SlideShare uma empresa Scribd logo
1 de 32
Putting Structured Business Vocabularies to Work

                                                                                November 4, 2008
                                               Data Management and Information Quality Conference
                                                                                          IRM UK

                                                                                        Ian Davis
                                                     Global Project Manger, Dow Jones & Company




© Copyright 2008 Dow Jones and Company, Inc.
What we’ll cover today:

        Understanding the challenges of controlled versus
         uncontrolled vocabularies

        Developing a strategy to create and maintain
         controlled vocabularies

        Identifying how you want to integrate your controlled
         vocabularies into your systems

        Understanding the requirements of integrating
         controlled vocabularies into multiple applications




© Copyright 2008 Dow Jones and Company, Inc.                     2
Setting the Context




© Copyright 2008 Dow Jones and Company, Inc.
Once upon a time…


        Most of the business was IT enabled.
        There was some degree of “sharing” of information
         and content, there were even some large, well
         structured document repositories.
        Yet, no one could find anything.
        Actually, they found things,
          but not what they wanted when they wanted it
          and they were never sure they found the “best” or “saw
            it all”.




© Copyright 2008 Dow Jones and Company, Inc.                        4
Once upon a time…


        The C-level executives were a bit irritated.
          They’d spent lots on the technology
          and people really weren’t much more efficient,
          the pinch point in the workflow had simply
           moved further downstream.
        So, what happened next?




© Copyright 2008 Dow Jones and Company, Inc.                5
Once upon a time…


        They SPENT <more> MONEY and bought the
         best in class search utilities.
        Yet, no one could find anything.
        Actually, they found things,
          but not what they wanted when they wanted it
          and they were never sure they found the “best”
           or “saw it all”.




© Copyright 2008 Dow Jones and Company, Inc.                6
Once upon a time…


        The C-level executives became a bit more
         irritated.
        Everyone was a bit frustrated.
        What was missing?




© Copyright 2008 Dow Jones and Company, Inc.        7
Optimized?

                 Is the search utility optimized using all the
                  bells and whistles it came with?
                  Relevancy rankings
                  “Thesaurus” files (synonym lists)
                  Multi-lingual capabilities
                  Common searches saved and presented to
                     users
                  Logs reviewed to understand user issues




© Copyright 2008 Dow Jones and Company, Inc.                      8
Usable?
                 Is the user interface considerate to users?
                  Was it designed with YOUR users in mind
                     Designed for occasional users?
                     Designed for power users?
                  Was it designed with YOUR business in mind
                     Task-based views for context sensitive
                       searches
                     Present results in a format readily used
                       within work flows



© Copyright 2008 Dow Jones and Company, Inc.                     9
Metadata?

                 Are there required metadata fields within the CMS?
                  Author, Title, Language, Topic, Product/Service, etc
                 Are the entry values to those fields controlled?
                  Lookups against authority files, taxonomies, thesauri
                 Does the search utility support fielded searches?
                 Does the search utility weight terms within metadata
                  fields higher than free-text?




© Copyright 2008 Dow Jones and Company, Inc.                               10
Metadata?
                 For example:
                  If a financial analyst enters the query term “stock”
                     within the company’s knowledge base,
                  Will he get back results with the documents
                     specifically discussing “stock” as a financial
                     instrument listed first?
                 Or will he have to look through 100’s of documents
                  discussing what’s relevant to him as well as every
                  document that references free-text in the body of
                  the document about:
                     soup stock (food industry),
                     cows (livestock industry),
                 or stock car racing (professional sports industry)?


© Copyright 2008 Dow Jones and Company, Inc.                              11
Metadata?
                 Precise and comprehensive searches
                  Only if controlled vocabularies have been used to
                     populate metadata fields
                 AND
                  The search utility takes advantage of that by giving
                     priority to query term occurrence within controlled
                     value metadata fields
                 OR
                  Fielded searches are enabled
                     e.g. <Author = Smith> + <Service = Consulting> +
                        <Industry = Automotive> + <Date = January 2006>
                        + <Content Type = Proposal>


© Copyright 2008 Dow Jones and Company, Inc.                               12
Challenges:
                Controlled versus Uncontrolled




© Copyright 2008 Dow Jones and Company, Inc.
Controlled Vocabularies Explained


        Authority files
           e.g. Company’s active directory, ISO standard for Languages
           Typically a flat list of allowed values
        Taxonomies
           e.g. Linnaean Classification (kingdom, phylum, class, order,
            family, genus, and species )
           Typically includes only hierarchical relationships between terms
        Thesauri
           e.g. NASA Thesaurus (http://www.sti.nasa.gov/thesfrm1.htm)
           Includes full set of semantic relationships defined between terms
            (hierarchical, associative, equivalence)




© Copyright 2008 Dow Jones and Company, Inc.                                    14
NASA Thesaurus – Sample Entry




© Copyright 2008 Dow Jones and Company, Inc.    15
Semantic Relationships

        Hierarchical
           Superordination - representing a class or a whole, and
            subordination - referring to members or parts
              e.g. mammals and vertebrates
              e.g. cherry pie and cherry pie slices
        Equivalence
           One concept expressed by two or more terms
              e.g. dogs and canines
        Associative
           Terms that are conceptually linked, but not through
            hierarchy or equivalence
              e.g. accounting and accountant

© Copyright 2008 Dow Jones and Company, Inc.                         16
Challenges – Uncontrolled Vocabularies

        Uncontrolled vocabularies are:
          Comprehensive but noisy
             Only comprehensive if synonym lists are
              used
          Limited in their precision and relevancy
             Time lost scanning through hundreds of
              “miss” hits
          Reduced effectiveness of cross-repository
           searches
             Limited ways to disambiguate ‘soup stock’
              from ‘stock car’
© Copyright 2008 Dow Jones and Company, Inc.              17
Challenges - Controlled Vocabularies

        Controlled vocabularies can produce:
          Potentially significant overhead effort (manual
           and technical)
          Organizational politics can add YEARS to
           establishing an initial set of controlled
           vocabularies
          A lack of basic understanding of what the
           controlled vocabularies are and how they work
           impedes effective development and utilization



© Copyright 2008 Dow Jones and Company, Inc.                 18
Challenges - Controlled Vocabularies

        Controlled vocabularies:
                  Richness and power comes from a full set of semantic
                   relationships, not just hierarchical ones
                     Hierarchy supports the ability to narrow and broaden
                      search queries
                     Association supports “did you mean” and “you might
                      also want to look at”
                     Equivalence enables the use of familiar language to
                      retrieve content which is conceptually on target but
                      never uses their term
                          e.g. user enters dog and search utility expands
                           query to include “canine, k-9, puppy”


© Copyright 2008 Dow Jones and Company, Inc.                                 19
Challenges - Controlled Vocabularies

        Controlled vocabularies:
          Richness and power comes at the cost of
           added complexity of development,
           implementation, integration and maintenance
          Utilization of controlled vocabularies can
           produce performance issues
             During search index creation
             During query run time




© Copyright 2008 Dow Jones and Company, Inc.             20
Tackling the Challenges




© Copyright 2008 Dow Jones and Company, Inc.
Strategy – Creation and Maintenance


                 State the business case clearly
                  Benefits
                     Reduced time for knowledge discovery
                     Increased richness of knowledge discovery
                     Decreased risk to firm of making business
                       decisions with partial information
                  Scope
                     One business unit or enterprise-wide?
                  Resource requirements
                     Skill sets (IS, IT, business knowledge)
                     Time commitment


© Copyright 2008 Dow Jones and Company, Inc.                      22
Strategy – Creation and Maintenance


                 Tackle organizational politics head-on
                  Gain credibility and ensure usability by establishing a
                    cross-functional working committee that will become
                    the Review Committee
                  Include all major stakeholder groups and any
                    interested parties (even the non-supporters)
                  Establish methods of broadly soliciting end-user input
                    that will become a source of change requests during
                    maintenance phases




© Copyright 2008 Dow Jones and Company, Inc.                                 23
Strategy – Creation and Maintenance


                 Additional considerations before you start:
                  How rigorous does it need to be?
                    What external standards should be adopted?
                        ANSI/NISO Z39.19-2005
                        British Standard – BS 8723
                    What internal standards should be developed?
                        Editorial Guidelines
                        Usage Guidelines
                  How extensive will it be?
                    Depth and breadth within and across facets
                  What about adaptability and flexibility
                    Will there be a need for local extensions?


© Copyright 2008 Dow Jones and Company, Inc.                        24
Strategy – Creation and Maintenance


                 Additional considerations before you start:
                  Projected frequency of revisions
                    How quickly does the content base change with
                       respect to concepts; is there significant content
                       drift?
                    How volatile is the language?
                        Management consulting vs. accounting
                  Vocabulary Management Software
                    DON’T spend money just to spend money
                    However, you CAN’T manage controlled
                       vocabularies in a spreadsheet
                    Buy the tool you need based on your documented
                       functional requirements
© Copyright 2008 Dow Jones and Company, Inc.                               25
Strategy – Integration Choices

        Performance trade-offs
           Store UIDs within content, then use look-up table at
            query run time
           Store full-text of a term, then touch all content when
            taxonomy value changes (must re-assign new term
            value)
        Version control
           Use static versions of controlled vocabularies within
            CMS and search utilities, releasing new versions
            periodically
           Use dynamic version of controlled vocabularies with
            continuous revisions occurring


© Copyright 2008 Dow Jones and Company, Inc.                         26
Strategy – Integration Choices

        Utilizing semantic relationships
          Store full set (term values or UIDs) within
           content record
         OR
          Store single UID and have search utility use
           reference tables to determine related terms
        Display of semantic relationships
          User interface considerations for effective
           presentation of non-hierarchically related terms


© Copyright 2008 Dow Jones and Company, Inc.                  27
Strategy – Integration Choices


                                                             Query entry
                                                   (including ability to broaden or
                                                   narrow current search results)


                                               Previous query statement user entered     Related topics
           Browse navigation                   plus any auto-expansion done by engine   (defined through
               options                                                                     Associative
                                                                                          relationships)


                                                        Query results listing




© Copyright 2008 Dow Jones and Company, Inc.                                                               28
Strategy – Multiple Applications

        Expanding the adoption and use of controlled
         vocabularies
           Know the business objectives of the applications
               In conjunction with the search utility, does the
                controlled vocabulary enable this objective?
           Are there metadata fields available within current
            application for the controlled vocabulary?
           Does the business have resources to assign the
            controlled vocabulary?
           What format does the controlled vocabulary need to be
            in to be integrated with the application?



© Copyright 2008 Dow Jones and Company, Inc.                        29
Strategy – Multiple Applications

        Additional considerations
          Will there be conflicting version management
           needs?
          How does search currently index these
           applications and will that change with the use
           of controlled vocabularies?




© Copyright 2008 Dow Jones and Company, Inc.                30
Five Key Points

       1. Controlled vocabularies are a lever to improve
          precision and comprehensiveness
       2. Controlled vocabularies are never finished – they are
          always a work in process
       3. Search utilities can only be tweaked so far
       4. Tapping into the richness of the semantic
          relationships between terms can be extremely
          powerful
       5. There are lots of options for implementing and
          integrating controlled vocabularies




© Copyright 2008 Dow Jones and Company, Inc.                      31
Thank you for your attention!

                           Ian Davis
                           ian.davis@dowjones.com




© Copyright 2008 Dow Jones and Company, Inc.

Mais conteúdo relacionado

Destaque

LinkedIn and Twitter Lab
LinkedIn and Twitter LabLinkedIn and Twitter Lab
LinkedIn and Twitter LabHelen Buzdugan
 
Git, как инструмент управления веб-контентом
Git, как инструмент управления веб-контентомGit, как инструмент управления веб-контентом
Git, как инструмент управления веб-контентомAlex Musayev
 
Cloud Computing Presentation V3
Cloud Computing Presentation V3Cloud Computing Presentation V3
Cloud Computing Presentation V3David Oliver
 
Social Good: Social Media beyond politics
Social Good: Social Media beyond politicsSocial Good: Social Media beyond politics
Social Good: Social Media beyond politicsVernon Joseph Go
 
Big data - Cassandra
Big data - CassandraBig data - Cassandra
Big data - CassandraJen Wei Lee
 
Ciszewski internet credentials and case study eng
Ciszewski internet credentials and case study engCiszewski internet credentials and case study eng
Ciszewski internet credentials and case study engCiszewski MSL
 
Chapter 12
Chapter 12Chapter 12
Chapter 12dphil002
 
DocDokuPLM : Domain Specific PaaS and Business Oriented API, OW2con'16, Paris.
DocDokuPLM : Domain Specific PaaS and Business Oriented API, OW2con'16, Paris. DocDokuPLM : Domain Specific PaaS and Business Oriented API, OW2con'16, Paris.
DocDokuPLM : Domain Specific PaaS and Business Oriented API, OW2con'16, Paris. OW2
 
Hammr Project Update: Machine Images and Docker Containers for your Cloud, OW...
Hammr Project Update: Machine Images and Docker Containers for your Cloud, OW...Hammr Project Update: Machine Images and Docker Containers for your Cloud, OW...
Hammr Project Update: Machine Images and Docker Containers for your Cloud, OW...OW2
 
Bonnie’S Life In Ethiopia
Bonnie’S Life In EthiopiaBonnie’S Life In Ethiopia
Bonnie’S Life In Ethiopiabmohan
 
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris. Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris. OW2
 
Portland Views
Portland ViewsPortland Views
Portland Viewsgardenmam
 
Beowulf summary
Beowulf summaryBeowulf summary
Beowulf summaryMr. Euc@s
 
La Casa Invisible
La Casa InvisibleLa Casa Invisible
La Casa InvisibleCrisis 999
 
Monitoring File transfert (MFT) WAARP R66, OW2con'16, Paris.
Monitoring File transfert (MFT) WAARP R66, OW2con'16, Paris. Monitoring File transfert (MFT) WAARP R66, OW2con'16, Paris.
Monitoring File transfert (MFT) WAARP R66, OW2con'16, Paris. OW2
 
Egypt Travel- Webinar Slide Show (June 2009)
Egypt Travel- Webinar Slide Show (June 2009)Egypt Travel- Webinar Slide Show (June 2009)
Egypt Travel- Webinar Slide Show (June 2009)Lindblad Expeditions
 
nuage, deployment strategy of a distributed cloud infrastructure, OW2con'15, ...
nuage, deployment strategy of a distributed cloud infrastructure, OW2con'15, ...nuage, deployment strategy of a distributed cloud infrastructure, OW2con'15, ...
nuage, deployment strategy of a distributed cloud infrastructure, OW2con'15, ...OW2
 
OW2con'14 - Weblab in the land of Big Data
OW2con'14 - Weblab in the land of Big DataOW2con'14 - Weblab in the land of Big Data
OW2con'14 - Weblab in the land of Big DataOW2
 
Role Of Industrial Hygienist In Asbestos Litigation
Role Of Industrial Hygienist In Asbestos LitigationRole Of Industrial Hygienist In Asbestos Litigation
Role Of Industrial Hygienist In Asbestos Litigationgueste39677e
 

Destaque (20)

LinkedIn and Twitter Lab
LinkedIn and Twitter LabLinkedIn and Twitter Lab
LinkedIn and Twitter Lab
 
Git, как инструмент управления веб-контентом
Git, как инструмент управления веб-контентомGit, как инструмент управления веб-контентом
Git, как инструмент управления веб-контентом
 
Sergio mejia a.
Sergio mejia a.Sergio mejia a.
Sergio mejia a.
 
Cloud Computing Presentation V3
Cloud Computing Presentation V3Cloud Computing Presentation V3
Cloud Computing Presentation V3
 
Social Good: Social Media beyond politics
Social Good: Social Media beyond politicsSocial Good: Social Media beyond politics
Social Good: Social Media beyond politics
 
Big data - Cassandra
Big data - CassandraBig data - Cassandra
Big data - Cassandra
 
Ciszewski internet credentials and case study eng
Ciszewski internet credentials and case study engCiszewski internet credentials and case study eng
Ciszewski internet credentials and case study eng
 
Chapter 12
Chapter 12Chapter 12
Chapter 12
 
DocDokuPLM : Domain Specific PaaS and Business Oriented API, OW2con'16, Paris.
DocDokuPLM : Domain Specific PaaS and Business Oriented API, OW2con'16, Paris. DocDokuPLM : Domain Specific PaaS and Business Oriented API, OW2con'16, Paris.
DocDokuPLM : Domain Specific PaaS and Business Oriented API, OW2con'16, Paris.
 
Hammr Project Update: Machine Images and Docker Containers for your Cloud, OW...
Hammr Project Update: Machine Images and Docker Containers for your Cloud, OW...Hammr Project Update: Machine Images and Docker Containers for your Cloud, OW...
Hammr Project Update: Machine Images and Docker Containers for your Cloud, OW...
 
Bonnie’S Life In Ethiopia
Bonnie’S Life In EthiopiaBonnie’S Life In Ethiopia
Bonnie’S Life In Ethiopia
 
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris. Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
Sat4j: from the lab to desktop computers. OW2con'15, November 17, Paris.
 
Portland Views
Portland ViewsPortland Views
Portland Views
 
Beowulf summary
Beowulf summaryBeowulf summary
Beowulf summary
 
La Casa Invisible
La Casa InvisibleLa Casa Invisible
La Casa Invisible
 
Monitoring File transfert (MFT) WAARP R66, OW2con'16, Paris.
Monitoring File transfert (MFT) WAARP R66, OW2con'16, Paris. Monitoring File transfert (MFT) WAARP R66, OW2con'16, Paris.
Monitoring File transfert (MFT) WAARP R66, OW2con'16, Paris.
 
Egypt Travel- Webinar Slide Show (June 2009)
Egypt Travel- Webinar Slide Show (June 2009)Egypt Travel- Webinar Slide Show (June 2009)
Egypt Travel- Webinar Slide Show (June 2009)
 
nuage, deployment strategy of a distributed cloud infrastructure, OW2con'15, ...
nuage, deployment strategy of a distributed cloud infrastructure, OW2con'15, ...nuage, deployment strategy of a distributed cloud infrastructure, OW2con'15, ...
nuage, deployment strategy of a distributed cloud infrastructure, OW2con'15, ...
 
OW2con'14 - Weblab in the land of Big Data
OW2con'14 - Weblab in the land of Big DataOW2con'14 - Weblab in the land of Big Data
OW2con'14 - Weblab in the land of Big Data
 
Role Of Industrial Hygienist In Asbestos Litigation
Role Of Industrial Hygienist In Asbestos LitigationRole Of Industrial Hygienist In Asbestos Litigation
Role Of Industrial Hygienist In Asbestos Litigation
 

Semelhante a Putting Controlled Vocabulary To Work I Davis 2008

Taxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information ArchitectureTaxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information ArchitectureAccess Innovations, Inc.
 
Dynamic Potential of Semantic Enrichment
Dynamic Potential of Semantic EnrichmentDynamic Potential of Semantic Enrichment
Dynamic Potential of Semantic Enrichmentpharley
 
Folksonomies&Taxonomies Dow Jones Webcast
Folksonomies&Taxonomies Dow Jones WebcastFolksonomies&Taxonomies Dow Jones Webcast
Folksonomies&Taxonomies Dow Jones Webcastdaniela barbosa
 
Terminology Management
Terminology ManagementTerminology Management
Terminology ManagementUwe Muegge
 
Avoiding the Software Marketing Trap: Understanding Lies and Near-Truths When...
Avoiding the Software Marketing Trap: Understanding Lies and Near-Truths When...Avoiding the Software Marketing Trap: Understanding Lies and Near-Truths When...
Avoiding the Software Marketing Trap: Understanding Lies and Near-Truths When...Scott Abel
 
Dealing the Cards
Dealing the CardsDealing the Cards
Dealing the CardsTSoholt
 
Taxonomies for Human vs Auto-Indexing
Taxonomies for Human vs Auto-IndexingTaxonomies for Human vs Auto-Indexing
Taxonomies for Human vs Auto-IndexingHeather Hedden
 
Federated Search Webinar for SLA (Special Libraries Assoc.)
Federated Search Webinar for SLA (Special Libraries Assoc.)Federated Search Webinar for SLA (Special Libraries Assoc.)
Federated Search Webinar for SLA (Special Libraries Assoc.)Helen Mitchell
 
Document Classification for Microsoft Office
Document Classification for Microsoft OfficeDocument Classification for Microsoft Office
Document Classification for Microsoft Officejoseph978
 
Taxonomies and Folksonomies
Taxonomies and FolksonomiesTaxonomies and Folksonomies
Taxonomies and FolksonomiesHeather Hedden
 
Role of metadata in transportation agency data programs
Role of metadata in transportation agency data programsRole of metadata in transportation agency data programs
Role of metadata in transportation agency data programsJoseph Busch
 
S doherty counting_dragons_dita-reuse
S doherty counting_dragons_dita-reuseS doherty counting_dragons_dita-reuse
S doherty counting_dragons_dita-reuseStan Doherty
 

Semelhante a Putting Controlled Vocabulary To Work I Davis 2008 (20)

Taxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information ArchitectureTaxonomies and Metadata in Information Architecture
Taxonomies and Metadata in Information Architecture
 
Taxonomy 101
Taxonomy 101Taxonomy 101
Taxonomy 101
 
Dynamic Potential of Semantic Enrichment
Dynamic Potential of Semantic EnrichmentDynamic Potential of Semantic Enrichment
Dynamic Potential of Semantic Enrichment
 
Folksonomies&Taxonomies Dow Jones Webcast
Folksonomies&Taxonomies Dow Jones WebcastFolksonomies&Taxonomies Dow Jones Webcast
Folksonomies&Taxonomies Dow Jones Webcast
 
Terminology Management
Terminology ManagementTerminology Management
Terminology Management
 
Avoiding the Software Marketing Trap: Understanding Lies and Near-Truths When...
Avoiding the Software Marketing Trap: Understanding Lies and Near-Truths When...Avoiding the Software Marketing Trap: Understanding Lies and Near-Truths When...
Avoiding the Software Marketing Trap: Understanding Lies and Near-Truths When...
 
Dealing the Cards
Dealing the CardsDealing the Cards
Dealing the Cards
 
PoolParty Platform 2013
PoolParty Platform 2013PoolParty Platform 2013
PoolParty Platform 2013
 
Taxonomies for Human vs Auto-Indexing
Taxonomies for Human vs Auto-IndexingTaxonomies for Human vs Auto-Indexing
Taxonomies for Human vs Auto-Indexing
 
Federated Search Webinar for SLA (Special Libraries Assoc.)
Federated Search Webinar for SLA (Special Libraries Assoc.)Federated Search Webinar for SLA (Special Libraries Assoc.)
Federated Search Webinar for SLA (Special Libraries Assoc.)
 
TermWiki
TermWikiTermWiki
TermWiki
 
How Ontologies Power Chatbots
How Ontologies Power ChatbotsHow Ontologies Power Chatbots
How Ontologies Power Chatbots
 
Document Classification for Microsoft Office
Document Classification for Microsoft OfficeDocument Classification for Microsoft Office
Document Classification for Microsoft Office
 
Taxonomies and Folksonomies
Taxonomies and FolksonomiesTaxonomies and Folksonomies
Taxonomies and Folksonomies
 
EIS-Webinar-Most-From-LLMs-2023-08-23.pptx
EIS-Webinar-Most-From-LLMs-2023-08-23.pptxEIS-Webinar-Most-From-LLMs-2023-08-23.pptx
EIS-Webinar-Most-From-LLMs-2023-08-23.pptx
 
Role of metadata in transportation agency data programs
Role of metadata in transportation agency data programsRole of metadata in transportation agency data programs
Role of metadata in transportation agency data programs
 
Taxonomy Fundamentals - SLA 2014
Taxonomy Fundamentals - SLA 2014Taxonomy Fundamentals - SLA 2014
Taxonomy Fundamentals - SLA 2014
 
S doherty counting_dragons_dita-reuse
S doherty counting_dragons_dita-reuseS doherty counting_dragons_dita-reuse
S doherty counting_dragons_dita-reuse
 
User-Driven Taxonomies
User-Driven TaxonomiesUser-Driven Taxonomies
User-Driven Taxonomies
 
Taxonomy Governance and Iteration
Taxonomy Governance and IterationTaxonomy Governance and Iteration
Taxonomy Governance and Iteration
 

Último

HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsMichael W. Hawkins
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfPaul Menig
 
Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Roland Driesen
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdfRenandantas16
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 DelhiCall Girls in Delhi
 
GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in managementchhavia330
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...noida100girls
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Tina Ji
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageMatteo Carbone
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLSeo
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation SlidesKeppelCorporation
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒anilsa9823
 
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsApsara Of India
 

Último (20)

HONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael HawkinsHONOR Veterans Event Keynote by Michael Hawkins
HONOR Veterans Event Keynote by Michael Hawkins
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdf
 
Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...Boost the utilization of your HCL environment by reevaluating use cases and f...
Boost the utilization of your HCL environment by reevaluating use cases and f...
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf0183760ssssssssssssssssssssssssssss00101011 (27).pdf
0183760ssssssssssssssssssssssssssss00101011 (27).pdf
 
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
 
GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in management
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRLMONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
MONA 98765-12871 CALL GIRLS IN LUDHIANA LUDHIANA CALL GIRL
 
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
Keppel Ltd. 1Q 2024 Business Update  Presentation SlidesKeppel Ltd. 1Q 2024 Business Update  Presentation Slides
Keppel Ltd. 1Q 2024 Business Update Presentation Slides
 
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒VIP Call Girls In Saharaganj ( Lucknow  ) 🔝 8923113531 🔝  Cash Payment (COD) 👒
VIP Call Girls In Saharaganj ( Lucknow ) 🔝 8923113531 🔝 Cash Payment (COD) 👒
 
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call GirlsCash Payment 9602870969 Escort Service in Udaipur Call Girls
Cash Payment 9602870969 Escort Service in Udaipur Call Girls
 

Putting Controlled Vocabulary To Work I Davis 2008

  • 1. Putting Structured Business Vocabularies to Work November 4, 2008 Data Management and Information Quality Conference IRM UK Ian Davis Global Project Manger, Dow Jones & Company © Copyright 2008 Dow Jones and Company, Inc.
  • 2. What we’ll cover today:  Understanding the challenges of controlled versus uncontrolled vocabularies  Developing a strategy to create and maintain controlled vocabularies  Identifying how you want to integrate your controlled vocabularies into your systems  Understanding the requirements of integrating controlled vocabularies into multiple applications © Copyright 2008 Dow Jones and Company, Inc. 2
  • 3. Setting the Context © Copyright 2008 Dow Jones and Company, Inc.
  • 4. Once upon a time…  Most of the business was IT enabled.  There was some degree of “sharing” of information and content, there were even some large, well structured document repositories.  Yet, no one could find anything.  Actually, they found things,  but not what they wanted when they wanted it  and they were never sure they found the “best” or “saw it all”. © Copyright 2008 Dow Jones and Company, Inc. 4
  • 5. Once upon a time…  The C-level executives were a bit irritated.  They’d spent lots on the technology  and people really weren’t much more efficient,  the pinch point in the workflow had simply moved further downstream.  So, what happened next? © Copyright 2008 Dow Jones and Company, Inc. 5
  • 6. Once upon a time…  They SPENT <more> MONEY and bought the best in class search utilities.  Yet, no one could find anything.  Actually, they found things,  but not what they wanted when they wanted it  and they were never sure they found the “best” or “saw it all”. © Copyright 2008 Dow Jones and Company, Inc. 6
  • 7. Once upon a time…  The C-level executives became a bit more irritated.  Everyone was a bit frustrated.  What was missing? © Copyright 2008 Dow Jones and Company, Inc. 7
  • 8. Optimized?  Is the search utility optimized using all the bells and whistles it came with?  Relevancy rankings  “Thesaurus” files (synonym lists)  Multi-lingual capabilities  Common searches saved and presented to users  Logs reviewed to understand user issues © Copyright 2008 Dow Jones and Company, Inc. 8
  • 9. Usable?  Is the user interface considerate to users?  Was it designed with YOUR users in mind  Designed for occasional users?  Designed for power users?  Was it designed with YOUR business in mind  Task-based views for context sensitive searches  Present results in a format readily used within work flows © Copyright 2008 Dow Jones and Company, Inc. 9
  • 10. Metadata?  Are there required metadata fields within the CMS?  Author, Title, Language, Topic, Product/Service, etc  Are the entry values to those fields controlled?  Lookups against authority files, taxonomies, thesauri  Does the search utility support fielded searches?  Does the search utility weight terms within metadata fields higher than free-text? © Copyright 2008 Dow Jones and Company, Inc. 10
  • 11. Metadata?  For example:  If a financial analyst enters the query term “stock” within the company’s knowledge base,  Will he get back results with the documents specifically discussing “stock” as a financial instrument listed first?  Or will he have to look through 100’s of documents discussing what’s relevant to him as well as every document that references free-text in the body of the document about:  soup stock (food industry),  cows (livestock industry),  or stock car racing (professional sports industry)? © Copyright 2008 Dow Jones and Company, Inc. 11
  • 12. Metadata?  Precise and comprehensive searches  Only if controlled vocabularies have been used to populate metadata fields AND  The search utility takes advantage of that by giving priority to query term occurrence within controlled value metadata fields OR  Fielded searches are enabled  e.g. <Author = Smith> + <Service = Consulting> + <Industry = Automotive> + <Date = January 2006> + <Content Type = Proposal> © Copyright 2008 Dow Jones and Company, Inc. 12
  • 13. Challenges: Controlled versus Uncontrolled © Copyright 2008 Dow Jones and Company, Inc.
  • 14. Controlled Vocabularies Explained  Authority files  e.g. Company’s active directory, ISO standard for Languages  Typically a flat list of allowed values  Taxonomies  e.g. Linnaean Classification (kingdom, phylum, class, order, family, genus, and species )  Typically includes only hierarchical relationships between terms  Thesauri  e.g. NASA Thesaurus (http://www.sti.nasa.gov/thesfrm1.htm)  Includes full set of semantic relationships defined between terms (hierarchical, associative, equivalence) © Copyright 2008 Dow Jones and Company, Inc. 14
  • 15. NASA Thesaurus – Sample Entry © Copyright 2008 Dow Jones and Company, Inc. 15
  • 16. Semantic Relationships  Hierarchical  Superordination - representing a class or a whole, and subordination - referring to members or parts  e.g. mammals and vertebrates  e.g. cherry pie and cherry pie slices  Equivalence  One concept expressed by two or more terms  e.g. dogs and canines  Associative  Terms that are conceptually linked, but not through hierarchy or equivalence  e.g. accounting and accountant © Copyright 2008 Dow Jones and Company, Inc. 16
  • 17. Challenges – Uncontrolled Vocabularies  Uncontrolled vocabularies are:  Comprehensive but noisy  Only comprehensive if synonym lists are used  Limited in their precision and relevancy  Time lost scanning through hundreds of “miss” hits  Reduced effectiveness of cross-repository searches  Limited ways to disambiguate ‘soup stock’ from ‘stock car’ © Copyright 2008 Dow Jones and Company, Inc. 17
  • 18. Challenges - Controlled Vocabularies  Controlled vocabularies can produce:  Potentially significant overhead effort (manual and technical)  Organizational politics can add YEARS to establishing an initial set of controlled vocabularies  A lack of basic understanding of what the controlled vocabularies are and how they work impedes effective development and utilization © Copyright 2008 Dow Jones and Company, Inc. 18
  • 19. Challenges - Controlled Vocabularies  Controlled vocabularies:  Richness and power comes from a full set of semantic relationships, not just hierarchical ones  Hierarchy supports the ability to narrow and broaden search queries  Association supports “did you mean” and “you might also want to look at”  Equivalence enables the use of familiar language to retrieve content which is conceptually on target but never uses their term  e.g. user enters dog and search utility expands query to include “canine, k-9, puppy” © Copyright 2008 Dow Jones and Company, Inc. 19
  • 20. Challenges - Controlled Vocabularies  Controlled vocabularies:  Richness and power comes at the cost of added complexity of development, implementation, integration and maintenance  Utilization of controlled vocabularies can produce performance issues  During search index creation  During query run time © Copyright 2008 Dow Jones and Company, Inc. 20
  • 21. Tackling the Challenges © Copyright 2008 Dow Jones and Company, Inc.
  • 22. Strategy – Creation and Maintenance  State the business case clearly  Benefits  Reduced time for knowledge discovery  Increased richness of knowledge discovery  Decreased risk to firm of making business decisions with partial information  Scope  One business unit or enterprise-wide?  Resource requirements  Skill sets (IS, IT, business knowledge)  Time commitment © Copyright 2008 Dow Jones and Company, Inc. 22
  • 23. Strategy – Creation and Maintenance  Tackle organizational politics head-on  Gain credibility and ensure usability by establishing a cross-functional working committee that will become the Review Committee  Include all major stakeholder groups and any interested parties (even the non-supporters)  Establish methods of broadly soliciting end-user input that will become a source of change requests during maintenance phases © Copyright 2008 Dow Jones and Company, Inc. 23
  • 24. Strategy – Creation and Maintenance  Additional considerations before you start:  How rigorous does it need to be?  What external standards should be adopted?  ANSI/NISO Z39.19-2005  British Standard – BS 8723  What internal standards should be developed?  Editorial Guidelines  Usage Guidelines  How extensive will it be?  Depth and breadth within and across facets  What about adaptability and flexibility  Will there be a need for local extensions? © Copyright 2008 Dow Jones and Company, Inc. 24
  • 25. Strategy – Creation and Maintenance  Additional considerations before you start:  Projected frequency of revisions  How quickly does the content base change with respect to concepts; is there significant content drift?  How volatile is the language?  Management consulting vs. accounting  Vocabulary Management Software  DON’T spend money just to spend money  However, you CAN’T manage controlled vocabularies in a spreadsheet  Buy the tool you need based on your documented functional requirements © Copyright 2008 Dow Jones and Company, Inc. 25
  • 26. Strategy – Integration Choices  Performance trade-offs  Store UIDs within content, then use look-up table at query run time  Store full-text of a term, then touch all content when taxonomy value changes (must re-assign new term value)  Version control  Use static versions of controlled vocabularies within CMS and search utilities, releasing new versions periodically  Use dynamic version of controlled vocabularies with continuous revisions occurring © Copyright 2008 Dow Jones and Company, Inc. 26
  • 27. Strategy – Integration Choices  Utilizing semantic relationships  Store full set (term values or UIDs) within content record OR  Store single UID and have search utility use reference tables to determine related terms  Display of semantic relationships  User interface considerations for effective presentation of non-hierarchically related terms © Copyright 2008 Dow Jones and Company, Inc. 27
  • 28. Strategy – Integration Choices Query entry (including ability to broaden or narrow current search results) Previous query statement user entered Related topics Browse navigation plus any auto-expansion done by engine (defined through options Associative relationships) Query results listing © Copyright 2008 Dow Jones and Company, Inc. 28
  • 29. Strategy – Multiple Applications  Expanding the adoption and use of controlled vocabularies  Know the business objectives of the applications  In conjunction with the search utility, does the controlled vocabulary enable this objective?  Are there metadata fields available within current application for the controlled vocabulary?  Does the business have resources to assign the controlled vocabulary?  What format does the controlled vocabulary need to be in to be integrated with the application? © Copyright 2008 Dow Jones and Company, Inc. 29
  • 30. Strategy – Multiple Applications  Additional considerations  Will there be conflicting version management needs?  How does search currently index these applications and will that change with the use of controlled vocabularies? © Copyright 2008 Dow Jones and Company, Inc. 30
  • 31. Five Key Points 1. Controlled vocabularies are a lever to improve precision and comprehensiveness 2. Controlled vocabularies are never finished – they are always a work in process 3. Search utilities can only be tweaked so far 4. Tapping into the richness of the semantic relationships between terms can be extremely powerful 5. There are lots of options for implementing and integrating controlled vocabularies © Copyright 2008 Dow Jones and Company, Inc. 31
  • 32. Thank you for your attention! Ian Davis ian.davis@dowjones.com © Copyright 2008 Dow Jones and Company, Inc.