SlideShare uma empresa Scribd logo
1 de 95
Machine Aided Indexer™
Machine Aided IndexerTM is available as a
stand-alone version or as part of MAIstro™
(integrated with Thesaurus MasterTM).
M.A.I.TM creates a simple rulebase from your
thesaurus terms to use for categorizing
documents.
You can fine-tune the rulebase to reflect
editorial knowledge and judgment, specifying
when thesaurus terms should be used.
    Your result: Precision Indexing
M.A.I. under the hood
   Concept Extractor™
      Compares text to Knowledge Base rules to present
      suggested index terms
   Statistics Collector™
      Gathers and stores the index experience of the
      system, sorting into Hits / Misses / Noise
      Prioritizes terms needing rule finetuning to improve
      indexing accuracy
   Rule Builder™
      Human editor creates, edits, and reviews rules for
      indexing terms
IN                             Knowledge
               MAI               Base
Text        Concept                         Rule Builder
            Extractor                       Editor manages
                                            Knowledge Base



          List of
       suggested
       terms from                            Statistics
        controlled                           Collector
       vocabulary                           improves the
                                             Knowledge
                                               Base
                                                             OUT
                     Human review                            Database
                     results in:
                                                              Indexed
                     Hits—selected terms
                                                               set of
                     Misses—added terms
                                                             documents
                     Noise—rejected terms
Objective in indexing:
apply indexing terms with...
   Accuracy
   Speed
   Depth -- specificity
   Breadth -- exhaustivity
   Consistency
Objective in M.A.I. rulebuilding:
make rules reflect human thinking for
optimal categorization
How?

 Formulate standard rules

    for interpreting text

    for applying thesaurus terms as subject
     metadata to index/categorize
     documents

 2/14/2012
Why use rules for indexing?
Rules provides consistent direction for
interpreting text and applying indexing terms.

Accurate indexing results in precise
information retrieval.
M.A.I.’s starter rulebase
   M.A.I. automatically generates rules
   Starter rules match exactly to words in text
       Identity rules for thesaurus terms
       Synonym rules for established NonPreferred
        terms
   Success out of the box depends on
       Taxonomy term expression of concepts
       Writer’s creative expression of concepts
Fine-tuned by editors, rules enable
   context clues to pinpoint word meaning
   ―reading between the lines‖
   natural language processing
   greater accuracy over simple rule
    indexing
       Use M.A.I.’s Rule Builder Module
     to fine-tune rules for applying terms.
Indexing and rule-building –
       two processes
 Indexing:
  Read and interpret document text

  Decide on indexing term

 Rulebuilding:
  Identify prompt word(s)
             What brought the indexing term to mind?
             This text to match in the document is the
              starting point for rule-building.
  2/14/2012
Indexer reads the document text


―Indian leaders are asking the government…‖




                                          11
Indexer considers indexing terms
             ―government‖
                      State government?
                      Federal government?
                      City government?
             ―Indians‖
                      in India?
                      Indigenous people?
                      Native Americans?
  2/14/2012
Indexer selects indexing terms

―Indian leaders are asking the government
   to prevent a repeat of the 1990 census
   undercount that missed nearly 3000
   Indians
  in New Mexico.‖


                                            13
M.A.I. term suggestions
   Government
   New Mexico

Use your knowledge to select best terms –
 from M.A.l. suggested terms

 from thesaurus


Decide on indexing terms and apply them to
  document.
Indexing done,
rule-building begins
The rule-building editor’s question:

     What words in the text prompted
     selection of those terms?

This word (or words) is the starting point for
building a rule with M.A.I. – the ―gatekeeper.‖
Choose the MAI Rule Builder tab


A rule has
two parts:
                        Viewing options:
--Text to Match
                             font
--rule body
                             style
                             size
M.A.I. rule starts with
Text to Match
  The prompt word (or word part or
  phrase) in the document --

  whatever made the indexer think of
  a specific indexing term --

  becomes the Text to Match of a rule.
Importance of Text to Match
   TTM opens the door to the rulebase
       Without a word or phrase to match, the
        knowledge of the rulebase is unavailable.

   M.A.I. system programmatically creates a
    starter rulebase
       Identity rules – exact match to thesaurus term
       Synonym rules – exact match to NonPref term

Starting point for a rulebase – Ready for finetuning
M.A.I. out of the box

    Estimate 60% accuracy

    Success depends on:
        Style of thesaurus terms
        Writing style of documents
        Addition of synonyms
If only…

Document authors wrote
using the language of thesaurus terms,
then the starter rulebase would be sufficient…

but...
Editors make M.A.I. rules smarter
 1.   Modify the Text to Match

 2.   Modify the rule body
1. Modify the Text to Match
   Words with the same root
      crystal ~ crystallize ~ crystalline ~
      crystallization ~ crystal-forming
          Text to match: crystal*
   Words in inverted sequence
      Power, Solar = Solar power
         Text to match: solar
   Phrases with same meaning, different syntax
      Pollution control = Control of pollution
          Text to match: pollution
2. Modify the rule body
   Starter rules (identity and synonym)
    specify term to be used –
            no ifs, ands or buts
   You can
       establish conditions or limits on the
        suggestion of the indexing term(s)
       direct M.A.I. to ignore a word or phrase in
        text (NULL rule)
Two basic types of rules
   1. Simple rules (starter rules)
       no conditions to limit the use
       of the indexing term
   2. Condition rules
       where rules get interesting!
Simple rules – how they work
   The prompt word in the text suggests the
    same indexing term every time that word
    occurs
   No IFs qualify the use of the indexing
    term
   Text to Match in the document 
       USE Indexing term
3 Types of simple rules
     1.   Identity rules

     2.   Synonym rules

     3.   NULL rules
Simple rules – identity rule


Text to Match
is identical to
thesaurus term
in the
rule body --
No conditions
Simple rules – identity
Text to match: irrigation
                    USE Irrigation
Text to match: Lake Michigan
                   USE Lake Michigan
Text to match: marriage and divorce records
                   USE Marriage and divorce records
Identity rules are created programmatically
Simple rules – synonym rule
   Show term equivalents (Use/Used
    for)
Text to match: jobless      USE Unemployment
Text to match: fish farm    USE Aquaculture
Text to match: Y2K          USE Y2K issue
Text to match: parish       USE County
Text to match: e-business   USE Ecommerce
Simple rules – synonym rule
   Simplify morphological, punctuation,
    spelling, and sequencing variations
Text to match: worker’s compensation
              workman’s compensation
              workmen’s compensation
              work* comp*
                         USE Worker’s compensation
Text to match: e-commerce
                         USE Ecommerce
A synonym rule for the Text to Match ―jobless‖
suggests …               USE Unemployment




           When M.A.I. is integrated with
           Thesaurus Master,
           synonym rules for Non Preferred terms
           are generated programmatically.
Simple rules – synonym rule
   Separate out compound terms

Text to match: fishing   USE Fishing and hunting
Text to match: hunting   USE Fishing and hunting
Text to match: adoption  USE Adoption and foster
                                  care
Text to match: divorce   USE Marriage and divorce
                                  records
      TIP: Trim TTM down to one core element
Simple rules – NULL
Ignore a thesaurus word that occurs
  • as part of an irrelevant phrase
              ―physician’s orders‖
  • as part of an idiom
              ―in light of…‖
              ―a bird in hand‖
              ―looking back...‖
      Text to match:    in light of
      Rule:             NULL
NULL rule –
    Do not index with the thesaurus term
    ―Light‖ in this instance.
Two basic types of rules
   1. Simple rules (starter rules)
       no conditions limit the use
       of the indexing term
   2. Condition rules
       where rules get interesting!
Dealing with ambiguity
Jay Leno’s headlines
   Police Begin Campaign to Run Down
    Jaywalkers
   Local High School Dropouts Cut in Half
   Red Tape Holds Up New Bridges
   Include Your Children When Baking
    Cookies
   Kids Make Nutritious Snacks
   Iraqi Head Loses Arm
How would you disambiguate…
•   bush – What other words and/or conditions
    should lead to using the term
          Shrubs – OR
          U.S. presidents
   balloon
          Aerostatic aviation – OR
          Party supplies
   will(s)
          Jurisprudence, Last will and testament, Living wills
          (auxiliary verb)
Example: routing
   vehicles (direction)
   work (workflow)
   people, data, stuff (distribute, disperse)
   the other team (overwhelming defeat)
   wood (using power tool)
Example: Technology –
            Need conditions?
     Top term
     Narrow terms
        Engineering                   Information technology
        Medical technology            Technology transfer
        Radio frequency identification technology
     Scope note
        The practical use of scientific knowledge in industry and
        everyday life; the scientific method and material used to
        achieve a commercial or industrial objective
     Related terms
        Technology assessment         Technology research

Set conditions on using term Technology?
 ―new fangled technology‖ ―cooking technology‖
 ―report from the Massachusetts Institute of Technology‖
When the prompt word
is ambiguous
   Could prompt word be interpreted differently?
     Indian leaders are asking the government…
     balloon
     bush
     bridge
     adoption
   Under what conditions would another
    interpretation be correct?
Thinking conditionally –
let the IFs begin...
   Convergent thinking
      What other words in text would
      confirm your interpretation of the
      text-to-match meaning and your
      proposed indexing term?
   Divergent thinking
      What words in text would contradict
      your interpretation?
Condition rules – IF rules
   For ambiguous word meanings, editor can set
    IF conditions that must be met for rules to
    suggest an indexing term.
   Can incorporate conditions from Scope Notes
   Editor can set one or more conditions, joined
    with Boolean operators AND, OR, and NOT.
Example: Sniffer
   BT Malicious code
   SN A program that intercepts routed data and
    examines each packet in search of specified
    information, such as passwords transmitted in
    clear text.
   M.A.I. rule
             TTM: sniffer
             USE Sniffer
        “Customs used a sniffer dog to identify
        the contraband …”
In a botany taxonomy, ―bushes‖ is a NonPref Term
that prompts the preferred term ―Shrubs‖ --
even if the text is about (former) President Bush.




When a simple rule won’t do, set conditions in the
rule to increase precision Hits and decrease Noise.
Simplify the TTM – then add conditions in the
                                    rule body
4 types of conditions
1.   Proximity of rule’s TTM to quoted word
     from document text
       (4 levels of proximity)
2.   Capitalization of TTM
3.   Exact MATCH of TTM to word in text
4.   TTM begins or ends a sentence
 Mix and match conditions with
 Boolean operators: AND, OR, NOT
Condition rules – Proximity
   Text to match: safety
IF (NEAR “security”)               WITHIN 3 WORDS
     USE Crime prevention
ENDIF
IF (WITH “community”)              WITHIN SENTENCE
    USE Public safety
ENDIF
IF (AROUND “product”)              WITHIN 50 WORDS
    USE Product safety
ENDIF
IF (MENTIONS “food”)               WITHIN 250 WORDS
    USE Food handling and safety
ENDIF
Condition rules – Proximity
   Text to match: bear
      IF (NEAR “Chicago” OR WITH “football”)
            USE Chicago Bears
      ENDIF

       IF (NEAR “market” OR AROUND “stock”)
            USE Stock market
       ENDIF

       IF (MENTIONS “forest” OR MENTIONS “woods”)
             USE Wild animals
       ENDIF
Example: Documentation
Text to match: documentation
USE Documentation
     Identity rule created problems

  Add conditions for greater precision:
 IF (AROUND "software" OR WITH "application"
 OR AROUND "hardware" OR WITH "instruction“)
       USE Documentation
 ENDIF
Condition rules – Negation
   Text to match: wages
              IF (NOT WITH “war”)
                    USE Wages and salaries
              ENDIF

• Text to match: web
              IF (NOT WITH “spin*”)
                    USE Internet
              ENDIF

      (“spider” no longer differentiates internet from arachnids)
Condition rules – Case
   Text to match: aids
      IF (ALL CAPS)
            USE AIDS and HIV
      ENDIF
   Text to match: masters
      IF (INITIAL CAPS AND MENTIONS “poet*”)
            USE Edgar Lee Masters
      ENDIF
Condition rules – Match
   Text to match: employ*

    IF (MATCH “employment”)
         USE Employment
    ENDIF

    IF (MATCH “employee” AND
    (WITH “municipal” OR WITH “city”
    OR WITH “town”))
         USE Municipal employees
    ENDIF
Condition rules – Sentence position

         IF (BEGIN SENTENCE)

         IF (END SENTENCE)
Conditions in rules help


      increase precision Hits


                  decrease Noise

for more precise information retrieval.
 Conditions depend on human logic.
M.A.I. can save illogical statements  bad results.
M.A.I. can not save a rule with incorrect syntax.
Rule Check and Save check the syntax of a rule.
Error warning – explains syntax problems
               – shows line location




                                        Closing
                                        parenthesis
                                        missing
Mind your IFs and ( )s – come in 2s
 IF starts the system thinking about a condition;
 ENDIF completes the thought.

 Every IF condition goes in ( )s.

 Every ( must close with ) -- multiple ( )s are OK.
 Every IF condition must close with an ENDIF.
 Every ― must close with ‖.

 Function words must be spelled correctly.
Kicking rules up a notch
      Rules can express
         Multiple concepts
         Alternative concepts
         Contingent concepts
Condition rules – IF-IF
Text to match: housing
IF (AROUND “afford*”)
      USE Affordable housing
                               IS DIFFERENT FROM
ENDIF
IF (AROUND “public”)           Text to match : housing
      USE Public housing       IF (AROUND “afford*”)
ENDIF                                 USE Affordable housing
                                  IF (AROUND “public”)
Independent conditions                  USE Public housing
                                  ENDIF
                               ENDIF

                               Contingent conditions
Condition rules – IF-IF
Text to Match: agricultur*       Text to Match: agricultur*
IF (WITH “products”)             IF (WITH “products”)
    USE Agricultural products       USE Agricultural products
 IF (WITH “programs”)            ENDIF
     USE Agricultural programs   IF (WITH “programs”)
 ENDIF                              USE Agricultural programs
ENDIF                            ENDIF
Agricultural programs
  is available ONLY IF       BOTH terms may be used—
Agricultural products        they are independent
  condition is met.
Condition rules – IF-IF
Text to Match: agricultur*
IF (WITH “products”)
      USE Agricultural products
      IF (WITH “programs”)
           USE Agricultural programs
      ENDIF
ENDIF

Indentation emphasizes contingent condition
Condition rules – IF-ELSE                        1

   IF - ELSE provides further options in
    rules, a default if the first condition is not
    met.
      It may be used without condition
          Text to match: technology
                 IF (AROUND “transfer*”)
                       USE Technology transfer
                 ELSE
                       USE Technology
                 ENDIF
Condition rules – IF-ELSE            2

     Text to match: norwegian
           IF (AROUND “language” OR
           WITH “speak*”)
                USE Norwegian language
           ELSE
                USE Norway
           ENDIF
Condition rules – IF-ELSE IF
    IF - ELSE IF
       or add extra conditions
        Text to match: norwegian
              IF (MENTIONS “language”)
                   USE Norwegian language
              ELSE IF (MENTIONS “country”)
                   USE Norway
              ENDIF
              ENDIF
You can...

    Truncate a single word with *
       e.g. agri*
    Use * as a wild card between words,
       e.g. drinking * driving
    Truncate in the text to match and/or
       in the rule body
And you can...
   Include multiple conditions in a
    rule, starting from a single text-to-
    match tax*
Text to match:
IF (WITH “business”)      USE Business taxes
IF (WITH “income”)        USE Income taxes
IF (WITH “sales”)         USE Sales taxes
IF (AROUND “forms”)       USE Tax forms
IF (AROUND “law*” OR AROUND “legis*”
      OR AROUND “legal”)  USE Tax laws
And you can...
   Use multiple Boolean operators in rules
   Embed clauses within clauses using Boolean
    operators
      Text to match: activit*
      IF (WITH “extracurricular” OR (WITH “school” AND
      (WITH “after” OR WITH “before” OR WITH “outside”)))
                  USE Extracurricular activities
      ENDIF
                  Watch the ( )s!
M.A.I. in action
(105 ILCS 45/1-20)
  Sec. 1-20. Enrollment. If the parents or guardians of a homeless
child or youth choose to enroll the child in a school other than the
school of origin, that school immediately shall enroll the homeless
child or youth even if the child or youth is unable to produce records
normally required for enrollment, such as previous academic records,
medical records, proof of residency, or other documentation. Nothing in
this subsection shall prohibit school districts from requiring parents
or guardians of a homeless child to submit an address or such other
contact information as the district may require from parents or
guardians of nonhomeless children. It shall be the duty of the
enrolling school to immediately contact the school last attended by the
child or youth to obtain relevant academic and other records. If the
child or youth must obtain immunizations, it shall be the duty of the
enrolling school to promptly refer the child or youth for those
immunizations.
(Source: P.A. 88-634, eff. 1-1-95; 88-686, eff. 1-24-95.)
Original identity rule for “Children and youth”
              Modify rule for “Children and youth”
              to Text to Match: child*
Reading M.A.I. results
  Indexing terms | Document words match TTM

 Children and youth | (15) child*(9) youth (6)

 Schools | (7) school*(7)

 Homeless people | (3) homeless*(3)

 Immunizations | (2) immuniz*(2)
M.A.I. Statistics let you track performance
as you fine-tune the Knowledge Base.




    M.A.I.’s Statistics Collector gathers and stores
    indexing experience.
    Statistics compare editor’s indexing results to
    M.A.I.’s suggestions  Hits, Misses, Noise
    Statistics prioritizes the terms for which rules
    need fine-tuning.
M.A.I. statistics
 Hits
     System suggests indexing terms that are
     chosen by the editor--good!
 Misses
     System misses terms editor uses
 Noise
   System suggests terms not used by editor


Misses and Noise … need more rule-building
Open Misses to reveal thesaurus terms used
by an editor but not suggested by M.A.I.




 Buddhism was used by editors for indexing
 3 records, but was not suggested by M.A.I.
Open the key beside the term to see the list
of records where the term was used...




 The file name, record number and editor’s
 name are stored with each record.
Click to highlight any record line on the left.




The full record appears on the right, with
M.A.I.’s Suggested Terms and the editor’s
Used Terms.
In this record, M.A.I. interprets ―devotion‖ and
suggests the indexing term ―Prayer‖ -- Hit.
The editor used ―Buddhism‖ though M.A.I.
did not suggest the term -- Miss.
M.A.I. suggested ―Libraries‖ and ―Religions‖
though the terms were not used -- Noise.
For this record, M.A.I. scored
• 3 Hits -- Prayer, Sri Lanka, Religious beliefs
• 1 Miss -- Buddhism
• 2 Noise -- Religions, Libraries
The word ―Buddhism‖ does not appear
in the record, although ―Buddha‖ does.
The editor’s use of the thesaurus term
Buddhism to index the record is
appropriate.

M.A.I.’s Knowledge Base can be fine-tuned
to reflect human knowledge and
interpretation of the text.
Search the Knowledge Base for rules for Buddha.
         (Truncate buddh* to widen the search.)




Click Search,
results appear
Rules exist for ―Buddhism‖ and ―Buddhist‖
  but not for ―Buddha,‖ which is in the text.

  You can easily create a new rule …
          Text to Match: Buddha
          IF (MENTIONS “religion”)
               USE Buddhism
          ENDIF
If ―buddha‖ and ―religion‖ are both in the text,
M.A.I. suggests the indexing term Buddhism.
Enter a rule for Text to Match: buddha ...




    Better yet: combine all 3 rules by using
          Text to Match: buddh*
Click Save, OK to verify, and then Retry...
The new rule Text to Match: buddha prompts
Buddhism in Suggested Terms for indexing.
At any time, you can:   modify a rule
                        check the rule
                          for syntax
                        save the rule
                        see the rule’s
                          history
                        add an editorial
                         note
                        find a word
                        clear the screen
                        delete the rule
Each rule in the Knowledge Base that the
editor fine-tunes increases M.A.I.’s
• ability to recognize synonyms,
• find connections between non-contiguous
      words
• interpret idioms,
• make sense of allusions,
• ―read between the lines‖

    Over time, statistics for Hits increase,
    while Misses and Noise decrease.
M.A.I.’s Statistics Report summarizes
  Hit/Miss/Noise figures over time
When to make rules
   Before processing documents
       Proactive rule building provides head start
       Increases hits from the start
   After processing documents
       Statistics report lets indexer see what rules
        need fine-tuning to improve Hits, avoid Misses,
        and decrease Noise, based on comparison of
        M.A.I. suggestions with editor’s indexing
Rule-building is an on-going process
       Frequency diminishes, results improve
Custom configure M.A.I.
 How many term suggestions?
 Limit use of a term to n documents?
 How much text to scan? Treat singular the same as plural?

                                                    Ignore
                                                 stopwords?

                                               Quote marks?

                                              Plural=Singular?

                                                Most specific
                                                 term only?

                                                 Suggest
                                                Candidates?
M.A.I. measurably improves indexing results:

  • Consistency
       same term suggested under same text conditions
  • Indexing coverage
       terms reflect full range of indexable concepts in data
  • Indexing depth
       terms reflect the granularity and precision of
       deeper levels of thesaurus
  • Faster throughput
       nearly 7 times faster indexing
M.A.I. mines the full depth of your
thesaurus, suggesting the most specific
and appropriate indexing term.

M.A.I. can also filter indexing terms,
displaying more general Broad Terms,
while retaining the more precise indexing
terms stored with the document.
Pairing Machine Aided Indexer
    with Thesaurus Master
      as MAIstro provides
 • simple thesaurus construction
       and maintenance
 • faster indexing
 • deeper indexing
 • greater concept coverage
 • more consistent indexing

   Efficiency and Economy
 in document storage and retrieval

Mais conteúdo relacionado

Mais procurados

Controlled Vocabulary
Controlled VocabularyControlled Vocabulary
Controlled Vocabulary
guest118a9a
 
Glossary of Media Cataloging Terms
Glossary of Media Cataloging TermsGlossary of Media Cataloging Terms
Glossary of Media Cataloging Terms
Richard Carson
 

Mais procurados (9)

The Myth of Topic Maps
The Myth of Topic MapsThe Myth of Topic Maps
The Myth of Topic Maps
 
Introduction To Controlled Vocabularies
Introduction To Controlled VocabulariesIntroduction To Controlled Vocabularies
Introduction To Controlled Vocabularies
 
Taxonomies for Text Analytics and Auto-indexing
Taxonomies for Text Analytics and Auto-indexingTaxonomies for Text Analytics and Auto-indexing
Taxonomies for Text Analytics and Auto-indexing
 
Controlled Vocabulary
Controlled VocabularyControlled Vocabulary
Controlled Vocabulary
 
Should libraries discontinue using and maintaining controlled subject vocabul...
Should libraries discontinue using and maintaining controlled subject vocabul...Should libraries discontinue using and maintaining controlled subject vocabul...
Should libraries discontinue using and maintaining controlled subject vocabul...
 
Mapping, Merging, and Multilingual Taxonomies
Mapping, Merging, and Multilingual TaxonomiesMapping, Merging, and Multilingual Taxonomies
Mapping, Merging, and Multilingual Taxonomies
 
E-LEARN: Search Strategies
E-LEARN: Search StrategiesE-LEARN: Search Strategies
E-LEARN: Search Strategies
 
Introduction to Controlled Vocabulary
Introduction to Controlled VocabularyIntroduction to Controlled Vocabulary
Introduction to Controlled Vocabulary
 
Glossary of Media Cataloging Terms
Glossary of Media Cataloging TermsGlossary of Media Cataloging Terms
Glossary of Media Cataloging Terms
 

Destaque (6)

самопрезентация 2007
самопрезентация 2007самопрезентация 2007
самопрезентация 2007
 
Информация – знания об окружающем нас мире
Информация – знания об окружающем нас миреИнформация – знания об окружающем нас мире
Информация – знания об окружающем нас мире
 
Trabalhofinal
TrabalhofinalTrabalhofinal
Trabalhofinal
 
Demystifyingsocial abl552011
Demystifyingsocial abl552011Demystifyingsocial abl552011
Demystifyingsocial abl552011
 
Inline Tagging and Dictionary Connection
Inline Tagging and Dictionary ConnectionInline Tagging and Dictionary Connection
Inline Tagging and Dictionary Connection
 
Como Aprendo
Como AprendoComo Aprendo
Como Aprendo
 

Semelhante a Machine Aided Indexer

02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
beshahashenafe20
 
Search domain basics
Search domain basicsSearch domain basics
Search domain basics
pmanvi
 

Semelhante a Machine Aided Indexer (20)

02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
 
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
WordNet Based Online Reverse Dictionary with Improved Accuracy and Parts-of-S...
 
SEO with the SEOGoddess Workshop
SEO with the SEOGoddess WorkshopSEO with the SEOGoddess Workshop
SEO with the SEOGoddess Workshop
 
Survey on Key Phrase Extraction using Machine Learning Approaches
Survey on Key Phrase Extraction using Machine Learning ApproachesSurvey on Key Phrase Extraction using Machine Learning Approaches
Survey on Key Phrase Extraction using Machine Learning Approaches
 
Julie glanville embase sunrise seminar may 2016
Julie glanville embase sunrise seminar may 2016Julie glanville embase sunrise seminar may 2016
Julie glanville embase sunrise seminar may 2016
 
A Novel Approach for Keyword extraction in learning objects using text mining
A Novel Approach for Keyword extraction in learning objects using text miningA Novel Approach for Keyword extraction in learning objects using text mining
A Novel Approach for Keyword extraction in learning objects using text mining
 
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habibConceptual foundations of text mining and preprocessing steps nfaoui el_habib
Conceptual foundations of text mining and preprocessing steps nfaoui el_habib
 
The search engine index
The search engine indexThe search engine index
The search engine index
 
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
 
Optimized Technique for Academic Search engine Optimization
Optimized Technique for Academic Search engine OptimizationOptimized Technique for Academic Search engine Optimization
Optimized Technique for Academic Search engine Optimization
 
Case Study: JSTOR: A Year Later
Case Study: JSTOR: A Year LaterCase Study: JSTOR: A Year Later
Case Study: JSTOR: A Year Later
 
XXIX Charleston 2009 Silverchair Kerner
XXIX Charleston 2009 Silverchair KernerXXIX Charleston 2009 Silverchair Kerner
XXIX Charleston 2009 Silverchair Kerner
 
Indexing
IndexingIndexing
Indexing
 
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDING
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDINGLARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDING
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDING
 
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDING
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDINGLARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDING
LARQS: AN ANALOGICAL REASONING EVALUATION DATASET FOR LEGAL WORD EMBEDDING
 
Search domain basics
Search domain basicsSearch domain basics
Search domain basics
 
Mc0077 – advanced database systems
Mc0077 – advanced database systemsMc0077 – advanced database systems
Mc0077 – advanced database systems
 
PubMed Search Tutorial
PubMed Search TutorialPubMed Search Tutorial
PubMed Search Tutorial
 
Search explained T3DD15
Search explained T3DD15Search explained T3DD15
Search explained T3DD15
 
Semantic Search_ NLP_ ML.pdf
Semantic Search_ NLP_ ML.pdfSemantic Search_ NLP_ ML.pdf
Semantic Search_ NLP_ ML.pdf
 

Mais de Access Innovations, Inc.

Mais de Access Innovations, Inc. (20)

Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy ResultsMaking AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
Making AI Behave: Using Knowledge Domains to Produce Useful, Trustworthy Results
 
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
ISO 25964-1Working Group ISO/TC 46/SC 9/WG 8
 
Smart submit
Smart submitSmart submit
Smart submit
 
Plos taxonomy beyond search dhug 2021
Plos taxonomy beyond search   dhug 2021Plos taxonomy beyond search   dhug 2021
Plos taxonomy beyond search dhug 2021
 
Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)Hindawi taxonomy and personalization 27.10 (1)
Hindawi taxonomy and personalization 27.10 (1)
 
Data harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacingData harmonycloudpowerpointclientfacing
Data harmonycloudpowerpointclientfacing
 
Data harmony update 2021
Data harmony update 2021 Data harmony update 2021
Data harmony update 2021
 
Atypon dhug2021
Atypon dhug2021Atypon dhug2021
Atypon dhug2021
 
Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021Asco using ai-taxos-for meta-titles-february-2021
Asco using ai-taxos-for meta-titles-february-2021
 
Asce more than just topic taxonomies
Asce more than just topic taxonomiesAsce more than just topic taxonomies
Asce more than just topic taxonomies
 
Acs discoverability-dhug2021
Acs discoverability-dhug2021Acs discoverability-dhug2021
Acs discoverability-dhug2021
 
Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)Ai webinar 2 -what's in a name (consolidated pdf)
Ai webinar 2 -what's in a name (consolidated pdf)
 
Tagging overview - Why Keywords Don't Cut It
Tagging overview  - Why Keywords Don't Cut ItTagging overview  - Why Keywords Don't Cut It
Tagging overview - Why Keywords Don't Cut It
 
Health Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut ItHealth Affairs - Why Keywords Don't Cut It
Health Affairs - Why Keywords Don't Cut It
 
Why Keywords Don't Cut It
Why Keywords Don't Cut ItWhy Keywords Don't Cut It
Why Keywords Don't Cut It
 
Data Harmony update 2020 final
Data Harmony update 2020 finalData Harmony update 2020 final
Data Harmony update 2020 final
 
Data Harmony Update 2020 final
Data Harmony Update 2020 finalData Harmony Update 2020 final
Data Harmony Update 2020 final
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository Interoperability
 
DHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCRDHUG 2018 - Florida Thesis OCR
DHUG 2018 - Florida Thesis OCR
 
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project FundedDHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
DHUG 2017 - Understanding ROI Just Enough to Get Your Project Funded
 

Último

Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
dlhescort
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
dlhescort
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
daisycvs
 
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
amitlee9823
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
amitlee9823
 
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Anamikakaur10
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
amitlee9823
 

Último (20)

Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
A DAY IN THE LIFE OF A SALESMAN / WOMAN
A DAY IN THE LIFE OF A  SALESMAN / WOMANA DAY IN THE LIFE OF A  SALESMAN / WOMAN
A DAY IN THE LIFE OF A SALESMAN / WOMAN
 
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service NoidaCall Girls In Noida 959961⊹3876 Independent Escort Service Noida
Call Girls In Noida 959961⊹3876 Independent Escort Service Noida
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Cracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptxCracking the Cultural Competence Code.pptx
Cracking the Cultural Competence Code.pptx
 
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
 
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
Call Girls Kengeri Satellite Town Just Call 👗 7737669865 👗 Top Class Call Gir...
 
(Anamika) VIP Call Girls Napur Call Now 8617697112 Napur Escorts 24x7
(Anamika) VIP Call Girls Napur Call Now 8617697112 Napur Escorts 24x7(Anamika) VIP Call Girls Napur Call Now 8617697112 Napur Escorts 24x7
(Anamika) VIP Call Girls Napur Call Now 8617697112 Napur Escorts 24x7
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
Organizational Transformation Lead with Culture
Organizational Transformation Lead with CultureOrganizational Transformation Lead with Culture
Organizational Transformation Lead with Culture
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
👉Chandigarh Call Girls 👉9878799926👉Just Call👉Chandigarh Call Girl In Chandiga...
 
Falcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investorsFalcon Invoice Discounting: The best investment platform in india for investors
Falcon Invoice Discounting: The best investment platform in india for investors
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentation
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
 

Machine Aided Indexer

  • 2. Machine Aided IndexerTM is available as a stand-alone version or as part of MAIstro™ (integrated with Thesaurus MasterTM). M.A.I.TM creates a simple rulebase from your thesaurus terms to use for categorizing documents. You can fine-tune the rulebase to reflect editorial knowledge and judgment, specifying when thesaurus terms should be used. Your result: Precision Indexing
  • 3. M.A.I. under the hood  Concept Extractor™ Compares text to Knowledge Base rules to present suggested index terms  Statistics Collector™ Gathers and stores the index experience of the system, sorting into Hits / Misses / Noise Prioritizes terms needing rule finetuning to improve indexing accuracy  Rule Builder™ Human editor creates, edits, and reviews rules for indexing terms
  • 4. IN Knowledge MAI Base Text Concept Rule Builder Extractor Editor manages Knowledge Base List of suggested terms from Statistics controlled Collector vocabulary improves the Knowledge Base OUT Human review Database results in: Indexed Hits—selected terms set of Misses—added terms documents Noise—rejected terms
  • 5. Objective in indexing: apply indexing terms with...  Accuracy  Speed  Depth -- specificity  Breadth -- exhaustivity  Consistency Objective in M.A.I. rulebuilding: make rules reflect human thinking for optimal categorization
  • 6. How? Formulate standard rules  for interpreting text  for applying thesaurus terms as subject metadata to index/categorize documents 2/14/2012
  • 7. Why use rules for indexing? Rules provides consistent direction for interpreting text and applying indexing terms. Accurate indexing results in precise information retrieval.
  • 8. M.A.I.’s starter rulebase  M.A.I. automatically generates rules  Starter rules match exactly to words in text  Identity rules for thesaurus terms  Synonym rules for established NonPreferred terms  Success out of the box depends on  Taxonomy term expression of concepts  Writer’s creative expression of concepts
  • 9. Fine-tuned by editors, rules enable  context clues to pinpoint word meaning  ―reading between the lines‖  natural language processing  greater accuracy over simple rule indexing Use M.A.I.’s Rule Builder Module to fine-tune rules for applying terms.
  • 10. Indexing and rule-building – two processes Indexing:  Read and interpret document text  Decide on indexing term Rulebuilding:  Identify prompt word(s)  What brought the indexing term to mind?  This text to match in the document is the starting point for rule-building. 2/14/2012
  • 11. Indexer reads the document text ―Indian leaders are asking the government…‖ 11
  • 12. Indexer considers indexing terms  ―government‖ State government? Federal government? City government?  ―Indians‖ in India? Indigenous people? Native Americans? 2/14/2012
  • 13. Indexer selects indexing terms ―Indian leaders are asking the government to prevent a repeat of the 1990 census undercount that missed nearly 3000 Indians in New Mexico.‖ 13
  • 14. M.A.I. term suggestions  Government  New Mexico Use your knowledge to select best terms –  from M.A.l. suggested terms  from thesaurus Decide on indexing terms and apply them to document.
  • 15. Indexing done, rule-building begins The rule-building editor’s question: What words in the text prompted selection of those terms? This word (or words) is the starting point for building a rule with M.A.I. – the ―gatekeeper.‖
  • 16. Choose the MAI Rule Builder tab A rule has two parts: Viewing options: --Text to Match font --rule body style size
  • 17. M.A.I. rule starts with Text to Match The prompt word (or word part or phrase) in the document -- whatever made the indexer think of a specific indexing term -- becomes the Text to Match of a rule.
  • 18. Importance of Text to Match  TTM opens the door to the rulebase  Without a word or phrase to match, the knowledge of the rulebase is unavailable.  M.A.I. system programmatically creates a starter rulebase  Identity rules – exact match to thesaurus term  Synonym rules – exact match to NonPref term Starting point for a rulebase – Ready for finetuning
  • 19. M.A.I. out of the box  Estimate 60% accuracy  Success depends on:  Style of thesaurus terms  Writing style of documents  Addition of synonyms
  • 20. If only… Document authors wrote using the language of thesaurus terms, then the starter rulebase would be sufficient… but...
  • 21. Editors make M.A.I. rules smarter 1. Modify the Text to Match 2. Modify the rule body
  • 22. 1. Modify the Text to Match  Words with the same root crystal ~ crystallize ~ crystalline ~ crystallization ~ crystal-forming Text to match: crystal*  Words in inverted sequence Power, Solar = Solar power Text to match: solar  Phrases with same meaning, different syntax Pollution control = Control of pollution Text to match: pollution
  • 23. 2. Modify the rule body  Starter rules (identity and synonym) specify term to be used – no ifs, ands or buts  You can  establish conditions or limits on the suggestion of the indexing term(s)  direct M.A.I. to ignore a word or phrase in text (NULL rule)
  • 24. Two basic types of rules 1. Simple rules (starter rules) no conditions to limit the use of the indexing term 2. Condition rules where rules get interesting!
  • 25. Simple rules – how they work  The prompt word in the text suggests the same indexing term every time that word occurs  No IFs qualify the use of the indexing term  Text to Match in the document  USE Indexing term
  • 26. 3 Types of simple rules 1. Identity rules 2. Synonym rules 3. NULL rules
  • 27. Simple rules – identity rule Text to Match is identical to thesaurus term in the rule body -- No conditions
  • 28. Simple rules – identity Text to match: irrigation USE Irrigation Text to match: Lake Michigan USE Lake Michigan Text to match: marriage and divorce records USE Marriage and divorce records
  • 29. Identity rules are created programmatically
  • 30. Simple rules – synonym rule  Show term equivalents (Use/Used for) Text to match: jobless USE Unemployment Text to match: fish farm USE Aquaculture Text to match: Y2K USE Y2K issue Text to match: parish USE County Text to match: e-business USE Ecommerce
  • 31. Simple rules – synonym rule  Simplify morphological, punctuation, spelling, and sequencing variations Text to match: worker’s compensation workman’s compensation workmen’s compensation work* comp* USE Worker’s compensation Text to match: e-commerce USE Ecommerce
  • 32. A synonym rule for the Text to Match ―jobless‖ suggests … USE Unemployment When M.A.I. is integrated with Thesaurus Master, synonym rules for Non Preferred terms are generated programmatically.
  • 33. Simple rules – synonym rule  Separate out compound terms Text to match: fishing USE Fishing and hunting Text to match: hunting USE Fishing and hunting Text to match: adoption USE Adoption and foster care Text to match: divorce USE Marriage and divorce records TIP: Trim TTM down to one core element
  • 34. Simple rules – NULL Ignore a thesaurus word that occurs • as part of an irrelevant phrase ―physician’s orders‖ • as part of an idiom ―in light of…‖ ―a bird in hand‖ ―looking back...‖ Text to match: in light of Rule: NULL
  • 35. NULL rule – Do not index with the thesaurus term ―Light‖ in this instance.
  • 36. Two basic types of rules 1. Simple rules (starter rules) no conditions limit the use of the indexing term 2. Condition rules where rules get interesting!
  • 38.
  • 39. Jay Leno’s headlines  Police Begin Campaign to Run Down Jaywalkers  Local High School Dropouts Cut in Half  Red Tape Holds Up New Bridges  Include Your Children When Baking Cookies  Kids Make Nutritious Snacks  Iraqi Head Loses Arm
  • 40. How would you disambiguate… • bush – What other words and/or conditions should lead to using the term  Shrubs – OR  U.S. presidents  balloon  Aerostatic aviation – OR  Party supplies  will(s)  Jurisprudence, Last will and testament, Living wills  (auxiliary verb)
  • 41. Example: routing  vehicles (direction)  work (workflow)  people, data, stuff (distribute, disperse)  the other team (overwhelming defeat)  wood (using power tool)
  • 42. Example: Technology – Need conditions?  Top term  Narrow terms Engineering Information technology Medical technology Technology transfer Radio frequency identification technology  Scope note The practical use of scientific knowledge in industry and everyday life; the scientific method and material used to achieve a commercial or industrial objective  Related terms Technology assessment Technology research Set conditions on using term Technology? ―new fangled technology‖ ―cooking technology‖ ―report from the Massachusetts Institute of Technology‖
  • 43. When the prompt word is ambiguous  Could prompt word be interpreted differently?  Indian leaders are asking the government…  balloon  bush  bridge  adoption  Under what conditions would another interpretation be correct?
  • 44. Thinking conditionally – let the IFs begin...  Convergent thinking What other words in text would confirm your interpretation of the text-to-match meaning and your proposed indexing term?  Divergent thinking What words in text would contradict your interpretation?
  • 45. Condition rules – IF rules  For ambiguous word meanings, editor can set IF conditions that must be met for rules to suggest an indexing term.  Can incorporate conditions from Scope Notes  Editor can set one or more conditions, joined with Boolean operators AND, OR, and NOT.
  • 46. Example: Sniffer  BT Malicious code  SN A program that intercepts routed data and examines each packet in search of specified information, such as passwords transmitted in clear text.  M.A.I. rule TTM: sniffer USE Sniffer “Customs used a sniffer dog to identify the contraband …”
  • 47. In a botany taxonomy, ―bushes‖ is a NonPref Term that prompts the preferred term ―Shrubs‖ -- even if the text is about (former) President Bush. When a simple rule won’t do, set conditions in the rule to increase precision Hits and decrease Noise.
  • 48. Simplify the TTM – then add conditions in the rule body
  • 49. 4 types of conditions 1. Proximity of rule’s TTM to quoted word from document text (4 levels of proximity) 2. Capitalization of TTM 3. Exact MATCH of TTM to word in text 4. TTM begins or ends a sentence Mix and match conditions with Boolean operators: AND, OR, NOT
  • 50. Condition rules – Proximity  Text to match: safety IF (NEAR “security”) WITHIN 3 WORDS USE Crime prevention ENDIF IF (WITH “community”) WITHIN SENTENCE USE Public safety ENDIF IF (AROUND “product”) WITHIN 50 WORDS USE Product safety ENDIF IF (MENTIONS “food”) WITHIN 250 WORDS USE Food handling and safety ENDIF
  • 51. Condition rules – Proximity  Text to match: bear IF (NEAR “Chicago” OR WITH “football”) USE Chicago Bears ENDIF IF (NEAR “market” OR AROUND “stock”) USE Stock market ENDIF IF (MENTIONS “forest” OR MENTIONS “woods”) USE Wild animals ENDIF
  • 52. Example: Documentation Text to match: documentation USE Documentation Identity rule created problems Add conditions for greater precision: IF (AROUND "software" OR WITH "application" OR AROUND "hardware" OR WITH "instruction“) USE Documentation ENDIF
  • 53. Condition rules – Negation  Text to match: wages IF (NOT WITH “war”) USE Wages and salaries ENDIF • Text to match: web IF (NOT WITH “spin*”) USE Internet ENDIF (“spider” no longer differentiates internet from arachnids)
  • 54. Condition rules – Case  Text to match: aids IF (ALL CAPS) USE AIDS and HIV ENDIF  Text to match: masters IF (INITIAL CAPS AND MENTIONS “poet*”) USE Edgar Lee Masters ENDIF
  • 55. Condition rules – Match  Text to match: employ* IF (MATCH “employment”) USE Employment ENDIF IF (MATCH “employee” AND (WITH “municipal” OR WITH “city” OR WITH “town”)) USE Municipal employees ENDIF
  • 56. Condition rules – Sentence position  IF (BEGIN SENTENCE)  IF (END SENTENCE)
  • 57. Conditions in rules help increase precision Hits decrease Noise for more precise information retrieval. Conditions depend on human logic.
  • 58. M.A.I. can save illogical statements  bad results. M.A.I. can not save a rule with incorrect syntax. Rule Check and Save check the syntax of a rule. Error warning – explains syntax problems – shows line location Closing parenthesis missing
  • 59. Mind your IFs and ( )s – come in 2s IF starts the system thinking about a condition; ENDIF completes the thought. Every IF condition goes in ( )s. Every ( must close with ) -- multiple ( )s are OK. Every IF condition must close with an ENDIF. Every ― must close with ‖. Function words must be spelled correctly.
  • 60. Kicking rules up a notch Rules can express  Multiple concepts  Alternative concepts  Contingent concepts
  • 61. Condition rules – IF-IF Text to match: housing IF (AROUND “afford*”) USE Affordable housing IS DIFFERENT FROM ENDIF IF (AROUND “public”) Text to match : housing USE Public housing IF (AROUND “afford*”) ENDIF USE Affordable housing IF (AROUND “public”) Independent conditions USE Public housing ENDIF ENDIF Contingent conditions
  • 62. Condition rules – IF-IF Text to Match: agricultur* Text to Match: agricultur* IF (WITH “products”) IF (WITH “products”) USE Agricultural products USE Agricultural products IF (WITH “programs”) ENDIF USE Agricultural programs IF (WITH “programs”) ENDIF USE Agricultural programs ENDIF ENDIF Agricultural programs is available ONLY IF BOTH terms may be used— Agricultural products they are independent condition is met.
  • 63. Condition rules – IF-IF Text to Match: agricultur* IF (WITH “products”) USE Agricultural products IF (WITH “programs”) USE Agricultural programs ENDIF ENDIF Indentation emphasizes contingent condition
  • 64. Condition rules – IF-ELSE 1  IF - ELSE provides further options in rules, a default if the first condition is not met. It may be used without condition Text to match: technology IF (AROUND “transfer*”) USE Technology transfer ELSE USE Technology ENDIF
  • 65. Condition rules – IF-ELSE 2 Text to match: norwegian IF (AROUND “language” OR WITH “speak*”) USE Norwegian language ELSE USE Norway ENDIF
  • 66. Condition rules – IF-ELSE IF  IF - ELSE IF or add extra conditions Text to match: norwegian IF (MENTIONS “language”) USE Norwegian language ELSE IF (MENTIONS “country”) USE Norway ENDIF ENDIF
  • 67. You can...  Truncate a single word with * e.g. agri*  Use * as a wild card between words, e.g. drinking * driving  Truncate in the text to match and/or in the rule body
  • 68. And you can...  Include multiple conditions in a rule, starting from a single text-to- match tax* Text to match: IF (WITH “business”) USE Business taxes IF (WITH “income”) USE Income taxes IF (WITH “sales”) USE Sales taxes IF (AROUND “forms”) USE Tax forms IF (AROUND “law*” OR AROUND “legis*” OR AROUND “legal”) USE Tax laws
  • 69.
  • 70.
  • 71. And you can...  Use multiple Boolean operators in rules  Embed clauses within clauses using Boolean operators Text to match: activit* IF (WITH “extracurricular” OR (WITH “school” AND (WITH “after” OR WITH “before” OR WITH “outside”))) USE Extracurricular activities ENDIF Watch the ( )s!
  • 72. M.A.I. in action (105 ILCS 45/1-20) Sec. 1-20. Enrollment. If the parents or guardians of a homeless child or youth choose to enroll the child in a school other than the school of origin, that school immediately shall enroll the homeless child or youth even if the child or youth is unable to produce records normally required for enrollment, such as previous academic records, medical records, proof of residency, or other documentation. Nothing in this subsection shall prohibit school districts from requiring parents or guardians of a homeless child to submit an address or such other contact information as the district may require from parents or guardians of nonhomeless children. It shall be the duty of the enrolling school to immediately contact the school last attended by the child or youth to obtain relevant academic and other records. If the child or youth must obtain immunizations, it shall be the duty of the enrolling school to promptly refer the child or youth for those immunizations. (Source: P.A. 88-634, eff. 1-1-95; 88-686, eff. 1-24-95.)
  • 73. Original identity rule for “Children and youth” Modify rule for “Children and youth” to Text to Match: child*
  • 74.
  • 75. Reading M.A.I. results Indexing terms | Document words match TTM Children and youth | (15) child*(9) youth (6) Schools | (7) school*(7) Homeless people | (3) homeless*(3) Immunizations | (2) immuniz*(2)
  • 76. M.A.I. Statistics let you track performance as you fine-tune the Knowledge Base. M.A.I.’s Statistics Collector gathers and stores indexing experience. Statistics compare editor’s indexing results to M.A.I.’s suggestions  Hits, Misses, Noise Statistics prioritizes the terms for which rules need fine-tuning.
  • 77. M.A.I. statistics  Hits System suggests indexing terms that are chosen by the editor--good!  Misses System misses terms editor uses  Noise  System suggests terms not used by editor Misses and Noise … need more rule-building
  • 78. Open Misses to reveal thesaurus terms used by an editor but not suggested by M.A.I. Buddhism was used by editors for indexing 3 records, but was not suggested by M.A.I.
  • 79. Open the key beside the term to see the list of records where the term was used... The file name, record number and editor’s name are stored with each record.
  • 80. Click to highlight any record line on the left. The full record appears on the right, with M.A.I.’s Suggested Terms and the editor’s Used Terms.
  • 81. In this record, M.A.I. interprets ―devotion‖ and suggests the indexing term ―Prayer‖ -- Hit. The editor used ―Buddhism‖ though M.A.I. did not suggest the term -- Miss. M.A.I. suggested ―Libraries‖ and ―Religions‖ though the terms were not used -- Noise. For this record, M.A.I. scored • 3 Hits -- Prayer, Sri Lanka, Religious beliefs • 1 Miss -- Buddhism • 2 Noise -- Religions, Libraries
  • 82. The word ―Buddhism‖ does not appear in the record, although ―Buddha‖ does. The editor’s use of the thesaurus term Buddhism to index the record is appropriate. M.A.I.’s Knowledge Base can be fine-tuned to reflect human knowledge and interpretation of the text.
  • 83. Search the Knowledge Base for rules for Buddha. (Truncate buddh* to widen the search.) Click Search, results appear
  • 84. Rules exist for ―Buddhism‖ and ―Buddhist‖ but not for ―Buddha,‖ which is in the text. You can easily create a new rule … Text to Match: Buddha IF (MENTIONS “religion”) USE Buddhism ENDIF If ―buddha‖ and ―religion‖ are both in the text, M.A.I. suggests the indexing term Buddhism.
  • 85. Enter a rule for Text to Match: buddha ... Better yet: combine all 3 rules by using Text to Match: buddh*
  • 86. Click Save, OK to verify, and then Retry...
  • 87. The new rule Text to Match: buddha prompts Buddhism in Suggested Terms for indexing.
  • 88. At any time, you can: modify a rule check the rule for syntax save the rule see the rule’s history add an editorial note find a word clear the screen delete the rule
  • 89. Each rule in the Knowledge Base that the editor fine-tunes increases M.A.I.’s • ability to recognize synonyms, • find connections between non-contiguous words • interpret idioms, • make sense of allusions, • ―read between the lines‖ Over time, statistics for Hits increase, while Misses and Noise decrease.
  • 90. M.A.I.’s Statistics Report summarizes Hit/Miss/Noise figures over time
  • 91. When to make rules  Before processing documents  Proactive rule building provides head start  Increases hits from the start  After processing documents  Statistics report lets indexer see what rules need fine-tuning to improve Hits, avoid Misses, and decrease Noise, based on comparison of M.A.I. suggestions with editor’s indexing Rule-building is an on-going process  Frequency diminishes, results improve
  • 92. Custom configure M.A.I. How many term suggestions? Limit use of a term to n documents? How much text to scan? Treat singular the same as plural? Ignore stopwords? Quote marks? Plural=Singular? Most specific term only? Suggest Candidates?
  • 93. M.A.I. measurably improves indexing results: • Consistency same term suggested under same text conditions • Indexing coverage terms reflect full range of indexable concepts in data • Indexing depth terms reflect the granularity and precision of deeper levels of thesaurus • Faster throughput nearly 7 times faster indexing
  • 94. M.A.I. mines the full depth of your thesaurus, suggesting the most specific and appropriate indexing term. M.A.I. can also filter indexing terms, displaying more general Broad Terms, while retaining the more precise indexing terms stored with the document.
  • 95. Pairing Machine Aided Indexer with Thesaurus Master as MAIstro provides • simple thesaurus construction and maintenance • faster indexing • deeper indexing • greater concept coverage • more consistent indexing Efficiency and Economy in document storage and retrieval