Gena M. San Nicholas, a taxonomist and biology subject-matter expert (SME) at Access Innovations, Inc., shows how Data Harmony's machine-aided indexing (M.A.I.) module produces tagged subject terms within bodies of text for XML and other repositories. This aids in search and leverages subject metadata, resulting in added value to data collections.
2. Introduction
What’s the big deal about Data Harmony,
anyway?
My background—biology
Searching through science databases was tedious and laborious
Frequently, the only way to tell if an article was what you wanted was to
actually read the whole thing
Costly if your institution didn’t have accession rights to that particular
publication.
3. Data Harmony allows the user to
“browse the book”
Rulebase allows editors to assign context to full-text and
disambiguate terms
Indexing terms are XML-tagged by Data Harmony in the document
Rulebase is auto-generated but is easily edited
4. “Easily edited”—easy for an
experienced editor
Test MAI
Look at indexing
results
Compare rule to
trigger words in
full-text test
Tweak rule as
necessary
7. With Inline Tagging, we make it
EVEN EASIER for you!!!
What is Inline Tagging?
From the DH-Inline tagging documentation: “Access Innovations’ Inline
Tagging function finds and labels thesaurus concepts (identified by rules
stored within the thesaurus rule base ) within the full text of an article (in
XML or PDF format ) by applying XML wrappers, or “tags”. The process
of adding XML tags within content is called “inline tagging.” Thanks to
the XML format, metadata can be included within content files—not
just in a set-aside area at the beginning or end of the file, but woven
into the very text. “
This allows the user to truly “browse
the book” according to your
content management needs.
11. MAIstro Inline Tagging Web Service
To facilitate integration of DataHarmony's MAIstro suite with a
publishing pipeline or other workflow, a simple web service can be
installed that performs automatic indexing. This web service is an
abstraction of the Java APIs that DataHarmony's MAIstro uses.
The web service has two functions:
TestSettings: For configuration and debugging
GetTerms: Call MAIstro's GetTerms API and return a formatted document
with the subject terms tagged inline with xml tags