Allex Lyons, a programmer at Access Innovations, Inc., talks about the decision made by this company to apply a faster, more reliable and efficient Lucene index to XIS for searching docsets, instead of a random access file.
2. What is XIS?
XIS is a XML schema-based database system used to
store user data
All records are stored in individual XML files
Option to zip XML files available with XIS Project DTD
3. How XIS Data Is Stored
Docsets
Stores records with multiple fields (similar to SQL Table)
Can also have subfields and lists of field values nested within a
record
Can look up values from other fields in other Docsets or other
tables
Tables
Stores a single list of values
Can be referenced by other Docsets
Can be directly accessible for editing or kept hidden from user
view
4. How to Create a XIS Project
Create DTD file for XIS project
Specify MAI Thesaurus to link to project
Create Docset and Tables
Specify ID lengths for each Docset
Create fields for Docsets
Save DTD to dhserver/projects/projects/xml folder
Create XIS Project folder under dhserver/data
Create subfolders for each Docset under XIS Project
folder as well as Tables directory
XIS Projects can only be created by administrators
5. Starting a XIS Project
Start Data Harmony server where project is located
Log in to Admin module
Start MAI Thesaurus
Start XIS Project
Index XIS Project, especially if just created
Run startXis program
Enter server, port, thesaurus, username, and password
to log in
11. XIS Record Format
Saved in XML file
Starts with tag to represent Docset name along with ID
as attribute
Fields are listed within Docset tag along with values.
Subfields are nested within their parent fields
14. Current XIS Indexing and Search
Uses text-based indexes
Creates large number of index files (one for each field)
Generates temporary files for results
Uses less reliable RandomAccessFile search
Has limited amount of search operands
Does not take into account numerical values
15. Lucene vs. Current XIS Index
Fewer index files needed
Allows for broader searches
Fuzzy matching
Start and end wildcard searches
Recognizes numerical and date fields as such
Can be utilized to remove stopwords
16. New Lucene Search Process
Establish index reader to perform search
Submit query string containing fields and parameters
Return results
17. Other Lucene Functions
Will be used for adding, updating, and deleting XIS
records
Indexes will be housed on Data Harmony server