SlideShare uma empresa Scribd logo
1 de 47
Baixar para ler offline
Semantic & Multilingual Strategies in Lucene/Solr 
Trey Grainger 
Director of Engineering, Search & Analytics@CareerBuilder
Outline 
•Introduction 
•Text Analysis Refresher 
•Language-specific text Analysis 
•Multilingual Search Strategies 
•Automatic Language Identification 
•Semantic Search Strategies (understanding “meaning”) 
•Conclusion
About Me 
Trey Grainger 
Director of Engineering, Search & Analytics 
Joined CareerBuilderin 2007 as Software Engineer 
MBA, Management of Technology –GA Tech 
BA, Computer Science, Business, & Philosophy –Furman University 
Mining Massive Datasets (in progress) -Stanford University 
Fun outside of CB: 
•Co-author of Solr in Action, plus several research papers 
•Frequent conference speaker 
•Founder of Celiaccess.com, the gluten-free search engine 
•Lucene/Solrcontributor
At CareerBuilder, SolrPowers...
Text Analysis Refresher
Text Analysis Refresher 
A text field in Lucene/Solrhas an Analyzer containing: 
①Zero or more CharFilters 
Takes incoming text and “cleans it up” before it is tokenized 
②One Tokenizer 
Splits incoming text into a Token Stream containing Zero or more Tokens 
③Zero or more TokenFilters 
Examines and optionally modifies each Token in the Token Stream 
*From Solrin Action, Chapter 6
Text Analysis Refresher 
A text field in Lucene/Solrhas an Analyzer containing: 
①Zero or more CharFilters 
Takes incoming text and “cleans it up” before it is tokenized 
②One Tokenizer 
Splits incoming text into a Token Stream containing Zero or more Tokens 
③Zero or more TokenFilters 
Examines and optionally modifies each Token in the Token Stream 
*From Solrin Action, Chapter 6
Text Analysis Refresher 
A text field in Lucene/Solrhas an Analyzer containing: 
①Zero or more CharFilters 
Takes incoming text and “cleans it up” before it is tokenized 
②OneTokenizer 
Splits incoming text into a Token Stream containing Zero or more Tokens 
③Zero or more TokenFilters 
Examines and optionally modifies each Token in the Token Stream 
*From Solrin Action, Chapter 6
Text Analysis Refresher 
A text field in Lucene/Solrhas an Analyzer containing: 
①Zero or more CharFilters 
Takes incoming text and “cleans it up” before it is tokenized 
②One Tokenizer 
Splits incoming text into a Token Stream containing Zero or more Tokens 
③Zero or more TokenFilters 
Examines and optionally modifies each Token in the Token Stream 
*From Solrin Action, Chapter 6
Language-specific Text Analysis
Example English Analysis Chains 
<fieldTypename="text_en" class="solr.TextField" 
positionIncrementGap="100"> 
<analyzer> 
<tokenizerclass="solr.StandardTokenizerFactory"/> 
<filter class="solr.StopFilterFactory" 
words="lang/stopwords_en.txt” ignoreCase="true" /> 
<filter class="solr.LowerCaseFilterFactory"/> 
<filter class="solr.EnglishPossessiveFilterFactory"/> 
<filter class="solr.KeywordMarkerFilterFactory" 
protected="lang/en_protwords.txt"/> 
<filter class="solr.PorterStemFilterFactory"/> 
</analyzer> 
</fieldType> 
<fieldTypename="text_en" class="solr.TextField" positionIncrementGap="100"> 
<analyzer> 
<charFilterclass="solr.HTMLStripCharFilterFactory"/> 
<tokenizerclass="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="lang/en_synonyms.txt" IignoreCase="true" expand="true"/> 
<filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.ASCIIFoldingFilterFactory"/> 
<filter class="solr.KStemFilterFactory"/> 
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/> 
</analyzer> 
</fieldType>
Per-language Analysis Chains 
*Some of the 32 different languages configurations in Appendix B of Solrin Action
Per-language Analysis Chains 
*Some of the 32 different languages configurations in Appendix B of Solrin Action
Which Stemmer do I choose? 
*From Solrin Action, Chapter 14
Common English Stemmers 
*From Solrin Action, Chapter 14
When Stemming goes awry 
Fixing Stemming Mistakes: 
•Unfortunately, every stemmer will have problem-cases that aren’t handled as you would expect 
•Thankfully, Stemmers can be overriden 
•KeywordMarkerFilter: protects a list of terms you specify from being stemmed 
•StemmerOverrideFilter: applies a list of custom term mappings you specify 
Alternate strategy: 
•Use Lemmatization(root-form analysis) instead of Stemming 
•Commercial vendorshelp tremendously in this space(see http://www.basistech.com/case-study-career-builder/) 
•The Hunspellstemmer enables dictionary-based support of varying quality in over 100 languages
Stemming vs. Lemmatization 
•Stemming: algorithmic manipulation of text, based upon common per-language rules 
•Lemmatization: finds the dictionary form of a term (lemma means “root”) 
-dramatically improves precision(only matching terms that “should” match), while not significantly impacting recall(all terms that should match do match). 
*From Solrin Action, Chapter 14
Multilingual Search Strategies
Multilingual Search Strategies 
How do you handle: 
…a different language per document? 
…multiple languages in the same document? …multiple languages in the same field? 
Strategies: 
1)Separate field per language 
2)Separate collection/core per language 
3)All languages in one field
Strategy 1: Separate field per language 
*From Solrin Action, Chapter 14
Separate field per language 
<field name="id" type="string" indexed="true" stored="true" /> <field name="title" type="string" indexed="true" stored="true" /> <field name="content_english" type="text_english" indexed="true” stored="true" /> <field name="content_french" type="text_french" indexed="true” stored="true" /> <field name="content_spanish" type="text_spanish" indexed="true” stored="true" /> 
<fieldTypename="text_english" class="solr.TextField" 
positionIncrementGap="100"> 
<analyzer> 
<tokenizerclass="solr.StandardTokenizerFactory"/> 
<filter class="solr.StopFilterFactory” ignoreCase="true" 
words="lang/stopwords_en.txt"/> 
<filter class="solr.LowerCaseFilterFactory"/> 
<filter class="solr.EnglishPossessiveFilterFactory"/> 
<filter class="solr.KeywordMarkerFilterFactory" 
protected="protwords.txt"/> 
<filter class="solr.KStemFilterFactory"/> 
</analyzer> 
</fieldType> 
<fieldTypename="text_spanish" class="solr.TextField" 
positionIncrementGap="100"> 
<analyzer> 
<tokenizerclass="solr.StandardTokenizerFactory"/> 
<filter class="solr.LowerCaseFilterFactory"/> 
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/ 
stopwords_es.txt" format="snowball"/> 
<filter class="solr.SpanishLightStemFilterFactory"/> 
</analyzer> 
</fieldType> 
<fieldTypename="text_french" class="solr.TextField" 
positionIncrementGap="100"> 
<analyzer> 
<tokenizerclass="solr.StandardTokenizerFactory"/> 
<filter class="solr.ElisionFilterFactory” ignoreCase="true" 
articles="lang/contractions_fr.txt"/> 
<filter class="solr.LowerCaseFilterFactory"/> 
<filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fr.txt” format="snowball"/> 
<filter class="solr.FrenchLightStemFilterFactory"/> 
</analyzer> 
</fieldType> 
schema.xml 
*From Solrin Action, Chapter 14
Separate field per language: one language per document 
<doc> 
<field name="id">1</field> 
<fieldname="title">The Adventures of Huckleberry Finn</field> 
<field name="content_english">YOU don't know about me without you have read 
a book by the name of The Adventures of Tom Sawyer; but that ain'tno 
matter. That book was made by Mr. Mark Twain, and he told the truth, 
mainly. There was things which he stretched, but mainly he told the truth. 
<field> 
</doc> 
<doc> 
<field name="id ">2</field> 
<field name="title">Les Misérables</field> 
<field name="content_french">Nuln'auraitpule dire; tout cequ'onsavait, 
c'estque, lorsqu'ilrevintd'Italie, ilétaitprêtre. 
</field> 
</doc> 
<doc> 
<field name="id">3</field> 
<field name="title">Don Quixote</field> 
<field name="content_spanish">Demasiadacordurapuedeserla peorde las 
locuras, verla vidacomoesy no comodeberíade ser. 
</field> 
</doc> 
Query: 
http://localhost:8983/solr/field-per-language/select? 
fl=title& 
defType=edismax& 
qf=content_englishcontent_frenchcontent_spanish& 
q="he told the truth" OR"ilétaitprêtre" OR"verla vidacomoes" 
Response: 
{ 
"response":{"numFound":3,"start":0,"docs":[ 
{ 
"title":["The Adventures of Huckleberry Finn"]}, 
{ 
"title":["Don Quixote"]}, 
{ 
"title":["Les Misérables"]}] 
} 
*From Solrin Action, Chapter 14
Separate field per language: multiple languages per document 
Query 1: 
http://localhost:8983/solr/field-per-language/select? 
fl=title& 
defType=edismax& 
qf=content_englishcontent_frenchcontent_spanish& 
q="wisdom” 
Query 2: 
http://localhost:8983/solr/field-per-language/select?... 
q="sabiduría” 
Query 3: 
http://localhost:8983/solr/field-per-language/select?... 
q="sagesse” 
Response: (same for queries 1–3) 
{ 
"response":{"numFound":1,"start":0,"docs":[ 
{ 
"title":["Proverbs"]}] 
} 
Documents: 
<doc> 
<field name="id">4</field> 
<field name="title">Proverbs</field> 
<field name="content_spanish"> No la abandonesy ellavelarásobre 
ti, ámalay ellateprotegerá. Lo principal esla sabiduría; adquiere 
sabiduría, y con todolo queobtengasadquiereinteligencia. 
</field> 
<field name="content_english">Do not forsake wisdom, and she will protect you; love her, and she will watch over you. Wisdom is supreme; 
therefore get wisdom. Though it cost all you have, get understanding. 
</field> 
<field name="content_french">N'abandonnepas la sagesse, et ellete 
gardera, aime-la, et elleteprotégera. Voicile début de la sagesse: 
acquierslasagesse, procure-toile discernementau prix de tout cequetupossèdes. 
<field> 
</doc> 
*From Solrin Action, Chapter 14
Summary: Separate field per language 
*From Solrin Action, Chapter 14
Strategy 2: Separate collection per language 
*From Solrin Action, Chapter 14
Separate collection per language: schema.xml 
*From Solrin Action, Chapter 14
Separate collection per language: Indexing & Querying 
Indexing: 
cd $SOLR_IN_ACTION/example-docs/ 
java -jar -Durl=http://localhost:8983/solr/english/update post.jar 
➥ch14/documents/english.xml 
java -jar -Durl=http://localhost:8983/solr/spanish/update post.jar 
➥ch14/documents/spanish.xml 
java -jar -Durl=http://localhost:8983/solr/french/update post.jar 
➥ch14/documents/french.xml 
Query (collections in SolrCloud): 
http://localhost:8983/solr/aggregator/select? 
shards=english,spanish,french 
df=content& 
q=query in any language here 
Query (specific cores): 
http://localhost:8983/solr/aggregator/select? 
shards=localhost:8983/solr/english, 
localhost:8983/solr/spanish, 
localhost:8983/solr/french& 
df=content& 
q=query in any language here 
Documents: 
All documents just have a single “content” field. The documents get routedto a different language-specific Solrcollection based upon the language of the content field. 
*From Solrin Action, Chapter 14
Summary: Separate index per language 
*From Solrin Action, Chapter 14
Strategy 3: One Field for all languages 
*From Solrin Action, Chapter 14
One Field for all languages: Feature Status 
•Note: This feature is not yet committed to Solr 
•I’m working on it in my free time. Currently it supports: 
•Update Request Processorwhich canautomatically detect the languages of documentsand choose the correct analyzers 
•Field Type which allows dynamically choosing one or more analyzers on a per-field (indexing) and per term (querying) basis. 
•Current Code from Solr in Actionis available and is freely available on github. 
•There is a JIRA ticket open to ultimately contribute this back to Solr: Solr-6492 
•Some work is still necessary to make querying more user friendly.
One Field for all languages 
Step 1: Define Multilingual Field 
schema.xml: 
<fieldTypename="multilingual_text" class="sia.ch14.MultiTextField" 
sortMissingLast="true" defaultFieldType="text_general" 
fieldMappings="en:text_english, 
es:text_spanish, 
fr:text_french, 
de:text_german"/>[1] 
<field name="text" type="multilingual_text" indexed="true" multiValued="true" /> 
[1]Note that "text_english", "text_spanish", "text_french", and "text_german" refer to field types defined elsewhere in the schema.xml 
[2]Uses the "defaultFieldType", in this case "text_general", defined elsewhere in schema.xml 
<add><doc>… 
<field name="text">general keywords</field> [2] <field name="text”>en,es|theschool, lasescuelas</field>… </doc></add> <add><doc>… 
<field name="text">en|theschool</field> 
<field name="text">es|lasescuelas</field>… 
</doc></add> 
Step 2: Index documents 
http://localhost:8983/solr/collection1/select? q=es|escuelaOR en,es,de|schoolOR school [2] 
Step 3: Search
One Field For All Languages: Stacked Token Streams 
1) English Field 
2) Spanish Field 
3) English + Spanish combined in Multilingual Text Field 
multilingual_text 
①For each language requested, the appropriate field type is chosen 
②The input text is passed separately to the Analyzer chain for each field type 
③The resulting Token Streams from each Analyzer chain arestacked into a unified Token Stream based upon their position increments 
*Screenshot from Solrin Action, Chapter 14
Strategy 3: All languages in one field 
* 
*See Solrin Action, Chapter 14
Automatic Language Identification
Identifying languages in documents 
solrconfig.xml 
... 
<updateRequestProcessorChainname="langid"> 
<processorclass="org.apache.solr.update.processor. 
LangDetectLanguageIdentifierUpdateProcessorFactory"> 
<lstname="invariants"> 
<strname="langid.fl">content, content_lang1,content_lang2,content_lang3</str> 
<strname="langid.langField">language</str> 
<strname="langid.langsField">languages</str> 
... 
</lst> 
</processor> 
.. 
</updateRequestProcessorChain> 
… 
<requestHandlername="/update" class="solr.UpdateRequestHandler"> 
<lstname="invariants"> 
<strname="update.chain">langid</str> 
</lst> 
</requestHandler> 
... 
schema.xml 
... 
<field name="language" type="string" indexed="true" stored="true" /> 
<field name="languages" type="string" indexed="true" stored="true" multiValued="true"/> 
... 
*See Solrin Action, Chapter 14
Identifying languages in documents 
Sending documents: 
cd $SOLR_IN_ACTION/example-docs/ 
java -Durl=http://localhost:8983/solr/langid/update 
➥-jar post.jarch14/documents/langid.xml 
Query 
http://localhost:8983/solr/langid/select? 
q=*:*& 
fl=title,language,languages 
Results 
[{ "title":"TheAdventures of HuckelberryFinn", 
"language":"en", 
"languages":["en"]}, 
{ 
"title":"LesMisérables", 
"language":"fr", 
"languages":["fr"]}, 
{ 
"title":"DonQuoxite", 
"language":"es", 
"languages":["es"]}, 
{ 
"title":"Proverbs", 
"language":"fr", 
"languages":["fr”, "en”,"es"]}] 
*See Solrin Action, Chapter 14
Mapping data to language-specific fields 
solrconfig.xml 
... 
<updateRequestProcessorChainname="langid"> 
<processorclass="org.apache.solr.update.processor. 
LangDetectLanguageIdentifierUpdateProcessorFactory"> 
<lstname="invariants"> 
<strname="langid.fl">content</str> 
<strname="langid.langField">language</str> 
<strname="langid.map">true</str> 
<strname="langid.map.fl">content</str> 
<strname="langid.whitelist">en,es,fr</str> 
<strname="langid.map.lcmap"> en:englishes:spanishfr:french</str> 
<strname="langid.fallback">en</str> 
</lst> 
</processor> 
... 
</updateRequestProcessorChain> 
... 
Indexed Documents: 
[{ 
"title":"TheAdventures of Huckleberry Finn", 
"language":"en", 
"content_english":[ "YOU don't know about me without..."]}, 
{ 
"title":"LesMisérables", 
"language":"fr", 
"content_french":[ "Nuln'auraitpule dire; tout ce..."]}, 
{ 
"title":"DonQuixote", 
"language":"es", 
"content_spanish":[ "Demasiadacordurapuedeserla peor..."]}] 
}] 
*See Solrin Action, Chapter 14
Semantic Strategies
The need for Semantic Search 
User’s Query: machine learning research and development Portland, OR software engineer AND hadoopjava 
Traditional Query Parsing: (machine ANDlearningANDresearch ANDdevelopmentANDportland) OR(software ANDengineer ANDhadoopANDjava) 
Semantic Query Parsing: "machine learning" AND"research and development" AND"Portland, OR” AND"software engineer" ANDhadoopANDjava 
Semantically Expanded Query: ("machine learning"^10OR"data scientist" OR"data mining" OR"computer vision") AND("research and development"^10OR"r&d") ANDAND("Portland, OR"^10OR"Portland, Oregon" OR{!geofiltpt=45.512,-122.676 d=50sfield=geo}) AND("software engineer"^10OR"software developer") AND(hadoop^10OR"big data" ORhbaseORhive) AND(java^10 ORj2ee)
Semantic Search Architecture –Query Parsing 
1)Generate Model of Domain-specific phrases 
•Can mine query logs or actual text of documents for significant phrases within your domain [1] 
2)Feed known phrases to SolrTextTagger(uses LuceneFST for high-throughput term lookups) 
3)Use SolrTextTaggerto perform entity extraction on incoming queries(tagging documents is also optional) 
4)Shown on next slide: Pass extracted entities to a Query Augmentation phase to rewrite query with enhanced semantic understanding(synonyms, related keywords, related categories, etc.) 
[1] K. Aljadda, M. Korayem, T. Grainger, C. Russell. "CrowdsourcedQuery Augmentation through Semantic Discovery of Domain-specific Jargon," in IEEE Big Data 2014. 
[2]https://github.com/OpenSextant/SolrTextTagger
machine learning 
Keywords: 
Search Behavior, 
Application Behavior, etc. 
Job Title Classifier, Skills Extractor, Job Level Classifier, etc. 
Clustering relationships 
Semantic Query Augmentation 
keywords:((machine learning)^10OR { AT_LEAST_2: ("data mining"^0.9,matlab^0.8, "data scientist"^0.75, "artificial intelligence"^0.7, "neural networks"^0.55))} 
{ BOOST_TO_TOP:(job_title:( "software engineer" OR "data manager" OR "data scientist" OR "hadoopengineer"))} 
Modified Query: 
Related Occupations 
machine learning: {15-1031.00 .58Computer Software Engineers, Applications 
15-1011.00 .55 
Computer and Information Scientists, Research 
15-1032.00 .52 Computer Software Engineers, Systems Software } 
machine learning: 
{ software engineer .65, data manager .3, data scientist .25, hadoopengineer .2, } 
Common Job Titles 
Semantic Search Architecture –Query Augmentation 
Related Phrases 
machine learning: 
{ data mining .9, matlab.8, data scientist .75, artificial intelligence .7, neural networks .55 } 
Known keyword phrases 
java developer 
machine learningregistered nurse
Differentiating related terms 
Synonyms: cpa=> certified public accountant 
rn=> registered nurser.n. => registered nurseAmbiguous Terms*: driver=> driver (trucking)~80% driver => driver (software)~20% 
Related Terms: r.n. => nursing, bsnhadoop=> mapreduce, hive, pig 
*differentiated based upon user and query context
Semantic Search “under the hood”
2014 Publications & Presentations 
Books: 
Solrin Action-A comprehensive guide to implementing scalable search using Apache Solr 
Research papers: 
●Towards a Job title Classification System 
●Augmenting Recommendation Systems Using a Model of Semantically-related Terms Extracted from User Behavior 
●sCooL: A system for academic institution name normalization 
●CrowdsourcedQuery Augmentation through Semantic Discovery of Domain-specific jargon 
●PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems 
●SKILL: A System for Skill Identification and Normalization 
Speaking Engagements: 
●WSDM 2014 Workshop: “Web-Scale Classification: Classifying Big Data from the Web” 
●Atlanta SolrMeetup 
●Atlanta Big Data Meetup 
●The Second International Symposium on Big Data and Data Analytics 
●Lucene/SolrRevolution 2014 
●RecSys2014 
●IEEE Big Data Conference 2014
Conclusion 
•Language analysis options for each language are very configurable 
•There are multiple strategies for handling multilingual content based upon your use case 
•When in doubt, automatic language detection can be easily leveraged in your indexing pipeline 
•The next generation of query/relevancy improvements will be able to understand the intent of the user.
Contact Info 
Yes, WE ARE HIRING@CareerBuilder. Come talk with me if you are interested… 
Trey Grainger 
trey.grainger@careerbuilder.com@treygrainger 
http://solrinaction.com 
Conference discount (43% off):lusorevcftw 
Other presentations: http://www.treygrainger.com
Previous presentations:

Mais conteúdo relacionado

Mais procurados

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingYoung Seok Kim
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Julien Le Dem
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j InternalsTobias Lindaaker
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache SolrChristos Manios
 
微博cache设计谈
微博cache设计谈微博cache设计谈
微博cache设计谈Tim Y
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveYingjun Wu
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemDatabricks
 
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Efficient Spark Analytics on Encrypted Data with Gidon GershinskyDatabricks
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...StreamNative
 
Introduction To Kibana
Introduction To KibanaIntroduction To Kibana
Introduction To KibanaJen Stirrup
 
When Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu MaWhen Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu MaDatabricks
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge GraphTrey Grainger
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildSujit Pal
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBMongoDB
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3DataWorks Summit
 
[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우NAVER D2
 
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simonlucenerevolution
 

Mais procurados (20)

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
Apache Solr
Apache SolrApache Solr
Apache Solr
 
BERT
BERTBERT
BERT
 
An overview of Neo4j Internals
An overview of Neo4j InternalsAn overview of Neo4j Internals
An overview of Neo4j Internals
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 
微博cache设计谈
微博cache设计谈微博cache设计谈
微博cache设计谈
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWave
 
The Apache Spark File Format Ecosystem
The Apache Spark File Format EcosystemThe Apache Spark File Format Ecosystem
The Apache Spark File Format Ecosystem
 
Introduction to NLTK
Introduction to NLTKIntroduction to NLTK
Introduction to NLTK
 
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
Efficient Spark Analytics on Encrypted Data with Gidon Gershinsky
 
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
 
Introduction To Kibana
Introduction To KibanaIntroduction To Kibana
Introduction To Kibana
 
When Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu MaWhen Apache Spark Meets TiDB with Xiaoyu Ma
When Apache Spark Meets TiDB with Xiaoyu Ma
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
 
An Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDBAn Enterprise Architect's View of MongoDB
An Enterprise Architect's View of MongoDB
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
 
[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우[226]대용량 텍스트마이닝 기술 하정우
[226]대용량 텍스트마이닝 기술 하정우
 
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer SimonDocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
DocValues aka. Column Stride Fields in Lucene 4.0 - By Willnauer Simon
 

Semelhante a Semantic & Multilingual Strategies in Lucene/Solr

Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6DEEPAK KHETAWAT
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemTrey Grainger
 
Text based search engine on a fixed corpus and utilizing indexation and ranki...
Text based search engine on a fixed corpus and utilizing indexation and ranki...Text based search engine on a fixed corpus and utilizing indexation and ranki...
Text based search engine on a fixed corpus and utilizing indexation and ranki...Soham Mondal
 
Elasticsearch Analyzers Field-Level Optimization.pdf
Elasticsearch Analyzers Field-Level Optimization.pdfElasticsearch Analyzers Field-Level Optimization.pdf
Elasticsearch Analyzers Field-Level Optimization.pdfInexture Solutions
 
Find it, possibly also near you!
Find it, possibly also near you!Find it, possibly also near you!
Find it, possibly also near you!Paul Borgermans
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
CrashCourse: XML technologies
CrashCourse: XML technologiesCrashCourse: XML technologies
CrashCourse: XML technologiesESRI Bulgaria
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrTrey Grainger
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation EnginesTrey Grainger
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHPPaul Borgermans
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEcommerce Solution Provider SysIQ
 
Search explained T3DD15
Search explained T3DD15Search explained T3DD15
Search explained T3DD15Hans Höchtl
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1 GokulD
 

Semelhante a Semantic & Multilingual Strategies in Lucene/Solr (20)

Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6Basics of Solr and Solr Integration with AEM6
Basics of Solr and Solr Integration with AEM6
 
Ir 03
Ir   03Ir   03
Ir 03
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Lucene And Solr Intro
Lucene And Solr IntroLucene And Solr Intro
Lucene And Solr Intro
 
Text based search engine on a fixed corpus and utilizing indexation and ranki...
Text based search engine on a fixed corpus and utilizing indexation and ranki...Text based search engine on a fixed corpus and utilizing indexation and ranki...
Text based search engine on a fixed corpus and utilizing indexation and ranki...
 
Elasticsearch Analyzers Field-Level Optimization.pdf
Elasticsearch Analyzers Field-Level Optimization.pdfElasticsearch Analyzers Field-Level Optimization.pdf
Elasticsearch Analyzers Field-Level Optimization.pdf
 
Find it, possibly also near you!
Find it, possibly also near you!Find it, possibly also near you!
Find it, possibly also near you!
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
CrashCourse: XML technologies
CrashCourse: XML technologiesCrashCourse: XML technologies
CrashCourse: XML technologies
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Self-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache SolrSelf-learned Relevancy with Apache Solr
Self-learned Relevancy with Apache Solr
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
Text analytics
Text analyticsText analytics
Text analytics
 
Get the most out of Solr search with PHP
Get the most out of Solr search with PHPGet the most out of Solr search with PHP
Get the most out of Solr search with PHP
 
Search pitb
Search pitbSearch pitb
Search pitb
 
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so coolEnterprise Search Solution: Apache SOLR. What's available and why it's so cool
Enterprise Search Solution: Apache SOLR. What's available and why it's so cool
 
Search explained T3DD15
Search explained T3DD15Search explained T3DD15
Search explained T3DD15
 
Lucene Bootcamp -1
Lucene Bootcamp -1 Lucene Bootcamp -1
Lucene Bootcamp -1
 
NLTK
NLTKNLTK
NLTK
 

Mais de Trey Grainger

Balancing the Dimensions of User Intent
Balancing the Dimensions of User IntentBalancing the Dimensions of User Intent
Balancing the Dimensions of User IntentTrey Grainger
 
Reflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationReflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationTrey Grainger
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchTrey Grainger
 
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Trey Grainger
 
The Next Generation of AI-powered Search
The Next Generation of AI-powered SearchThe Next Generation of AI-powered Search
The Next Generation of AI-powered SearchTrey Grainger
 
Natural Language Search with Knowledge Graphs (Activate 2019)
Natural Language Search with Knowledge Graphs (Activate 2019)Natural Language Search with Knowledge Graphs (Activate 2019)
Natural Language Search with Knowledge Graphs (Activate 2019)Trey Grainger
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementTrey Grainger
 
Measuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceMeasuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceTrey Grainger
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Trey Grainger
 
The Future of Search and AI
The Future of Search and AIThe Future of Search and AI
The Future of Search and AITrey Grainger
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search SystemTrey Grainger
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphTrey Grainger
 
Searching for Meaning
Searching for MeaningSearching for Meaning
Searching for MeaningTrey Grainger
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesTrey Grainger
 
The Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphThe Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphTrey Grainger
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsTrey Grainger
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelTrey Grainger
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemTrey Grainger
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systemsTrey Grainger
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Trey Grainger
 

Mais de Trey Grainger (20)

Balancing the Dimensions of User Intent
Balancing the Dimensions of User IntentBalancing the Dimensions of User Intent
Balancing the Dimensions of User Intent
 
Reflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital TransformationReflected Intelligence: Real world AI in Digital Transformation
Reflected Intelligence: Real world AI in Digital Transformation
 
Thought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered SearchThought Vectors and Knowledge Graphs in AI-powered Search
Thought Vectors and Knowledge Graphs in AI-powered Search
 
Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)Natural Language Search with Knowledge Graphs (Chicago Meetup)
Natural Language Search with Knowledge Graphs (Chicago Meetup)
 
The Next Generation of AI-powered Search
The Next Generation of AI-powered SearchThe Next Generation of AI-powered Search
The Next Generation of AI-powered Search
 
Natural Language Search with Knowledge Graphs (Activate 2019)
Natural Language Search with Knowledge Graphs (Activate 2019)Natural Language Search with Knowledge Graphs (Activate 2019)
Natural Language Search with Knowledge Graphs (Activate 2019)
 
AI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge ManagementAI, Search, and the Disruption of Knowledge Management
AI, Search, and the Disruption of Knowledge Management
 
Measuring Relevance in the Negative Space
Measuring Relevance in the Negative SpaceMeasuring Relevance in the Negative Space
Measuring Relevance in the Negative Space
 
Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)Natural Language Search with Knowledge Graphs (Haystack 2019)
Natural Language Search with Knowledge Graphs (Haystack 2019)
 
The Future of Search and AI
The Future of Search and AIThe Future of Search and AI
The Future of Search and AI
 
How to Build a Semantic Search System
How to Build a Semantic Search SystemHow to Build a Semantic Search System
How to Build a Semantic Search System
 
The Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge GraphThe Relevance of the Apache Solr Semantic Knowledge Graph
The Relevance of the Apache Solr Semantic Knowledge Graph
 
Searching for Meaning
Searching for MeaningSearching for Meaning
Searching for Meaning
 
The Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation EnginesThe Intent Algorithms of Search & Recommendation Engines
The Intent Algorithms of Search & Recommendation Engines
 
The Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge GraphThe Apache Solr Semantic Knowledge Graph
The Apache Solr Semantic Knowledge Graph
 
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval SystemsIntent Algorithms: The Data Science of Smart Information Retrieval Systems
Intent Algorithms: The Data Science of Smart Information Retrieval Systems
 
South Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis PanelSouth Big Data Hub: Text Data Analysis Panel
South Big Data Hub: Text Data Analysis Panel
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
Searching on Intent: Knowledge Graphs, Personalization, and Contextual Disamb...
 

Último

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Último (20)

🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Semantic & Multilingual Strategies in Lucene/Solr

  • 1. Semantic & Multilingual Strategies in Lucene/Solr Trey Grainger Director of Engineering, Search & Analytics@CareerBuilder
  • 2. Outline •Introduction •Text Analysis Refresher •Language-specific text Analysis •Multilingual Search Strategies •Automatic Language Identification •Semantic Search Strategies (understanding “meaning”) •Conclusion
  • 3. About Me Trey Grainger Director of Engineering, Search & Analytics Joined CareerBuilderin 2007 as Software Engineer MBA, Management of Technology –GA Tech BA, Computer Science, Business, & Philosophy –Furman University Mining Massive Datasets (in progress) -Stanford University Fun outside of CB: •Co-author of Solr in Action, plus several research papers •Frequent conference speaker •Founder of Celiaccess.com, the gluten-free search engine •Lucene/Solrcontributor
  • 6. Text Analysis Refresher A text field in Lucene/Solrhas an Analyzer containing: ①Zero or more CharFilters Takes incoming text and “cleans it up” before it is tokenized ②One Tokenizer Splits incoming text into a Token Stream containing Zero or more Tokens ③Zero or more TokenFilters Examines and optionally modifies each Token in the Token Stream *From Solrin Action, Chapter 6
  • 7. Text Analysis Refresher A text field in Lucene/Solrhas an Analyzer containing: ①Zero or more CharFilters Takes incoming text and “cleans it up” before it is tokenized ②One Tokenizer Splits incoming text into a Token Stream containing Zero or more Tokens ③Zero or more TokenFilters Examines and optionally modifies each Token in the Token Stream *From Solrin Action, Chapter 6
  • 8. Text Analysis Refresher A text field in Lucene/Solrhas an Analyzer containing: ①Zero or more CharFilters Takes incoming text and “cleans it up” before it is tokenized ②OneTokenizer Splits incoming text into a Token Stream containing Zero or more Tokens ③Zero or more TokenFilters Examines and optionally modifies each Token in the Token Stream *From Solrin Action, Chapter 6
  • 9. Text Analysis Refresher A text field in Lucene/Solrhas an Analyzer containing: ①Zero or more CharFilters Takes incoming text and “cleans it up” before it is tokenized ②One Tokenizer Splits incoming text into a Token Stream containing Zero or more Tokens ③Zero or more TokenFilters Examines and optionally modifies each Token in the Token Stream *From Solrin Action, Chapter 6
  • 11. Example English Analysis Chains <fieldTypename="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizerclass="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" words="lang/stopwords_en.txt” ignoreCase="true" /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="lang/en_protwords.txt"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> </fieldType> <fieldTypename="text_en" class="solr.TextField" positionIncrementGap="100"> <analyzer> <charFilterclass="solr.HTMLStripCharFilterFactory"/> <tokenizerclass="solr.WhitespaceTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="lang/en_synonyms.txt" IignoreCase="true" expand="true"/> <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.KStemFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType>
  • 12. Per-language Analysis Chains *Some of the 32 different languages configurations in Appendix B of Solrin Action
  • 13. Per-language Analysis Chains *Some of the 32 different languages configurations in Appendix B of Solrin Action
  • 14. Which Stemmer do I choose? *From Solrin Action, Chapter 14
  • 15. Common English Stemmers *From Solrin Action, Chapter 14
  • 16. When Stemming goes awry Fixing Stemming Mistakes: •Unfortunately, every stemmer will have problem-cases that aren’t handled as you would expect •Thankfully, Stemmers can be overriden •KeywordMarkerFilter: protects a list of terms you specify from being stemmed •StemmerOverrideFilter: applies a list of custom term mappings you specify Alternate strategy: •Use Lemmatization(root-form analysis) instead of Stemming •Commercial vendorshelp tremendously in this space(see http://www.basistech.com/case-study-career-builder/) •The Hunspellstemmer enables dictionary-based support of varying quality in over 100 languages
  • 17. Stemming vs. Lemmatization •Stemming: algorithmic manipulation of text, based upon common per-language rules •Lemmatization: finds the dictionary form of a term (lemma means “root”) -dramatically improves precision(only matching terms that “should” match), while not significantly impacting recall(all terms that should match do match). *From Solrin Action, Chapter 14
  • 19. Multilingual Search Strategies How do you handle: …a different language per document? …multiple languages in the same document? …multiple languages in the same field? Strategies: 1)Separate field per language 2)Separate collection/core per language 3)All languages in one field
  • 20. Strategy 1: Separate field per language *From Solrin Action, Chapter 14
  • 21. Separate field per language <field name="id" type="string" indexed="true" stored="true" /> <field name="title" type="string" indexed="true" stored="true" /> <field name="content_english" type="text_english" indexed="true” stored="true" /> <field name="content_french" type="text_french" indexed="true” stored="true" /> <field name="content_spanish" type="text_spanish" indexed="true” stored="true" /> <fieldTypename="text_english" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizerclass="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory” ignoreCase="true" words="lang/stopwords_en.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/> <filter class="solr.KStemFilterFactory"/> </analyzer> </fieldType> <fieldTypename="text_spanish" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizerclass="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/ stopwords_es.txt" format="snowball"/> <filter class="solr.SpanishLightStemFilterFactory"/> </analyzer> </fieldType> <fieldTypename="text_french" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizerclass="solr.StandardTokenizerFactory"/> <filter class="solr.ElisionFilterFactory” ignoreCase="true" articles="lang/contractions_fr.txt"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_fr.txt” format="snowball"/> <filter class="solr.FrenchLightStemFilterFactory"/> </analyzer> </fieldType> schema.xml *From Solrin Action, Chapter 14
  • 22. Separate field per language: one language per document <doc> <field name="id">1</field> <fieldname="title">The Adventures of Huckleberry Finn</field> <field name="content_english">YOU don't know about me without you have read a book by the name of The Adventures of Tom Sawyer; but that ain'tno matter. That book was made by Mr. Mark Twain, and he told the truth, mainly. There was things which he stretched, but mainly he told the truth. <field> </doc> <doc> <field name="id ">2</field> <field name="title">Les Misérables</field> <field name="content_french">Nuln'auraitpule dire; tout cequ'onsavait, c'estque, lorsqu'ilrevintd'Italie, ilétaitprêtre. </field> </doc> <doc> <field name="id">3</field> <field name="title">Don Quixote</field> <field name="content_spanish">Demasiadacordurapuedeserla peorde las locuras, verla vidacomoesy no comodeberíade ser. </field> </doc> Query: http://localhost:8983/solr/field-per-language/select? fl=title& defType=edismax& qf=content_englishcontent_frenchcontent_spanish& q="he told the truth" OR"ilétaitprêtre" OR"verla vidacomoes" Response: { "response":{"numFound":3,"start":0,"docs":[ { "title":["The Adventures of Huckleberry Finn"]}, { "title":["Don Quixote"]}, { "title":["Les Misérables"]}] } *From Solrin Action, Chapter 14
  • 23. Separate field per language: multiple languages per document Query 1: http://localhost:8983/solr/field-per-language/select? fl=title& defType=edismax& qf=content_englishcontent_frenchcontent_spanish& q="wisdom” Query 2: http://localhost:8983/solr/field-per-language/select?... q="sabiduría” Query 3: http://localhost:8983/solr/field-per-language/select?... q="sagesse” Response: (same for queries 1–3) { "response":{"numFound":1,"start":0,"docs":[ { "title":["Proverbs"]}] } Documents: <doc> <field name="id">4</field> <field name="title">Proverbs</field> <field name="content_spanish"> No la abandonesy ellavelarásobre ti, ámalay ellateprotegerá. Lo principal esla sabiduría; adquiere sabiduría, y con todolo queobtengasadquiereinteligencia. </field> <field name="content_english">Do not forsake wisdom, and she will protect you; love her, and she will watch over you. Wisdom is supreme; therefore get wisdom. Though it cost all you have, get understanding. </field> <field name="content_french">N'abandonnepas la sagesse, et ellete gardera, aime-la, et elleteprotégera. Voicile début de la sagesse: acquierslasagesse, procure-toile discernementau prix de tout cequetupossèdes. <field> </doc> *From Solrin Action, Chapter 14
  • 24. Summary: Separate field per language *From Solrin Action, Chapter 14
  • 25. Strategy 2: Separate collection per language *From Solrin Action, Chapter 14
  • 26. Separate collection per language: schema.xml *From Solrin Action, Chapter 14
  • 27. Separate collection per language: Indexing & Querying Indexing: cd $SOLR_IN_ACTION/example-docs/ java -jar -Durl=http://localhost:8983/solr/english/update post.jar ➥ch14/documents/english.xml java -jar -Durl=http://localhost:8983/solr/spanish/update post.jar ➥ch14/documents/spanish.xml java -jar -Durl=http://localhost:8983/solr/french/update post.jar ➥ch14/documents/french.xml Query (collections in SolrCloud): http://localhost:8983/solr/aggregator/select? shards=english,spanish,french df=content& q=query in any language here Query (specific cores): http://localhost:8983/solr/aggregator/select? shards=localhost:8983/solr/english, localhost:8983/solr/spanish, localhost:8983/solr/french& df=content& q=query in any language here Documents: All documents just have a single “content” field. The documents get routedto a different language-specific Solrcollection based upon the language of the content field. *From Solrin Action, Chapter 14
  • 28. Summary: Separate index per language *From Solrin Action, Chapter 14
  • 29. Strategy 3: One Field for all languages *From Solrin Action, Chapter 14
  • 30. One Field for all languages: Feature Status •Note: This feature is not yet committed to Solr •I’m working on it in my free time. Currently it supports: •Update Request Processorwhich canautomatically detect the languages of documentsand choose the correct analyzers •Field Type which allows dynamically choosing one or more analyzers on a per-field (indexing) and per term (querying) basis. •Current Code from Solr in Actionis available and is freely available on github. •There is a JIRA ticket open to ultimately contribute this back to Solr: Solr-6492 •Some work is still necessary to make querying more user friendly.
  • 31. One Field for all languages Step 1: Define Multilingual Field schema.xml: <fieldTypename="multilingual_text" class="sia.ch14.MultiTextField" sortMissingLast="true" defaultFieldType="text_general" fieldMappings="en:text_english, es:text_spanish, fr:text_french, de:text_german"/>[1] <field name="text" type="multilingual_text" indexed="true" multiValued="true" /> [1]Note that "text_english", "text_spanish", "text_french", and "text_german" refer to field types defined elsewhere in the schema.xml [2]Uses the "defaultFieldType", in this case "text_general", defined elsewhere in schema.xml <add><doc>… <field name="text">general keywords</field> [2] <field name="text”>en,es|theschool, lasescuelas</field>… </doc></add> <add><doc>… <field name="text">en|theschool</field> <field name="text">es|lasescuelas</field>… </doc></add> Step 2: Index documents http://localhost:8983/solr/collection1/select? q=es|escuelaOR en,es,de|schoolOR school [2] Step 3: Search
  • 32. One Field For All Languages: Stacked Token Streams 1) English Field 2) Spanish Field 3) English + Spanish combined in Multilingual Text Field multilingual_text ①For each language requested, the appropriate field type is chosen ②The input text is passed separately to the Analyzer chain for each field type ③The resulting Token Streams from each Analyzer chain arestacked into a unified Token Stream based upon their position increments *Screenshot from Solrin Action, Chapter 14
  • 33. Strategy 3: All languages in one field * *See Solrin Action, Chapter 14
  • 35. Identifying languages in documents solrconfig.xml ... <updateRequestProcessorChainname="langid"> <processorclass="org.apache.solr.update.processor. LangDetectLanguageIdentifierUpdateProcessorFactory"> <lstname="invariants"> <strname="langid.fl">content, content_lang1,content_lang2,content_lang3</str> <strname="langid.langField">language</str> <strname="langid.langsField">languages</str> ... </lst> </processor> .. </updateRequestProcessorChain> … <requestHandlername="/update" class="solr.UpdateRequestHandler"> <lstname="invariants"> <strname="update.chain">langid</str> </lst> </requestHandler> ... schema.xml ... <field name="language" type="string" indexed="true" stored="true" /> <field name="languages" type="string" indexed="true" stored="true" multiValued="true"/> ... *See Solrin Action, Chapter 14
  • 36. Identifying languages in documents Sending documents: cd $SOLR_IN_ACTION/example-docs/ java -Durl=http://localhost:8983/solr/langid/update ➥-jar post.jarch14/documents/langid.xml Query http://localhost:8983/solr/langid/select? q=*:*& fl=title,language,languages Results [{ "title":"TheAdventures of HuckelberryFinn", "language":"en", "languages":["en"]}, { "title":"LesMisérables", "language":"fr", "languages":["fr"]}, { "title":"DonQuoxite", "language":"es", "languages":["es"]}, { "title":"Proverbs", "language":"fr", "languages":["fr”, "en”,"es"]}] *See Solrin Action, Chapter 14
  • 37. Mapping data to language-specific fields solrconfig.xml ... <updateRequestProcessorChainname="langid"> <processorclass="org.apache.solr.update.processor. LangDetectLanguageIdentifierUpdateProcessorFactory"> <lstname="invariants"> <strname="langid.fl">content</str> <strname="langid.langField">language</str> <strname="langid.map">true</str> <strname="langid.map.fl">content</str> <strname="langid.whitelist">en,es,fr</str> <strname="langid.map.lcmap"> en:englishes:spanishfr:french</str> <strname="langid.fallback">en</str> </lst> </processor> ... </updateRequestProcessorChain> ... Indexed Documents: [{ "title":"TheAdventures of Huckleberry Finn", "language":"en", "content_english":[ "YOU don't know about me without..."]}, { "title":"LesMisérables", "language":"fr", "content_french":[ "Nuln'auraitpule dire; tout ce..."]}, { "title":"DonQuixote", "language":"es", "content_spanish":[ "Demasiadacordurapuedeserla peor..."]}] }] *See Solrin Action, Chapter 14
  • 39. The need for Semantic Search User’s Query: machine learning research and development Portland, OR software engineer AND hadoopjava Traditional Query Parsing: (machine ANDlearningANDresearch ANDdevelopmentANDportland) OR(software ANDengineer ANDhadoopANDjava) Semantic Query Parsing: "machine learning" AND"research and development" AND"Portland, OR” AND"software engineer" ANDhadoopANDjava Semantically Expanded Query: ("machine learning"^10OR"data scientist" OR"data mining" OR"computer vision") AND("research and development"^10OR"r&d") ANDAND("Portland, OR"^10OR"Portland, Oregon" OR{!geofiltpt=45.512,-122.676 d=50sfield=geo}) AND("software engineer"^10OR"software developer") AND(hadoop^10OR"big data" ORhbaseORhive) AND(java^10 ORj2ee)
  • 40. Semantic Search Architecture –Query Parsing 1)Generate Model of Domain-specific phrases •Can mine query logs or actual text of documents for significant phrases within your domain [1] 2)Feed known phrases to SolrTextTagger(uses LuceneFST for high-throughput term lookups) 3)Use SolrTextTaggerto perform entity extraction on incoming queries(tagging documents is also optional) 4)Shown on next slide: Pass extracted entities to a Query Augmentation phase to rewrite query with enhanced semantic understanding(synonyms, related keywords, related categories, etc.) [1] K. Aljadda, M. Korayem, T. Grainger, C. Russell. "CrowdsourcedQuery Augmentation through Semantic Discovery of Domain-specific Jargon," in IEEE Big Data 2014. [2]https://github.com/OpenSextant/SolrTextTagger
  • 41. machine learning Keywords: Search Behavior, Application Behavior, etc. Job Title Classifier, Skills Extractor, Job Level Classifier, etc. Clustering relationships Semantic Query Augmentation keywords:((machine learning)^10OR { AT_LEAST_2: ("data mining"^0.9,matlab^0.8, "data scientist"^0.75, "artificial intelligence"^0.7, "neural networks"^0.55))} { BOOST_TO_TOP:(job_title:( "software engineer" OR "data manager" OR "data scientist" OR "hadoopengineer"))} Modified Query: Related Occupations machine learning: {15-1031.00 .58Computer Software Engineers, Applications 15-1011.00 .55 Computer and Information Scientists, Research 15-1032.00 .52 Computer Software Engineers, Systems Software } machine learning: { software engineer .65, data manager .3, data scientist .25, hadoopengineer .2, } Common Job Titles Semantic Search Architecture –Query Augmentation Related Phrases machine learning: { data mining .9, matlab.8, data scientist .75, artificial intelligence .7, neural networks .55 } Known keyword phrases java developer machine learningregistered nurse
  • 42. Differentiating related terms Synonyms: cpa=> certified public accountant rn=> registered nurser.n. => registered nurseAmbiguous Terms*: driver=> driver (trucking)~80% driver => driver (software)~20% Related Terms: r.n. => nursing, bsnhadoop=> mapreduce, hive, pig *differentiated based upon user and query context
  • 44. 2014 Publications & Presentations Books: Solrin Action-A comprehensive guide to implementing scalable search using Apache Solr Research papers: ●Towards a Job title Classification System ●Augmenting Recommendation Systems Using a Model of Semantically-related Terms Extracted from User Behavior ●sCooL: A system for academic institution name normalization ●CrowdsourcedQuery Augmentation through Semantic Discovery of Domain-specific jargon ●PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems ●SKILL: A System for Skill Identification and Normalization Speaking Engagements: ●WSDM 2014 Workshop: “Web-Scale Classification: Classifying Big Data from the Web” ●Atlanta SolrMeetup ●Atlanta Big Data Meetup ●The Second International Symposium on Big Data and Data Analytics ●Lucene/SolrRevolution 2014 ●RecSys2014 ●IEEE Big Data Conference 2014
  • 45. Conclusion •Language analysis options for each language are very configurable •There are multiple strategies for handling multilingual content based upon your use case •When in doubt, automatic language detection can be easily leveraged in your indexing pipeline •The next generation of query/relevancy improvements will be able to understand the intent of the user.
  • 46. Contact Info Yes, WE ARE HIRING@CareerBuilder. Come talk with me if you are interested… Trey Grainger trey.grainger@careerbuilder.com@treygrainger http://solrinaction.com Conference discount (43% off):lusorevcftw Other presentations: http://www.treygrainger.com