SlideShare uma empresa Scribd logo
1 de 51
Baixar para ler offline
Where Search Meets Machine Learning
Diana Hu @sdianahu — Data Science Lead, Verizon
Joaquin Delgado @joaquind — Director of Engineering, Verizon
Disclaimer
2
The content of this presentation are of the authors’
personal statements and does not officially represent their
employer’s view in anyway. Included content is especially
not intended to convey the views of OnCue or Verizon
01
Index
1.  Introduction
2.  Search and Information Retrieval
3.  ML problems as Search-based Systems
4.  ML Meets Search!
Introduction
Scaling learning systems is hard!
•  Millions of users, items
•  Billions of features
•  Imbalanced Datasets
•  Complex Distributed Systems
•  Many algorithms have not been tested at “Internet Scale”
Typical approaches
•  Distributed systems – Fault tolerance, Throughput vs.
latency
•  Parallelization Strategies – Hashing, trees
•  Processing – Map reduce variants, MPI, graph parallel
•  Databases – Key/Value Stores, NoSQL
Such a custom system requires TLC
Search and
Information Retrieval
Search
Search is about finding specific things that are either known
or assumed to exist, Discovery is about is about helping the
user encounter what he/she didn’t even know exists.
•  Focused on Search: Search Engines, Database Systems
•  Focused on Discovery: Recommender Systems, Advertising
Predicate Logic and Declarative Languages Rock!
Search stack
Matched Hits
Representation
Function
Similarity
Calculation
Matched HitsDocuments
Representation
Function
Input Query
Matched HitsMatched HitsRetrieved Documents
Online
Processing
Offline
Processing
(*)RelevanceFeedback
Query Representation
Doc Representation Index
*Metadata Engineering
(*) Optional
Relevance: Vector Space Model
Search Engines: the big hammer
•  Search engines are largely used to solve non-IR
search problems, because:
•  Widely Available
•  Fast and Scalable
•  Integrates well with existing data stores
But… Are we using the right tool?
•  Search Engines were originally designed for IR.
•  Complex non-IR search tasks sometimes require a two
phase approach
Phase1) Filter Phase 2) Rank
Finding commonalities
Relevance
aka Ranking
RecSys
Discovery
IRSearch
Advertising
ML problems as
Search-based Systems
Machine Learning
Machine Learning in particular supervised learning refer to
techniques used to learn how to classify or score previously
unseen objects based on a training dataset
Inference and Generalization are the Key!
Supervised learning pipeline
Learning systems’ stack
Visualization / UI
Retrieval
Ranking
Query Generation and
Contextual Pre-filtering
Model Building
Index Building
Data/Events Collections
Data Analytics
Contextual Post Filtering
OnlineOffline
Experimentation
Case study: Recommender Systems
•  Reduce information load by estimating relevance
•  Ranking (aka Relevance) Approaches:
•  Collaborative filtering
•  Content Based
•  Knowledge Based
•  Hybrid
•  Beyond rating prediction and ranking
•  Business filtering logic
•  Low latency and Scale
RecSys: Content based models
•  Rec Task: Given a user profile find the best matching items by their
attributes
•  Similarity calculation: based on keyword overlap between user/items
•  Neighborhood method (i.e. nearest neighbor)
•  Query-based retrieval (i.e. Rocchio’s method)
•  Probabilistic methods (classical text classification)
•  Explicit decision models
•  Feature representation: based on content analysis
•  Vector space model
•  TF-IDF
•  Topic Modeling
RecSys: Collaborative Filtering
Matrix
Factorization
Rating
Dataset
User
Factors
Item
Factors
Re-Ranking
Model
Input Query
Online
Processing
Offline
Processing
Recommendations
RecSys: Collaborative Filtering
Matrix
Factorization
Rating
Dataset
User
Factors
Item
Factors
Re-Ranking
Model
Input Query
Online
Processing
Offline
Processing
Recommendations
ML Meets Search! ML Search
Remember the elephant?
Visualization / UI
Retrieval
Ranking
Query Generation and
Contextual Pre-filtering
Model Building
Index Building
Data/Events Collections
Data Analytics
Contextual Post Filtering
OnlineOffline
Experimentation
Simplifying the stack!
Visualization / UI
Query Generation and
Contextual Pre-filtering
Model Building
Index Building
Data/Events Collections
Data Analytics
OnlineOffline
Experimentation
Retrieval
Contextual Post Filtering
Ranking
Search stack
Matched Hits
Representation
Function
Similarity
Calculation
Matched HitsDocuments
Representation
Function
Input Query
Matched HitsMatched HitsRetrieved Documents
Online
Processing
Offline
Processing
(*)RelevanceFeedback
Query Representation
Doc Representation Index
*Metadata Engineering
(*) Optional
Simplifying the Search stack
Matched Hits
Representation
Function
Similarity
Calculation
Matched HitsDocuments
Representation
Function
Input Query
Matched HitsMatched HitsRetrieved Documents
Online
Processing
Offline
Processing
(*)RelevanceFeedback
Query Representation
Doc Representation Index
*Metadata Engineering
(*) Optional
Retrieval
Contextual Post Filtering
Ranking
ML-Scoring Plugin
Serialized
ML Model
ML-Scoring architecture
Lucene/Solr
Instances +
Labels
Instances
Index
ML
Scoring
Plugin
Serialized
ML Model
Online
Processing
Offline
Processing
Trainer
+
Indexer
ML-Scoring Options
•  Option A: Solr FunctionQuery
•  Pro: Model is just a query!
•  Cons: Limits expressiveness of models
•  Option B: Solr Custom Function Query
•  Pro: Loading any type of model (also PMML)
•  Cons: Memory limitations, also multiple model reloading
•  Option C: Lucene CustomScoreQuery
•  Pro: Can use PMML and tune how PMML gets loaded
•  Cons: No control on matches
•  Option D: Lucene Low level Custom Query
•  *Mahout vectors from Lucene text (only trains, so not an option)
Real-life Problem
•  Census database that contains documents with the following
fields:
1. Age: continuous; 2. Workclass: 8 values; 3. Fnlwgt: continuous.; 4.
Education: 16 values; 5. Education-num: continuous.; 6. Marital-status: 7
values; 7. Occupation: 14 values; 8. Relationship: 6 values; 9. Race: 5
values; 10. Sex: Male, Female; 11. Capital-gain: continuous.;12. Capital-
loss: continuous.; 13. Hours-per-week: continuous.; 14. Native-country:
41 values; 15. >50K Income: Yes, No.
•  Task is to predict whether a person makes more than 50k a
year based on their attributes
1) Learn from the (training) data
Naïve
Bayes
SVM
Logistic
Regression
Decision
Trees
Train with your favorite
ML Framework
Option A: Just a Solr Function Query
q=“sum(C,	
  
	
   	
  	
  	
  	
  	
  product(age,w1),	
  
	
   	
  	
  	
  	
  	
  product(Workclass,w2),	
  
	
   	
  	
  	
  	
  	
  product(Fnlwgt,	
  w3),	
  
	
   	
  	
  	
  	
  	
  product(Education,	
  w4),	
  
	
   	
  	
  	
  	
  	
  ….)”	
  
Serialized ML Model
as Query
Trainer
+
Indexer
Y_prediction = C + XB
May result in a crazy Solr functionQuery
See more at https://wiki.apache.org/solr/FunctionQuery
q=dismax&bf="ord(educaton-num)^0.5 recip(rord(age),1,1000,1000)^0.3"
What about models like this?
Option B: Custom Solr FuntionQuery
1.  Subclass org.apache.solr.search.ValueSourceParser.
public	
  class	
  MyValueSourceParser	
  extends	
  ValueSourceParser	
  {	
  
	
  public	
  void	
  init	
  (NamedList	
  namedList)	
  {	
  
	
  	
  	
  	
  	
  …	
  
	
  	
  }	
  
	
  	
  public	
  ValueSource	
  parse(FunctionQParser	
  fqp)	
  throws	
  ParseException	
  {	
  
	
  	
  	
  	
  	
  return	
  new	
  MyValueSource();	
  
	
  	
  }	
  
}
2.  In solrconfig.xml, register your new ValueSourceParser directly under the <config> tag
<valueSourceParser	
  name=“myfunc”	
  class=“com.custom.MyValueSourceParser”	
  />	
  
3.  Subclass org.apache.solr.search.ValueSource and instantiate it in
ValueSourceParser.parse()
Option C: Lucene CustomScoreQuery
2C) Serialize model with PMML
•  Can use JPMML library to read serialized model in Lucene
•  On Lucene will need to implement an extension with
JPMML-evaluator to take vectors as expected
3C) In Lucene:
•  Override CustomScoreQuery: load PMML
•  Create CustomScoreProvider: do model PMML data marshaling
•  Rescoring: PMML evaluation
Predictive Model Markup Language
•  Why use PMML
•  Allows users to build a model in one system
•  Export model and deploy it in a different environment for prediction
•  Fast iteration: from research to deployment to production
•  Model is a XML document with:
•  Header: description of model, and where it was generated
•  DataDictionary: defines fields used by model
•  Model: structure and parameters of model
•  http://dmg.org/pmml/v4-2-1/GeneralStructure.html
Example: Train in Spark to PMML
import	
  org.apache.spark.mllib.clustering.KMeans	
  	
  
import	
  org.apache.spark.mllib.linalg.Vectors	
  	
  	
  
	
  
//	
  Load	
  and	
  parse	
  the	
  data	
  	
  
val	
  data	
  =	
  sc.textFile("/path/to/file")	
  	
  	
  	
  
	
   	
  .map(s	
  =>	
  Vectors.dense(s.split(',').map(_.toDouble)))	
  	
  	
  
	
  	
  
//	
  Cluster	
  the	
  data	
  into	
  three	
  classes	
  using	
  KMeans	
  	
  
val	
  numIterations	
  =	
  20	
  	
  
val	
  numClusters	
  =	
  3	
  	
  
val	
  kmeansModel	
  =	
  KMeans.train(data,	
  numClusters,	
  numIterations)	
  	
  	
  
	
  	
  
//	
  Export	
  clustering	
  model	
  to	
  PMML	
  	
  
kmeansModel.toPMML("/path/to/kmeans.xml")	
  
PMML XML File
Overriding scores with
CustomScoreQuery
CustomScoreProvider CustomScoreQuery
Lucene Query
Find next
Match
Score
Rescore Doc
New Score
*Credit to Doug Turnbull’s
Hacking Lucene forCustom Search Results
Overriding scores with
CustomScoreQuery
•  Matching remains
•  Scoring overridden
CustomScoreProvider CustomScoreQuery
Lucene Query
Find next
Match
Score
Rescore Doc
New Score
*Credit to Doug Turnbull’s
Hacking Lucene forCustom Search Results
Implementing CustomScoreQuery
1.  Given normal Lucene Query, use a CustomScoreQuery to wrap it
TermQuery	
  q	
  =	
  New	
  TermQuery(term)	
  
MyCustomScoreQuery	
  mcsq	
  =	
  New	
  MyCustomScoreQuery(q)	
  
//Make	
  sure	
  query	
  has	
  all	
  fields	
  needed	
  by	
  PMML!
Implementing CustomScoreQuery
2.  Initialize PMML
PMML	
  pmml	
  =	
  ...;	
  
ModelEvaluatorFactory	
  modelEvaluatorFactory	
  =	
  	
  
	
   	
   	
   	
   	
  ModelEvaluatorFactory.newInstance();	
  
ModelEvaluator<?>	
  modelEvaluator	
  =	
  	
  
	
   	
   	
   	
   	
  modelEvaluatorFactory.newModelManager(pmml);	
  
Evaluator	
  evaluator	
  =	
  (Evaluator)modelEvaluator;	
  
	
   	
  	
  
Implementing CustomScoreQuery
2.  Rescore each doc with IndexReader and docID
public	
  float	
  customScore(int	
  doc,	
  float	
  subQueryScore,	
  float	
  
valSrcScores[])	
  throws	
  IOException	
  {	
  
//Lucene	
  reader	
  
IndexReader	
  r	
  =	
  context.reader();	
  
Terms	
  tv	
  =	
  r.getTermVector(doc,	
  _field);	
  
TermsEnum	
  tenum	
  =	
  null;	
  
tenum	
  =	
  tv.iterator(tenum);	
  	
  	
  	
  	
  
//convert	
  the	
  iterator	
  order	
  to	
  fields	
  needed	
  by	
  model	
  
TermsEnum	
  tenumPMML	
  =	
  tenum2PMML(tenum,	
  
	
   	
   	
   	
   	
   	
   	
   	
   	
   	
   	
  evaluator.getActiveFields());	
  
	
   	
  	
  
Implementing CustomScoreQuery
2.  Rescore each doc with IndexReader and docID
//Marshall	
  Data	
  into	
  PMML	
  
Map<FieldName,	
  FieldValue>	
  arguments	
  =	
  	
  
	
   	
   	
   	
   	
   	
   	
   	
  new	
  LinkedHashMap<FieldName,	
  FieldValue>();	
  
List<FieldName>	
  activeFields	
  =	
  evaluator.getActiveFields();	
  
for(FieldName	
  activeField	
  :	
  activeFields){	
  
	
   	
  //	
  The	
  raw	
  is	
  value	
  has	
  been	
  sorted	
  with	
  number	
  of	
  fields	
  needed	
  
	
   	
  Object	
  rawValue	
  =	
  tenumPMML.next;	
  
	
   	
  FieldValue	
  activeValue	
  =	
  evaluator.prepare(activeField,	
  rawValue);	
  
	
   	
  arguments.put(activeField,	
  activeValue);	
  
}	
  
	
  
	
  	
  
Implementing CustomScoreQuery
2.  Rescore each doc with IndexReader and docID
//Rescore	
  and	
  evaluate	
  with	
  PMML	
  
Map<FieldName,	
  ?>	
  results	
  =	
  evaluator.evaluate(arguments);	
  
FieldName	
  targetName	
  =	
  evaluator.getTargetField();	
  
Object	
  targetValue	
  =	
  results.get(targetName);	
  
return	
  (float)	
  targetValue;	
  
	
  	
  
Potential issues
•  Performance
•  If search space is very large
•  If model complexity explodes (i.e. kernel expansion)
•  Operations
•  Code is running on key infrastructure
•  Versioning
•  Binary Compatibility
Option D: Low Level Lucene
•  CustomScoreQuery or Custom FunctionScore can’t control
matches
•  If you want custom matches and scoring….
•  Implement:
•  Custom Query Class
•  Custom Weight Class
•  Custom Scorer Class
•  http://opensourceconnections.com/blog/2014/03/12/using-
customscorequery-for-custom-solrlucene-scoring/
Conclusion
•  Importance of the full picture – Learning systems from the
lenses of the whole elephant
•  Reducing the time from science to production is
complicated
•  Scalability is hard!
•  Why not have ML use Search in its core during online eval?
•  Solr and Lucene are a start to customize your learning system
We are Hiring!
Contact me at
diana.hu@verizon.com
@sdianahu
Q&A
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado, Verizon

Mais conteúdo relacionado

Mais procurados

Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphWebinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphLucidworks
 
Seattle Scalability Mahout
Seattle Scalability MahoutSeattle Scalability Mahout
Seattle Scalability MahoutJake Mannix
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformTrey Grainger
 
Lucene And Solr Document Classification
Lucene And Solr Document ClassificationLucene And Solr Document Classification
Lucene And Solr Document ClassificationAlessandro Benedetti
 
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...Lucidworks
 
Solr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewSolr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewKevin Watters
 
Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...
Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...
Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...Ramzi Alqrainy
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Lucidworks
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksLucidworks
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkJake Mannix
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in SolrTommaso Teofili
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowDatabricks
 
Latent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkLatent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkSandy Ryza
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Lucidworks
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseLucidworks
 
Exploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, LucidworksExploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, LucidworksLucidworks
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine LearningTom Maiaroto
 

Mais procurados (19)

Webinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and GraphWebinar: Solr 6 Deep Dive - SQL and Graph
Webinar: Solr 6 Deep Dive - SQL and Graph
 
Seattle Scalability Mahout
Seattle Scalability MahoutSeattle Scalability Mahout
Seattle Scalability Mahout
 
Deploying Machine Learning Models to Production
Deploying Machine Learning Models to ProductionDeploying Machine Learning Models to Production
Deploying Machine Learning Models to Production
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
Lucene And Solr Document Classification
Lucene And Solr Document ClassificationLucene And Solr Document Classification
Lucene And Solr Document Classification
 
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
An Introduction to NLP4L - Natural Language Processing Tool for Apache Lucene...
 
Solr 6.0 Graph Query Overview
Solr 6.0 Graph Query OverviewSolr 6.0 Graph Query Overview
Solr 6.0 Graph Query Overview
 
Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...
Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...
Apache Solr 4 Part 1 - Introduction, Features, Recency Ranking and Popularity...
 
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine: Presented by T...
 
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, LucidworksA Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
A Multifaceted Look At Faceting - Ted Sullivan, Lucidworks
 
Practical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and SparkPractical Machine Learning for Smarter Search with Solr and Spark
Practical Machine Learning for Smarter Search with Solr and Spark
 
Natural Language Search in Solr
Natural Language Search in SolrNatural Language Search in Solr
Natural Language Search in Solr
 
Best Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflowBest Practices for Hyperparameter Tuning with MLflow
Best Practices for Hyperparameter Tuning with MLflow
 
Latent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with SparkLatent Semantic Analysis of Wikipedia with Spark
Latent Semantic Analysis of Wikipedia with Spark
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
 
Exploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, LucidworksExploring Direct Concept Search - Steve Rowe, Lucidworks
Exploring Direct Concept Search - Steve Rowe, Lucidworks
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine Learning
 

Destaque

Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Enginelucenerevolution
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and SparkLucidworks
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineNYC Predictive Analytics
 
Search is the UI
Search is the UI Search is the UI
Search is the UI danielbeach
 
Lucene solr 4 spatial extended deep dive
Lucene solr 4 spatial   extended deep diveLucene solr 4 spatial   extended deep dive
Lucene solr 4 spatial extended deep divelucenerevolution
 
Twitter Search Architecture
Twitter Search Architecture Twitter Search Architecture
Twitter Search Architecture Ramez Al-Fayez
 
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...Lucidworks
 
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...Lucidworks
 
Pachyderm: Building a Big Data Beast On Kubernetes
Pachyderm: Building a Big Data Beast On KubernetesPachyderm: Building a Big Data Beast On Kubernetes
Pachyderm: Building a Big Data Beast On KubernetesKubeAcademy
 
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...Lucidworks
 
Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...
Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...
Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...Lucidworks
 
Numeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrNumeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrVadim Kirilchuk
 
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBM
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBMUnderstanding the Solr Security Framekwork: Presented by Anshum Gupta, IBM
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBMLucidworks
 
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Lucidworks
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Lucidworks
 
Build a Great Application in Minutes!: Presented by Stefan Olafsson, Twigkit
Build a Great Application in Minutes!: Presented by Stefan Olafsson, TwigkitBuild a Great Application in Minutes!: Presented by Stefan Olafsson, Twigkit
Build a Great Application in Minutes!: Presented by Stefan Olafsson, TwigkitLucidworks
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisHelena Edelson
 
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, LucidworksState of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, LucidworksLucidworks
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Lucidworks
 

Destaque (20)

Building a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation EngineBuilding a Real-time Solr-powered Recommendation Engine
Building a Real-time Solr-powered Recommendation Engine
 
Data Engineering with Solr and Spark
Data Engineering with Solr and SparkData Engineering with Solr and Spark
Data Engineering with Solr and Spark
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
 
Search is the UI
Search is the UI Search is the UI
Search is the UI
 
Lucene solr 4 spatial extended deep dive
Lucene solr 4 spatial   extended deep diveLucene solr 4 spatial   extended deep dive
Lucene solr 4 spatial extended deep dive
 
Twitter Search Architecture
Twitter Search Architecture Twitter Search Architecture
Twitter Search Architecture
 
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
Queue Based Solr Indexing with Collection Management: Presented by Devansh Dh...
 
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
Events Processing and Data Analysis with Lucidworks Fusion: Presented by Kira...
 
Pachyderm: Building a Big Data Beast On Kubernetes
Pachyderm: Building a Big Data Beast On KubernetesPachyderm: Building a Big Data Beast On Kubernetes
Pachyderm: Building a Big Data Beast On Kubernetes
 
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
PlayStation and Lucene - Indexing 1M documents per second: Presented by Alexa...
 
Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...
Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...
Large Scale Log Analytics with Solr: Presented by Rafał Kuć & Radu Gheorghe, ...
 
Numeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrNumeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and Solr
 
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBM
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBMUnderstanding the Solr Security Framekwork: Presented by Anshum Gupta, IBM
Understanding the Solr Security Framekwork: Presented by Anshum Gupta, IBM
 
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
 
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
Building a Large Scale SEO/SEM Application with Apache Solr: Presented by Rah...
 
Build a Great Application in Minutes!: Presented by Stefan Olafsson, Twigkit
Build a Great Application in Minutes!: Presented by Stefan Olafsson, TwigkitBuild a Great Application in Minutes!: Presented by Stefan Olafsson, Twigkit
Build a Great Application in Minutes!: Presented by Stefan Olafsson, Twigkit
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, LucidworksState of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
State of Solr Security 2016: Presented by Ishan Chattopadhyaya, Lucidworks
 
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
Loading 350M documents into a large Solr cluster: Presented by Dion Olsthoorn...
 

Semelhante a Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado, Verizon

RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRahul Jain
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildSujit Pal
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
Webinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewWebinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewLucidworks
 
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdfML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdfAvijitChaudhuri3
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...Ilkay Altintas, Ph.D.
 
Large scale computing
Large scale computing Large scale computing
Large scale computing Bhupesh Bansal
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...mestato
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustMenchita Falcutila Dumlao
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Lucidworks
 
Preliminary committee presentation
Preliminary committee presentationPreliminary committee presentation
Preliminary committee presentationRichard Drake
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Dios Kurniawan
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningVarad Meru
 
Three Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceAditya Parameswaran
 

Semelhante a Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado, Verizon (20)

RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Webinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's NewWebinar: Fusion 3.1 - What's New
Webinar: Fusion 3.1 - What's New
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdfML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrust
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
 
Preliminary committee presentation
Preliminary committee presentationPreliminary committee presentation
Preliminary committee presentation
 
Database Systems - Lecture Week 1
Database Systems - Lecture Week 1Database Systems - Lecture Week 1
Database Systems - Lecture Week 1
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Three Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data ScienceThree Tools for "Human-in-the-loop" Data Science
Three Tools for "Human-in-the-loop" Data Science
 

Mais de Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

Mais de Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Último

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Último (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado, Verizon

  • 1. Where Search Meets Machine Learning Diana Hu @sdianahu — Data Science Lead, Verizon Joaquin Delgado @joaquind — Director of Engineering, Verizon
  • 2. Disclaimer 2 The content of this presentation are of the authors’ personal statements and does not officially represent their employer’s view in anyway. Included content is especially not intended to convey the views of OnCue or Verizon 01
  • 3. Index 1.  Introduction 2.  Search and Information Retrieval 3.  ML problems as Search-based Systems 4.  ML Meets Search!
  • 5. Scaling learning systems is hard! •  Millions of users, items •  Billions of features •  Imbalanced Datasets •  Complex Distributed Systems •  Many algorithms have not been tested at “Internet Scale”
  • 6. Typical approaches •  Distributed systems – Fault tolerance, Throughput vs. latency •  Parallelization Strategies – Hashing, trees •  Processing – Map reduce variants, MPI, graph parallel •  Databases – Key/Value Stores, NoSQL Such a custom system requires TLC
  • 8. Search Search is about finding specific things that are either known or assumed to exist, Discovery is about is about helping the user encounter what he/she didn’t even know exists. •  Focused on Search: Search Engines, Database Systems •  Focused on Discovery: Recommender Systems, Advertising Predicate Logic and Declarative Languages Rock!
  • 9. Search stack Matched Hits Representation Function Similarity Calculation Matched HitsDocuments Representation Function Input Query Matched HitsMatched HitsRetrieved Documents Online Processing Offline Processing (*)RelevanceFeedback Query Representation Doc Representation Index *Metadata Engineering (*) Optional
  • 11. Search Engines: the big hammer •  Search engines are largely used to solve non-IR search problems, because: •  Widely Available •  Fast and Scalable •  Integrates well with existing data stores
  • 12. But… Are we using the right tool? •  Search Engines were originally designed for IR. •  Complex non-IR search tasks sometimes require a two phase approach Phase1) Filter Phase 2) Rank
  • 15. Machine Learning Machine Learning in particular supervised learning refer to techniques used to learn how to classify or score previously unseen objects based on a training dataset Inference and Generalization are the Key!
  • 17. Learning systems’ stack Visualization / UI Retrieval Ranking Query Generation and Contextual Pre-filtering Model Building Index Building Data/Events Collections Data Analytics Contextual Post Filtering OnlineOffline Experimentation
  • 18. Case study: Recommender Systems •  Reduce information load by estimating relevance •  Ranking (aka Relevance) Approaches: •  Collaborative filtering •  Content Based •  Knowledge Based •  Hybrid •  Beyond rating prediction and ranking •  Business filtering logic •  Low latency and Scale
  • 19. RecSys: Content based models •  Rec Task: Given a user profile find the best matching items by their attributes •  Similarity calculation: based on keyword overlap between user/items •  Neighborhood method (i.e. nearest neighbor) •  Query-based retrieval (i.e. Rocchio’s method) •  Probabilistic methods (classical text classification) •  Explicit decision models •  Feature representation: based on content analysis •  Vector space model •  TF-IDF •  Topic Modeling
  • 22. ML Meets Search! ML Search
  • 23. Remember the elephant? Visualization / UI Retrieval Ranking Query Generation and Contextual Pre-filtering Model Building Index Building Data/Events Collections Data Analytics Contextual Post Filtering OnlineOffline Experimentation
  • 24. Simplifying the stack! Visualization / UI Query Generation and Contextual Pre-filtering Model Building Index Building Data/Events Collections Data Analytics OnlineOffline Experimentation Retrieval Contextual Post Filtering Ranking
  • 25. Search stack Matched Hits Representation Function Similarity Calculation Matched HitsDocuments Representation Function Input Query Matched HitsMatched HitsRetrieved Documents Online Processing Offline Processing (*)RelevanceFeedback Query Representation Doc Representation Index *Metadata Engineering (*) Optional
  • 26. Simplifying the Search stack Matched Hits Representation Function Similarity Calculation Matched HitsDocuments Representation Function Input Query Matched HitsMatched HitsRetrieved Documents Online Processing Offline Processing (*)RelevanceFeedback Query Representation Doc Representation Index *Metadata Engineering (*) Optional Retrieval Contextual Post Filtering Ranking ML-Scoring Plugin Serialized ML Model
  • 28. ML-Scoring Options •  Option A: Solr FunctionQuery •  Pro: Model is just a query! •  Cons: Limits expressiveness of models •  Option B: Solr Custom Function Query •  Pro: Loading any type of model (also PMML) •  Cons: Memory limitations, also multiple model reloading •  Option C: Lucene CustomScoreQuery •  Pro: Can use PMML and tune how PMML gets loaded •  Cons: No control on matches •  Option D: Lucene Low level Custom Query •  *Mahout vectors from Lucene text (only trains, so not an option)
  • 29. Real-life Problem •  Census database that contains documents with the following fields: 1. Age: continuous; 2. Workclass: 8 values; 3. Fnlwgt: continuous.; 4. Education: 16 values; 5. Education-num: continuous.; 6. Marital-status: 7 values; 7. Occupation: 14 values; 8. Relationship: 6 values; 9. Race: 5 values; 10. Sex: Male, Female; 11. Capital-gain: continuous.;12. Capital- loss: continuous.; 13. Hours-per-week: continuous.; 14. Native-country: 41 values; 15. >50K Income: Yes, No. •  Task is to predict whether a person makes more than 50k a year based on their attributes
  • 30. 1) Learn from the (training) data Naïve Bayes SVM Logistic Regression Decision Trees Train with your favorite ML Framework
  • 31. Option A: Just a Solr Function Query q=“sum(C,              product(age,w1),              product(Workclass,w2),              product(Fnlwgt,  w3),              product(Education,  w4),              ….)”   Serialized ML Model as Query Trainer + Indexer Y_prediction = C + XB
  • 32. May result in a crazy Solr functionQuery See more at https://wiki.apache.org/solr/FunctionQuery q=dismax&bf="ord(educaton-num)^0.5 recip(rord(age),1,1000,1000)^0.3"
  • 33. What about models like this?
  • 34. Option B: Custom Solr FuntionQuery 1.  Subclass org.apache.solr.search.ValueSourceParser. public  class  MyValueSourceParser  extends  ValueSourceParser  {    public  void  init  (NamedList  namedList)  {            …      }      public  ValueSource  parse(FunctionQParser  fqp)  throws  ParseException  {            return  new  MyValueSource();      }   } 2.  In solrconfig.xml, register your new ValueSourceParser directly under the <config> tag <valueSourceParser  name=“myfunc”  class=“com.custom.MyValueSourceParser”  />   3.  Subclass org.apache.solr.search.ValueSource and instantiate it in ValueSourceParser.parse()
  • 35. Option C: Lucene CustomScoreQuery 2C) Serialize model with PMML •  Can use JPMML library to read serialized model in Lucene •  On Lucene will need to implement an extension with JPMML-evaluator to take vectors as expected 3C) In Lucene: •  Override CustomScoreQuery: load PMML •  Create CustomScoreProvider: do model PMML data marshaling •  Rescoring: PMML evaluation
  • 36. Predictive Model Markup Language •  Why use PMML •  Allows users to build a model in one system •  Export model and deploy it in a different environment for prediction •  Fast iteration: from research to deployment to production •  Model is a XML document with: •  Header: description of model, and where it was generated •  DataDictionary: defines fields used by model •  Model: structure and parameters of model •  http://dmg.org/pmml/v4-2-1/GeneralStructure.html
  • 37. Example: Train in Spark to PMML import  org.apache.spark.mllib.clustering.KMeans     import  org.apache.spark.mllib.linalg.Vectors         //  Load  and  parse  the  data     val  data  =  sc.textFile("/path/to/file")            .map(s  =>  Vectors.dense(s.split(',').map(_.toDouble)))           //  Cluster  the  data  into  three  classes  using  KMeans     val  numIterations  =  20     val  numClusters  =  3     val  kmeansModel  =  KMeans.train(data,  numClusters,  numIterations)           //  Export  clustering  model  to  PMML     kmeansModel.toPMML("/path/to/kmeans.xml")  
  • 39. Overriding scores with CustomScoreQuery CustomScoreProvider CustomScoreQuery Lucene Query Find next Match Score Rescore Doc New Score *Credit to Doug Turnbull’s Hacking Lucene forCustom Search Results
  • 40. Overriding scores with CustomScoreQuery •  Matching remains •  Scoring overridden CustomScoreProvider CustomScoreQuery Lucene Query Find next Match Score Rescore Doc New Score *Credit to Doug Turnbull’s Hacking Lucene forCustom Search Results
  • 41. Implementing CustomScoreQuery 1.  Given normal Lucene Query, use a CustomScoreQuery to wrap it TermQuery  q  =  New  TermQuery(term)   MyCustomScoreQuery  mcsq  =  New  MyCustomScoreQuery(q)   //Make  sure  query  has  all  fields  needed  by  PMML!
  • 42. Implementing CustomScoreQuery 2.  Initialize PMML PMML  pmml  =  ...;   ModelEvaluatorFactory  modelEvaluatorFactory  =              ModelEvaluatorFactory.newInstance();   ModelEvaluator<?>  modelEvaluator  =              modelEvaluatorFactory.newModelManager(pmml);   Evaluator  evaluator  =  (Evaluator)modelEvaluator;        
  • 43. Implementing CustomScoreQuery 2.  Rescore each doc with IndexReader and docID public  float  customScore(int  doc,  float  subQueryScore,  float   valSrcScores[])  throws  IOException  {   //Lucene  reader   IndexReader  r  =  context.reader();   Terms  tv  =  r.getTermVector(doc,  _field);   TermsEnum  tenum  =  null;   tenum  =  tv.iterator(tenum);           //convert  the  iterator  order  to  fields  needed  by  model   TermsEnum  tenumPMML  =  tenum2PMML(tenum,                        evaluator.getActiveFields());        
  • 44. Implementing CustomScoreQuery 2.  Rescore each doc with IndexReader and docID //Marshall  Data  into  PMML   Map<FieldName,  FieldValue>  arguments  =                    new  LinkedHashMap<FieldName,  FieldValue>();   List<FieldName>  activeFields  =  evaluator.getActiveFields();   for(FieldName  activeField  :  activeFields){      //  The  raw  is  value  has  been  sorted  with  number  of  fields  needed      Object  rawValue  =  tenumPMML.next;      FieldValue  activeValue  =  evaluator.prepare(activeField,  rawValue);      arguments.put(activeField,  activeValue);   }        
  • 45. Implementing CustomScoreQuery 2.  Rescore each doc with IndexReader and docID //Rescore  and  evaluate  with  PMML   Map<FieldName,  ?>  results  =  evaluator.evaluate(arguments);   FieldName  targetName  =  evaluator.getTargetField();   Object  targetValue  =  results.get(targetName);   return  (float)  targetValue;      
  • 46. Potential issues •  Performance •  If search space is very large •  If model complexity explodes (i.e. kernel expansion) •  Operations •  Code is running on key infrastructure •  Versioning •  Binary Compatibility
  • 47. Option D: Low Level Lucene •  CustomScoreQuery or Custom FunctionScore can’t control matches •  If you want custom matches and scoring…. •  Implement: •  Custom Query Class •  Custom Weight Class •  Custom Scorer Class •  http://opensourceconnections.com/blog/2014/03/12/using- customscorequery-for-custom-solrlucene-scoring/
  • 48. Conclusion •  Importance of the full picture – Learning systems from the lenses of the whole elephant •  Reducing the time from science to production is complicated •  Scalability is hard! •  Why not have ML use Search in its core during online eval? •  Solr and Lucene are a start to customize your learning system
  • 49. We are Hiring! Contact me at diana.hu@verizon.com @sdianahu Q&A
  • 50. O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X