SlideShare uma empresa Scribd logo
1 de 98
Baixar para ler offline
Instant Search - A Hands-on Tutorial
ACM SIGIR 2016
Ganesh Venkataraman, Viet Ha-Thuc, Dhruv Arya and Abhimanyu Lad
LinkedIn Search
1
The Actors
2
Where to find information
Code - https://github.com/linkedin/instantsearch-tutorial
Wiki - https://github.com/linkedin/instantsearch-tutorial/wiki
Slack - https://instantsearchtutorial.slack.com/
Slides - will be on the slideshare and we will update the wiki/tweet
Twitter - #instantsearchtutorial (twitter.com/search)
3
The Plot
● At the end of this tutorial, attendees should:
○ Understand the challenges/constraints faced while dealing with instant search (latency,
tolerance to user errors) etc
○ Get a broad overview of the theoretical foundations behind:
■ Indexing
■ Query Processing
■ Ranking and Blending (including personalization)
○ Understand open source options available to put together an ‘end-to-end’ instant search
solution
○ Put together an end-to-end solution on their own (with some helper code)
4
What would graduation look like?
● Instant result solution built over
stackoverflow data
● Built based on open source tools
(elasticsearch, typeahead.js)
● Ability to experiment further to
modify ranking/query construction
5
Final Output from hands on tutorial
6
Agenda
● Terminology and Background
● Indexing & Retrieval
○ Instant Results
○ Query Autocomplete
● Ranking
● Hands on tutorial with data from stackoverflow
○ Index and search posts from stackoverflow
○ Play around with ranking
7
Agenda
● Terminology and Background
● Indexing & Retrieval
○ Instant Results
○ Query Autocomplete
● Ranking
● Hands on tutorial with data from stackoverflow
○ Index and search xx posts from stackoverflow
○ Play around with ranking
8
Terminology - Query Autocomplete
● Intention is to complete the user query
9
Terminology - Instant Results
● Get the result to the user as they type the query
10
Terminology - Instant Answers
● We will NOT be covering answers for this tutorial
11
Terminology - Navigational Query
● Queries where the information need can be satisfied by only one
result/document
12
Terminology - Exploratory Queries
● Multiple results can potentially satisfy users need
13
When to display instant results vs query completion
● LinkedIn product decision
○ when the confidence level is high enough for a
particular result, show the result
● What is ‘high enough’ could be application specific and
not merely a function of score
14
Completing query vs instant results
● “lin” => first degree connection with lots of common connections, same
company etc.
● “link” => better off completing the query (even with possible suggestions for
verticals)
15
Terminology - Blending
● Bringing results from different search verticals (news, web, answers etc)
16
Blending on prefix
17
Why Instant Search and why now?
● Natural evolution of search
● Users have gotten used to getting immediate feedback
● Mobile devices => need to type less
18
Agenda
● Terminology and Background
● Indexing & Retrieval
○ Instant Results
○ Query Autocomplete
● Ranking
● Hands on tutorial with data from stackoverflow
○ Index and search xx posts from stackoverflow
○ Play around with ranking
19
Instant Search at Scale
● Constraints (example: LinkedIn people search)
○ Scale - ability to store and retrieve 100’s of Millions/Billions of
documents via prefix
○ Fast - ability to return results quicker than typing speed
○ Resilience to user errors
○ Personalized
20
Instant Search via Inverted Index
● Scaleable
● Ability to form complex boolean queries
● Open source availability (Lucene/Elasticsearch)
● Easy to add metadata (payloads, forward index)
21
The Search Index
Inverted Index: Mapping from (search) terms to list of
documents (they are present in)
Forward Index: Mapping from documents to metadata about
them
22
The Posting List
23
Candidate selection
● Posting lists
○ “abraham” => {5, 7, 8, 23, 47, 101}
○ “lincoln” => {7, 23, 101, 151}
● Query = “abraham AND lincoln”
○ Retrieved set => {7, 23, 101}
24
Prefix indexing
● Instant search, query != ‘abraham’
● Queries = [‘a’, ‘ab’, … , ‘abraham’]
● Need to index each prefix
● Elasticsearch refers to this form of tokenization as ‘edge n-gram’
● Issues
○ Bigger index
○ Big posting list for short prefixes => much higher number of documents retrieved
25
Early Termination
● We cannot ‘afford’ to retrieve and score all documents that match the query
● We terminate posting list traversal when certain number of documents have
been retrieved
● We may miss out on recall
26
Static Rank
● Order the posting lists so that documents with high (query independent) prior
probability of relevance appears first
● Use application specific logic to rewrite query
● Once the query has achieved a certain number of matches in the posting list,
we stop. This number of matches is referred to as “early termination limit”
27
Static Rank Example - People Search at LinkedIn
● Some factors that go into static rank computation
○ Member popularity measure by profile views both
within and outside network
○ Spam in person’s name
○ Security and Spam. Downgrade profiles flagged by
LinkedIn’s internal security team
○ Celebrities and Influencers
28
Static Rank Case study - People Search at LinkedIn
29
Recall
Early termination limit
Resilience to Spelling errors
● We focus on names as they can be (often) hard to get right (ex: “marissa
mayer” or “marissa meyer”?)
● Names vs traditional spelling errors:
○ “program manager” vs “program manger” - only one of these is right
○ “Mayer” vs “Meyer” - no clear source of truth
● Edit distance based approaches can be wrong both ways:
○ “Mohamad” and “Muhammed” are 3 edits apart and yet plausible variants
○ “Jeff” and “Joff” are 1 edit distance apart, but highly unlikely to be plausible variants of the
same name
30
LinkedIn Approach - Name clusters
Solution touches indexing, query reformulation and ranking
31
Name Clusters - Two step clustering
● Course level clustering
○ Uses double metaphone + some known heuristics
○ Focus on recall
● Fine level clustering
○ similarity function that takes into account Jaro-Winkler distance
○ User session data
32
Overall approach for Name Clusters
● Indexing
○ Store clusterID for each cluster in a separate field (say ‘NAMECLUSTERID’)
○ ‘Cris’ and ‘chris’ in same name cluster CHRISID
○ NAME:cris NAMECLUSTERID:chris
● Query processing
○ user query = ‘chris’
○ Rewritten query = ?NAME:chris ?NAMECLUSTERID:chris
● Ranking
○ Different weights for ‘perfect match’ vs. ‘name cluster match’
33
Instant Results via Inverted Index - Some Takeaways
● Used for documents at very high scale
● Use early termination
● Approach the problem as a combination of indexing/query processing/ranking
34
Agenda
● Terminology and Background
● Indexing & Retrieval
○ Instant Results
○ Query Autocomplete
● Ranking
● Hands on tutorial with data from stackoverflow
○ Index and search xx posts from stackoverflow
○ Play around with ranking
35
Query Autocomplete - Problem Statement
● Let q = w1
, w2
. . . wk
* represent
the query with k words, where the
kth
token is a prefix as denoted by
the asterisk
● Goal: Find one or more relevant
completions for the query
36
Trie
● Used to store an associative array
where keys are strings
● Only certain keys and leaves are
of interest
● Structure allows for only sharing
of prefixes
● Representation not memory
efficient
37
An trie of words {space, spark, moth}
Finite State Transducers (FST)
● Allows efficient retrieval of
completions at runtime
● Can fit entirely into RAM
● Useful when keys have
commonalities to them, allowing
better compression
● Lucene has support for FSTs*
FST for words: software, scala,
scalding, spark
*Lucene FST implementation based on “Direct Construction of Minimal Acyclic Subsequential Transducers (2001)” by Stoyan Mihov, Denis Maurel
38
Query Autocomplete vs. Instant Results
● For query autocomplete corpus of terms remains relatively constant, instant
results documents can be continuously added/removed
● Query autocomplete focuses only on prefix based retrieval whereas instant
search results utilize complex query construction for retrieval
● Query autocomplete retrieval based off a dictionary hence index can be
refreshed periodically instead of real time
39
Query Tagging
● Segment query based on
recognized entities
● Annotate query with:
○ Named Entity Tags
○ Standardized Identifiers
○ Related Entities
○ Additional Entity Specific Metadata
40
Data Processing
● Break queries into recognized entities and individual tokens
● Past querylogs are parsed for recognized entities, tokens and fed into an fst
for retrieval of candidate suggestions.
41
Retrieval
● All candidate completions over increasingly longer suffixes of the query are
used to capture enough context
● Given a query like “linkedin sof*” we look completions for:
○ sof*, linkedin sof*
● Candidates are then provided to the scoring phase.
42
Retrieval
● From the above FST, for the query “linkedin sof*” we retrieve the
candidates:
○ sof: [software developer, software engineer]
○ linkedin sof: []
43
Payloads
● Each query autocomplete result
can have a payload associated
with it.
● A payload holds serialized data
useful in scoring the autocomplete
result
44
Fuzzy Matching - LinkedIn Autocomplete
45
Fuzzy Matching
● Use levenshtein automata constructed from
a word and maximum edit distance
● Based on the automaton and letters input
to it, we decide whether to continue or not
● Ex. search for “dpark” (s/d being close on
the keyboard) with edit distance 1 =
[spark]
An index of {space, spark, moth}
represented as a trie
46
47
48
49
Suggestion = Spark
Agenda
● Terminology and Background
● Indexing & Retrieval
● Ranking
○ Ranking instant results
○ Ranking query suggestions
○ Blending
● Hands on tutorial with data from stackoverflow
50
Ranking Challenge
● Short query prefixes
● Context beyond query
○ Personalized context
○ Global context
■ Global popularity
■ Trending
51
Hand-Tuned vs. Machine-Learned Ranking
● Hard to manually tune with very large number of features
● Challenging to personalize
● LTR allows leveraging large volume of click data in an automated way
52
Agenda
● Terminology and Background
● Indexing & Retrieval
● Ranking
○ Ranking instant results
○ Ranking query suggestions
○ Blending
● Hands on tutorial with data from stackoverflow
53
Features
● Text match
○ Match query terms with different fields on documents
54
Features
● Document Quality
○ Global Popularity
■ Celebrities
○ Spaminess
55
Features
● Social Affinity (personalized features)
○ Network distance between searcher and result
○ Connection Strength
■ Within the same company
■ Common connections
■ From the same school
56
Training Data
● Human judgement
● Challenge:
○ Personalization
○ Scale
57
Training Data
● Log-based
○ Personalized
○ Available in large quantity
● Position Bias
○ Top-K randomization
58
Learning to Rank
▪ Pointwise: Reduce ranking to binary classification
LinkedIn Confidential ©2013 All Rights Reserved 59
+
+
+
-
+
-
-
-
+
+
-
-
Learning to Rank
▪ Pointwise: Reduce ranking to binary classification
LinkedIn Confidential ©2013 All Rights Reserved 60
+
+
+
-
+
-
-
-
+
+
-
-
Learning to Rank
▪ Pointwise: Reduce ranking to binary classification
LinkedIn Confidential ©2013 All Rights Reserved 61
+
+
+
-
+
-
-
-
+
+
-
-
Limitations
▪ Relevant documents associated with different queries are put into the
same class
Learning to Rank
▪ Pairwise: Reduce ranking to classification of document pairs w.r.t. the
same query
– {(Q1
, A>B), (Q2
, C>D), (Q3
, E>F)}
LinkedIn Confidential ©2013 All Rights Reserved 62
Learning to Rank
▪ Pairwise: Reduce ranking to classification of document pairs w.r.t. the
same query
– {(Q1
, A>B), (Q2
, C>D), (Q3
, E>F)}
LinkedIn Confidential ©2013 All Rights Reserved 63
Learning to Rank
▪ Pairwise
– Limitation: Does not differentiate inversions at top vs. bottom positions
LinkedIn Confidential ©2013 All Rights Reserved 64
Learning to Rank
▪ Listwise
– Directly operate on ranked lists
– Optimize listwise objective function, e.g. IR metrics
▪ Mean Average Precision (MAP)
▪ Normalized Discounted Cumulative Gain (NDCG)
LinkedIn Confidential ©2013 All Rights Reserved 65
Agenda
● Terminology and Background
● Indexing & Retrieval
● Ranking
○ Ranking vertical results
○ Ranking query suggestions
○ Blending
● Hands on tutorial with data from stackoverflow
66
Features
● Query Popularity
○ Candidate completion q = s1
, s2
… sk
○ Likelihood q is a query in the query corpus, estimated by N-gram
language model
Pr(q) = Pr(s1
, s2
… sk
)
= Pr(s1
) * Pr (s2
|s1
) … P(sk
|sk-1
)
67
Features
● Time-sensitive popularity [Shokouhi et al. SIGIR 12]
○ Trending query
○ Periodic Pattern
■ Weekend -> Disneyland
○ Time-series: Forecasted frequencies
68
Features
● Recency-based suggestion (Personalized feature)
69
Agenda
● Terminology and Background
● Indexing & Retrieval
● Ranking
○ Ranking instant results
○ Ranking query suggestions
○ Blending
● Hands on tutorial with data from stackoverflow
70
Blending
71
Blending
72
Company Instant
Query Prefix
Federator
People Instant Query Autocompletion
Blender
Blending Challenges
● Different verticals associate with different signals
○ People: network distance
○ Groups: time of the last edit
○ Query suggestion: edit distance
● Even common features may not be equally predictive
across verticals
○ Popularity
○ Text similarity
● Scores might not be comparable across verticals
73
Approaches
● Separate binary classifiers
f1
f2
f3
f1
f2
f4
People
Jobs
Classifier1
Classifier2
74
Approaches
● Separate binary classifiers
○ Pros
■ Handle vertical-specific features
■ Handle common features with different predictive powers
○ Cons
■ Need to calibrate output scores of multiple classifiers
75
Approaches
● Learning-to-rank - Equal correlation assumption
○ Union feature schema and padding zeros to non-applicable features
○ Equal correlation assumption
f1
f2
f3
f1
f2
f4
People
Jobs
f1
f2
f3
f4
=0
f1
f2
f3
=0 f4
Model
76
Approaches
● Learning-to-rank - Equal correlation assumption
○ Pros
■ Handle vertical-specific features
■ Comparable output scores across verticals
○ Cons
■ Assume common features are equally predictive of vertical relevance
77
Approaches
● Learning-to-rank - Without equal correlation assumption
f1
f2
f3
f4
f5
f6
People
Jobs
f1
f2
f3
0
0 0 0 f4
Model
0 0
f5
f6
People vertical features
Job vertical features
78
Approaches
● Learning-to-rank - Without equal correlation assumption
○ Pros
■ Handle vertical-specific features
■ Without equal correlation assumption -> auto learn evidence-vertical
association
■ Comparable output scores across verticals
○ Cons
■ The number of features is huge
● Overfitting
● Require a huge amount of training data
79
Evaluation
● “If you can’t measure it, you can’t improve it”
● Metrics
○ Successful search rate
○ Number of keystrokes per search: query length + clicked result rank
80
Take-Aways
● Speed
○ Instant results: Early termination
○ Autocompletion: FST
● Tolerance to spelling errors
● Relevance: go beyond query prefix
○ Personalized context
○ Global context
81
Agenda
● Terminology and Background
● Indexing & Retrieval
● Ranking
○ Ranking instant results
○ Ranking query suggestions
○ Blending
● Hands on tutorial with data from stackoverflow
82
Dataset
● Posts and Tags from stackoverflow.com
● Posts are questions posted by users and contains following attributes
○ Title
○ Score
● Tags help identify a suitable category for the post and contain following
attributes
○ Tag Name
○ Count
● Each post can have a maximum of five tags
83
stackoverflow.com
Title
Tags
Score
84
stackoverflow.com
Question
Tags
Score
Tags & counts
85
The End Product
86
Search Query Input
Query Autocomplete
Instant Results
Tools
87
Architecture
88
Assignments
● Assignments available on Github
● Each assignment builds on a component of the end product
● Tests are provided at end of each assignment for validation
● Finished files available for reference (if needed)
● Raise hand if you need help or have a question
89
Assignment 0
Setting up the machine
90
Assignment 1
Building Instant Search and Autocomplete Index
91
Take-Aways
● Index should be used primarily for retrieval
● Data sources should be kept separate from the index
● Building an index is not instantaneous hence have replicas in production
● Real world indexes seldom can be stored in a single shard
92
Assignment 2
Building the Mid-Tier
93
Take-Aways
● Make incremental additions
● Allow for relevance changes to be compared
● Document relevance changes
● Do side by side evaluations
94
Assignment 3
Visualizing the blended result set
95
Assignment 4
Relevance Improvements
96
Summary
● Theoretical understanding of indexing, retrieval and ranking for instant search
results and query autocomplete
● Insights and learnings from linkedin.com case studies
● Working end-to-end implementation of query autocomplete and instant results
with stackoverflow.com dataset
97
98

Mais conteúdo relacionado

Mais procurados

Introduction to boolean search
Introduction to boolean searchIntroduction to boolean search
Introduction to boolean searchKey Resourcing
 
AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19Xavier Amatriain
 
10 search engines every recruiter should be using and how
10 search engines every recruiter should be using and how10 search engines every recruiter should be using and how
10 search engines every recruiter should be using and howRecruitingDaily.com LLC
 
Find and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInFind and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInDaniel Tunkelang
 
2019 Fall SourceCon Sourcing Tools Roundtable
2019 Fall SourceCon Sourcing Tools Roundtable2019 Fall SourceCon Sourcing Tools Roundtable
2019 Fall SourceCon Sourcing Tools RoundtableSusanna Frazier
 
The complete guide to X-raying LinkedIn for Sourcing
The complete guide to X-raying LinkedIn for SourcingThe complete guide to X-raying LinkedIn for Sourcing
The complete guide to X-raying LinkedIn for SourcingIrina Shamaeva
 
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016MLconf
 
System design for recommendations and search
System design for recommendations and searchSystem design for recommendations and search
System design for recommendations and searchEugene Yan Ziyou
 
The 2 Hour Job Search - Book summary
The 2 Hour Job Search - Book summaryThe 2 Hour Job Search - Book summary
The 2 Hour Job Search - Book summarySimardeep Kochar
 
KPIs, Metrics & Benchmarks That Matter For SEO Success In 2023.pdf
KPIs, Metrics & Benchmarks That Matter For SEO Success In 2023.pdfKPIs, Metrics & Benchmarks That Matter For SEO Success In 2023.pdf
KPIs, Metrics & Benchmarks That Matter For SEO Success In 2023.pdfSearch Engine Journal
 
Competitive intelligence for sourcers gutmacher-TA Week 2021
Competitive intelligence for sourcers gutmacher-TA Week 2021Competitive intelligence for sourcers gutmacher-TA Week 2021
Competitive intelligence for sourcers gutmacher-TA Week 2021Glenn Gutmacher
 
How to get your search, brand, PR and social efforts to work together - Steph...
How to get your search, brand, PR and social efforts to work together - Steph...How to get your search, brand, PR and social efforts to work together - Steph...
How to get your search, brand, PR and social efforts to work together - Steph...Rise at Seven
 
Boolean Search Fundamentals For Recruiters - Guide
Boolean Search Fundamentals For Recruiters - GuideBoolean Search Fundamentals For Recruiters - Guide
Boolean Search Fundamentals For Recruiters - GuideProminence
 
Beth Barnham Schema Auditing BrightonSEO Slides.pptx
Beth Barnham Schema Auditing BrightonSEO Slides.pptxBeth Barnham Schema Auditing BrightonSEO Slides.pptx
Beth Barnham Schema Auditing BrightonSEO Slides.pptxBethBarnham1
 
The Big SEO Migration - Learnings from a first time hiker
The Big SEO Migration - Learnings from a first time hiker The Big SEO Migration - Learnings from a first time hiker
The Big SEO Migration - Learnings from a first time hiker ReneHarris7
 
W3C Tutorial on Semantic Web and Linked Data at WWW 2013
W3C Tutorial on Semantic Web and Linked Data at WWW 2013W3C Tutorial on Semantic Web and Linked Data at WWW 2013
W3C Tutorial on Semantic Web and Linked Data at WWW 2013Fabien Gandon
 
Scaling Search Campaigns With Bulk Uploads and Ad Customizers (SMX 2023)
Scaling Search Campaigns With Bulk Uploads and Ad Customizers (SMX 2023)Scaling Search Campaigns With Bulk Uploads and Ad Customizers (SMX 2023)
Scaling Search Campaigns With Bulk Uploads and Ad Customizers (SMX 2023)Christopher Gutknecht
 
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...Benjamin Le
 

Mais procurados (20)

Introduction to boolean search
Introduction to boolean searchIntroduction to boolean search
Introduction to boolean search
 
AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19AI-driven product innovation: from Recommender Systems to COVID-19
AI-driven product innovation: from Recommender Systems to COVID-19
 
10 search engines every recruiter should be using and how
10 search engines every recruiter should be using and how10 search engines every recruiter should be using and how
10 search engines every recruiter should be using and how
 
Find and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedInFind and be Found: Information Retrieval at LinkedIn
Find and be Found: Information Retrieval at LinkedIn
 
Boolean Training
Boolean TrainingBoolean Training
Boolean Training
 
2019 Fall SourceCon Sourcing Tools Roundtable
2019 Fall SourceCon Sourcing Tools Roundtable2019 Fall SourceCon Sourcing Tools Roundtable
2019 Fall SourceCon Sourcing Tools Roundtable
 
The complete guide to X-raying LinkedIn for Sourcing
The complete guide to X-raying LinkedIn for SourcingThe complete guide to X-raying LinkedIn for Sourcing
The complete guide to X-raying LinkedIn for Sourcing
 
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016
Tanvi Motwani, Lead Data Scientist, Guided Search at A9.com at MLconf ATL 2016
 
System design for recommendations and search
System design for recommendations and searchSystem design for recommendations and search
System design for recommendations and search
 
El EEAT me come toa la marca- Lucia Rico.pptx.(Lucía y el SEO)
El EEAT me come toa la marca- Lucia Rico.pptx.(Lucía y el SEO)El EEAT me come toa la marca- Lucia Rico.pptx.(Lucía y el SEO)
El EEAT me come toa la marca- Lucia Rico.pptx.(Lucía y el SEO)
 
The 2 Hour Job Search - Book summary
The 2 Hour Job Search - Book summaryThe 2 Hour Job Search - Book summary
The 2 Hour Job Search - Book summary
 
KPIs, Metrics & Benchmarks That Matter For SEO Success In 2023.pdf
KPIs, Metrics & Benchmarks That Matter For SEO Success In 2023.pdfKPIs, Metrics & Benchmarks That Matter For SEO Success In 2023.pdf
KPIs, Metrics & Benchmarks That Matter For SEO Success In 2023.pdf
 
Competitive intelligence for sourcers gutmacher-TA Week 2021
Competitive intelligence for sourcers gutmacher-TA Week 2021Competitive intelligence for sourcers gutmacher-TA Week 2021
Competitive intelligence for sourcers gutmacher-TA Week 2021
 
How to get your search, brand, PR and social efforts to work together - Steph...
How to get your search, brand, PR and social efforts to work together - Steph...How to get your search, brand, PR and social efforts to work together - Steph...
How to get your search, brand, PR and social efforts to work together - Steph...
 
Boolean Search Fundamentals For Recruiters - Guide
Boolean Search Fundamentals For Recruiters - GuideBoolean Search Fundamentals For Recruiters - Guide
Boolean Search Fundamentals For Recruiters - Guide
 
Beth Barnham Schema Auditing BrightonSEO Slides.pptx
Beth Barnham Schema Auditing BrightonSEO Slides.pptxBeth Barnham Schema Auditing BrightonSEO Slides.pptx
Beth Barnham Schema Auditing BrightonSEO Slides.pptx
 
The Big SEO Migration - Learnings from a first time hiker
The Big SEO Migration - Learnings from a first time hiker The Big SEO Migration - Learnings from a first time hiker
The Big SEO Migration - Learnings from a first time hiker
 
W3C Tutorial on Semantic Web and Linked Data at WWW 2013
W3C Tutorial on Semantic Web and Linked Data at WWW 2013W3C Tutorial on Semantic Web and Linked Data at WWW 2013
W3C Tutorial on Semantic Web and Linked Data at WWW 2013
 
Scaling Search Campaigns With Bulk Uploads and Ad Customizers (SMX 2023)
Scaling Search Campaigns With Bulk Uploads and Ad Customizers (SMX 2023)Scaling Search Campaigns With Bulk Uploads and Ad Customizers (SMX 2023)
Scaling Search Campaigns With Bulk Uploads and Ad Customizers (SMX 2023)
 
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
Personalized Job Recommendation System at LinkedIn: Practical Challenges and ...
 

Destaque

Search Ranking Across Heterogeneous Information Sources
Search Ranking Across Heterogeneous Information SourcesSearch Ranking Across Heterogeneous Information Sources
Search Ranking Across Heterogeneous Information SourcesViet Ha-Thuc
 
Personalizing Search at LinkedIn
Personalizing Search at LinkedInPersonalizing Search at LinkedIn
Personalizing Search at LinkedInViet Ha-Thuc
 
Learning to Rank: An Introduction to LambdaMART
Learning to Rank: An Introduction to LambdaMARTLearning to Rank: An Introduction to LambdaMART
Learning to Rank: An Introduction to LambdaMARTJulian Qian
 
Learning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional NetworksLearning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional NetworksViet Ha-Thuc
 
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...Amit Sharma
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine kiran palaka
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentationCyanny LIANG
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Sadayuki Furuhashi
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data PlatformAmazon Web Services
 

Destaque (11)

Search Ranking Across Heterogeneous Information Sources
Search Ranking Across Heterogeneous Information SourcesSearch Ranking Across Heterogeneous Information Sources
Search Ranking Across Heterogeneous Information Sources
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
 
Personalizing Search at LinkedIn
Personalizing Search at LinkedInPersonalizing Search at LinkedIn
Personalizing Search at LinkedIn
 
IEEE big data 2015
IEEE big data 2015IEEE big data 2015
IEEE big data 2015
 
Learning to Rank: An Introduction to LambdaMART
Learning to Rank: An Introduction to LambdaMARTLearning to Rank: An Introduction to LambdaMART
Learning to Rank: An Introduction to LambdaMART
 
Learning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional NetworksLearning to Rank Personalized Search Results in Professional Networks
Learning to Rank Personalized Search Results in Professional Networks
 
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
[RecSys '13]Pairwise Learning: Experiments with Community Recommendation on L...
 
Presto: Distributed sql query engine
Presto: Distributed sql query engine Presto: Distributed sql query engine
Presto: Distributed sql query engine
 
Facebook Presto presentation
Facebook Presto presentationFacebook Presto presentation
Facebook Presto presentation
 
Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014Presto - Hadoop Conference Japan 2014
Presto - Hadoop Conference Japan 2014
 
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform(BDT303) Running Spark and Presto on the Netflix Big Data Platform
(BDT303) Running Spark and Presto on the Netflix Big Data Platform
 

Semelhante a Instant search - A hands-on tutorial

Candidate selection tutorial
Candidate selection tutorialCandidate selection tutorial
Candidate selection tutorialYiqun Liu
 
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...Aman Grover
 
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Neo4j
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadatamarkgrover
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkSimon Hughes
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Lucidworks
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Xavier Amatriain
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security datamarkgrover
 
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of DatadipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of DataeXascale Infolab
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systemsTrey Grainger
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentationTao Feng
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
Curtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahooCurtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahoo羽祈 張
 
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersCarlos Toxtli
 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comLucidworks
 
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn SearchStructure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn SearchC4Media
 
Natural Language Query to SQL conversion using Machine Learning Approach
Natural Language Query to SQL conversion using Machine Learning ApproachNatural Language Query to SQL conversion using Machine Learning Approach
Natural Language Query to SQL conversion using Machine Learning ApproachMinhazul Arefin
 

Semelhante a Instant search - A hands-on tutorial (20)

Candidate selection tutorial
Candidate selection tutorialCandidate selection tutorial
Candidate selection tutorial
 
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
 
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
Discovering Emerging Tech through Graph Analysis - Henry Hwangbo @ GraphConne...
 
Data Discovery and Metadata
Data Discovery and MetadataData Discovery and Metadata
Data Discovery and Metadata
 
Role of Data Science in eCommerce
Role of Data Science in eCommerceRole of Data Science in eCommerce
Role of Data Science in eCommerce
 
Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by S...
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
 
Amundsen: From discovering to security data
Amundsen: From discovering to security dataAmundsen: From discovering to security data
Amundsen: From discovering to security data
 
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of DatadipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
dipLODocus[RDF]: Short and Long-Tail RDF Analytics for Massive Webs of Data
 
Reflected intelligence evolving self-learning data systems
Reflected intelligence  evolving self-learning data systemsReflected intelligence  evolving self-learning data systems
Reflected intelligence evolving self-learning data systems
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Curtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahooCurtain call of zooey - what i've learned in yahoo
Curtain call of zooey - what i've learned in yahoo
 
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge WorkersExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
ExperTwin: An Alter Ego in Cyberspace for Knowledge Workers
 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
 
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn SearchStructure, Personalization, Scale: A Deep Dive into LinkedIn Search
Structure, Personalization, Scale: A Deep Dive into LinkedIn Search
 
Data Structures & Algorithms
Data Structures & AlgorithmsData Structures & Algorithms
Data Structures & Algorithms
 
Natural Language Query to SQL conversion using Machine Learning Approach
Natural Language Query to SQL conversion using Machine Learning ApproachNatural Language Query to SQL conversion using Machine Learning Approach
Natural Language Query to SQL conversion using Machine Learning Approach
 

Último

Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating SystemRashmi Bhat
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxsiddharthjain2303
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingBootNeck1
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfRajuKanojiya4
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating SystemRashmi Bhat
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AIabhishek36461
 
The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsDILIPKUMARMONDAL6
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdfCaalaaAbdulkerim
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfROCENODodongVILLACER
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...Amil Baba Dawood bangali
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadaditya806802
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvLewisJB
 

Último (20)

Virtual memory management in Operating System
Virtual memory management in Operating SystemVirtual memory management in Operating System
Virtual memory management in Operating System
 
Energy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptxEnergy Awareness training ppt for manufacturing process.pptx
Energy Awareness training ppt for manufacturing process.pptx
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
System Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event SchedulingSystem Simulation and Modelling with types and Event Scheduling
System Simulation and Modelling with types and Event Scheduling
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
National Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdfNational Level Hackathon Participation Certificate.pdf
National Level Hackathon Participation Certificate.pdf
 
Main Memory Management in Operating System
Main Memory Management in Operating SystemMain Memory Management in Operating System
Main Memory Management in Operating System
 
Past, Present and Future of Generative AI
Past, Present and Future of Generative AIPast, Present and Future of Generative AI
Past, Present and Future of Generative AI
 
The SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teamsThe SRE Report 2024 - Great Findings for the teams
The SRE Report 2024 - Great Findings for the teams
 
Research Methodology for Engineering pdf
Research Methodology for Engineering pdfResearch Methodology for Engineering pdf
Research Methodology for Engineering pdf
 
Risk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdfRisk Assessment For Installation of Drainage Pipes.pdf
Risk Assessment For Installation of Drainage Pipes.pdf
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
home automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasadhome automation using Arduino by Aditya Prasad
home automation using Arduino by Aditya Prasad
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Work Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvvWork Experience-Dalton Park.pptxfvvvvvvv
Work Experience-Dalton Park.pptxfvvvvvvv
 

Instant search - A hands-on tutorial

  • 1. Instant Search - A Hands-on Tutorial ACM SIGIR 2016 Ganesh Venkataraman, Viet Ha-Thuc, Dhruv Arya and Abhimanyu Lad LinkedIn Search 1
  • 3. Where to find information Code - https://github.com/linkedin/instantsearch-tutorial Wiki - https://github.com/linkedin/instantsearch-tutorial/wiki Slack - https://instantsearchtutorial.slack.com/ Slides - will be on the slideshare and we will update the wiki/tweet Twitter - #instantsearchtutorial (twitter.com/search) 3
  • 4. The Plot ● At the end of this tutorial, attendees should: ○ Understand the challenges/constraints faced while dealing with instant search (latency, tolerance to user errors) etc ○ Get a broad overview of the theoretical foundations behind: ■ Indexing ■ Query Processing ■ Ranking and Blending (including personalization) ○ Understand open source options available to put together an ‘end-to-end’ instant search solution ○ Put together an end-to-end solution on their own (with some helper code) 4
  • 5. What would graduation look like? ● Instant result solution built over stackoverflow data ● Built based on open source tools (elasticsearch, typeahead.js) ● Ability to experiment further to modify ranking/query construction 5
  • 6. Final Output from hands on tutorial 6
  • 7. Agenda ● Terminology and Background ● Indexing & Retrieval ○ Instant Results ○ Query Autocomplete ● Ranking ● Hands on tutorial with data from stackoverflow ○ Index and search posts from stackoverflow ○ Play around with ranking 7
  • 8. Agenda ● Terminology and Background ● Indexing & Retrieval ○ Instant Results ○ Query Autocomplete ● Ranking ● Hands on tutorial with data from stackoverflow ○ Index and search xx posts from stackoverflow ○ Play around with ranking 8
  • 9. Terminology - Query Autocomplete ● Intention is to complete the user query 9
  • 10. Terminology - Instant Results ● Get the result to the user as they type the query 10
  • 11. Terminology - Instant Answers ● We will NOT be covering answers for this tutorial 11
  • 12. Terminology - Navigational Query ● Queries where the information need can be satisfied by only one result/document 12
  • 13. Terminology - Exploratory Queries ● Multiple results can potentially satisfy users need 13
  • 14. When to display instant results vs query completion ● LinkedIn product decision ○ when the confidence level is high enough for a particular result, show the result ● What is ‘high enough’ could be application specific and not merely a function of score 14
  • 15. Completing query vs instant results ● “lin” => first degree connection with lots of common connections, same company etc. ● “link” => better off completing the query (even with possible suggestions for verticals) 15
  • 16. Terminology - Blending ● Bringing results from different search verticals (news, web, answers etc) 16
  • 18. Why Instant Search and why now? ● Natural evolution of search ● Users have gotten used to getting immediate feedback ● Mobile devices => need to type less 18
  • 19. Agenda ● Terminology and Background ● Indexing & Retrieval ○ Instant Results ○ Query Autocomplete ● Ranking ● Hands on tutorial with data from stackoverflow ○ Index and search xx posts from stackoverflow ○ Play around with ranking 19
  • 20. Instant Search at Scale ● Constraints (example: LinkedIn people search) ○ Scale - ability to store and retrieve 100’s of Millions/Billions of documents via prefix ○ Fast - ability to return results quicker than typing speed ○ Resilience to user errors ○ Personalized 20
  • 21. Instant Search via Inverted Index ● Scaleable ● Ability to form complex boolean queries ● Open source availability (Lucene/Elasticsearch) ● Easy to add metadata (payloads, forward index) 21
  • 22. The Search Index Inverted Index: Mapping from (search) terms to list of documents (they are present in) Forward Index: Mapping from documents to metadata about them 22
  • 24. Candidate selection ● Posting lists ○ “abraham” => {5, 7, 8, 23, 47, 101} ○ “lincoln” => {7, 23, 101, 151} ● Query = “abraham AND lincoln” ○ Retrieved set => {7, 23, 101} 24
  • 25. Prefix indexing ● Instant search, query != ‘abraham’ ● Queries = [‘a’, ‘ab’, … , ‘abraham’] ● Need to index each prefix ● Elasticsearch refers to this form of tokenization as ‘edge n-gram’ ● Issues ○ Bigger index ○ Big posting list for short prefixes => much higher number of documents retrieved 25
  • 26. Early Termination ● We cannot ‘afford’ to retrieve and score all documents that match the query ● We terminate posting list traversal when certain number of documents have been retrieved ● We may miss out on recall 26
  • 27. Static Rank ● Order the posting lists so that documents with high (query independent) prior probability of relevance appears first ● Use application specific logic to rewrite query ● Once the query has achieved a certain number of matches in the posting list, we stop. This number of matches is referred to as “early termination limit” 27
  • 28. Static Rank Example - People Search at LinkedIn ● Some factors that go into static rank computation ○ Member popularity measure by profile views both within and outside network ○ Spam in person’s name ○ Security and Spam. Downgrade profiles flagged by LinkedIn’s internal security team ○ Celebrities and Influencers 28
  • 29. Static Rank Case study - People Search at LinkedIn 29 Recall Early termination limit
  • 30. Resilience to Spelling errors ● We focus on names as they can be (often) hard to get right (ex: “marissa mayer” or “marissa meyer”?) ● Names vs traditional spelling errors: ○ “program manager” vs “program manger” - only one of these is right ○ “Mayer” vs “Meyer” - no clear source of truth ● Edit distance based approaches can be wrong both ways: ○ “Mohamad” and “Muhammed” are 3 edits apart and yet plausible variants ○ “Jeff” and “Joff” are 1 edit distance apart, but highly unlikely to be plausible variants of the same name 30
  • 31. LinkedIn Approach - Name clusters Solution touches indexing, query reformulation and ranking 31
  • 32. Name Clusters - Two step clustering ● Course level clustering ○ Uses double metaphone + some known heuristics ○ Focus on recall ● Fine level clustering ○ similarity function that takes into account Jaro-Winkler distance ○ User session data 32
  • 33. Overall approach for Name Clusters ● Indexing ○ Store clusterID for each cluster in a separate field (say ‘NAMECLUSTERID’) ○ ‘Cris’ and ‘chris’ in same name cluster CHRISID ○ NAME:cris NAMECLUSTERID:chris ● Query processing ○ user query = ‘chris’ ○ Rewritten query = ?NAME:chris ?NAMECLUSTERID:chris ● Ranking ○ Different weights for ‘perfect match’ vs. ‘name cluster match’ 33
  • 34. Instant Results via Inverted Index - Some Takeaways ● Used for documents at very high scale ● Use early termination ● Approach the problem as a combination of indexing/query processing/ranking 34
  • 35. Agenda ● Terminology and Background ● Indexing & Retrieval ○ Instant Results ○ Query Autocomplete ● Ranking ● Hands on tutorial with data from stackoverflow ○ Index and search xx posts from stackoverflow ○ Play around with ranking 35
  • 36. Query Autocomplete - Problem Statement ● Let q = w1 , w2 . . . wk * represent the query with k words, where the kth token is a prefix as denoted by the asterisk ● Goal: Find one or more relevant completions for the query 36
  • 37. Trie ● Used to store an associative array where keys are strings ● Only certain keys and leaves are of interest ● Structure allows for only sharing of prefixes ● Representation not memory efficient 37 An trie of words {space, spark, moth}
  • 38. Finite State Transducers (FST) ● Allows efficient retrieval of completions at runtime ● Can fit entirely into RAM ● Useful when keys have commonalities to them, allowing better compression ● Lucene has support for FSTs* FST for words: software, scala, scalding, spark *Lucene FST implementation based on “Direct Construction of Minimal Acyclic Subsequential Transducers (2001)” by Stoyan Mihov, Denis Maurel 38
  • 39. Query Autocomplete vs. Instant Results ● For query autocomplete corpus of terms remains relatively constant, instant results documents can be continuously added/removed ● Query autocomplete focuses only on prefix based retrieval whereas instant search results utilize complex query construction for retrieval ● Query autocomplete retrieval based off a dictionary hence index can be refreshed periodically instead of real time 39
  • 40. Query Tagging ● Segment query based on recognized entities ● Annotate query with: ○ Named Entity Tags ○ Standardized Identifiers ○ Related Entities ○ Additional Entity Specific Metadata 40
  • 41. Data Processing ● Break queries into recognized entities and individual tokens ● Past querylogs are parsed for recognized entities, tokens and fed into an fst for retrieval of candidate suggestions. 41
  • 42. Retrieval ● All candidate completions over increasingly longer suffixes of the query are used to capture enough context ● Given a query like “linkedin sof*” we look completions for: ○ sof*, linkedin sof* ● Candidates are then provided to the scoring phase. 42
  • 43. Retrieval ● From the above FST, for the query “linkedin sof*” we retrieve the candidates: ○ sof: [software developer, software engineer] ○ linkedin sof: [] 43
  • 44. Payloads ● Each query autocomplete result can have a payload associated with it. ● A payload holds serialized data useful in scoring the autocomplete result 44
  • 45. Fuzzy Matching - LinkedIn Autocomplete 45
  • 46. Fuzzy Matching ● Use levenshtein automata constructed from a word and maximum edit distance ● Based on the automaton and letters input to it, we decide whether to continue or not ● Ex. search for “dpark” (s/d being close on the keyboard) with edit distance 1 = [spark] An index of {space, spark, moth} represented as a trie 46
  • 47. 47
  • 48. 48
  • 50. Agenda ● Terminology and Background ● Indexing & Retrieval ● Ranking ○ Ranking instant results ○ Ranking query suggestions ○ Blending ● Hands on tutorial with data from stackoverflow 50
  • 51. Ranking Challenge ● Short query prefixes ● Context beyond query ○ Personalized context ○ Global context ■ Global popularity ■ Trending 51
  • 52. Hand-Tuned vs. Machine-Learned Ranking ● Hard to manually tune with very large number of features ● Challenging to personalize ● LTR allows leveraging large volume of click data in an automated way 52
  • 53. Agenda ● Terminology and Background ● Indexing & Retrieval ● Ranking ○ Ranking instant results ○ Ranking query suggestions ○ Blending ● Hands on tutorial with data from stackoverflow 53
  • 54. Features ● Text match ○ Match query terms with different fields on documents 54
  • 55. Features ● Document Quality ○ Global Popularity ■ Celebrities ○ Spaminess 55
  • 56. Features ● Social Affinity (personalized features) ○ Network distance between searcher and result ○ Connection Strength ■ Within the same company ■ Common connections ■ From the same school 56
  • 57. Training Data ● Human judgement ● Challenge: ○ Personalization ○ Scale 57
  • 58. Training Data ● Log-based ○ Personalized ○ Available in large quantity ● Position Bias ○ Top-K randomization 58
  • 59. Learning to Rank ▪ Pointwise: Reduce ranking to binary classification LinkedIn Confidential ©2013 All Rights Reserved 59 + + + - + - - - + + - -
  • 60. Learning to Rank ▪ Pointwise: Reduce ranking to binary classification LinkedIn Confidential ©2013 All Rights Reserved 60 + + + - + - - - + + - -
  • 61. Learning to Rank ▪ Pointwise: Reduce ranking to binary classification LinkedIn Confidential ©2013 All Rights Reserved 61 + + + - + - - - + + - - Limitations ▪ Relevant documents associated with different queries are put into the same class
  • 62. Learning to Rank ▪ Pairwise: Reduce ranking to classification of document pairs w.r.t. the same query – {(Q1 , A>B), (Q2 , C>D), (Q3 , E>F)} LinkedIn Confidential ©2013 All Rights Reserved 62
  • 63. Learning to Rank ▪ Pairwise: Reduce ranking to classification of document pairs w.r.t. the same query – {(Q1 , A>B), (Q2 , C>D), (Q3 , E>F)} LinkedIn Confidential ©2013 All Rights Reserved 63
  • 64. Learning to Rank ▪ Pairwise – Limitation: Does not differentiate inversions at top vs. bottom positions LinkedIn Confidential ©2013 All Rights Reserved 64
  • 65. Learning to Rank ▪ Listwise – Directly operate on ranked lists – Optimize listwise objective function, e.g. IR metrics ▪ Mean Average Precision (MAP) ▪ Normalized Discounted Cumulative Gain (NDCG) LinkedIn Confidential ©2013 All Rights Reserved 65
  • 66. Agenda ● Terminology and Background ● Indexing & Retrieval ● Ranking ○ Ranking vertical results ○ Ranking query suggestions ○ Blending ● Hands on tutorial with data from stackoverflow 66
  • 67. Features ● Query Popularity ○ Candidate completion q = s1 , s2 … sk ○ Likelihood q is a query in the query corpus, estimated by N-gram language model Pr(q) = Pr(s1 , s2 … sk ) = Pr(s1 ) * Pr (s2 |s1 ) … P(sk |sk-1 ) 67
  • 68. Features ● Time-sensitive popularity [Shokouhi et al. SIGIR 12] ○ Trending query ○ Periodic Pattern ■ Weekend -> Disneyland ○ Time-series: Forecasted frequencies 68
  • 69. Features ● Recency-based suggestion (Personalized feature) 69
  • 70. Agenda ● Terminology and Background ● Indexing & Retrieval ● Ranking ○ Ranking instant results ○ Ranking query suggestions ○ Blending ● Hands on tutorial with data from stackoverflow 70
  • 72. Blending 72 Company Instant Query Prefix Federator People Instant Query Autocompletion Blender
  • 73. Blending Challenges ● Different verticals associate with different signals ○ People: network distance ○ Groups: time of the last edit ○ Query suggestion: edit distance ● Even common features may not be equally predictive across verticals ○ Popularity ○ Text similarity ● Scores might not be comparable across verticals 73
  • 74. Approaches ● Separate binary classifiers f1 f2 f3 f1 f2 f4 People Jobs Classifier1 Classifier2 74
  • 75. Approaches ● Separate binary classifiers ○ Pros ■ Handle vertical-specific features ■ Handle common features with different predictive powers ○ Cons ■ Need to calibrate output scores of multiple classifiers 75
  • 76. Approaches ● Learning-to-rank - Equal correlation assumption ○ Union feature schema and padding zeros to non-applicable features ○ Equal correlation assumption f1 f2 f3 f1 f2 f4 People Jobs f1 f2 f3 f4 =0 f1 f2 f3 =0 f4 Model 76
  • 77. Approaches ● Learning-to-rank - Equal correlation assumption ○ Pros ■ Handle vertical-specific features ■ Comparable output scores across verticals ○ Cons ■ Assume common features are equally predictive of vertical relevance 77
  • 78. Approaches ● Learning-to-rank - Without equal correlation assumption f1 f2 f3 f4 f5 f6 People Jobs f1 f2 f3 0 0 0 0 f4 Model 0 0 f5 f6 People vertical features Job vertical features 78
  • 79. Approaches ● Learning-to-rank - Without equal correlation assumption ○ Pros ■ Handle vertical-specific features ■ Without equal correlation assumption -> auto learn evidence-vertical association ■ Comparable output scores across verticals ○ Cons ■ The number of features is huge ● Overfitting ● Require a huge amount of training data 79
  • 80. Evaluation ● “If you can’t measure it, you can’t improve it” ● Metrics ○ Successful search rate ○ Number of keystrokes per search: query length + clicked result rank 80
  • 81. Take-Aways ● Speed ○ Instant results: Early termination ○ Autocompletion: FST ● Tolerance to spelling errors ● Relevance: go beyond query prefix ○ Personalized context ○ Global context 81
  • 82. Agenda ● Terminology and Background ● Indexing & Retrieval ● Ranking ○ Ranking instant results ○ Ranking query suggestions ○ Blending ● Hands on tutorial with data from stackoverflow 82
  • 83. Dataset ● Posts and Tags from stackoverflow.com ● Posts are questions posted by users and contains following attributes ○ Title ○ Score ● Tags help identify a suitable category for the post and contain following attributes ○ Tag Name ○ Count ● Each post can have a maximum of five tags 83
  • 86. The End Product 86 Search Query Input Query Autocomplete Instant Results
  • 89. Assignments ● Assignments available on Github ● Each assignment builds on a component of the end product ● Tests are provided at end of each assignment for validation ● Finished files available for reference (if needed) ● Raise hand if you need help or have a question 89
  • 90. Assignment 0 Setting up the machine 90
  • 91. Assignment 1 Building Instant Search and Autocomplete Index 91
  • 92. Take-Aways ● Index should be used primarily for retrieval ● Data sources should be kept separate from the index ● Building an index is not instantaneous hence have replicas in production ● Real world indexes seldom can be stored in a single shard 92
  • 94. Take-Aways ● Make incremental additions ● Allow for relevance changes to be compared ● Document relevance changes ● Do side by side evaluations 94
  • 95. Assignment 3 Visualizing the blended result set 95
  • 97. Summary ● Theoretical understanding of indexing, retrieval and ranking for instant search results and query autocomplete ● Insights and learnings from linkedin.com case studies ● Working end-to-end implementation of query autocomplete and instant results with stackoverflow.com dataset 97
  • 98. 98