O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Fast, Lenient, and Accurate
Building Personalized Instant Search Experience at LinkedIn
Ganesh Venkataraman, Abhi Lad, Lin...
Agenda
● LinkedIn
● LinkedIn Search
○ Navigational vs Exploratory searches
○ Typeahead vs SERP
● Big picture and problem s...
LinkedIn – Professional Identity
LinkedIn – Professional Graph
LinkedIn – Jobs
LinkedIn – And much more...
Companies
Skills
Professional Content
LinkedIn – Massive Scale
LinkedIn Search
Navigational Search
Looking for someone specific
by name.
Query has a single correct
result.
Exploratory Search
Finding people that match a
given set of criteria.
Multiple results match the
user’s query.
Instant Search – Search-as-you-type
Satisfy navigational searches:
Show instant search results.
Help frame exploratory sea...
Big Picture
Partial query
Instant results Autocomplete
Search suggestions
Query tagger
Full-text search
Search results
Man...
Big Picture
Partial query
Instant results Autocomplete
Search suggestions
Query tagger
Full-text search
Search results
Man...
Problem Statement
Partial query
Instant results Autocomplete
Search suggestions
Query tagger
Full-text search
Search resul...
Query Tagging
PERSON
TITLE
(ID=126)
COMPANY
(ID=1337)
Entity types identified:
Person name, job title, company, school, sk...
Query Autocomplete
● Fast
● Relevant and contextual
● Resilient to spelling errors
Query Autocomplete – Offline processing
linkedin software engineer
software engineer
big data
data scientist
data engineer...
Query Autocomplete – Online processing
Two step process:
1. Retrieval (Candidate generation)
User’s query: [big data e]
Ca...
Query Autocomplete – Online processing
Two step process:
2. Scoring (Ranking)
User’s query: [big data e]
Candidate complet...
Query Suggestions – Autocomplete + query tagger
“linke” ⇒ “Linkedin” ⇒ COMPANY
“had” ⇒ “Hadoop” ⇒ SKILL
Instant Results
● Fast retrieval over 450+ million members
● Highly personalized
● Balance personalization & popularity
● ...
Instant Results – Indexing
NAME: richard
PREFIX: r, ri, ric, rich, richa, ...
NAME: branson
PREFIX: b, br, bra, bran, bran...
Instant Results – Indexing
CONN: 1, 10, 15
● Inverted Index
CONN:4 => [1, 10, 15] // Everyone connected to Richard Branson...
Instant Results – Indexing
Early Termination
Problem: A query like [PREFIX:ri] might retrieve too many candidate documents...
Balancing Popularity and Personalization
Query: richard b…
Are you looking for Richard Branson, or a colleague name Richar...
Instant Results – Spelling Variations
weiner ⇔ wiener
catherine ⇔ kathryn
dipak ⇔ deepak
Name Clusters
Offline process to cluster together similar sounding or similarly spelt names.
Two step process:
1. Coarse c...
Instant Results – Spelling Variations
NAME: kathryn
CLUSTER: katharine
Potential queries:
katherine
kathryn
katharine
cath...
Clicked result treated as positive.
All other shown results treated as negative.
Since this is navigational search, we ass...
Conclusions
● Instant search experience
○ Directly satisfy navigational search uses in typeahead via Instant Results
○ Hel...
Future Work
● Personalized query completions
○ m ⇒ machine learning
○ m ⇒ machinist
● Multi-entity query suggestions
○ Now...
Thank You!
LinkedIn – The Economic Graph
LinkedIn Search – SERP (Jobs)
LinkedIn Search – Typeahead
LinkedIn Search – SERP
Próximos SlideShares
Carregando em…5
×

Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

1.913 visualizações

Publicada em

We describe the challenges that we faced while building the instant search experience at LinkedIn, and present techniques that we developed to overcome them. We discuss three aspects of instant search – performance, tolerance to user errors, and accuracy of search results.

Publicada em: Tecnologia
  • Controversial method reveals inner psychology of techniques you can use to get your Ex back! See it now! ➤➤ http://goo.gl/nkXEkK
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

Fast, Lenient, and Accurate – Building Personalized Instant Search Experience at LinkedIn

  1. 1. Fast, Lenient, and Accurate Building Personalized Instant Search Experience at LinkedIn Ganesh Venkataraman, Abhi Lad, Lin Guo, Shakti Sinha LinkedIn
  2. 2. Agenda ● LinkedIn ● LinkedIn Search ○ Navigational vs Exploratory searches ○ Typeahead vs SERP ● Big picture and problem statement ● Instant search – Search-as-you-type ○ Query autocomplete ○ Entity-aware suggestions ○ Instant results ● Conclusions & Future work
  3. 3. LinkedIn – Professional Identity
  4. 4. LinkedIn – Professional Graph
  5. 5. LinkedIn – Jobs
  6. 6. LinkedIn – And much more... Companies Skills Professional Content
  7. 7. LinkedIn – Massive Scale
  8. 8. LinkedIn Search
  9. 9. Navigational Search Looking for someone specific by name. Query has a single correct result.
  10. 10. Exploratory Search Finding people that match a given set of criteria. Multiple results match the user’s query.
  11. 11. Instant Search – Search-as-you-type Satisfy navigational searches: Show instant search results. Help frame exploratory searches: Complete the user’s query and show search suggestions.
  12. 12. Big Picture Partial query Instant results Autocomplete Search suggestions Query tagger Full-text search Search results Manually entered query
  13. 13. Big Picture Partial query Instant results Autocomplete Search suggestions Query tagger Full-text search Search results Manually entered query Focus today: ● Autocomplete ● Search suggestions ● Instant results
  14. 14. Problem Statement Partial query Instant results Autocomplete Search suggestions Query tagger Full-text search Search results Manually entered query Focus today: ● Autocomplete ● Search suggestions ● Instant results How can we build an instant search experience that scales to 450+ million members, and is fast, lenient, and accurate? ● Instant search = Query autocomplete + search suggestions + instant results ● Fast = Search-as-you-type latencies ● Lenient = Handle spelling errors and common variations ● Accurate = Highly relevant and personalized results
  15. 15. Query Tagging PERSON TITLE (ID=126) COMPANY (ID=1337) Entity types identified: Person name, job title, company, school, skills, locations. Key part of query processing! Impacts: autocomplete, spelling correction, search suggestions, query rewriting, ranking. Sequential prediction model (CRF – Conditional Random Fields) Training data: ● Standardized dictionaries (people names, companies, schools, titles, skills, locations) ● Query logs ● Clickthrough (CTR) data ● Crowdsourced labels
  16. 16. Query Autocomplete ● Fast ● Relevant and contextual ● Resilient to spelling errors
  17. 17. Query Autocomplete – Offline processing linkedin software engineer software engineer big data data scientist data engineer expert systems . . [linkedin] [software engineer] Query logs Entities Index FST – Finite State Transducers Compact + fast retrieval + fuzzy match (via Levenstein Automata)
  18. 18. Query Autocomplete – Online processing Two step process: 1. Retrieval (Candidate generation) User’s query: [big data e] Candidates = C(big data e) U C(data e) U C(e) = big data engineer, big data expert systems, big data entry, ... linkedin software engineer software engineer big data data scientist data engineer expert systems . . Query logs
  19. 19. Query Autocomplete – Online processing Two step process: 2. Scoring (Ranking) User’s query: [big data e] Candidate completions: “big data engineer”, “big data expert”, “big data entry” Score(“big data engineer”): P(s1 , s2 , s3 …) ≈ P(s1 )·P(s2 |s1 )·P(s3 |s2 ).. // Bigram language model Use entities : P([engineer] | [big data]) Fall back to words : P(engineer | data)·P(data | big)
  20. 20. Query Suggestions – Autocomplete + query tagger “linke” ⇒ “Linkedin” ⇒ COMPANY “had” ⇒ “Hadoop” ⇒ SKILL
  21. 21. Instant Results ● Fast retrieval over 450+ million members ● Highly personalized ● Balance personalization & popularity ● Resilient to spelling variations
  22. 22. Instant Results – Indexing NAME: richard PREFIX: r, ri, ric, rich, richa, ... NAME: branson PREFIX: b, br, bra, bran, brans, ... ● Inverted Index (Maps token to list of docs that contain that token): NAME:richard => [1, 4, 10, 15, …] // Everyone named “richard” PREFIX:ri => [1, 2, 4, 7, 10, 15, …] // Everyone whose name starts with “ri” … ● Retrieval approach User’s query – richard b Rewritten query – +NAME:richard +PREFIX:b ● Prefix-based tokenization: DOCID 4 (posting lists)
  23. 23. Instant Results – Indexing CONN: 1, 10, 15 ● Inverted Index CONN:4 => [1, 10, 15] // Everyone connected to Richard Branson CONN:1 => [4, ...] CONN:10 => [4, ...] ... ● Retrieval approach User’s query – richard b Rewritten query – +NAME:richard +PREFIX:b +CONN:1 (Everyone named richard b… and connected to User:1) ● Connections Index: DOCID 4
  24. 24. Instant Results – Indexing Early Termination Problem: A query like [PREFIX:ri] might retrieve too many candidate documents. How can we retrieve the most promising documents first so that we don’t need to score all of them? Static Rank: Order documents based on their prior (query independent) likelihood of relevance: A combination of: ● Profile views ● Spam and security related scores ● Editorial rules (Celebrities, influencers, …) numToScore: The number of documents to retrieve and score for any query
  25. 25. Balancing Popularity and Personalization Query: richard b… Are you looking for Richard Branson, or a colleague name Richard Burton? (Assume searcher’s ID = 1) Rewritten Query: ● +NAME:richard +PREFIX:b +CONN:1 // Too restrictive. Only find searcher’s connections. ● +NAME:richard +PREFIX:b ?CONN:1[50%] // Try to retrieve 50% results from searcher’s connections Instant Results – Retrieval Custom search operator: “Weighted OR”
  26. 26. Instant Results – Spelling Variations weiner ⇔ wiener catherine ⇔ kathryn dipak ⇔ deepak
  27. 27. Name Clusters Offline process to cluster together similar sounding or similarly spelt names. Two step process: 1. Coarse clustering (optimized for broad coverage) Normalization: repeated chars, accented chars, common phonetic variations (c ⇔ k, ph ⇔ f) Combination of edit distance & double metaphone (sound) E.g. (dipak = deepak), (wiener = weiner), (catherine = kathryn), (jeff = joff) 2. Fine-grained clustering (optimized for precision) Split up clusters based on more sophisticated rules Position and character-aware edit distance Query reformulation data (q1 → q2 → click) E.g. (jeff ≠ joff) Instant Results – Spelling Variations
  28. 28. Instant Results – Spelling Variations NAME: kathryn CLUSTER: katharine Potential queries: katherine kathryn katharine catharine Rewritten queries: ?NAME:katherine ?CLUSTER:katharine ?NAME:kathryn ?CLUSTER:katharine ?NAME:katharine ?CLUSTER:katharine ?NAME:catharine ?CLUSTER:katharine Either match original query term or match the name cluster Query time Indexing time
  29. 29. Clicked result treated as positive. All other shown results treated as negative. Since this is navigational search, we assume there’s only 1 correct result => low presentation bias. Learning to Rank (Machine-learned ranking) Training data ● Click data from previous typeahead sessions ● <searcher, query, doc> ⇒ positive/negative Features / signals ● Textual match against various fields ● Network distance, number of shared connections ● Global popularity ● Compound features Instant Results – Scoring + – – –
  30. 30. Conclusions ● Instant search experience ○ Directly satisfy navigational search uses in typeahead via Instant Results ○ Help the user frame exploratory search queries via Query Autocomplete & Search Suggestions ● Combination of techniques ○ Query tagger for entity extraction – “Things not Strings” ○ FST-based query completion ○ Inverted index-based instant results + Early termination + Weighted OR ○ Name clusters for fuzzy name matching
  31. 31. Future Work ● Personalized query completions ○ m ⇒ machine learning ○ m ⇒ machinist ● Multi-entity query suggestions ○ Now : [linkedin] ⇒ “Find people who work at LinkedIn” ○ Future : [linkedin data scientist] ⇒ “Find data scientists at LinkedIn” ● Better blending ○ Autocomplete + query suggestions + instant results ○ Query features – what does the query mean? ○ Results features – what results come back from each system?
  32. 32. Thank You!
  33. 33. LinkedIn – The Economic Graph
  34. 34. LinkedIn Search – SERP (Jobs)
  35. 35. LinkedIn Search – Typeahead
  36. 36. LinkedIn Search – SERP

×