SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Shakti          Daniel




     formation Retrieval: Search at LinkedIn
Shakti Sinha               Daniel Tunkelang
Head, Search Relevance     Head, Query Understanding

    Recruiting Solutions                               1
Why do 200M+ people use LinkedIn?




                                    2
People use LinkedIn because of other people.




                                          3
Search helps members find and be found.




                                          4
Rich collection of professional content.




                                           5
Every search is personalized.




                                6
Let’s talk a bit about how it all works.

§  Query Understanding

§  Search Spam

§  Unified Search

More at http://data.linkedin.com/search.



                                           7
Query Understanding




                      8
People are semi-structured objects.




  for i in [1..n]!
    s ← w 1 w 2 … w i!
    if Pc(s) > 0!
      a ← new Segment()!
      a.segs ← {s}!
      a.prob ← Pc(s)!
      B[i] ← {a}!
    for j in [1..i-1]!
       for b in B[j]!
         s ← wj wj+1 … wi!
         if Pc(s) > 0!
            a ← new Segment()!
            a.segs ← b.segs U {s}!
            a.prob ← b.prob * Pc(s)!
            B[i] ← B[i] U {a}!
     sort B[i] by prob!
     truncate B[i] to size k!



                                       9
Word sense is contextual.




                            10
Understand queries as early as possible.




                                           11
Query structure has many applications.

§    Boost results that match query interpretation.
§    Bucket search log analysis by query classes.
§    Query rewriting specific to query classes.
§    …



      Query understanding focuses on set-level metrics.

                  Not just about best answer,
                  but getting to best question.


                                                          12
Search Spam




              13
Let’s look at a search spammer.




                                  14
Summary is verbose but legitimate.




                                     15
But then comes the keyword stuffing.




                                       16
How we train our search spam classifier.

§  Find the queries targeted by spammers.
   –  10,000 most common non-name queries.


§  Look at top results for a generic user.
   –  i.e., show unpersonalized search results.


§  Remove private profiles.
   –  Members first! Can’t sacrifice privacy to fight spammers.


§  Label data by crowdsourcing.
   –  Relevance is subjective, but spam is relatively objective.


                                                                   17
ROC curve for spam thresholding.

                   1
     Spam score
      threshold   0.9

                  0.8
          a
                  0.7

                  0.6

                  0.5
           b
                  0.4

                  0.3

     0<a<b<1      0.2

                  0.1

                   0
                        0   0.1   0.2   0.3   0.4   0.5   0.6   0.7   0.8   0.9   1




                                                                                      18
Integrate spamminess into relevance score.

§  Spam model yields a probability between 0 and 1.

§  Use spam score as piecewise linear factor:
      if score < spammin:
           # not a spammer
           relevance *= 1.0
      elif score > spammax:
           # spammer
           relevance *= 0.0
      else:
           # linear function of spamminess
           relevance *= (spammax - score) / (spammax - spammin)


                                                                  19
Spam is an arms race.

§  We can’t reveal precisely which features we use for spam
    detection, or spammers will work around them.

§  Spammers will try to reverse-engineer us anyway.

§  Personalization benefits us and our legitimate users – it’s
    hard to spam your way to high personalized ranking.

§  Fighting spam is all about making the investment less
    profitable for the spammer.



                                                              20
Unified Search




                 21
Un-Unified Search




                    22
Introducing LinkedIn Unified Search!

Goal: make all of our content more discoverable.

Three new features:
§  Query Auto-Complete
§  Content Type Suggestions
§  Unified Search Result Page




                                                   23
Query Auto-Complete




                      24
Best completion not always the most popular.

§  In a heavy-tailed distribution, even the most popular
    queries account for a small fraction of distribution.

§  We don’t want to suggest generic queries that would
    produce useless results.
   –  e.g., c -> company, j -> jobs


§  Goal is to not only to infer user’s intent but also suggest a
    search that yields relevant results across content types.




                                                                25
Content Type Suggestions




                           26
How we compute content type suggestions.

§  Rank content types by likelihood of a successful search.
   –  Consider click-through behavior as well as downstream actions.


§  Bootstrap using what we know from pre-unified search
    behavior.
   –  Tricky part is compensating for findability bias.


§  Continuously evaluate and collect feedback through user
    behavior.
   –  E.g., members using the left rail to select a particular vertical.




                                                                           27
Unified Search Result Page




                             28
Intent Detection and Page Construction

§  Relevance is now a two-part computation:

              P(Content Type | User, Query)
                             x
          P(Document | User, Query, Content Type)

§  Intent detection comes first: inefficient to send all queries
    to all verticals.

§  Secondary components introduce diversity.


                                                                    29
Summary

§    Personalize every search and leverage structure.
§    Understand queries as early as possible.
§    Fight the spammers that be.
§    Unify and simplify the search experience.


             Goal: help LinkedIn’s 200M+
             members find and be found.




                                                         30
Thank you!




             31
Want to learn more?

§  Check out http://data.linkedin.com/search.

§  Contact us:
     –  Shakti: ssinha@linkedin.com
                http://linkedin.com/in/sdsinha

   –  Daniel: dtunkelang@linkedin.com
              http://linkedin.com/in/dtunkelang

§  Did we mention that we’re hiring?


                                                  32

Mais conteúdo relacionado

Destaque

Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
Daniel Tunkelang
 

Destaque (16)

MongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL DatabaseMongoDB: Advantages of an Open Source NoSQL Database
MongoDB: Advantages of an Open Source NoSQL Database
 
User Acquisition Strategy Guide
User Acquisition Strategy Guide User Acquisition Strategy Guide
User Acquisition Strategy Guide
 
Natural Language Processing (NLP), Search and Wearable Technology
Natural Language Processing (NLP), Search and Wearable TechnologyNatural Language Processing (NLP), Search and Wearable Technology
Natural Language Processing (NLP), Search and Wearable Technology
 
E-Tools to Help College Students with Career Planning and Job Search
E-Tools to Help College Students with Career Planning and Job SearchE-Tools to Help College Students with Career Planning and Job Search
E-Tools to Help College Students with Career Planning and Job Search
 
LinkedIn for Students
LinkedIn for StudentsLinkedIn for Students
LinkedIn for Students
 
Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them | Talent Conn...
 
Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...
Students on LinkedIn: What They're Doing and How to Engage Them I Talent Conn...
 
Machine Learning for Search at LinkedIn
Machine Learning for Search at LinkedInMachine Learning for Search at LinkedIn
Machine Learning for Search at LinkedIn
 
Learn to Rank search results
Learn to Rank search resultsLearn to Rank search results
Learn to Rank search results
 
Get LinkedIn: How to use LinkedIn to Get Connected
Get LinkedIn: How to use LinkedIn to Get ConnectedGet LinkedIn: How to use LinkedIn to Get Connected
Get LinkedIn: How to use LinkedIn to Get Connected
 
Social Media Summer School: Use LinkedIn to Get Connected
Social Media Summer School: Use LinkedIn to Get ConnectedSocial Media Summer School: Use LinkedIn to Get Connected
Social Media Summer School: Use LinkedIn to Get Connected
 
Linkedin for students
Linkedin for studentsLinkedin for students
Linkedin for students
 
Linkedin for high school students
Linkedin for high school studentsLinkedin for high school students
Linkedin for high school students
 
Joining, Searching, & Interacting on LinkedIn Groups
Joining, Searching, & Interacting on LinkedIn GroupsJoining, Searching, & Interacting on LinkedIn Groups
Joining, Searching, & Interacting on LinkedIn Groups
 
Debt collection letter - What do I do?
Debt collection letter - What do I do?Debt collection letter - What do I do?
Debt collection letter - What do I do?
 
Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem Big Data, We Have a Communication Problem
Big Data, We Have a Communication Problem
 

Mais de Daniel Tunkelang

Enterprise Intelligence
Enterprise IntelligenceEnterprise Intelligence
Enterprise Intelligence
Daniel Tunkelang
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine Learning
Daniel Tunkelang
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?
Daniel Tunkelang
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
Daniel Tunkelang
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
Daniel Tunkelang
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
Daniel Tunkelang
 

Mais de Daniel Tunkelang (20)

Query Understanding and Ecommerce
Query Understanding and EcommerceQuery Understanding and Ecommerce
Query Understanding and Ecommerce
 
Semantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce QueriesSemantic Equivalence of e-Commerce Queries
Semantic Equivalence of e-Commerce Queries
 
Helping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query UnderstandingHelping Searchers Satisfice through Query Understanding
Helping Searchers Satisfice through Query Understanding
 
MMM, Search!
MMM, Search!MMM, Search!
MMM, Search!
 
Enterprise Intelligence
Enterprise IntelligenceEnterprise Intelligence
Enterprise Intelligence
 
Query Understanding: A Manifesto
Query Understanding: A ManifestoQuery Understanding: A Manifesto
Query Understanding: A Manifesto
 
Where should you put your data scientists?
Where should you put your data scientists?Where should you put your data scientists?
Where should you put your data scientists?
 
Data Science: A Mindset for Productivity
Data Science: A Mindset for ProductivityData Science: A Mindset for Productivity
Data Science: A Mindset for Productivity
 
My Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine LearningMy Three Ex’s: A Data Science Approach for Applied Machine Learning
My Three Ex’s: A Data Science Approach for Applied Machine Learning
 
Web science - How is it different?
Web science - How is it different?Web science - How is it different?
Web science - How is it different?
 
Social Search in a Professional Context
Social Search in a Professional ContextSocial Search in a Professional Context
Social Search in a Professional Context
 
Search as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal JourneySearch as Communication: Lessons from a Personal Journey
Search as Communication: Lessons from a Personal Journey
 
Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?Enterprise Search: How do we get there from here?
Enterprise Search: How do we get there from here?
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Information, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of NeedsInformation, Attention, and Trust: A Hierarchy of Needs
Information, Attention, and Trust: A Hierarchy of Needs
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Scale, Structure, and Semantics
Scale, Structure, and SemanticsScale, Structure, and Semantics
Scale, Structure, and Semantics
 
Strata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of MicroworkStrata 2012: Humans, Machines, and the Dimensions of Microwork
Strata 2012: Humans, Machines, and the Dimensions of Microwork
 
Recommendations as a Conversation with the User
Recommendations as a Conversation with the UserRecommendations as a Conversation with the User
Recommendations as a Conversation with the User
 
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedInKeeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
Keeping It Professional: Relevance, Recommendations, and Reputation at LinkedIn
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Último (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

[In]formation Retrieval: Search at LinkedIn

  • 1. Shakti Daniel formation Retrieval: Search at LinkedIn Shakti Sinha Daniel Tunkelang Head, Search Relevance Head, Query Understanding Recruiting Solutions 1
  • 2. Why do 200M+ people use LinkedIn? 2
  • 3. People use LinkedIn because of other people. 3
  • 4. Search helps members find and be found. 4
  • 5. Rich collection of professional content. 5
  • 6. Every search is personalized. 6
  • 7. Let’s talk a bit about how it all works. §  Query Understanding §  Search Spam §  Unified Search More at http://data.linkedin.com/search. 7
  • 9. People are semi-structured objects. for i in [1..n]! s ← w 1 w 2 … w i! if Pc(s) > 0! a ← new Segment()! a.segs ← {s}! a.prob ← Pc(s)! B[i] ← {a}! for j in [1..i-1]! for b in B[j]! s ← wj wj+1 … wi! if Pc(s) > 0! a ← new Segment()! a.segs ← b.segs U {s}! a.prob ← b.prob * Pc(s)! B[i] ← B[i] U {a}! sort B[i] by prob! truncate B[i] to size k! 9
  • 10. Word sense is contextual. 10
  • 11. Understand queries as early as possible. 11
  • 12. Query structure has many applications. §  Boost results that match query interpretation. §  Bucket search log analysis by query classes. §  Query rewriting specific to query classes. §  … Query understanding focuses on set-level metrics. Not just about best answer, but getting to best question. 12
  • 14. Let’s look at a search spammer. 14
  • 15. Summary is verbose but legitimate. 15
  • 16. But then comes the keyword stuffing. 16
  • 17. How we train our search spam classifier. §  Find the queries targeted by spammers. –  10,000 most common non-name queries. §  Look at top results for a generic user. –  i.e., show unpersonalized search results. §  Remove private profiles. –  Members first! Can’t sacrifice privacy to fight spammers. §  Label data by crowdsourcing. –  Relevance is subjective, but spam is relatively objective. 17
  • 18. ROC curve for spam thresholding. 1 Spam score threshold 0.9 0.8 a 0.7 0.6 0.5 b 0.4 0.3 0<a<b<1 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 18
  • 19. Integrate spamminess into relevance score. §  Spam model yields a probability between 0 and 1. §  Use spam score as piecewise linear factor: if score < spammin: # not a spammer relevance *= 1.0 elif score > spammax: # spammer relevance *= 0.0 else: # linear function of spamminess relevance *= (spammax - score) / (spammax - spammin) 19
  • 20. Spam is an arms race. §  We can’t reveal precisely which features we use for spam detection, or spammers will work around them. §  Spammers will try to reverse-engineer us anyway. §  Personalization benefits us and our legitimate users – it’s hard to spam your way to high personalized ranking. §  Fighting spam is all about making the investment less profitable for the spammer. 20
  • 23. Introducing LinkedIn Unified Search! Goal: make all of our content more discoverable. Three new features: §  Query Auto-Complete §  Content Type Suggestions §  Unified Search Result Page 23
  • 25. Best completion not always the most popular. §  In a heavy-tailed distribution, even the most popular queries account for a small fraction of distribution. §  We don’t want to suggest generic queries that would produce useless results. –  e.g., c -> company, j -> jobs §  Goal is to not only to infer user’s intent but also suggest a search that yields relevant results across content types. 25
  • 27. How we compute content type suggestions. §  Rank content types by likelihood of a successful search. –  Consider click-through behavior as well as downstream actions. §  Bootstrap using what we know from pre-unified search behavior. –  Tricky part is compensating for findability bias. §  Continuously evaluate and collect feedback through user behavior. –  E.g., members using the left rail to select a particular vertical. 27
  • 29. Intent Detection and Page Construction §  Relevance is now a two-part computation: P(Content Type | User, Query) x P(Document | User, Query, Content Type) §  Intent detection comes first: inefficient to send all queries to all verticals. §  Secondary components introduce diversity. 29
  • 30. Summary §  Personalize every search and leverage structure. §  Understand queries as early as possible. §  Fight the spammers that be. §  Unify and simplify the search experience. Goal: help LinkedIn’s 200M+ members find and be found. 30
  • 32. Want to learn more? §  Check out http://data.linkedin.com/search. §  Contact us: –  Shakti: ssinha@linkedin.com http://linkedin.com/in/sdsinha –  Daniel: dtunkelang@linkedin.com http://linkedin.com/in/dtunkelang §  Did we mention that we’re hiring? 32