O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a navegar o site, você aceita o uso de cookies. Leia nosso Contrato do Usuário e nossa Política de Privacidade.

O SlideShare utiliza cookies para otimizar a funcionalidade e o desempenho do site, assim como para apresentar publicidade mais relevante aos nossos usuários. Se você continuar a utilizar o site, você aceita o uso de cookies. Leia nossa Política de Privacidade e nosso Contrato do Usuário para obter mais detalhes.

O slideshow foi denunciado.

Gostou da apresentação? Compartilhe-a!

- Fast, Lenient, and Accurate – Build... by Abhimanyu Lad 1916 views
- Query Understanding at LinkedIn [Ta... by Abhimanyu Lad 1316 views
- Find and be Found: Information Retr... by Daniel Tunkelang 14440 views
- [In]formation Retrieval: Search at ... by Daniel Tunkelang 49758 views
- Search Quality at LinkedIn by Daniel Tunkelang 21117 views
- Better Search Through Query Underst... by Daniel Tunkelang 23417 views

958 visualizações

Publicada em

ECIR 2013 workshop keynote

Sem downloads

Visualizações totais

958

No SlideShare

0

A partir de incorporações

0

Número de incorporações

22

Compartilhamentos

0

Downloads

12

Comentários

0

Gostaram

1

Nenhuma incorporação

Nenhuma nota no slide

- 1. Recruiters, Job Seekers and Spammers:Innovations in Job Search at LinkedIn Daria Sorokina Senior Data Scientist LinkedIn
- 2. Part I: Recruiters“Multiple Objective Optimization in RecommendationSystems”, Mario Rodriguez, Christian Posse, EthanZhang. RecSys‟12
- 3. TalentMatch Job Posting Member Profiles Ranked Talent Talent Match
- 4. TalentMatch Model Job Postingtitle industry …geo descriptioncompany functional area Text similarity features CandidateGeneral Current Positionexpertise titlespecialties summaryeducation tenure lengthheadline industrygeo functional areaexperience … The model can be trained on user activity signals like job ad clicks or job applications
- 5. TalentMatch Utility = fn(email rate, reply rate) Email Rate Recruiter Reply Job Problem! Rate seeker?
- 6. Job Seeker Intent PASSIVE NON-JOB- SEEKER ACTIVEModel: time till the job changeo How long will this person stay in this job after this date?o Trained on past job positions from our users profileso Accelerated failure time (AFT) modelo æ ö Ti = exp çå bk xik + sei ÷ è k ø
- 7. Job-SeekerFeatureExample:Attrition byIndustry Probability Time
- 8. TalentMatch Utility fn(email rate, reply rate)Job-Seeking Intent:16x reply rate oncareer-related mail Reply Rate
- 9. How: ControlledRe-ranking Ranking Score DistributionsTalent Match rankingMatch Score1, Item X, 0.98, Non-Seeker2, Item Y, 0.91, Non-Seeker--------------------------------------- Divergenc3, Item Z, 0.89, Active e score Re-ranking function f() optimize for bothImproved rankingMatch Score, Reranking Score1, Item X, 0.98, 0.98, Non-Seeker Objective Score:2, Item Z, 0.89, 0.93, Active #Active in top N--------------------------------------------3, Item Y, 0.91, 0.91, Non-Seeker
- 10. Part II: Job SeekersLearning to Rank. Fast and personalized.
- 11. Job Search.Query “Data Scientist LinkedIn”
- 12. Learning To Rank Regular approach – A data point is a pair: {Query, Document} – Data label: “Is this document relevant for this query?” Can be done by crowdsourcing Job Search reality – A data point is a triple: {Query, Job position, User} – Data label: “Is this job relevant for this user who asked this query?” Depends on the user‟s location, industry, seniority… Too much to ask from a random person Have to collect labels from user signals
- 13. We use simplified version of FairPairs(Radlinski, Joachims AAAI‟06) Clicked! ✔flipped ✗ Each pair is flipped with a 50% chance ✔not flipped Choose pairs where ✔ only the lower document is clicked ✗ label 0not flipped Save 1 positive (lower) ✔ label 1 and 1 negative (upper) results for the labeled ✗ data set flipped ✗
- 14. Fair Pairs data is not enough for training The user clicks or skips only whatever is shown Bad results are not shown So there will be no “really bad” negatives in the training data We need to add them! For queries with many results, add all results from the last page as “easy negatives” label 0 label 0 label 0 … … label 0
- 15. Learning To Rank – Training a Model Best models for LTR are complex ensembles of trees – See results of Yahoo Learning to Rank „10 competition – LambdaMART, BagBoo, Additive Groves, MatrixNet … Complex models come at a cost – It takes long to calculate predictions – Requires a lot of optimization, often used with multi-level ranking Can we train a simple model that will resemble a complex one? – Train a complex model – Get insights on what it looks like – Modify a simple model accordingly
- 16. Training a Simple Model using a Complex Model Base simple model – logistic or linear regression p log = b0 + b1 x1 + b2 x2 +... + bn xn 1- p – Does not handle well features with non-linear effects – Does not handle interactions (e.g., if-then-else rules) Target complex model – Additive Groves – (Sorokina, Caruana, Riedewald ECML‟07)(1/N)· +…+ + (1/N)· +…+ +…+ (1/N)· +…+ – Comes with interaction detection and effect visualization tools
- 17. Improving LR – Feature Transformations Additive Groves can model and visualize non-linear effects Approximate the effect curve average prediction with a polynomial transform T(x) – anything simple will do Apply T(x) to the original feature values feature values average prediction Now the feature effect is linear Regression model will love it! b0 + b1 T(x1 )+ b2 x2 +... + bn xn T(x) values
- 18. Improving LR – Interaction Splits Additive Groves‟ interaction detection tool produces a list of strong interactions and corresponding joint effect plots average prediction X2=1 Effect of X1 is stronger when X2 = 0 Simple regression will not capture this Often such X2 interacts with other features as well values of feature X1 X2=? Solution: Build separate models for different values of X2 b0 + b1 x1 +... + bn xn a0 + a1 x1 +...+ an xn
- 19. Improving LR – Tree with LR leaves and transforms Both operations (effect transforms and interaction splits) can be applied multiple times in any order Resulting model – a simple tree with regression model leaves X2=? b0 + b1 T(x1 )+...+ bn xn X10< 0.1234 ? a0 + a1 P(x1 )+...+ anQ(xn ) g 0 + g1 R(x1 )+...+ g nQ(xn ) Gives a significant boost to the performance of the basic LR model
- 20. TreeExtra package A set of machine learning tools – Additive Groves ensemble – Interaction detection – Effect and interaction visualization http://additivegroves.net – Created by Daria Sorokina while in Cornell, CMU, Yandex, LinkedIn from 2006 to 2013
- 21. Part III: SpammersFighting black SEO
- 22. Search Spam
- 23. Search Spam
- 24. Search Spam
- 25. Training data for the search spam classifier Find the queries targeted by spammers. – 10,000 most common non-name queries. – Spammers love optimizing for [marketing] – But not so much for [david smith] Look at top results for a generic user. – i.e., show unpersonalized search results. Label data by crowdsourcing. – Definition of spam is non-personalized Train a model – Spam scores are recalculated offline once in a while – So the model complexity is not an issue – Additive Groves works well. (Could use any ensemble of trees)
- 26. ROC curve. Choosing thresholds. 1Spam score threshold 0.9 0.8 a 0.7 0.6 0.5 b 0.4 0.30<a<b<1 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1
- 27. Integrating the Spam Score into Relevance Spam model yields a probability between 0 and 1. Convert spam score into a factor – [0.0 <= score <= a] not a spammer, factor = 1.0 – [b <= score <= 1.0] Spammer factor = 0.0 – [a <= score <= b] Suspicious linearly scale score from [a, b] to [1, 0] Multiply relevance score by factor
- 28. We are hiring!

Nenhum painel de recortes público que contém este slide

Parece que você já adicionou este slide ao painel

Criar painel de recortes

Seja o primeiro a comentar