O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

LinkedIn Skills: RecSys Conference 2014

1.198 visualizações

Publicada em

My talk at RecSys 2014 about LinkedIn Skills and how it was built. More details can be found in the paper.

Publicada em: Dados e análise
  • Seja o primeiro a comentar

LinkedIn Skills: RecSys Conference 2014

  1. 1. LinkedIn Skills: Large-Scale Topic Extraction and Inference Mathieu Bastian LinkedIn Corporation ©2014 All Rights Reserved
  2. 2. The World’s Largest Professional Network Members Worldwide 2 new Members Per Second 100M+ Monthly Unique Visitors 313M+ 3M+ Company Pages Connecting Talent  Opportunity. At scale…
  3. 3. LinkedIn Profile  313M+ profiles in 200+ countries  Organized into sections – Standardized: Companies, Titles, Industry, Location etc. – Unstandardized: Text (Summary, Position description, specialties)  Skills & Endorsements section – Introduced in 2011 – Limited to 50 skills per profile
  4. 4. Skills at LinkedIn  Key component of the professional identity  Dictionary of 45k+ skills in English  Members have diverse skills – Java Programming – Ballet – Politics – Bow Hunting  Many of these are long-tailExample of a Skills section on a LinkedIn profile
  5. 5. Folksonomy creation LinkedIn Corporation ©2014 All Rights Reserved
  6. 6. Folksonomy creation  Create a folksonomy of skills based on LinkedIn profiles  Leverage the “specialties” section  Detect comma-separated lists and extract skill phrases  Use stop-list and exclude other entities (e.g. companies, titles, degrees)  150k skill phrases extracted after removing long-tail noise skill phrases
  7. 7. Disambiguation  Need to add context to differentiate skill phrases with multiple meanings (e.g. NLP = Natural Language Processing, NLP = Neuro-linguistic programming)  Different meanings have different sets of related phrases  Use Jaccard Similarity on LinkedIn profiles for related phrases and then SVD + KMeans to identify clusers of phrases References: R. Baeza-Yates, B. Ribeiro-Neto, et al. Modern information retrieval, volume 463
  8. 8. De-duplication  Need to group phrases with similar meaning together. Examples: – Acronyms: B2B, Business to Business – Synonyms: Java Programming, Java Development – Typos: Government Liason  Many of the skill phrases could be tied to a Wikipedia page  Built Mechanical Turk (www.mturk.com) task to find the Wikipedia page associated with a skill phrase Java programming Java development Java http://en.wikipedia.org/wiki/Java _(programming_language) Cluster
  9. 9.  Extraction based on 12M of LinkedIn profiles with “specialties”  Extracted 150k skill phrases  Clustered related phrases adding the industry context to ambiguous phrases  De-duplication using MTurk  Final master list contains 50k skills Folksonomy creation summary Examples of synonyms of “Microsoft Office”
  10. 10. Inference and Recommendation LinkedIn Corporation ©2014 All Rights Reserved
  11. 11.  Goal was boosting skills adoption with a recommender system: “suggested skills”  Inferring the skills members have, similar to discovering latent attributes in profiles  Develop a collaborative filtering solution using profile attributes Skills Inference and Recommendation References: A. Mislove and al. You are who you know: Inferring user profiles in online social networks. R. Jäschke and al. Tag recommendations in folksonomies. Skills Typeahead on LinkedIn Suggested Skills
  12. 12.  Large number of standardized profile attributes (i.e. can be represented by a unique identifier)  Members with similar profiles attributes are likely to have similar skills (e.g. If you work at Apple, you probably know “Mac OS”) Features Type Example Cardinality Title (Headline) Product Manager Thousands Function Engineering Dozens Industry Healthcare Dozens Title (Employment Position) Product Manager Thousands Company LinkedIn Millions Group membership Healthcare Professionals Millions Skills Matlab Thousands
  13. 13.  Calculate the likelihood that a member has a given skill, given his profile attributes  No direct user similarity metric  Large number of features (e.g. 3M companies) and 50k classes Problem the set of profile attributes the folksonomy of skills
  14. 14.  Used a Naïve Bayes Classifier to produce inferred skills  Training data based on members already with skills  Result is a ranking of inferred skills, which can directly be used in “suggested skills”  Evaluation methodology – AUC for each skill – P@k and Recall for evaluating the recommendations Naïve Bayes Classifier with
  15. 15.  Evaluate how well we can predict skills members’ have Evaluation ROC of skill “Hadoop” Distribution of ROC across all skills
  16. 16.  12X improvement in conversion using “suggested skills” Results Without “suggested skills” With “suggested skills”
  17. 17. Our Contributions  End-to-end creation of a skills folksonomy based on free-text specialties section  Efficient inferred skills model with good offline performance  Skills recommender system based on profile attributes
  18. 18. Thank You