The document summarizes the Lunatic Goats @PoliMi team's approach for the ACM RecSys Challenge 2017 on cold-start job recommendations. Their solution uses an ensemble of content-based filtering, profile matching, and collaborative filtering algorithms. It discusses the data analysis, local validation process, individual algorithm implementations and tuning, ensemble structure, parameter tuning methodology, online changes, system architecture, and final scores in the local and leaderboard evaluations.
Content-Based approaches for Cold-Start Job Recommendations
1. Titolo presentazione
sottotitolo
Milano, XX mese 20XX
Content-Based approaches for
Cold-Start Job Recommendations
ACM RecSys Challenge 2017
Lunatic Goats @PoliMi
M. Bianchi, F. Cesaro, F. Ciceri, M. Dagrada, A. Gasparin, D. Grattarola,
I. Inajjar, A. M. Metelli, L. Cella
2. Lunatic Goats @PoliMi
Task Outline
● Cold Start recommendation scenario:
○ job posting recommendations;
○ focus on getting positive interactions;
○ penalized for negative interaction;
○ rewarded for recruiter Interest.
● Two phases:
○ Offline - predictions for fixed sets of items and users.
○ Online - daily recommendation to variable sets of users.
3. Lunatic Goats @PoliMi
Data Analysis - Impressions vs Interactions
● Impressions: ~97% of the data, little to no information
contained (discarded).
● Interactions: ~3% of the data.
● Interactions divided in:
○ positive interactions (types 1, 2 and 3);
○ negative interactions (type 4);
○ recruiter interest (type 5).
● Interactions treated with implicit approach.
4. Lunatic Goats @PoliMi
Local Validation
● Split the dataset in train and validation set.
● Random sampling procedure:
○ randomly select target items from dataset;
○ remove all interactions with these items;
○ pick target users as a subset of those who have
interactions with these items.
● Preserve the user-item ratio.
● No cross-validation, too much data
5. Lunatic Goats @PoliMi
Solution - Preprocessing
● One Hot Encoding of both user and items features.
● Feature aggregation:
● TF-IDF application.
● Negative User Filtering: removing heavy deleters.
7. Lunatic Goats @PoliMi
Solution - Negative Recommendation
● Scoring heavily penalized negative (type 4) interactions
● Using CBF approach, predict type 4 interactions
● Ensemble these predictions with negative weight
8. Lunatic Goats @PoliMi
Solution – Content Based Filtering algorithms (CBF)
Recommend to a user items similar to the ones he/she likes.
● Run separately on positive (CBF+) and negative (CBF-)
interactions.
● Tanimoto similarity between items:
● Recommendation performed for filtered users only:
● Penalize heavy clickers.
9. Lunatic Goats @PoliMi
Solution – Profile Matching (PM)
Recommend to a user items matching his/her profile.
● Cosine similarity between user and item:
● Items’ tags and titles compared with users’ jobroles.
● Recommendation performed for filtered users only.
● Differently from CBF, PM is able to recommend also cold-start
users.
10. Lunatic Goats @PoliMi
Solution – Collaborative Filtering algorithms
● CF cannot be run directly in a cold-start scenario.
● Content-based microclustering approach:
○ for each cold-start item associate the interactions of the
top 5 CBF-similar non-cold-start items;
○ run standard CF algorithms.
● CF algorithms:
○ CF with item cosine similarity;
○ iALS (Implicit Alternating Least Squares).
11. Lunatic Goats @PoliMi
Solution - Ensemble Structure
● Divide algorithms by nature.
● Normalize and weight each
layer.
● Generate upper layers by
adding lower layers.
● Output 100 best scores.
12. Lunatic Goats @PoliMi
Solution - Parameter Tuning
● Ensemble tuning:
○ 9 weights (one for each block), reduced to 6 due to
normalization;
○ non-differentiable scoring function;
○ gradient-free optimization methods:
■ Genetic Algorithms - quick and acceptable results;
■ Powell’s Conjugate Direction method - slower but
superior results.
● Individual algorithms tuning:
○ greedy search on local test.
13. Lunatic Goats @PoliMi
Online - Changes to ensemble
● Normalization type.
● Cutting for each user
before items.
● Excluding slower
algorithms - prompt push
gives more exposure →
better scores.
14. Lunatic Goats @PoliMi
Architecture & Runtime
● Recommender is run on VM’s with 8 cores and 16GB RAM.
● Only exception is content-based microclustering and iALS,
run on 8 core 64GB RAM.
● Code is heavily optimized to use little memory efficiently
(sparse matrix representations, efficient matrix operations).
● Results in optimal runtime.
15. Lunatic Goats @PoliMi
Scores - Local vs Offline
Algorithm Local score Leaderboard score Execution time
CBF+ 57852 60257 13 min
CBF- -1330 -8529 4 min
PM 17260 16777 7 min
CF 42213 39250 12 min
iALS 48081 52411 150 min
XING Baseline 14742 14395 40 min
Ensemble 60625 71372 2 min
16. Lunatic Goats @PoliMi
Results and Conclusions
● 2nd
place in the online phase;
● 1st
place in the offline phase.
● Points of strength:
○ speed (in particular offline ~20 min);
○ ease of implementation.
● Extensions:
○ feature weighting (user personalized, feature interaction);
○ time decay models.