SlideShare uma empresa Scribd logo
1 de 80
Feature Selection Algorithms for
Learning to Rank
Andrea Gigli Email
Slides: http://www.slideshare.net/andrgig
March-2016
Outline
 Machine Learning for Ranking
 Proposed Feature Selection Algorithms (FSA)
and Feature Selection Protocol
 Application to Public Available Web Search Data
Outline
 Machine Learning for Ranking
 Proposed Feature Selection Algorithms (FSA)
and Feature Selection Protocol
 Application to Public Available Web Search Data
Information Retrieval and Ranking
Systems
 Information Retrieval is the activity of providing
information offers relevant to an information need from
a collection of information resources.
 Ranking consists in sorting the information offers
according to some criterion, so that the "best" results
appear early in the provided list.
Information Retrieval and Ranking
Systems
Ranking System
Information Request
(Query)
Information Offer
(Documents)
Indexed Documents
Information
Request
Processing
(Top) Ranked
Documents
 Compute numeric scores on query/document pairs
Cosine Similarity, BM25 score, LMIR probabilities…
 Use Machine Learning to build a ranking model
Learning to Rank (L2R)
How to Rank
𝑨
𝑪
𝑩 G
C
A
𝑫
Information Offers
(Documents) Ranked List
Information
Request
(Query) 𝑬
𝑮
𝑭
𝑯
…
…
...
...
Learning
System
Ranking
System
Indexed
Documents
...
Training
Prediction
How to Rank using Supervised
Learning
...
...
𝑑1,1 𝑑1,2
ℓ1,1
𝑑1,𝑁1
ℓ1,2 ℓ1,𝑁1
…
…
𝑑M,1 𝑑M,2
ℓ 𝑀,1
𝑑M,𝑁 𝑀
ℓ 𝑀,2 ℓ 𝑀,𝑁2
𝑞1
𝑞 𝑀 𝑓(𝑞, 𝑑)
𝑞 𝑀+1
𝑓(𝑞 𝑀+1, 𝑑 𝑁1
)
𝑓(𝑞 𝑀+1, 𝑑 𝑁 𝑀
)
𝑞𝑖: i-th query
𝑑𝑖,𝑗: j-th document
associated to the i-th
query
ℓ𝑖,𝑗: observed score
of the j-th document
associated to the i-th
query
𝑓(𝒒, 𝒐): scoring
function
Machine Learning for Ranking
Documents: Application Fields
Machine Learning for Ranking
Documents: Business Cases
Outline
 Machine Learning for Ranking
 Proposed Feature Selection Algorithms (FSA)
and Feature Selection Protocol
 Application to Public Available Web Search Data
Query & Information Offer
Features
𝑑𝑖,1 𝑑𝑖,2 𝑑𝑖,𝑁 𝑖𝑞𝑖
… …ℓ𝑖,1 ℓ𝑖,2 ℓ𝑖,𝑁 𝑖
𝑥𝑖,1
(1)
𝑥𝑖,1
(2)
𝑥𝑖,1
(3)
⋮
𝑥𝑖,1
(𝐹)
𝑥𝑖,2
(1)
𝑥𝑖,2
(2)
𝑥𝑖,2
(3)
⋮
𝑥𝑖,2
(𝐹)
𝑥𝑖,𝑁 𝑖
(1)
𝑥𝑖,𝑁 𝑖
(2)
𝑥𝑖,𝑁 𝑖
(3)
⋮
𝑥𝑖,𝑁 𝑖
(𝐹)
…
Documents Query/Documents LabelsQuery
 𝑓 𝒒, 𝒐 → 𝑓(𝒙)
 F is of the order
of hundreds,
thousands
Which features
Case Feature examples
Web Search Query-URL matching features: number of occurrences of query
terms in the document, BM25, N-gram BM25,Tf-Idf,…
Importance of Url: PageRank , Number of in-links, Number of
clicks, Browse Rank, Spam Score, Page Quality Score…………..
Online
Advertisement
User features: last page visited, time from the last visit, last
advertisement clicked, products queried… Product features:
product description, product category, price… User-product
matching feature: tf-idf, expected rating,… Page-Product
matching feature: topic, category, tf-idf, …
Collaborative
Filtering
User features: age, gender, consumption history, … Product
characteristics: category, price, description, … Context -
Product matching: tag-matching, tf-idf, …
… …
How to select features in L2R
 The main goal of any feature selection process is to select a
subset of n elements from a set of N measurement, with
n<N, without significantly degrading the performance of
the system
 The search for the optimal subset require to search among
2N possible subsets
How to select features in L2R
0
1,000,000
2,000,000
3,000,000
4,000,000
5,000,000
6,000,000
7,000,000
8,000,000
9,000,000
0 5 10 15 20 25
Number of
possible
feature subsets
Number of Features
A suboptimal
criteria is
needed
Proposed Protocol for Comparing
Feature Selection Algorithms
Measure the
Relevance of
each Feature
Measure the
Similarity of
each pair of
features
Select a Feature
Subset using a
Feature Selector
Train the L2R
Model
Measure the L2R
Model Performance
on theTest Set
Compare
Feature
Selection
Algorithms
Repeat for different
Subset Size
1 2 3 4 5 6
Repeat from 3 for
every Feature
Selection
Algorithm
Competing Algorithms for feature
selection
We developed the following algorithms
 Naïve Greedy searchAlgorithm for feature Selection
(NGAS)
 Naïve Greedy searchAlgorithm for feature Selection -
Extended (NGAS-E)
 Hierarchical Clustering search Algorithm for feature
Selection (HCAS)
Competing Algorithms for feature
selection
We developed the following algorithms
 Naïve Greedy searchAlgorithm for feature Selection
(NGAS)
 Naïve Greedy searchAlgorithm for feature Selection -
Extended (NGAS-E)
 Hierarchical Clustering search Algorithm for feature
Selection (HCAS)
Competing Algorithm for feature
selection #1: NGAS
The undirect graph is built and the set S of
selected features is initialized.
Competing Algorithm for feature
selection #1: NGAS
Assuming node 1 has the highest relevance, add it
to S.
Competing Algorithm for feature
selection #1: NGAS
Select the node with the lowest similarity to Node
1, say Node 7, and the one with the highest
similarity to Node 7, say Node 5.
Competing Algorithm for feature
selection #1: NGAS
Remove Node 1. Node 5 is the one with the highest
relevance between 5 and 7, add it to S.
Competing Algorithm for feature
selection #1: NGAS
Select the node with the lowest similarity to Node
5, say Node 2, and the one with the highest
similarity to Node 2, say Node 3.
Competing Algorithm for feature
selection #1: NGAS
Remove Node 5. Assuming Node 2 is the one with
highest relevance between 2 and 3, add it to S.
Competing Algorithm for feature
selection #1: NGAS
Select the node with the lowest similarity to Node
2, say Node 4, and the one with the highest
similarity to Node 4, say Node 8.
Competing Algorithm for feature
selection #1: NGAS
Remove Node 2. Assuming Node 4 is the one with
highest relevance between 4 and 8, add it to S.
Competing Algorithm for feature
selection #1: NGAS
Select the node with the lowest similarity to Node
4, say Node 6, and the one with the highest
similarity to Node 6, say Node 7.
Competing Algorithm for feature
selection #1: NGAS
Remove Node 4. Assuming Node 6 is the one with
highest relevance between 6 and 7, add it to S.
Competing Algorithm for feature
selection #1: NGAS
Select the node with the lowest similarity to Node
6, say Node 3, and the one with the highest
similarity to Node 3, say Node 8.
Competing Algorithm for feature
selection #1: NGAS
Remove Node 6. Assuming Node 3 is the one with
highest relevance between 3 and 8, add it to S.
Competing Algorithm for feature
selection #1: NGAS
Select the node with the lowest similarity to Node
3, say Node 8, and the one with the highest
similarity to Node 8, say Node 7.
Competing Algorithm for feature
selection #1: NGAS
Remove Node 3. Assuming Node 8 is the one with
highest relevance between 8 and 7, add it to S.
Competing Algorithm for feature
selection #1: NGAS
Add the last node, 7, to S.
We developed the following algorithms
 Naïve Greedy searchAlgorithm for feature Selection
(NGAS)
 Naïve Greedy searchAlgorithm for feature Selection -
Extended (NGAS-E)
 Hierarchical Clustering search Algorithm for feature
Selection (HCAS)
Competing Algorithms for feature
selection
Competing Algorithm for feature
selection #2: NGAS-E (p=50%)
The undirect graph is built and the set S of selected
features is initialized.
Competing Algorithm for feature
selection #2: NGAS-E (p=50%)
Assuming node 1 has the highest relevance, add it
to S.
Competing Algorithm for feature
selection #2: NGAS-E (p=50%)
Select 7 ∗ 50% nodes less similar to 1.
Competing Algorithm for feature
selection #2: NGAS-E (p=50%)
Cancel Node 1 from the graph. Among the selected
nodes, add the one with highest relevance (say
node 5) to S.
Competing Algorithm for feature
selection #2: NGAS-E (p=50%)
Select ⌈6*50% ⌉ nodes less similar to node 5.
Competing Algorithm for feature
selection #2: NGAS-E (p=50%)
Cancel Node 5 from the graph. Among the selected
nodes, add the one with highest relevance (say
Node 3) to S.
Competing Algorithm for feature
selection #2: NGAS-E (p=50%)
Select ⌈5*50% ⌉ nodes less similar to node 3.
Competing Algorithm for feature
selection #2: NGAS-E (p=50%)
Cancel node 3 from the graph. Among the selected
nodes, add the one with highest relevance (say
node 4) to S.
Competing Algorithm for feature
selection #2: NGAS-E (p=50%)
Select ⌈4*50% ⌉ nodes less similar to node 4.
Competing Algorithm for feature
selection #2: NGAS-E (p=50%)
Cancel node 4 from the graph. Among the selected
nodes, add the one with highest relevance (say
node 6) to S.
Competing Algorithm for feature
selection #2: NGAS-E (p=50%)
Select ⌈3*50% ⌉ nodes less similar to node 6.
Competing Algorithm for feature
selection #2: NGAS-E (p=50%)
Cancel node 6 from the graph. Among the selected
nodes, add the one with highest relevance (say
node 2) to S.
Competing Algorithm for feature
selection #2: NGAS-E (p=50%)
Select ⌈2*50% ⌉nodes less similar to node 2.
Competing Algorithm for feature
selection #2: NGAS-E (p=50%)
Node 2 is cancelled from the graph and node 8 is
added to S.
Competing Algorithm for feature
selection #2: NGAS-E (p=50%)
Node 8 is cancelled from the graph and the last
node, 7, is added to S.
Competing Algorithm for feature
selection #2: NGAS-E (p=50%)
Node 8 is cancelled from the graph and the last
node, 7, is added to S.
We developed the following algorithms
 Naïve Greedy searchAlgorithm for feature Selection
(NGAS)
 Naïve Greedy searchAlgorithm for feature Selection -
Extended (NGAS-E)
 Hierarchical Clustering search Algorithm for feature
Selection (HCAS)
Competing Algorithms for feature
selection
Competing Algorithm for feature
selection #3: HCAS
23
11
5
3
Outline
 Machine Learning for Ranking
 Proposed Feature Selection Algorithms (FSA)
and Feature Selection Protocol
 Application to Public Available Web Search
Data
Application to Web Search
Engine Data
 Bing Data http://research.microsoft.com/en-us/projects/mslr/
 Yahoo! Data http://webscope.sandbox.yahoo.com
Train Validation Test
#queries 19,944 2,994 6,983
#urls 473,134 71,083 165,660
# features 519
Train Validation Test
#queries 18,919 6,306 6,306
#urls 723,412 235,259 241,521
# features 136
Proposed Protocol for Comparing
Feature Selection Algorithms
Measure the
Relevance of
each Feature
Measure the
Similarity of
each pair of
features
Select a Feature
Subset using a
Feature Selector
Train the L2R
Model
Measure the L2R
Model Performance
on theTest Set
Compare
Feature
Selection
Algorithms
Repeat for different
Subset Size
1 2 3 4 5 6
Repeat from 3 for
every Feature
Selection
Algorithm
Learning to Rank Algorithms: a
timeline of major contributions
LambdaMART
LambdaRank
CRR
IntervalRank
GBlend
NDCG Boost
BayesRank
BoltzRank
MPBoost
SortNet
SSRankBoost
RR
SoftRank
PermuRank
ListMLE
SVMmap
RankRLS
RankGP
RankCosine
QBRank
McRank
ListNet
GBRank
AdaRank
IR-SVM
RankNet
RankBoost
Pranking
RankSVM
2000 2002 2003 2005 2006 2007
FRank
2008 2009 2010
Pointwise
Pairwise
Listwise
Multiple
Additive
Regression
Trees
LambdaMART Model for LtR
Lambda
function
 Ensemble method:Tree Boosting
 Loss function not differentiable
 Sorting characteristic
 Speed
Proposed Protocol for Comparing
Feature Selection Algorithms
Measure the
Relevance of
each Feature
Measure the
Similarity of
each pair of
features
Select a Feature
Subset using a
Feature Selector
Train the L2R
Model
Measure the L2R
Model Performance
on theTest Set
Compare
Feature
Selection
Algorithms
Repeat for different
Subset Size
1 2 3 4 5 6
Repeat from 3 for
every Feature
Selection
Algorithm
Feature Relevance
 The relevance of a document is measured with a
categorical variable (0,1,2,3,4)  we need to use metrics
good at measuring «dependence» between
discrete/continuous feature variables and a categorical
label variable.
 In the following we use
 Normalized Mutual Information (NMI):
 Spearman coefficient (S)
 Kendall’s tau (K)
 Average GroupVariance (AGV)
 OneVariable NDCG@10 (1VNDCG)
Feature Relevance via Normalized
Mutual Information
 Mutual Information (MI) measures how much, on average, the
realization of a random variable X tells us about the realization
of the random variableY, or how much the entropy ofY, H(Y), is
reduced knowing about the realization of X
𝑀𝐼 𝑋, 𝑌 = 𝐻 𝑋 − 𝐻 𝑋 𝑌 = 𝐻 𝑋 + 𝐻 𝑌 − 𝐻 𝑋, 𝑌
The normalizad version is
𝑁𝑀𝐼 𝑋, 𝑌 =
𝑀𝐼(𝑋, 𝑌)
𝐻(𝑋) 𝐻(𝑌)
Feature Relevance via Spearman’s
coefficient
 Spearman’s rank correlation coefficient is a non-parametric
measure of statistical dependence between two random
variables.
It is given by
𝜌 = 1 −
6 𝑑𝑖
2
𝑛(𝑛2 − 1)
where n is the sample size and
𝑑𝑖 = 𝑟𝑎𝑛𝑘 𝑥𝑖 − 𝑟𝑎𝑛𝑘 𝑦𝑖
Feature Relevance via Kendall’s tau
 Kendall’sTau is a measure of association defined on two
ranking lists of length n. It is defined as
τ =
𝑛 𝑐 − 𝑛 𝑑
𝑛(𝑛 − 1)
2
− 𝑛1
𝑛(𝑛 − 1)
2
− 𝑛2
where 𝑛 𝑐 denotes the number of concordant pairs between the
two lists, 𝑛 𝑑 denotes the number of discordant pairs, 𝑛1 =
𝑡𝑖(𝑡𝑖 − 1)/2, 𝑛2 = 𝑢𝑗(𝑢𝑗 − 1)/2, 𝑡𝑖 is the number of tied
values in the i-th group of ties for the first list and 𝑢𝑗 is the
number of tied values in the j-th group of ties for the second list.
Feature Relevance via Average
GroupVariance
 Average GroupVariance measure the discrimination power of a
feature.The intuitive justification is that a feature is useful if it is
capable of discriminating a small portion of the ordered scale
from the rest, and that features with a small variance are
those which satisfy this property.
𝐴𝐺𝑉 = 1 −
𝑔=1
5
𝑛 𝑔 𝑥 𝑔 − 𝑥
2
𝑖 𝑥𝑖 − 𝑥 2
where 𝑛 𝑔be the size of group g, 𝑥 𝑔 the sample mean of feature 𝑥
in the g-th group and 𝑥 whole sample mean.
Feature Relevance via single
feature LambdaMART scoring
 For each feature i we run LambdaMART and compute
the 𝑁𝐷𝐶𝐺𝑖,𝑞@10 for each query q
 The i-th feature relevance is measured averaging
𝑁𝐷𝐶𝐺𝑖,𝑞@10 over the whole query set
𝑁𝐷𝐶𝐺𝑖@10 =
1
𝑄
𝑞∈𝑄
𝑁𝐷𝐶𝐺𝑖,𝑞@10
 Precision at k:
𝑃𝑖@𝑘 =
# 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑡𝑜𝑝 𝑘 𝑟𝑒𝑠𝑢𝑙𝑡𝑠
𝑘
 Average precision:
1
𝐷
𝑘=1
𝐷
𝑃𝑖@𝑘 ∙ 𝕀 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝑘 𝑖𝑠 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡
# 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠
 Discounted Cumulative Gain
𝐷𝐶𝐺𝑖 =
𝑗=1
𝑘
2 𝑟𝑒𝑙 𝑖,𝑗 − 1
𝑙𝑜𝑔2 1 + 𝑟𝑎𝑛𝑘𝑗
How to Measure Ranking
Performance on query i
How to Measure Ranking
Performance: Normalized DCG
Document Gain
Cumulative
Gain
Document 1 31 31
Document 2 3 34
Document 3 7 41
Document 4 31 72
Discounted
31x1=31
31+3x0.63=32.9
32.9+7x0.5=36.4
36.4+31x0.4=48.8
Normalization: divide
DCG by the ideal DCG
Document Gain
Cumulative
Gain
Document 1 31 31
Document 4 31 62
Document 3 7 69
Document 2 3 72
Discounted
31x1=31
31+31x0.63=50.53
50.53+7x0.5=54.03
54.03+3x0.4=57.07
Relevance
Rating
Gain
Perfect 25-1=31
Excellent 24-1=15
Good 23-1=7
Fair 22-1=3
Bad 21-1=1
Feature Relevance
Feature Relevance
Choosing the Relevance Measure
(1/2)
FSA performance is measured using the Average NDCG@10
obtaind from LambdaMART on the test set.
NDCG@10 on Yahoo Test Set
Feature Subset
Dimension
5% 10% 20% 30% 40% 50% 75% 100%
NMI 0.73398 0.75952 0.76241 0.7678 0.76912 0.769 0.77015 0.76935
AVG 0.7524 0.7548 0.76168 0.76493 0.76498 0.76717 0.76971 0.76935
S 0.74963 0.75396 0.76099 0.76398 0.7649 0.76753 0.77002 0.76935
K 0.75225 0.75291 0.76145 0.76304 0.7648 0.76673 0.76972 0.76935
1VNDCG 0.75246 0.75768 0.76452 0.76672 0.76823 0.77008 0.77027 0.76935
NDCG@10 on Bing Test Set
Feature Subset
Dimension
5% 10% 20% 30% 40% 50% 75% 100%
NMI 0.38927 0.3978 0.41347 0.41539 0.44785 0.44966 0.45083 0.46336
AVG 0.32682 0.33168 0.36043 0.36976 0.37383 0.37612 0.43444 0.46336
S 0.32969 0.3346 0.3428 0.34592 0.36711 0.42475 0.42809 0.46336
K 0.32917 0.3346 0.34356 0.42124 0.42071 0.4245 0.42706 0.46336
1VNDCG 0.41633 0.42571 0.42413 0.42601 0.42757 0.43795 0.46222 0.46336
Choosing the Relevance Measure
(2/2)
Proposed Protocol for Comparing
Feature Selection Algorithms
Measure the
Relevance of
each Feature
Measure the
Similarity of
each pair of
features
Select a Feature
Subset using a
Feature Selector
Train the L2R
Model
Measure the Model
Performance on
theTest Set
Compare
Feature
Selection
Algorithms
Repeat for different
Subset Size
1 2 3 4 5 6
Repeat from 3 for
every Feature
Selection
Algorithm
Feature Similarity
 We used Spearman’s Rank coefficient for measuring
features similarity.
 Spearman’s Rank is faster to be computed than NMI,
Kendall’s tau and 1VNDCG.
The FSA benchmark: Greedy
Algorithm for feature Selection
1. Build a complete undirected graph 𝐺0, in which
a) each node represent the i-th feature with weight 𝑤𝑖 and
b) each edge has weigth 𝑒𝑖,𝑗
2. Let 𝑆0 = ∅ be the set of selected features at step 0.
3. For i=1, …, n
a) Select the node with largest weight from 𝐺𝑖−1, suppose that it is the k-
th node
b) Punish all the nodes connected with the k-th node: 𝑤𝑗 ← 𝑤𝑗 −2*c*𝑒 𝑘,𝑗,
𝑗 ≠ 𝑘
c) Add the k-th node to 𝑆𝑖−1
d) Remove the k-th node from 𝐺𝑖−1
4. Return 𝑆 𝑛
Train the L2R
Model
Proposed Protocol for Comparing
Feature Selection Algorithms
Measure the
Relevance of
each Feature
Measure the
Similarity of
each pair of
features
Select a Feature
Subset using a
Feature Selector
Compare
Feature
Selection
Algorithms
Repeat for different
Subset Size
1 2 3 4 5 6
Repeat from 3 for
every Feature
Selection
Algorithm
Measure the L2R
Model Performance
on theTest Set
Proposed Protocol for Comparing
Feature Selection Algorithms
Measure the
Relevance of
each Feature
Measure the
Similarity of
each pair of
features
Select a Feature
Subset using a
Feature Selector
Train the L2R
Model
Measure the L2R
Model Performance
on theTest Set
Compare
Feature
Selection
Algorithms
1 2 3 4 5 6
Repeat from 3 for
every Feature
Selection
AlgorithmRepeat for different
Subset Size
Proposed Protocol for Comparing
Feature Selection Algorithms
Measure the
Relevance of
each Feature
Measure the
Similarity of
each pair of
features
Select a Feature
Subset using a
Feature Selector
Train the L2R
Model
Compare
Feature
Selection
Algorithms
Repeat for different
Subset Size
1 2 3 4 5 6
Repeat from 3 for
every Feature
Selection
Algorithm
Measure the Model
Performance on
theTest Set
LambdaMART Performance
SignificanceTest using
RandomizationTest
NDCG@10 on Yahoo Test Set
Feature Subset
Dimension
5% 10% 20% 30% 40% 50% 75% 100%
NGAS 0.7430▼ 0.7601 0.7672 0.7717 0.7724 0.7759 0.7766 0.7753
NGAS-E, p = 0.8 0.7655 0.7666 0.7723 0.7742 0.7751 0.7759 0.776 0.7753
HCAS, "single" 0.7350▼ 0.7635 0.7666 0.7738 0.7742 0.7754 0.7756 0.7753
HCAS, "ward" 0.7570▼ 0.7626 0.7704 0.7743 0.7755 0.7763 0.7757 0.7753
GAS, c = 0.01 0.7628 0.7649 0.7671 0.773 0.7737 0.7737 0.7758 0.7753
NDCG@10 on Bing Test Set
Feature Subset
Dimension
5% 10% 20% 30% 40% 50% 75% 100%
NGAS 0.4011▼ 0.4459 0.471 0.4739▼ 0.4813 0.4837 0.4831 0.4863
NGAS-E, p = 0.05 0.4376▲ 0.4528 0.4577▼ 0.4825 0.4834 0.4845 0.4867 0.4863
HCAS, "single" 0.4423▲ 0.4643▲ 0.4870▲ 0.4854 0.4848 0.4847 0.4853 0.4863
HCAS, "ward" 0.4289 0.4434▼ 0.4820 0.4879 0.4853 0.4837 0.4870 0.4863
GAS, c = 0.01 0.4294 0.4515 0.4758 0.4848 0.4863 0.4860 0.4868 0.4863
LambdaMART Performance
Conclusions
 We designed 3 FSAs and we applied them to the Web Search Pages
Ranking problem.
 NGAS-E e HCAS have a performance equal or greater than the
benchmark model.
 HCAS and NGAS are very
 The proposed FSAs can be implemented independently of the L2R
model.
 The proposed FSAs can be applied to other ML contexts, to
Sorting problems and to Model Ensambling.
Thanks!
Andrea Gigli
Slides: http://www.slideshare.net/andrgig

Mais conteúdo relacionado

Mais procurados

Review Mining of Products of Amazon.com
Review Mining of Products of Amazon.comReview Mining of Products of Amazon.com
Review Mining of Products of Amazon.comShobhit Monga
 
Real-world News Recommender Systems
Real-world News Recommender SystemsReal-world News Recommender Systems
Real-world News Recommender Systemskib_83
 
Visualizing the model selection process
Visualizing the model selection processVisualizing the model selection process
Visualizing the model selection processRebecca Bilbro
 
Recommendation Engine Powered by Hadoop
Recommendation Engine Powered by HadoopRecommendation Engine Powered by Hadoop
Recommendation Engine Powered by HadoopPranab Ghosh
 
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Bartlomiej Twardowski
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearnPratap Dangeti
 
Doctoral Thesis Dissertation 2014-03-20 @PoliMi
Doctoral Thesis Dissertation 2014-03-20 @PoliMiDoctoral Thesis Dissertation 2014-03-20 @PoliMi
Doctoral Thesis Dissertation 2014-03-20 @PoliMiDavide Chicco
 
Yellowbrick: Steering machine learning with visual transformers
Yellowbrick: Steering machine learning with visual transformersYellowbrick: Steering machine learning with visual transformers
Yellowbrick: Steering machine learning with visual transformersRebecca Bilbro
 
Open and Automated Machine Learning
Open and Automated Machine LearningOpen and Automated Machine Learning
Open and Automated Machine LearningJoaquin Vanschoren
 
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...Kishor Datta Gupta
 
Quiz2 cs141-1-17
Quiz2 cs141-1-17Quiz2 cs141-1-17
Quiz2 cs141-1-17Fahadaio
 
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis ProposalParallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis ProposalGianmario Spacagna
 
Testing of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven StrategiesTesting of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven StrategiesLionel Briand
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitizationVenkata Reddy Konasani
 

Mais procurados (20)

Review Mining of Products of Amazon.com
Review Mining of Products of Amazon.comReview Mining of Products of Amazon.com
Review Mining of Products of Amazon.com
 
OpenML NeurIPS2018
OpenML NeurIPS2018OpenML NeurIPS2018
OpenML NeurIPS2018
 
OpenML 2019
OpenML 2019OpenML 2019
OpenML 2019
 
Real-world News Recommender Systems
Real-world News Recommender SystemsReal-world News Recommender Systems
Real-world News Recommender Systems
 
Visualizing the model selection process
Visualizing the model selection processVisualizing the model selection process
Visualizing the model selection process
 
Recommendation Engine Powered by Hadoop
Recommendation Engine Powered by HadoopRecommendation Engine Powered by Hadoop
Recommendation Engine Powered by Hadoop
 
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
Recsys 2016: Modeling Contextual Information in Session-Aware Recommender Sys...
 
Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
Doctoral Thesis Dissertation 2014-03-20 @PoliMi
Doctoral Thesis Dissertation 2014-03-20 @PoliMiDoctoral Thesis Dissertation 2014-03-20 @PoliMi
Doctoral Thesis Dissertation 2014-03-20 @PoliMi
 
Yellowbrick: Steering machine learning with visual transformers
Yellowbrick: Steering machine learning with visual transformersYellowbrick: Steering machine learning with visual transformers
Yellowbrick: Steering machine learning with visual transformers
 
Open and Automated Machine Learning
Open and Automated Machine LearningOpen and Automated Machine Learning
Open and Automated Machine Learning
 
Exposé Ontology
Exposé OntologyExposé Ontology
Exposé Ontology
 
Credit risk meetup
Credit risk meetupCredit risk meetup
Credit risk meetup
 
Neural Networks made easy
Neural Networks made easyNeural Networks made easy
Neural Networks made easy
 
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
Using Negative Detectors for Identifying Adversarial Data Manipulation in Mac...
 
Quiz2 cs141-1-17
Quiz2 cs141-1-17Quiz2 cs141-1-17
Quiz2 cs141-1-17
 
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis ProposalParallel Tuning of Machine Learning Algorithms, Thesis Proposal
Parallel Tuning of Machine Learning Algorithms, Thesis Proposal
 
Chap07
Chap07Chap07
Chap07
 
Testing of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven StrategiesTesting of Cyber-Physical Systems: Diversity-driven Strategies
Testing of Cyber-Physical Systems: Diversity-driven Strategies
 
Data exploration validation and sanitization
Data exploration validation and sanitizationData exploration validation and sanitization
Data exploration validation and sanitization
 

Destaque

Gfpi f-019 guia de aprendizaje 01 tda orientar fpi
Gfpi f-019 guia de aprendizaje 01 tda orientar fpiGfpi f-019 guia de aprendizaje 01 tda orientar fpi
Gfpi f-019 guia de aprendizaje 01 tda orientar fpilisbet bravo
 
JULIOPARI - Elaborando un Plan de Negocios
JULIOPARI - Elaborando un Plan de NegociosJULIOPARI - Elaborando un Plan de Negocios
JULIOPARI - Elaborando un Plan de NegociosJulio Pari
 
El emprendedor y el empresario profesional cert
El emprendedor y el empresario profesional certEl emprendedor y el empresario profesional cert
El emprendedor y el empresario profesional certMaestros Online
 
Onderzoeksrapport acrs v3.0_definitief
Onderzoeksrapport acrs v3.0_definitiefOnderzoeksrapport acrs v3.0_definitief
Onderzoeksrapport acrs v3.0_definitiefrloggen
 
Como hacer un plan de negocios
Como hacer un plan de negociosComo hacer un plan de negocios
Como hacer un plan de negociosXPINNERPablo
 
Schrijven voor het web
Schrijven voor het webSchrijven voor het web
Schrijven voor het webSimone Levie
 
Evidence: Describing my kitchen. ENGLISH DOT WORKS 2. SENA.
Evidence: Describing my kitchen. ENGLISH DOT WORKS 2. SENA.Evidence: Describing my kitchen. ENGLISH DOT WORKS 2. SENA.
Evidence: Describing my kitchen. ENGLISH DOT WORKS 2. SENA... ..
 
Estrategias competitivas básicas
Estrategias competitivas básicasEstrategias competitivas básicas
Estrategias competitivas básicasLarryJimenez
 
2. describing cities and places. ENGLISH DOT WORKS 2. SENA. semana 4 acitivda...
2. describing cities and places. ENGLISH DOT WORKS 2. SENA. semana 4 acitivda...2. describing cities and places. ENGLISH DOT WORKS 2. SENA. semana 4 acitivda...
2. describing cities and places. ENGLISH DOT WORKS 2. SENA. semana 4 acitivda..... ..
 
3.Evidence: Getting to Bogota.ENGLISH DOT WORKS 2. SENA.semana 4 actividad 3.
3.Evidence: Getting to Bogota.ENGLISH DOT WORKS 2. SENA.semana 4 actividad 3.3.Evidence: Getting to Bogota.ENGLISH DOT WORKS 2. SENA.semana 4 actividad 3.
3.Evidence: Getting to Bogota.ENGLISH DOT WORKS 2. SENA.semana 4 actividad 3... ..
 
Evidence: Going to the restaurant . ENGLISH DOT WORKS 2. SENA.
Evidence: Going to the restaurant . ENGLISH DOT WORKS 2. SENA.Evidence: Going to the restaurant . ENGLISH DOT WORKS 2. SENA.
Evidence: Going to the restaurant . ENGLISH DOT WORKS 2. SENA... ..
 
Evidence: I can’t believe it.ENGLISH DOT WORKS 2. semana 3 actividad 1.SENA.
Evidence: I can’t believe it.ENGLISH DOT WORKS 2. semana 3 actividad 1.SENA.Evidence: I can’t believe it.ENGLISH DOT WORKS 2. semana 3 actividad 1.SENA.
Evidence: I can’t believe it.ENGLISH DOT WORKS 2. semana 3 actividad 1.SENA... ..
 
Evidence: Memorable moments.ENGLISH DOT WORKS 2. SENA. semana 2 actividad 2.
Evidence: Memorable moments.ENGLISH DOT WORKS 2. SENA. semana 2 actividad 2.Evidence: Memorable moments.ENGLISH DOT WORKS 2. SENA. semana 2 actividad 2.
Evidence: Memorable moments.ENGLISH DOT WORKS 2. SENA. semana 2 actividad 2... ..
 
Evidence: Planning my trip. ENGLISH DOT WORKS 2. SENA. semana 4 actividad 1.
Evidence: Planning my trip. ENGLISH DOT WORKS 2. SENA. semana 4 actividad 1.Evidence: Planning my trip. ENGLISH DOT WORKS 2. SENA. semana 4 actividad 1.
Evidence: Planning my trip. ENGLISH DOT WORKS 2. SENA. semana 4 actividad 1... ..
 
3. Your next holiday destination ACTIVIDAD 3 SEMANA 3 ENGLISH DOT WORKS 2.
3. Your next holiday destination ACTIVIDAD 3 SEMANA 3 ENGLISH DOT WORKS 2.3. Your next holiday destination ACTIVIDAD 3 SEMANA 3 ENGLISH DOT WORKS 2.
3. Your next holiday destination ACTIVIDAD 3 SEMANA 3 ENGLISH DOT WORKS 2... ..
 

Destaque (20)

Gfpi f-019 guia de aprendizaje 01 tda orientar fpi
Gfpi f-019 guia de aprendizaje 01 tda orientar fpiGfpi f-019 guia de aprendizaje 01 tda orientar fpi
Gfpi f-019 guia de aprendizaje 01 tda orientar fpi
 
JULIOPARI - Elaborando un Plan de Negocios
JULIOPARI - Elaborando un Plan de NegociosJULIOPARI - Elaborando un Plan de Negocios
JULIOPARI - Elaborando un Plan de Negocios
 
El emprendedor y el empresario profesional cert
El emprendedor y el empresario profesional certEl emprendedor y el empresario profesional cert
El emprendedor y el empresario profesional cert
 
Onderzoeksrapport acrs v3.0_definitief
Onderzoeksrapport acrs v3.0_definitiefOnderzoeksrapport acrs v3.0_definitief
Onderzoeksrapport acrs v3.0_definitief
 
Como hacer un plan de negocios
Como hacer un plan de negociosComo hacer un plan de negocios
Como hacer un plan de negocios
 
Schrijven voor het web
Schrijven voor het webSchrijven voor het web
Schrijven voor het web
 
Evidence: Describing my kitchen. ENGLISH DOT WORKS 2. SENA.
Evidence: Describing my kitchen. ENGLISH DOT WORKS 2. SENA.Evidence: Describing my kitchen. ENGLISH DOT WORKS 2. SENA.
Evidence: Describing my kitchen. ENGLISH DOT WORKS 2. SENA.
 
Estrategias competitivas básicas
Estrategias competitivas básicasEstrategias competitivas básicas
Estrategias competitivas básicas
 
Cápsula 1. estudios de mercado
Cápsula 1. estudios de mercadoCápsula 1. estudios de mercado
Cápsula 1. estudios de mercado
 
Rodriguez alvarez
Rodriguez alvarezRodriguez alvarez
Rodriguez alvarez
 
2. describing cities and places. ENGLISH DOT WORKS 2. SENA. semana 4 acitivda...
2. describing cities and places. ENGLISH DOT WORKS 2. SENA. semana 4 acitivda...2. describing cities and places. ENGLISH DOT WORKS 2. SENA. semana 4 acitivda...
2. describing cities and places. ENGLISH DOT WORKS 2. SENA. semana 4 acitivda...
 
Capacitacion y adiestramiento
Capacitacion y adiestramientoCapacitacion y adiestramiento
Capacitacion y adiestramiento
 
3.Evidence: Getting to Bogota.ENGLISH DOT WORKS 2. SENA.semana 4 actividad 3.
3.Evidence: Getting to Bogota.ENGLISH DOT WORKS 2. SENA.semana 4 actividad 3.3.Evidence: Getting to Bogota.ENGLISH DOT WORKS 2. SENA.semana 4 actividad 3.
3.Evidence: Getting to Bogota.ENGLISH DOT WORKS 2. SENA.semana 4 actividad 3.
 
Evidence: Going to the restaurant . ENGLISH DOT WORKS 2. SENA.
Evidence: Going to the restaurant . ENGLISH DOT WORKS 2. SENA.Evidence: Going to the restaurant . ENGLISH DOT WORKS 2. SENA.
Evidence: Going to the restaurant . ENGLISH DOT WORKS 2. SENA.
 
Evidence: I can’t believe it.ENGLISH DOT WORKS 2. semana 3 actividad 1.SENA.
Evidence: I can’t believe it.ENGLISH DOT WORKS 2. semana 3 actividad 1.SENA.Evidence: I can’t believe it.ENGLISH DOT WORKS 2. semana 3 actividad 1.SENA.
Evidence: I can’t believe it.ENGLISH DOT WORKS 2. semana 3 actividad 1.SENA.
 
Evidence: Memorable moments.ENGLISH DOT WORKS 2. SENA. semana 2 actividad 2.
Evidence: Memorable moments.ENGLISH DOT WORKS 2. SENA. semana 2 actividad 2.Evidence: Memorable moments.ENGLISH DOT WORKS 2. SENA. semana 2 actividad 2.
Evidence: Memorable moments.ENGLISH DOT WORKS 2. SENA. semana 2 actividad 2.
 
Evidence: Planning my trip. ENGLISH DOT WORKS 2. SENA. semana 4 actividad 1.
Evidence: Planning my trip. ENGLISH DOT WORKS 2. SENA. semana 4 actividad 1.Evidence: Planning my trip. ENGLISH DOT WORKS 2. SENA. semana 4 actividad 1.
Evidence: Planning my trip. ENGLISH DOT WORKS 2. SENA. semana 4 actividad 1.
 
Libro de-mantenimiento-industrial
Libro de-mantenimiento-industrialLibro de-mantenimiento-industrial
Libro de-mantenimiento-industrial
 
3. Your next holiday destination ACTIVIDAD 3 SEMANA 3 ENGLISH DOT WORKS 2.
3. Your next holiday destination ACTIVIDAD 3 SEMANA 3 ENGLISH DOT WORKS 2.3. Your next holiday destination ACTIVIDAD 3 SEMANA 3 ENGLISH DOT WORKS 2.
3. Your next holiday destination ACTIVIDAD 3 SEMANA 3 ENGLISH DOT WORKS 2.
 
C:\Fakepath\Christie
C:\Fakepath\ChristieC:\Fakepath\Christie
C:\Fakepath\Christie
 

Semelhante a Feature Selection for Document Ranking

Real Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningReal Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningIRJET Journal
 
Graph processing at scale using spark &amp; graph frames
Graph processing at scale using spark &amp; graph framesGraph processing at scale using spark &amp; graph frames
Graph processing at scale using spark &amp; graph framesRon Barabash
 
You Don't Have to Be a Data Scientist to Do Data Science
You Don't Have to Be a Data Scientist to Do Data ScienceYou Don't Have to Be a Data Scientist to Do Data Science
You Don't Have to Be a Data Scientist to Do Data ScienceCarmen Mardiros
 
Algorithms Lecture 1: Introduction to Algorithms
Algorithms Lecture 1: Introduction to AlgorithmsAlgorithms Lecture 1: Introduction to Algorithms
Algorithms Lecture 1: Introduction to AlgorithmsMohamed Loey
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataAbhishek M Shivalingaiah
 
Recommendation System
Recommendation SystemRecommendation System
Recommendation SystemAnamta Sayyed
 
IRJET-Fake Product Review Monitoring
IRJET-Fake Product Review MonitoringIRJET-Fake Product Review Monitoring
IRJET-Fake Product Review MonitoringIRJET Journal
 
Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Gabriel Moreira
 
Algorithms Lecture 6: Searching Algorithms
Algorithms Lecture 6: Searching AlgorithmsAlgorithms Lecture 6: Searching Algorithms
Algorithms Lecture 6: Searching AlgorithmsMohamed Loey
 
Improving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsImproving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsNeo4j
 
Using Interactive Genetic Algorithm for Requirements Prioritization
Using Interactive Genetic Algorithm for Requirements PrioritizationUsing Interactive Genetic Algorithm for Requirements Prioritization
Using Interactive Genetic Algorithm for Requirements Prioritization Francis Palma
 
Combining machine learning and search through learning to rank
Combining machine learning and search through learning to rankCombining machine learning and search through learning to rank
Combining machine learning and search through learning to rankJettro Coenradie
 
Experiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryExperiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryTim Menzies
 
IRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET Journal
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Spencer Fox
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
IRJET- Online Course Recommendation System
IRJET- Online Course Recommendation SystemIRJET- Online Course Recommendation System
IRJET- Online Course Recommendation SystemIRJET Journal
 
Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...
Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...
Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...IRJET Journal
 

Semelhante a Feature Selection for Document Ranking (20)

Real Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine LearningReal Estate Investment Advising Using Machine Learning
Real Estate Investment Advising Using Machine Learning
 
Graph processing at scale using spark &amp; graph frames
Graph processing at scale using spark &amp; graph framesGraph processing at scale using spark &amp; graph frames
Graph processing at scale using spark &amp; graph frames
 
You Don't Have to Be a Data Scientist to Do Data Science
You Don't Have to Be a Data Scientist to Do Data ScienceYou Don't Have to Be a Data Scientist to Do Data Science
You Don't Have to Be a Data Scientist to Do Data Science
 
Algorithms Lecture 1: Introduction to Algorithms
Algorithms Lecture 1: Introduction to AlgorithmsAlgorithms Lecture 1: Introduction to Algorithms
Algorithms Lecture 1: Introduction to Algorithms
 
Cloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big DataCloudera Movies Data Science Project On Big Data
Cloudera Movies Data Science Project On Big Data
 
Recommendation System
Recommendation SystemRecommendation System
Recommendation System
 
UNEC__1683196273.pptx
UNEC__1683196273.pptxUNEC__1683196273.pptx
UNEC__1683196273.pptx
 
IRJET-Fake Product Review Monitoring
IRJET-Fake Product Review MonitoringIRJET-Fake Product Review Monitoring
IRJET-Fake Product Review Monitoring
 
Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação Sistemas de Recomendação sem Enrolação
Sistemas de Recomendação sem Enrolação
 
Algorithms Lecture 6: Searching Algorithms
Algorithms Lecture 6: Searching AlgorithmsAlgorithms Lecture 6: Searching Algorithms
Algorithms Lecture 6: Searching Algorithms
 
Improving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph AlgorithmsImproving Machine Learning using Graph Algorithms
Improving Machine Learning using Graph Algorithms
 
Using Interactive Genetic Algorithm for Requirements Prioritization
Using Interactive Genetic Algorithm for Requirements PrioritizationUsing Interactive Genetic Algorithm for Requirements Prioritization
Using Interactive Genetic Algorithm for Requirements Prioritization
 
Recommender Systems and Linked Open Data
Recommender Systems and Linked Open DataRecommender Systems and Linked Open Data
Recommender Systems and Linked Open Data
 
Combining machine learning and search through learning to rank
Combining machine learning and search through learning to rankCombining machine learning and search through learning to rank
Combining machine learning and search through learning to rank
 
Experiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryExperiments on Design Pattern Discovery
Experiments on Design Pattern Discovery
 
IRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and ChallengesIRJET- Machine Learning: Survey, Types and Challenges
IRJET- Machine Learning: Survey, Types and Challenges
 
Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016Introduction to R Short course Fall 2016
Introduction to R Short course Fall 2016
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
IRJET- Online Course Recommendation System
IRJET- Online Course Recommendation SystemIRJET- Online Course Recommendation System
IRJET- Online Course Recommendation System
 
Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...
Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...
Comparative Analysis of Machine Learning Models for Cricket Score and Win Pre...
 

Mais de Andrea Gigli

How organizations can become data-driven: three main rules
How organizations can become data-driven: three main rulesHow organizations can become data-driven: three main rules
How organizations can become data-driven: three main rulesAndrea Gigli
 
Equity Value for Startups.pdf
Equity Value for Startups.pdfEquity Value for Startups.pdf
Equity Value for Startups.pdfAndrea Gigli
 
Introduction to recommender systems
Introduction to recommender systemsIntroduction to recommender systems
Introduction to recommender systemsAndrea Gigli
 
Data Analytics per Manager
Data Analytics per ManagerData Analytics per Manager
Data Analytics per ManagerAndrea Gigli
 
Balance-sheet dynamics impact on FVA, MVA, KVA
Balance-sheet dynamics impact on FVA, MVA, KVABalance-sheet dynamics impact on FVA, MVA, KVA
Balance-sheet dynamics impact on FVA, MVA, KVAAndrea Gigli
 
Reasons behind XVAs
Reasons behind XVAs Reasons behind XVAs
Reasons behind XVAs Andrea Gigli
 
Recommendation Systems in banking and Financial Services
Recommendation Systems in banking and Financial ServicesRecommendation Systems in banking and Financial Services
Recommendation Systems in banking and Financial ServicesAndrea Gigli
 
Mine the Wine by Andrea Gigli
Mine the Wine by Andrea GigliMine the Wine by Andrea Gigli
Mine the Wine by Andrea GigliAndrea Gigli
 
Using R for Building a Simple and Effective Dashboard
Using R for Building a Simple and Effective DashboardUsing R for Building a Simple and Effective Dashboard
Using R for Building a Simple and Effective DashboardAndrea Gigli
 
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...Andrea Gigli
 
Comparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningComparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningAndrea Gigli
 
Electricity Derivatives
Electricity DerivativesElectricity Derivatives
Electricity DerivativesAndrea Gigli
 
Crawling Tripadvisor Attracion Reviews - Italiano
Crawling Tripadvisor Attracion Reviews - ItalianoCrawling Tripadvisor Attracion Reviews - Italiano
Crawling Tripadvisor Attracion Reviews - ItalianoAndrea Gigli
 
Search Engine for World Recipes Expo 2015
Search Engine for World Recipes Expo 2015Search Engine for World Recipes Expo 2015
Search Engine for World Recipes Expo 2015Andrea Gigli
 
A Data Scientist Job Map Visualization Tool using Python, D3.js and MySQL
A Data Scientist Job Map Visualization Tool using Python, D3.js and MySQLA Data Scientist Job Map Visualization Tool using Python, D3.js and MySQL
A Data Scientist Job Map Visualization Tool using Python, D3.js and MySQLAndrea Gigli
 
Search Engine Query Suggestion Application
Search Engine Query Suggestion ApplicationSearch Engine Query Suggestion Application
Search Engine Query Suggestion ApplicationAndrea Gigli
 
From real to risk neutral probability measure for pricing and managing cva
From real to risk neutral probability measure for pricing and managing cvaFrom real to risk neutral probability measure for pricing and managing cva
From real to risk neutral probability measure for pricing and managing cvaAndrea Gigli
 
Startup Saturday Internet Festival 2014
Startup Saturday Internet Festival 2014Startup Saturday Internet Festival 2014
Startup Saturday Internet Festival 2014Andrea Gigli
 
Lean Methods for Business & Social Innovation
Lean Methods for Business & Social InnovationLean Methods for Business & Social Innovation
Lean Methods for Business & Social InnovationAndrea Gigli
 
Presentazione Startup Saturday Europe @ ParmaCamp2013
Presentazione Startup Saturday Europe @ ParmaCamp2013Presentazione Startup Saturday Europe @ ParmaCamp2013
Presentazione Startup Saturday Europe @ ParmaCamp2013Andrea Gigli
 

Mais de Andrea Gigli (20)

How organizations can become data-driven: three main rules
How organizations can become data-driven: three main rulesHow organizations can become data-driven: three main rules
How organizations can become data-driven: three main rules
 
Equity Value for Startups.pdf
Equity Value for Startups.pdfEquity Value for Startups.pdf
Equity Value for Startups.pdf
 
Introduction to recommender systems
Introduction to recommender systemsIntroduction to recommender systems
Introduction to recommender systems
 
Data Analytics per Manager
Data Analytics per ManagerData Analytics per Manager
Data Analytics per Manager
 
Balance-sheet dynamics impact on FVA, MVA, KVA
Balance-sheet dynamics impact on FVA, MVA, KVABalance-sheet dynamics impact on FVA, MVA, KVA
Balance-sheet dynamics impact on FVA, MVA, KVA
 
Reasons behind XVAs
Reasons behind XVAs Reasons behind XVAs
Reasons behind XVAs
 
Recommendation Systems in banking and Financial Services
Recommendation Systems in banking and Financial ServicesRecommendation Systems in banking and Financial Services
Recommendation Systems in banking and Financial Services
 
Mine the Wine by Andrea Gigli
Mine the Wine by Andrea GigliMine the Wine by Andrea Gigli
Mine the Wine by Andrea Gigli
 
Using R for Building a Simple and Effective Dashboard
Using R for Building a Simple and Effective DashboardUsing R for Building a Simple and Effective Dashboard
Using R for Building a Simple and Effective Dashboard
 
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...
Impact of Valuation Adjustments (CVA, DVA, FVA, KVA) on Bank's Processes - An...
 
Comparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text MiningComparing Machine Learning Algorithms in Text Mining
Comparing Machine Learning Algorithms in Text Mining
 
Electricity Derivatives
Electricity DerivativesElectricity Derivatives
Electricity Derivatives
 
Crawling Tripadvisor Attracion Reviews - Italiano
Crawling Tripadvisor Attracion Reviews - ItalianoCrawling Tripadvisor Attracion Reviews - Italiano
Crawling Tripadvisor Attracion Reviews - Italiano
 
Search Engine for World Recipes Expo 2015
Search Engine for World Recipes Expo 2015Search Engine for World Recipes Expo 2015
Search Engine for World Recipes Expo 2015
 
A Data Scientist Job Map Visualization Tool using Python, D3.js and MySQL
A Data Scientist Job Map Visualization Tool using Python, D3.js and MySQLA Data Scientist Job Map Visualization Tool using Python, D3.js and MySQL
A Data Scientist Job Map Visualization Tool using Python, D3.js and MySQL
 
Search Engine Query Suggestion Application
Search Engine Query Suggestion ApplicationSearch Engine Query Suggestion Application
Search Engine Query Suggestion Application
 
From real to risk neutral probability measure for pricing and managing cva
From real to risk neutral probability measure for pricing and managing cvaFrom real to risk neutral probability measure for pricing and managing cva
From real to risk neutral probability measure for pricing and managing cva
 
Startup Saturday Internet Festival 2014
Startup Saturday Internet Festival 2014Startup Saturday Internet Festival 2014
Startup Saturday Internet Festival 2014
 
Lean Methods for Business & Social Innovation
Lean Methods for Business & Social InnovationLean Methods for Business & Social Innovation
Lean Methods for Business & Social Innovation
 
Presentazione Startup Saturday Europe @ ParmaCamp2013
Presentazione Startup Saturday Europe @ ParmaCamp2013Presentazione Startup Saturday Europe @ ParmaCamp2013
Presentazione Startup Saturday Europe @ ParmaCamp2013
 

Último

Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxAmita Gupta
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 

Último (20)

Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 

Feature Selection for Document Ranking

  • 1. Feature Selection Algorithms for Learning to Rank Andrea Gigli Email Slides: http://www.slideshare.net/andrgig March-2016
  • 2. Outline  Machine Learning for Ranking  Proposed Feature Selection Algorithms (FSA) and Feature Selection Protocol  Application to Public Available Web Search Data
  • 3. Outline  Machine Learning for Ranking  Proposed Feature Selection Algorithms (FSA) and Feature Selection Protocol  Application to Public Available Web Search Data
  • 4. Information Retrieval and Ranking Systems  Information Retrieval is the activity of providing information offers relevant to an information need from a collection of information resources.  Ranking consists in sorting the information offers according to some criterion, so that the "best" results appear early in the provided list.
  • 5. Information Retrieval and Ranking Systems Ranking System Information Request (Query) Information Offer (Documents) Indexed Documents Information Request Processing (Top) Ranked Documents
  • 6.  Compute numeric scores on query/document pairs Cosine Similarity, BM25 score, LMIR probabilities…  Use Machine Learning to build a ranking model Learning to Rank (L2R) How to Rank 𝑨 𝑪 𝑩 G C A 𝑫 Information Offers (Documents) Ranked List Information Request (Query) 𝑬 𝑮 𝑭 𝑯
  • 7. … … ... ... Learning System Ranking System Indexed Documents ... Training Prediction How to Rank using Supervised Learning ... ... 𝑑1,1 𝑑1,2 ℓ1,1 𝑑1,𝑁1 ℓ1,2 ℓ1,𝑁1 … … 𝑑M,1 𝑑M,2 ℓ 𝑀,1 𝑑M,𝑁 𝑀 ℓ 𝑀,2 ℓ 𝑀,𝑁2 𝑞1 𝑞 𝑀 𝑓(𝑞, 𝑑) 𝑞 𝑀+1 𝑓(𝑞 𝑀+1, 𝑑 𝑁1 ) 𝑓(𝑞 𝑀+1, 𝑑 𝑁 𝑀 ) 𝑞𝑖: i-th query 𝑑𝑖,𝑗: j-th document associated to the i-th query ℓ𝑖,𝑗: observed score of the j-th document associated to the i-th query 𝑓(𝒒, 𝒐): scoring function
  • 8. Machine Learning for Ranking Documents: Application Fields
  • 9. Machine Learning for Ranking Documents: Business Cases
  • 10. Outline  Machine Learning for Ranking  Proposed Feature Selection Algorithms (FSA) and Feature Selection Protocol  Application to Public Available Web Search Data
  • 11. Query & Information Offer Features 𝑑𝑖,1 𝑑𝑖,2 𝑑𝑖,𝑁 𝑖𝑞𝑖 … …ℓ𝑖,1 ℓ𝑖,2 ℓ𝑖,𝑁 𝑖 𝑥𝑖,1 (1) 𝑥𝑖,1 (2) 𝑥𝑖,1 (3) ⋮ 𝑥𝑖,1 (𝐹) 𝑥𝑖,2 (1) 𝑥𝑖,2 (2) 𝑥𝑖,2 (3) ⋮ 𝑥𝑖,2 (𝐹) 𝑥𝑖,𝑁 𝑖 (1) 𝑥𝑖,𝑁 𝑖 (2) 𝑥𝑖,𝑁 𝑖 (3) ⋮ 𝑥𝑖,𝑁 𝑖 (𝐹) … Documents Query/Documents LabelsQuery  𝑓 𝒒, 𝒐 → 𝑓(𝒙)  F is of the order of hundreds, thousands
  • 12. Which features Case Feature examples Web Search Query-URL matching features: number of occurrences of query terms in the document, BM25, N-gram BM25,Tf-Idf,… Importance of Url: PageRank , Number of in-links, Number of clicks, Browse Rank, Spam Score, Page Quality Score………….. Online Advertisement User features: last page visited, time from the last visit, last advertisement clicked, products queried… Product features: product description, product category, price… User-product matching feature: tf-idf, expected rating,… Page-Product matching feature: topic, category, tf-idf, … Collaborative Filtering User features: age, gender, consumption history, … Product characteristics: category, price, description, … Context - Product matching: tag-matching, tf-idf, … … …
  • 13. How to select features in L2R  The main goal of any feature selection process is to select a subset of n elements from a set of N measurement, with n<N, without significantly degrading the performance of the system  The search for the optimal subset require to search among 2N possible subsets
  • 14. How to select features in L2R 0 1,000,000 2,000,000 3,000,000 4,000,000 5,000,000 6,000,000 7,000,000 8,000,000 9,000,000 0 5 10 15 20 25 Number of possible feature subsets Number of Features A suboptimal criteria is needed
  • 15. Proposed Protocol for Comparing Feature Selection Algorithms Measure the Relevance of each Feature Measure the Similarity of each pair of features Select a Feature Subset using a Feature Selector Train the L2R Model Measure the L2R Model Performance on theTest Set Compare Feature Selection Algorithms Repeat for different Subset Size 1 2 3 4 5 6 Repeat from 3 for every Feature Selection Algorithm
  • 16. Competing Algorithms for feature selection We developed the following algorithms  Naïve Greedy searchAlgorithm for feature Selection (NGAS)  Naïve Greedy searchAlgorithm for feature Selection - Extended (NGAS-E)  Hierarchical Clustering search Algorithm for feature Selection (HCAS)
  • 17. Competing Algorithms for feature selection We developed the following algorithms  Naïve Greedy searchAlgorithm for feature Selection (NGAS)  Naïve Greedy searchAlgorithm for feature Selection - Extended (NGAS-E)  Hierarchical Clustering search Algorithm for feature Selection (HCAS)
  • 18. Competing Algorithm for feature selection #1: NGAS The undirect graph is built and the set S of selected features is initialized.
  • 19. Competing Algorithm for feature selection #1: NGAS Assuming node 1 has the highest relevance, add it to S.
  • 20. Competing Algorithm for feature selection #1: NGAS Select the node with the lowest similarity to Node 1, say Node 7, and the one with the highest similarity to Node 7, say Node 5.
  • 21. Competing Algorithm for feature selection #1: NGAS Remove Node 1. Node 5 is the one with the highest relevance between 5 and 7, add it to S.
  • 22. Competing Algorithm for feature selection #1: NGAS Select the node with the lowest similarity to Node 5, say Node 2, and the one with the highest similarity to Node 2, say Node 3.
  • 23. Competing Algorithm for feature selection #1: NGAS Remove Node 5. Assuming Node 2 is the one with highest relevance between 2 and 3, add it to S.
  • 24. Competing Algorithm for feature selection #1: NGAS Select the node with the lowest similarity to Node 2, say Node 4, and the one with the highest similarity to Node 4, say Node 8.
  • 25. Competing Algorithm for feature selection #1: NGAS Remove Node 2. Assuming Node 4 is the one with highest relevance between 4 and 8, add it to S.
  • 26. Competing Algorithm for feature selection #1: NGAS Select the node with the lowest similarity to Node 4, say Node 6, and the one with the highest similarity to Node 6, say Node 7.
  • 27. Competing Algorithm for feature selection #1: NGAS Remove Node 4. Assuming Node 6 is the one with highest relevance between 6 and 7, add it to S.
  • 28. Competing Algorithm for feature selection #1: NGAS Select the node with the lowest similarity to Node 6, say Node 3, and the one with the highest similarity to Node 3, say Node 8.
  • 29. Competing Algorithm for feature selection #1: NGAS Remove Node 6. Assuming Node 3 is the one with highest relevance between 3 and 8, add it to S.
  • 30. Competing Algorithm for feature selection #1: NGAS Select the node with the lowest similarity to Node 3, say Node 8, and the one with the highest similarity to Node 8, say Node 7.
  • 31. Competing Algorithm for feature selection #1: NGAS Remove Node 3. Assuming Node 8 is the one with highest relevance between 8 and 7, add it to S.
  • 32. Competing Algorithm for feature selection #1: NGAS Add the last node, 7, to S.
  • 33. We developed the following algorithms  Naïve Greedy searchAlgorithm for feature Selection (NGAS)  Naïve Greedy searchAlgorithm for feature Selection - Extended (NGAS-E)  Hierarchical Clustering search Algorithm for feature Selection (HCAS) Competing Algorithms for feature selection
  • 34. Competing Algorithm for feature selection #2: NGAS-E (p=50%) The undirect graph is built and the set S of selected features is initialized.
  • 35. Competing Algorithm for feature selection #2: NGAS-E (p=50%) Assuming node 1 has the highest relevance, add it to S.
  • 36. Competing Algorithm for feature selection #2: NGAS-E (p=50%) Select 7 ∗ 50% nodes less similar to 1.
  • 37. Competing Algorithm for feature selection #2: NGAS-E (p=50%) Cancel Node 1 from the graph. Among the selected nodes, add the one with highest relevance (say node 5) to S.
  • 38. Competing Algorithm for feature selection #2: NGAS-E (p=50%) Select ⌈6*50% ⌉ nodes less similar to node 5.
  • 39. Competing Algorithm for feature selection #2: NGAS-E (p=50%) Cancel Node 5 from the graph. Among the selected nodes, add the one with highest relevance (say Node 3) to S.
  • 40. Competing Algorithm for feature selection #2: NGAS-E (p=50%) Select ⌈5*50% ⌉ nodes less similar to node 3.
  • 41. Competing Algorithm for feature selection #2: NGAS-E (p=50%) Cancel node 3 from the graph. Among the selected nodes, add the one with highest relevance (say node 4) to S.
  • 42. Competing Algorithm for feature selection #2: NGAS-E (p=50%) Select ⌈4*50% ⌉ nodes less similar to node 4.
  • 43. Competing Algorithm for feature selection #2: NGAS-E (p=50%) Cancel node 4 from the graph. Among the selected nodes, add the one with highest relevance (say node 6) to S.
  • 44. Competing Algorithm for feature selection #2: NGAS-E (p=50%) Select ⌈3*50% ⌉ nodes less similar to node 6.
  • 45. Competing Algorithm for feature selection #2: NGAS-E (p=50%) Cancel node 6 from the graph. Among the selected nodes, add the one with highest relevance (say node 2) to S.
  • 46. Competing Algorithm for feature selection #2: NGAS-E (p=50%) Select ⌈2*50% ⌉nodes less similar to node 2.
  • 47. Competing Algorithm for feature selection #2: NGAS-E (p=50%) Node 2 is cancelled from the graph and node 8 is added to S.
  • 48. Competing Algorithm for feature selection #2: NGAS-E (p=50%) Node 8 is cancelled from the graph and the last node, 7, is added to S.
  • 49. Competing Algorithm for feature selection #2: NGAS-E (p=50%) Node 8 is cancelled from the graph and the last node, 7, is added to S.
  • 50. We developed the following algorithms  Naïve Greedy searchAlgorithm for feature Selection (NGAS)  Naïve Greedy searchAlgorithm for feature Selection - Extended (NGAS-E)  Hierarchical Clustering search Algorithm for feature Selection (HCAS) Competing Algorithms for feature selection
  • 51. Competing Algorithm for feature selection #3: HCAS 23 11 5 3
  • 52. Outline  Machine Learning for Ranking  Proposed Feature Selection Algorithms (FSA) and Feature Selection Protocol  Application to Public Available Web Search Data
  • 53. Application to Web Search Engine Data  Bing Data http://research.microsoft.com/en-us/projects/mslr/  Yahoo! Data http://webscope.sandbox.yahoo.com Train Validation Test #queries 19,944 2,994 6,983 #urls 473,134 71,083 165,660 # features 519 Train Validation Test #queries 18,919 6,306 6,306 #urls 723,412 235,259 241,521 # features 136
  • 54. Proposed Protocol for Comparing Feature Selection Algorithms Measure the Relevance of each Feature Measure the Similarity of each pair of features Select a Feature Subset using a Feature Selector Train the L2R Model Measure the L2R Model Performance on theTest Set Compare Feature Selection Algorithms Repeat for different Subset Size 1 2 3 4 5 6 Repeat from 3 for every Feature Selection Algorithm
  • 55. Learning to Rank Algorithms: a timeline of major contributions LambdaMART LambdaRank CRR IntervalRank GBlend NDCG Boost BayesRank BoltzRank MPBoost SortNet SSRankBoost RR SoftRank PermuRank ListMLE SVMmap RankRLS RankGP RankCosine QBRank McRank ListNet GBRank AdaRank IR-SVM RankNet RankBoost Pranking RankSVM 2000 2002 2003 2005 2006 2007 FRank 2008 2009 2010 Pointwise Pairwise Listwise
  • 56. Multiple Additive Regression Trees LambdaMART Model for LtR Lambda function  Ensemble method:Tree Boosting  Loss function not differentiable  Sorting characteristic  Speed
  • 57. Proposed Protocol for Comparing Feature Selection Algorithms Measure the Relevance of each Feature Measure the Similarity of each pair of features Select a Feature Subset using a Feature Selector Train the L2R Model Measure the L2R Model Performance on theTest Set Compare Feature Selection Algorithms Repeat for different Subset Size 1 2 3 4 5 6 Repeat from 3 for every Feature Selection Algorithm
  • 58. Feature Relevance  The relevance of a document is measured with a categorical variable (0,1,2,3,4)  we need to use metrics good at measuring «dependence» between discrete/continuous feature variables and a categorical label variable.  In the following we use  Normalized Mutual Information (NMI):  Spearman coefficient (S)  Kendall’s tau (K)  Average GroupVariance (AGV)  OneVariable NDCG@10 (1VNDCG)
  • 59. Feature Relevance via Normalized Mutual Information  Mutual Information (MI) measures how much, on average, the realization of a random variable X tells us about the realization of the random variableY, or how much the entropy ofY, H(Y), is reduced knowing about the realization of X 𝑀𝐼 𝑋, 𝑌 = 𝐻 𝑋 − 𝐻 𝑋 𝑌 = 𝐻 𝑋 + 𝐻 𝑌 − 𝐻 𝑋, 𝑌 The normalizad version is 𝑁𝑀𝐼 𝑋, 𝑌 = 𝑀𝐼(𝑋, 𝑌) 𝐻(𝑋) 𝐻(𝑌)
  • 60. Feature Relevance via Spearman’s coefficient  Spearman’s rank correlation coefficient is a non-parametric measure of statistical dependence between two random variables. It is given by 𝜌 = 1 − 6 𝑑𝑖 2 𝑛(𝑛2 − 1) where n is the sample size and 𝑑𝑖 = 𝑟𝑎𝑛𝑘 𝑥𝑖 − 𝑟𝑎𝑛𝑘 𝑦𝑖
  • 61. Feature Relevance via Kendall’s tau  Kendall’sTau is a measure of association defined on two ranking lists of length n. It is defined as τ = 𝑛 𝑐 − 𝑛 𝑑 𝑛(𝑛 − 1) 2 − 𝑛1 𝑛(𝑛 − 1) 2 − 𝑛2 where 𝑛 𝑐 denotes the number of concordant pairs between the two lists, 𝑛 𝑑 denotes the number of discordant pairs, 𝑛1 = 𝑡𝑖(𝑡𝑖 − 1)/2, 𝑛2 = 𝑢𝑗(𝑢𝑗 − 1)/2, 𝑡𝑖 is the number of tied values in the i-th group of ties for the first list and 𝑢𝑗 is the number of tied values in the j-th group of ties for the second list.
  • 62. Feature Relevance via Average GroupVariance  Average GroupVariance measure the discrimination power of a feature.The intuitive justification is that a feature is useful if it is capable of discriminating a small portion of the ordered scale from the rest, and that features with a small variance are those which satisfy this property. 𝐴𝐺𝑉 = 1 − 𝑔=1 5 𝑛 𝑔 𝑥 𝑔 − 𝑥 2 𝑖 𝑥𝑖 − 𝑥 2 where 𝑛 𝑔be the size of group g, 𝑥 𝑔 the sample mean of feature 𝑥 in the g-th group and 𝑥 whole sample mean.
  • 63. Feature Relevance via single feature LambdaMART scoring  For each feature i we run LambdaMART and compute the 𝑁𝐷𝐶𝐺𝑖,𝑞@10 for each query q  The i-th feature relevance is measured averaging 𝑁𝐷𝐶𝐺𝑖,𝑞@10 over the whole query set 𝑁𝐷𝐶𝐺𝑖@10 = 1 𝑄 𝑞∈𝑄 𝑁𝐷𝐶𝐺𝑖,𝑞@10
  • 64.  Precision at k: 𝑃𝑖@𝑘 = # 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠 𝑖𝑛 𝑡𝑜𝑝 𝑘 𝑟𝑒𝑠𝑢𝑙𝑡𝑠 𝑘  Average precision: 1 𝐷 𝑘=1 𝐷 𝑃𝑖@𝑘 ∙ 𝕀 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝑘 𝑖𝑠 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 # 𝑟𝑒𝑙𝑒𝑣𝑎𝑛𝑡 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡𝑠  Discounted Cumulative Gain 𝐷𝐶𝐺𝑖 = 𝑗=1 𝑘 2 𝑟𝑒𝑙 𝑖,𝑗 − 1 𝑙𝑜𝑔2 1 + 𝑟𝑎𝑛𝑘𝑗 How to Measure Ranking Performance on query i
  • 65. How to Measure Ranking Performance: Normalized DCG Document Gain Cumulative Gain Document 1 31 31 Document 2 3 34 Document 3 7 41 Document 4 31 72 Discounted 31x1=31 31+3x0.63=32.9 32.9+7x0.5=36.4 36.4+31x0.4=48.8 Normalization: divide DCG by the ideal DCG Document Gain Cumulative Gain Document 1 31 31 Document 4 31 62 Document 3 7 69 Document 2 3 72 Discounted 31x1=31 31+31x0.63=50.53 50.53+7x0.5=54.03 54.03+3x0.4=57.07 Relevance Rating Gain Perfect 25-1=31 Excellent 24-1=15 Good 23-1=7 Fair 22-1=3 Bad 21-1=1
  • 68. Choosing the Relevance Measure (1/2) FSA performance is measured using the Average NDCG@10 obtaind from LambdaMART on the test set. NDCG@10 on Yahoo Test Set Feature Subset Dimension 5% 10% 20% 30% 40% 50% 75% 100% NMI 0.73398 0.75952 0.76241 0.7678 0.76912 0.769 0.77015 0.76935 AVG 0.7524 0.7548 0.76168 0.76493 0.76498 0.76717 0.76971 0.76935 S 0.74963 0.75396 0.76099 0.76398 0.7649 0.76753 0.77002 0.76935 K 0.75225 0.75291 0.76145 0.76304 0.7648 0.76673 0.76972 0.76935 1VNDCG 0.75246 0.75768 0.76452 0.76672 0.76823 0.77008 0.77027 0.76935 NDCG@10 on Bing Test Set Feature Subset Dimension 5% 10% 20% 30% 40% 50% 75% 100% NMI 0.38927 0.3978 0.41347 0.41539 0.44785 0.44966 0.45083 0.46336 AVG 0.32682 0.33168 0.36043 0.36976 0.37383 0.37612 0.43444 0.46336 S 0.32969 0.3346 0.3428 0.34592 0.36711 0.42475 0.42809 0.46336 K 0.32917 0.3346 0.34356 0.42124 0.42071 0.4245 0.42706 0.46336 1VNDCG 0.41633 0.42571 0.42413 0.42601 0.42757 0.43795 0.46222 0.46336
  • 69. Choosing the Relevance Measure (2/2)
  • 70. Proposed Protocol for Comparing Feature Selection Algorithms Measure the Relevance of each Feature Measure the Similarity of each pair of features Select a Feature Subset using a Feature Selector Train the L2R Model Measure the Model Performance on theTest Set Compare Feature Selection Algorithms Repeat for different Subset Size 1 2 3 4 5 6 Repeat from 3 for every Feature Selection Algorithm
  • 71. Feature Similarity  We used Spearman’s Rank coefficient for measuring features similarity.  Spearman’s Rank is faster to be computed than NMI, Kendall’s tau and 1VNDCG.
  • 72. The FSA benchmark: Greedy Algorithm for feature Selection 1. Build a complete undirected graph 𝐺0, in which a) each node represent the i-th feature with weight 𝑤𝑖 and b) each edge has weigth 𝑒𝑖,𝑗 2. Let 𝑆0 = ∅ be the set of selected features at step 0. 3. For i=1, …, n a) Select the node with largest weight from 𝐺𝑖−1, suppose that it is the k- th node b) Punish all the nodes connected with the k-th node: 𝑤𝑗 ← 𝑤𝑗 −2*c*𝑒 𝑘,𝑗, 𝑗 ≠ 𝑘 c) Add the k-th node to 𝑆𝑖−1 d) Remove the k-th node from 𝐺𝑖−1 4. Return 𝑆 𝑛
  • 73. Train the L2R Model Proposed Protocol for Comparing Feature Selection Algorithms Measure the Relevance of each Feature Measure the Similarity of each pair of features Select a Feature Subset using a Feature Selector Compare Feature Selection Algorithms Repeat for different Subset Size 1 2 3 4 5 6 Repeat from 3 for every Feature Selection Algorithm Measure the L2R Model Performance on theTest Set
  • 74. Proposed Protocol for Comparing Feature Selection Algorithms Measure the Relevance of each Feature Measure the Similarity of each pair of features Select a Feature Subset using a Feature Selector Train the L2R Model Measure the L2R Model Performance on theTest Set Compare Feature Selection Algorithms 1 2 3 4 5 6 Repeat from 3 for every Feature Selection AlgorithmRepeat for different Subset Size
  • 75. Proposed Protocol for Comparing Feature Selection Algorithms Measure the Relevance of each Feature Measure the Similarity of each pair of features Select a Feature Subset using a Feature Selector Train the L2R Model Compare Feature Selection Algorithms Repeat for different Subset Size 1 2 3 4 5 6 Repeat from 3 for every Feature Selection Algorithm Measure the Model Performance on theTest Set
  • 77. SignificanceTest using RandomizationTest NDCG@10 on Yahoo Test Set Feature Subset Dimension 5% 10% 20% 30% 40% 50% 75% 100% NGAS 0.7430▼ 0.7601 0.7672 0.7717 0.7724 0.7759 0.7766 0.7753 NGAS-E, p = 0.8 0.7655 0.7666 0.7723 0.7742 0.7751 0.7759 0.776 0.7753 HCAS, "single" 0.7350▼ 0.7635 0.7666 0.7738 0.7742 0.7754 0.7756 0.7753 HCAS, "ward" 0.7570▼ 0.7626 0.7704 0.7743 0.7755 0.7763 0.7757 0.7753 GAS, c = 0.01 0.7628 0.7649 0.7671 0.773 0.7737 0.7737 0.7758 0.7753 NDCG@10 on Bing Test Set Feature Subset Dimension 5% 10% 20% 30% 40% 50% 75% 100% NGAS 0.4011▼ 0.4459 0.471 0.4739▼ 0.4813 0.4837 0.4831 0.4863 NGAS-E, p = 0.05 0.4376▲ 0.4528 0.4577▼ 0.4825 0.4834 0.4845 0.4867 0.4863 HCAS, "single" 0.4423▲ 0.4643▲ 0.4870▲ 0.4854 0.4848 0.4847 0.4853 0.4863 HCAS, "ward" 0.4289 0.4434▼ 0.4820 0.4879 0.4853 0.4837 0.4870 0.4863 GAS, c = 0.01 0.4294 0.4515 0.4758 0.4848 0.4863 0.4860 0.4868 0.4863
  • 79. Conclusions  We designed 3 FSAs and we applied them to the Web Search Pages Ranking problem.  NGAS-E e HCAS have a performance equal or greater than the benchmark model.  HCAS and NGAS are very  The proposed FSAs can be implemented independently of the L2R model.  The proposed FSAs can be applied to other ML contexts, to Sorting problems and to Model Ensambling.