ECDL 2010 - Measuring Effectiveness of Geographic IR Systems in Digital Libraries: Evaluation and Case Study
1. ECDL 2010
6-10 september 2010
Measuring Effectiveness of
Geographic IR Systems
in Digital Libraries:
Evaluation and Case Study
Damien Palacio, Guillaume Cabanac,
Christian Sallaberry, Gilles Hubert
Damien Palacio - damien.palacio@univ-pau.fr 1
2. Outline
1. Motivation Topical IR → Geographic IR
Hypothesis: GIRS > IRS
2. Context IRS evaluation
Issue Current evaluation frameworks
= partial
3. Contribution GIRS evaluation framework
4. Experiments Case study with PIV GIRS
Hypothesis validated
5. Conclusion and Future Works
2
3. Outline
1. Motivation Topical IR → Geographic IR
Hypothesis: GIRS > IRS
2. Context IRS evaluation
Issue Current evaluation frameworks
= partial
3. Contribution GIRS evaluation framework
4. Experiments Case study with PIV GIRS
Hypothesis validated
5. Conclusion and Future Works
3
4. 1. Motivation – Why Geographic IR?
Geographic Information Retrieval
➔ Query = ''trip around Glasgow in summer 2010''
➔ Search Engines
➔ Topical term ∈ {trip, Glasgow, summer, 2010}
spatial ∈ {citiesNearGlasgow ...}
➔ Geographic temporal ∈ {21june .. 22sept 2010}
term ∈ {trip, Glasgow, summer, 2010}
➔ ≈ 1/6 Queries = Geographic Queries
➔ Excite (Sanderson et al., 2004)
➔ AOL (Gan et al., 2008)
➔ Yahoo! (Jones et al., 2008)
➔ Current Issue and Realistic
4
5. 1. Motivation – Why Geographic IR?
A Geographic IRS: How Does It Work?
➔ 3 Dimensions to Process:
➔ Spatial, temporal and topical
➔ 1 Index per Dimension
➔ Topical bag of words, vector space model, ...
➔ Spatial named entity recognition, ...
➔ Temporal named entity recognition, ...
5
6. 1. Motivation – Why Geographic IR?
A Geographic IRS: How Does It Work?
➔ Spatial Processing
6
7. 1. Motivation – Why Geographic IR?
A Geographic IRS: How Does It Work?
➔ 3 Dimensions to Process:
➔ Spatial, temporal and topical
➔ 1 Index per Dimension
➔ Topical bag of words, vector space model, ...
➔ Spatial named entity recognition, ...
➔ Temporal named entity recognition, ...
➔ Retrieval
➔ Usually by filtering (STEWARD, SPIRIT, CITER, …)
➔ Issue: Performance of GIRS vs. topical IRS
➔ Hypothesis: Geographic IRS better than topical IRS
7
8. Outline
1. Motivation Topical IR → Geographic IR
Hypothesis: GIRS > IRS
2. Context IRS evaluation
Issue Current evaluation frameworks
= partial
3. Contribution GIRS evaluation framework
4. Experiments Case study with PIV GIRS
Hypothesis validated
5. Conclusion and Future Works
8
9. 2. Context and Issue: IRS Partial Evaluation
Evaluating an IR System
➔ System = efficiency + effectiveness
Geo IR litterature Topical IR
litterature
➔ Effectiveness Evaluation
9
10. 2. Context and Issue: IRS Partial Evaluation
Evaluating an IR System
➔ System = efficiency + effectiveness
Computation Storage
time needed
Geo IR litterature Topical IR
litterature
➔ Effectiveness Evaluation
10
11. 2. Context and Issue: IRS Partial Evaluation
Evaluating an IR System
➔ System = efficiency + effectiveness
Computation Storage
needed Quality
time
Geo IR litterature Topical IR
litterature
➔ Effectiveness Evaluation
11
12. 2. Context and Issue: IRS Partial Evaluation
Evaluating an IR System
➔ System = efficiency + effectiveness
Computation Storage
needed Quality
time
Geo IR litterature Topical IR
litterature
➔ Effectiveness Evaluation
Temporal Topical
Spatial
12
13. 2. Context and Issue: IRS Partial Evaluation
Evaluating an IR System
➔ System = efficiency + effectiveness
Computation Storage
needed Quality
time
Geo IR litterature Topical IR
litterature
➔ Effectiveness Evaluation
TREC, CLEF, ...
Temporal Topical
Spatial
13
14. 2. Context and Issue: IRS Partial Evaluation
Evaluating an IR System
➔ System = efficiency + effectiveness
Computation Storage
needed Quality
time
Geo IR litterature Topical IR
litterature
➔ Effectiveness Evaluation
TREC, CLEF, ...
TempEval
Temporal Topical
Spatial
14
15. 2. Context and Issue: IRS Partial Evaluation
Evaluating an IR System
➔ System = efficiency + effectiveness
Computation Storage
needed Quality
time
Geo IR litterature Topical IR
litterature
➔ Effectiveness Evaluation
TREC, CLEF, ...
TempEval
Temporal Topical
Bucher et al. (2005)
GeoClef
Spatial
15
16. 2. Context and Issue: IRS Partial Evaluation
Evaluating an IR System
➔ System = efficiency + effectiveness
Computation Storage
needed Quality
time
Geo IR litterature Topical IR
litterature
➔ Effectiveness Evaluation
TREC, CLEF, ...
TempEval
Temporal Topical
Bucher et al. (2005)
Evaluation GeoClef
framework Spatial
proposed
16
17. Outline
1. Motivation Topical IR → Geographic IR
Hypothesis: GIRS > IRS
2. Context IRS evaluation
Issue Current evaluation frameworks
= partial
3. Contribution GIRS evaluation framework
4. Experiments Case study with PIV GIRS
Hypothesis validated
5. Conclusion and Future Works
17
18. 3. Proposition – GIRS Evaluation Framework
Evaluation Framework for the 3 Dimensions (1/2)
➔ Goal: measuring GIRS quality
➔ Means: building on TREC framework (1992-)
➔ ''Cranfield'' methodology
➔ Test collection
➔ Corpus
➔ ≥ 25 Topics
➔ Qrels
➔ Measures: P@X, MAP,
NDCG, ...
[Voorhees, 2007]
18
19. 3. Proposition – GIRS Evaluation Framework
Evaluation Framework for the 3 Dimensions (2/2)
➔ TREC Framework Extension
➔ Test collection
➔ ≥ 25 Topics
➔ Corpus Covering the 3
dimensions
➔ Gradual qrels
➔ + geographic ressources
19
20. 3. Proposition – GIRS Evaluation Framework
Evaluation Framework for the 3 Dimensions (2/2)
➔ TREC Framework Extension
➔ Test collection
➔ ≥ 25 Topics
➔ Corpus Covering the 3
dimensions
➔ Gradual qrels
3 dimensions:
➔ + geographic ressources Topic: ''trip around Glasgow''
Doc: trip + Bob born in Dumbarton
No dimension 3 dimensions + global
➔ About qrels … =
Satisfied topic
➔ Relevance (doc, topic) ∈ {0;1;2;3;4}
➔ Principle: ''the more satisfied dimensions there are, the
better it is''
20
21. 3. Proposition – GIRS Evaluation Framework
Evaluation Framework for the 3 Dimensions (2/2)
➔ TREC Framework Extension
➔ Test collection
➔ ≥ 25 Topics
➔ Corpus Covering the 3
dimensions
➔ Gradual qrels
3 dimensions:
➔ + geographic ressources Topic: ''trip around Glasgow''
Doc: trip + Bob born in Dumbarton
No dimension 3 dimensions + global
➔ About qrels … =
Satisfied topic
➔ Relevance (doc, topic) ∈ {0;1;2;3;4}
➔ Principle: ''the more satisfied dimensions there are, the
better it is''
➔ Gradual qrels aware measure:
Normalized Discounted Cumulative Gain [Järvelin & Kekäläinen, 2002]
➔ By topic: NDCG for each topic
➔ Global: meanNDCG for the system 21
22. Outline
1. Motivation Topical IR → Geographic IR
Hypothesis: GIRS > IRS
2. Context IRS evaluation
Issue Current evaluation frameworks
= partial
3. Contribution GIRS evaluation framework
4. Experiments Case study with PIV GIRS
Hypothesis validated
5. Conclusion and Future Works
22
23. 4. Experiments – Case Study with PIV GIRS
Case Study: PIV System
➔ Indexing: 1 index per dimension
➔ Topical = Terrier IRS [Ounis et al, 2005]
➔ Spatial = map segmentation into tiles
➔ Temporal = timeline segmentation into tiles
CombMNZ
➔ Retrieval
➔ Result document list for each index
➔ Results combination with CombMNZ [Fox & Shaw, 1993; Lee, 1997]
23
24. 4. Experiments – Case Study with PIV GIRS
CombMNZ Principle [Fox & Shaw, 1993; Lee 1997]
24
25. 4. Experiments – Case Study with PIV GIRS
CombMNZ Principle [Fox & Shaw, 1993; Lee 1997]
25
26. 4. Experiments – Case Study with PIV GIRS
CombMNZ Principle [Fox & Shaw, 1993; Lee 1997]
26
27. 4. Experiments – Case Study with PIV GIRS
Case Study: MIDR_2010 collection
➔ Building Qrels: 12 volunteers (thanks!!!)
31 topics Qrels
5645 Relevance
documents judgments
= {0;1;2;3;4}
paragraphs
Map for
tracking
spatial
information
27
28. 4. Experiments – Hypothesis Validated
Analysis of Collected Data
➔ IRS Evaluation
trec_eval
➔ ResultsList × Qrels NDCG
➔ Results: geographic IRS most effective
Hypothesis
28
29. 4. Experiments – Hypothesis Validated
Analysis of Collected Data
➔ Results: geographic IRS most effective
29
30. Outline
1. Motivation Topical IR → Geographic IR
Hypothesis: GIRS > IRS
2. Context IRS evaluation
Issue Current evaluation frameworks
= partial
3. Contribution GIRS evaluation framework
4. Experiments Case study with PIV GIRS
Hypothesis validated
5. Conclusion and Future Works
30
31. Evaluation framework for Geographic IR Systems
Conclusions and Future Works (1/2)
➔ Evaluation Framework for Geographic IR Systems
➔ Reusable
➔ Generalizable for more dimensions: confidence,
freshness, ... [Costa Pereira et al., 2009]
➔ Not gradual relevance per dimension
➔ Case Study with PIV System
➔ Creation of a specific test collection (≥ 25 topics)
➔ French test collection
➔ Limited collection (number of documents)
31
32. Evaluation Framework for Geographic IR Systems
Conclusions and Future Works (2/2)
➔ Hypothesis Validated
➔ The 3 dimensions improve IR (+66.5%)
➔ Future Works
➔ More precise analysis: by query
➔ Quantify PIV improvements: various indexes combinations
➔ Organize a GIRS evaluation campaign: anyone interested?
32