SlideShare uma empresa Scribd logo
1 de 19
WWW 2014
Seoul, April 8th
SNOW 2014 Data Challenge
Symeon Papadopoulos (CERTH)
David Corney (RGU)
Luca Aiello (Yahoo! Labs)
Overview of Challenge
• Goal: Detection of newsworthy topics in a large and
noisy set of tweets
• Topic: a news story represented by a headline + tags
+ representative tweets + representative images
(optional)
• Newsworthy: A topic that ends up being covered by
at least some major online news sources
• Topics are detected per timeslot (small equally-sized
time intervals)
• We want a maximum number of topics per timeslot
#2
Challenge Activity Log
• Challenge definition (Dec 2013)
• Challenge toolkit and registration (Jan 20, 2014)
• Development dataset collection (Feb 3, 2014)
• Rehearsal dataset collection (Feb 17, 2014)
• Test dataset collection (Feb 25, 2014)
• Results submission (Mar 4, 2014)
• Paper submission (Mar 9, 2014)
• Results evaluation (Mar 5-18, 2014)
• Workshop (Apr 7, 2014)
#3
Some statistics
• Registered participants: 25
– India: 4, Belgium: 3, Germany: 3, UK: 3, Greece: 3,
Ireland: 2, USA: 2, France: 2, Italy: 1, Spain: 1, Russia: 1
• Participants that signed the Challenge agreement: 19
• Participants that submitted results: 11
• Participants that submitted papers: 9
#4
Evaluation Protocol
• Defined several evaluation criteria:
– Newsworthiness  Precision/Recall, F-score
– Readability  scale [1-5]
– Coherence  scale [1-5]
– Diversity  scale [1-5]
• List of reference topics
• Set up precise evaluation guidelines
• Blind evaluation (i.e. evaluator not aware of which
method a topic comes from) based on Web UI
• Participants submitted topics for 96 timeslots, but
manual evaluation happened for 5 sample timeslots.
• Result validation and analysis
#5
Teams key
#6
Key Team
A UKON
B IBCN
C ITI
D math-dyn
E Insight
F FUB-TORV
G PILOTS
H RGU
I UoGMIR
J EURECOM
K SNOWBITS
References to the submitted papers will be
included in the overview paper in the
workshop proceedings.
Results – Reference topic recall
#7
Team Recall (%) Rank
A 0.44 5
B 0.58 4
C 0.32 7
D 0.63 2
E 0.66 1
F 0.39 6
G 0.24 8
H 0.6 3
I 0.17 10
J 0.24 8
K 0.14 11
Recall computed with respect
to 59 reference topics.
Those were partitioned in
three groups (20, 20, 19) and
each of the three evaluators
manually matched the topics
of participants to the topics
assigned to him.
Eval. Pair Correlation
Eval. 1 – Eval. 2 0.894913
Eval. 1 – Eval. 3 0.930247
Eval. 2 – Eval. 3 0.811976
Results – Pooled topic recall (1/2)
• Each evaluator independently evaluated the topics
of each participant as newsworthy or not
• Selected all topics that were marked as newsworthy
by at least two evaluators
• Manually extracted the unique topics (70 in total,
partially overlapping with reference topic list)
• Manually matched correct topics of each participants
to the list of newsworthy topics
• Computed precision, recall and F-score
#8
Results – Pooled topic recall (2/2)
#9
Team Matched Unique Total Prec Rec F-score Rank
A 13 13 27 0.481 0.186 0.268 6
B 12 12 23 0.522 0.171 0.258 7
C 22 15 50 0.44 0.214 0.288 4
D 18 14 39 0.462 0.2 0.279 5
E 28 25 50 0.56 0.357 0.436 1
F 4 2 15 0.267 0.029 0.052 10
G 4 4 10 0.4 0.057 0.099 9
H 19 17 49 0.388 0.243 0.299 3
I 36 15 45 0.8 0.214 0.338 2
J 1 1 8 0.125 0.014 0.027 11
K 8 7 10 0.8 0.1 0.178 8
Results - Readability
#10
Team Readability Rank
A 4.29 9
B 4.92 2
C 4.49 7
D 4.59 6
E 4.74 4
F 4.18 10
G 4.93 1
H 4.71 5
I 4.8 3
J 3.38 11
K 4.32 8
Eval. Pair Correlation
Eval. 1 – Eval. 2 0.902124
Eval. 1 – Eval. 3 0.357733
Eval. 2 – Eval. 3 0.278632
Results - Coherence
#11
Team Coherence Rank
A 4.4 6
B 4.08 9
C 4.68 5
D 4.91 2
E 4.97 1
F 4.78 4
G 4.83 3
H 4.22 8
I 3.95 10
J 3.75 11
K 4.36 7
Eval. Pair Correlation
Eval. 1 – Eval. 2 0.549512
Eval. 1 – Eval. 3 0.730684
Eval. 2 – Eval. 3 0.684426
Results - Diversity
#12
Team Diversity Rank
A 2.12 7
B 2.36 4
C 2.31 6
D 2.11 8
E 2.11 8
F 2 10
G 1.92 11
H 3.27 2
I 2.36 4
J 2.5 3
K 3.47 1
Eval. Pair Correlation
Eval. 1 – Eval. 2 0.873365
Eval. 1 – Eval. 3 0.890415
Eval. 2 – Eval. 3 0.905915
Results – Image Relevance
#13
Team Precision (%) Rank
A 54.19 3
B 31.75 5
C 58.09 2
D 52.04 4
E 27.39 6
F 0 8
G 0 8
H 58.82 1
I 0 8
J 0 8
K 18.45 7
Eval. Pair Correlation
Eval. 1 – Eval. 2 0.944946
Eval. 1 – Eval. 3 0.919469
Eval. 2 – Eval. 3 0.79596
Results – Aggregate (1/2)
• For each criterion Ci, we computed the score of each
team relative to the best team for this criterion:
Ci
* (team) = Ci (team) / max(Ci (teamj))
• We then aggregated over the different norm. scores:
Ctot = 0.25*Cref*Cpool + 0.25*Cread + 0.25*Ccoh + 0.25*Cdiv
where Cref is computed from the recall of reference
topics, Cpool from the F-score of the pooled topics,
and Cread, Ccoh and Cdiv from readability, coherence
and diversity respectively.
#14
Results – Aggregate (2/2)
#15
Team Precision (%) Rank
A 0.694 7
B 0.755 4
C 0.710 5
D 0.785 3
E 0.892 1
F 0.614 10
G 0.652 9
H 0.842 2
I 0.662 8
J 0.546 11
K 0.70987 6
We tried several other
alternative aggregation
scores. The top three teams
were the same!
Program
15:20-15:30: Carlos Martin-Dancausa and Ayse Goker: Real-time topic detection with
bursty n-grams.
16:00-16:20: Gopi Chand Nuttaki, Olfa Nasraoui, Behnoush Abdollahi, Mahsa Badami,
Wenlong Sun: Distributed LDA based topic modelling and topic agglomeration in a
latent space.
16:20-16:40: Steven van Canneyt, Matthias Feys, Steven Schockaert, Thomas
Demeester, Chris Develder, Bart Dhoedt: Detecting newsworthy topics in Twitter.
16:40-17:00: Georgiana Ifrim, Bichen Shi, Igor Brigadir: Event detection in Twitter
using aggressive filtering and hierarchical tweet clustering.
17:00-17:20: Gerard Burnside, Dimitrios Milioris, Philippe Jacquet: One day in Twitter:
Topic detection via joint complexity.
17:20-17:30: Georgios Petkos, Symeon Papadopoulos, Yiannis Kompatsiaris: Two-level
message clustering for topic detection in Twitter.
17:30-17:40: Winners’ announcement!
#16
Limitations – Lessons Learned
• Did not take into account time
– However, methods that produce a newsworthy topic earlier
should be rewarded
• Did not take into account image relevance
– since we considered it an optional field
• Coherence and diversity had extreme values in
numerous cases
– e.g. when a single relevant tweet was provided as
representative
• Evaluation turned out to be a very complex task!
• Assessing only five slots (out of the 96) is definitely a
compromise: (a) consider use of more evaluators/AMT,
(b) consider simpler evaluation tasks
#17
Plan
• Release evaluation resources
– list of reference topics
– list of pooled newsworthy topics
– evaluation scores
• Papers
– SNOW Data Challenge paper
– Resubmission of participants’ papers with CEUR style
– Submission to CEUR-ws.org
• Open-source implementations?
• Further plans?
#18
Thank you!
#19

Mais conteúdo relacionado

Semelhante a SNOW 2014 Data Challenge

HTM 602 Cluster Analysis (Sept. 10, Suh-hee Choi)
HTM 602 Cluster Analysis (Sept. 10, Suh-hee Choi)HTM 602 Cluster Analysis (Sept. 10, Suh-hee Choi)
HTM 602 Cluster Analysis (Sept. 10, Suh-hee Choi)Suh-hee Choi
 
20200416_215843.jpg20200416_215856.jpg20200416_215918.jp.docx
20200416_215843.jpg20200416_215856.jpg20200416_215918.jp.docx20200416_215843.jpg20200416_215856.jpg20200416_215918.jp.docx
20200416_215843.jpg20200416_215856.jpg20200416_215918.jp.docxlorainedeserre
 
Quiz review and start of lesson 6
Quiz review and start of lesson 6Quiz review and start of lesson 6
Quiz review and start of lesson 6Erik Tjersland
 
REG IPF / ILD Working Group Meeting
REG IPF / ILD Working Group MeetingREG IPF / ILD Working Group Meeting
REG IPF / ILD Working Group MeetingZoe Mitchell
 
research in Animal sciences.doc
research in Animal sciences.docresearch in Animal sciences.doc
research in Animal sciences.docNigussuFekade
 
Grad Student Needs Assessment
Grad Student Needs AssessmentGrad Student Needs Assessment
Grad Student Needs Assessmentaldenlibrary
 
Paper Generation Process of PEC.pptx
Paper Generation Process of PEC.pptxPaper Generation Process of PEC.pptx
Paper Generation Process of PEC.pptxNasirMahmood976516
 
Paper Generation Process of PEC.pptx
Paper Generation Process of PEC.pptxPaper Generation Process of PEC.pptx
Paper Generation Process of PEC.pptxNasirMahmood976516
 
Tsovaltzi etal ectel2010
Tsovaltzi etal ectel2010Tsovaltzi etal ectel2010
Tsovaltzi etal ectel2010dtsovaltzi
 
Tsovaltzi etal ectel2010
Tsovaltzi etal ectel2010Tsovaltzi etal ectel2010
Tsovaltzi etal ectel2010dtsovaltzi
 
Tsovaltzi etal ectel2010
Tsovaltzi etal ectel2010Tsovaltzi etal ectel2010
Tsovaltzi etal ectel2010dtsovaltzi
 
Tsovaltzi etal ectel2010
Tsovaltzi etal ectel2010Tsovaltzi etal ectel2010
Tsovaltzi etal ectel2010dtsovaltzi
 
Keynote EARLI SIG17 The power of learning analytics: a need to move towards n...
Keynote EARLI SIG17 The power of learning analytics: a need to move towards n...Keynote EARLI SIG17 The power of learning analytics: a need to move towards n...
Keynote EARLI SIG17 The power of learning analytics: a need to move towards n...Bart Rienties
 
20200416_215843.jpg20200416_215856.jpg20200416_215918.jp
20200416_215843.jpg20200416_215856.jpg20200416_215918.jp20200416_215843.jpg20200416_215856.jpg20200416_215918.jp
20200416_215843.jpg20200416_215856.jpg20200416_215918.jpcargillfilberto
 
LearningQ: A Large-scale Dataset for Educational Question Generation (ICWSM 2...
LearningQ: A Large-scale Dataset for Educational Question Generation (ICWSM 2...LearningQ: A Large-scale Dataset for Educational Question Generation (ICWSM 2...
LearningQ: A Large-scale Dataset for Educational Question Generation (ICWSM 2...Guanliang Chen
 
Multilevel analysis of collaborative activities based on a Mobile Learning Sc...
Multilevel analysis of collaborative activities based on a Mobile Learning Sc...Multilevel analysis of collaborative activities based on a Mobile Learning Sc...
Multilevel analysis of collaborative activities based on a Mobile Learning Sc...Irene-Angelica Chounta
 
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on TwitterAn Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on TwitterSymeon Papadopoulos
 
ntcir14centre-overview
ntcir14centre-overviewntcir14centre-overview
ntcir14centre-overviewTetsuya Sakai
 

Semelhante a SNOW 2014 Data Challenge (20)

01 academic report writing iec 2011
01 academic report writing iec 201101 academic report writing iec 2011
01 academic report writing iec 2011
 
HTM 602 Cluster Analysis (Sept. 10, Suh-hee Choi)
HTM 602 Cluster Analysis (Sept. 10, Suh-hee Choi)HTM 602 Cluster Analysis (Sept. 10, Suh-hee Choi)
HTM 602 Cluster Analysis (Sept. 10, Suh-hee Choi)
 
20200416_215843.jpg20200416_215856.jpg20200416_215918.jp.docx
20200416_215843.jpg20200416_215856.jpg20200416_215918.jp.docx20200416_215843.jpg20200416_215856.jpg20200416_215918.jp.docx
20200416_215843.jpg20200416_215856.jpg20200416_215918.jp.docx
 
Quiz review and start of lesson 6
Quiz review and start of lesson 6Quiz review and start of lesson 6
Quiz review and start of lesson 6
 
REG IPF / ILD Working Group Meeting
REG IPF / ILD Working Group MeetingREG IPF / ILD Working Group Meeting
REG IPF / ILD Working Group Meeting
 
research in Animal sciences.doc
research in Animal sciences.docresearch in Animal sciences.doc
research in Animal sciences.doc
 
Analyzing navigation logs in MOOC: the Coursera case
Analyzing navigation logs in MOOC: the Coursera caseAnalyzing navigation logs in MOOC: the Coursera case
Analyzing navigation logs in MOOC: the Coursera case
 
Grad Student Needs Assessment
Grad Student Needs AssessmentGrad Student Needs Assessment
Grad Student Needs Assessment
 
Paper Generation Process of PEC.pptx
Paper Generation Process of PEC.pptxPaper Generation Process of PEC.pptx
Paper Generation Process of PEC.pptx
 
Paper Generation Process of PEC.pptx
Paper Generation Process of PEC.pptxPaper Generation Process of PEC.pptx
Paper Generation Process of PEC.pptx
 
Tsovaltzi etal ectel2010
Tsovaltzi etal ectel2010Tsovaltzi etal ectel2010
Tsovaltzi etal ectel2010
 
Tsovaltzi etal ectel2010
Tsovaltzi etal ectel2010Tsovaltzi etal ectel2010
Tsovaltzi etal ectel2010
 
Tsovaltzi etal ectel2010
Tsovaltzi etal ectel2010Tsovaltzi etal ectel2010
Tsovaltzi etal ectel2010
 
Tsovaltzi etal ectel2010
Tsovaltzi etal ectel2010Tsovaltzi etal ectel2010
Tsovaltzi etal ectel2010
 
Keynote EARLI SIG17 The power of learning analytics: a need to move towards n...
Keynote EARLI SIG17 The power of learning analytics: a need to move towards n...Keynote EARLI SIG17 The power of learning analytics: a need to move towards n...
Keynote EARLI SIG17 The power of learning analytics: a need to move towards n...
 
20200416_215843.jpg20200416_215856.jpg20200416_215918.jp
20200416_215843.jpg20200416_215856.jpg20200416_215918.jp20200416_215843.jpg20200416_215856.jpg20200416_215918.jp
20200416_215843.jpg20200416_215856.jpg20200416_215918.jp
 
LearningQ: A Large-scale Dataset for Educational Question Generation (ICWSM 2...
LearningQ: A Large-scale Dataset for Educational Question Generation (ICWSM 2...LearningQ: A Large-scale Dataset for Educational Question Generation (ICWSM 2...
LearningQ: A Large-scale Dataset for Educational Question Generation (ICWSM 2...
 
Multilevel analysis of collaborative activities based on a Mobile Learning Sc...
Multilevel analysis of collaborative activities based on a Mobile Learning Sc...Multilevel analysis of collaborative activities based on a Mobile Learning Sc...
Multilevel analysis of collaborative activities based on a Mobile Learning Sc...
 
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on TwitterAn Ensemble Model for Cross-Domain Polarity Classification on Twitter
An Ensemble Model for Cross-Domain Polarity Classification on Twitter
 
ntcir14centre-overview
ntcir14centre-overviewntcir14centre-overview
ntcir14centre-overview
 

Mais de Symeon Papadopoulos

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...Symeon Papadopoulos
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionSymeon Papadopoulos
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationSymeon Papadopoulos
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Symeon Papadopoulos
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingSymeon Papadopoulos
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSymeon Papadopoulos
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualitySymeon Papadopoulos
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentSymeon Papadopoulos
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetSymeon Papadopoulos
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionSymeon Papadopoulos
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterSymeon Papadopoulos
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersSymeon Papadopoulos
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Symeon Papadopoulos
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Symeon Papadopoulos
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceSymeon Papadopoulos
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Symeon Papadopoulos
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsSymeon Papadopoulos
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsSymeon Papadopoulos
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Symeon Papadopoulos
 

Mais de Symeon Papadopoulos (20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact Tracing
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia content
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air Quality
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the Internet
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering Detection
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on Twitter
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016
 
Multimedia Privacy
Multimedia PrivacyMultimedia Privacy
Multimedia Privacy
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging Performance
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Último (20)

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

SNOW 2014 Data Challenge

  • 1. WWW 2014 Seoul, April 8th SNOW 2014 Data Challenge Symeon Papadopoulos (CERTH) David Corney (RGU) Luca Aiello (Yahoo! Labs)
  • 2. Overview of Challenge • Goal: Detection of newsworthy topics in a large and noisy set of tweets • Topic: a news story represented by a headline + tags + representative tweets + representative images (optional) • Newsworthy: A topic that ends up being covered by at least some major online news sources • Topics are detected per timeslot (small equally-sized time intervals) • We want a maximum number of topics per timeslot #2
  • 3. Challenge Activity Log • Challenge definition (Dec 2013) • Challenge toolkit and registration (Jan 20, 2014) • Development dataset collection (Feb 3, 2014) • Rehearsal dataset collection (Feb 17, 2014) • Test dataset collection (Feb 25, 2014) • Results submission (Mar 4, 2014) • Paper submission (Mar 9, 2014) • Results evaluation (Mar 5-18, 2014) • Workshop (Apr 7, 2014) #3
  • 4. Some statistics • Registered participants: 25 – India: 4, Belgium: 3, Germany: 3, UK: 3, Greece: 3, Ireland: 2, USA: 2, France: 2, Italy: 1, Spain: 1, Russia: 1 • Participants that signed the Challenge agreement: 19 • Participants that submitted results: 11 • Participants that submitted papers: 9 #4
  • 5. Evaluation Protocol • Defined several evaluation criteria: – Newsworthiness  Precision/Recall, F-score – Readability  scale [1-5] – Coherence  scale [1-5] – Diversity  scale [1-5] • List of reference topics • Set up precise evaluation guidelines • Blind evaluation (i.e. evaluator not aware of which method a topic comes from) based on Web UI • Participants submitted topics for 96 timeslots, but manual evaluation happened for 5 sample timeslots. • Result validation and analysis #5
  • 6. Teams key #6 Key Team A UKON B IBCN C ITI D math-dyn E Insight F FUB-TORV G PILOTS H RGU I UoGMIR J EURECOM K SNOWBITS References to the submitted papers will be included in the overview paper in the workshop proceedings.
  • 7. Results – Reference topic recall #7 Team Recall (%) Rank A 0.44 5 B 0.58 4 C 0.32 7 D 0.63 2 E 0.66 1 F 0.39 6 G 0.24 8 H 0.6 3 I 0.17 10 J 0.24 8 K 0.14 11 Recall computed with respect to 59 reference topics. Those were partitioned in three groups (20, 20, 19) and each of the three evaluators manually matched the topics of participants to the topics assigned to him. Eval. Pair Correlation Eval. 1 – Eval. 2 0.894913 Eval. 1 – Eval. 3 0.930247 Eval. 2 – Eval. 3 0.811976
  • 8. Results – Pooled topic recall (1/2) • Each evaluator independently evaluated the topics of each participant as newsworthy or not • Selected all topics that were marked as newsworthy by at least two evaluators • Manually extracted the unique topics (70 in total, partially overlapping with reference topic list) • Manually matched correct topics of each participants to the list of newsworthy topics • Computed precision, recall and F-score #8
  • 9. Results – Pooled topic recall (2/2) #9 Team Matched Unique Total Prec Rec F-score Rank A 13 13 27 0.481 0.186 0.268 6 B 12 12 23 0.522 0.171 0.258 7 C 22 15 50 0.44 0.214 0.288 4 D 18 14 39 0.462 0.2 0.279 5 E 28 25 50 0.56 0.357 0.436 1 F 4 2 15 0.267 0.029 0.052 10 G 4 4 10 0.4 0.057 0.099 9 H 19 17 49 0.388 0.243 0.299 3 I 36 15 45 0.8 0.214 0.338 2 J 1 1 8 0.125 0.014 0.027 11 K 8 7 10 0.8 0.1 0.178 8
  • 10. Results - Readability #10 Team Readability Rank A 4.29 9 B 4.92 2 C 4.49 7 D 4.59 6 E 4.74 4 F 4.18 10 G 4.93 1 H 4.71 5 I 4.8 3 J 3.38 11 K 4.32 8 Eval. Pair Correlation Eval. 1 – Eval. 2 0.902124 Eval. 1 – Eval. 3 0.357733 Eval. 2 – Eval. 3 0.278632
  • 11. Results - Coherence #11 Team Coherence Rank A 4.4 6 B 4.08 9 C 4.68 5 D 4.91 2 E 4.97 1 F 4.78 4 G 4.83 3 H 4.22 8 I 3.95 10 J 3.75 11 K 4.36 7 Eval. Pair Correlation Eval. 1 – Eval. 2 0.549512 Eval. 1 – Eval. 3 0.730684 Eval. 2 – Eval. 3 0.684426
  • 12. Results - Diversity #12 Team Diversity Rank A 2.12 7 B 2.36 4 C 2.31 6 D 2.11 8 E 2.11 8 F 2 10 G 1.92 11 H 3.27 2 I 2.36 4 J 2.5 3 K 3.47 1 Eval. Pair Correlation Eval. 1 – Eval. 2 0.873365 Eval. 1 – Eval. 3 0.890415 Eval. 2 – Eval. 3 0.905915
  • 13. Results – Image Relevance #13 Team Precision (%) Rank A 54.19 3 B 31.75 5 C 58.09 2 D 52.04 4 E 27.39 6 F 0 8 G 0 8 H 58.82 1 I 0 8 J 0 8 K 18.45 7 Eval. Pair Correlation Eval. 1 – Eval. 2 0.944946 Eval. 1 – Eval. 3 0.919469 Eval. 2 – Eval. 3 0.79596
  • 14. Results – Aggregate (1/2) • For each criterion Ci, we computed the score of each team relative to the best team for this criterion: Ci * (team) = Ci (team) / max(Ci (teamj)) • We then aggregated over the different norm. scores: Ctot = 0.25*Cref*Cpool + 0.25*Cread + 0.25*Ccoh + 0.25*Cdiv where Cref is computed from the recall of reference topics, Cpool from the F-score of the pooled topics, and Cread, Ccoh and Cdiv from readability, coherence and diversity respectively. #14
  • 15. Results – Aggregate (2/2) #15 Team Precision (%) Rank A 0.694 7 B 0.755 4 C 0.710 5 D 0.785 3 E 0.892 1 F 0.614 10 G 0.652 9 H 0.842 2 I 0.662 8 J 0.546 11 K 0.70987 6 We tried several other alternative aggregation scores. The top three teams were the same!
  • 16. Program 15:20-15:30: Carlos Martin-Dancausa and Ayse Goker: Real-time topic detection with bursty n-grams. 16:00-16:20: Gopi Chand Nuttaki, Olfa Nasraoui, Behnoush Abdollahi, Mahsa Badami, Wenlong Sun: Distributed LDA based topic modelling and topic agglomeration in a latent space. 16:20-16:40: Steven van Canneyt, Matthias Feys, Steven Schockaert, Thomas Demeester, Chris Develder, Bart Dhoedt: Detecting newsworthy topics in Twitter. 16:40-17:00: Georgiana Ifrim, Bichen Shi, Igor Brigadir: Event detection in Twitter using aggressive filtering and hierarchical tweet clustering. 17:00-17:20: Gerard Burnside, Dimitrios Milioris, Philippe Jacquet: One day in Twitter: Topic detection via joint complexity. 17:20-17:30: Georgios Petkos, Symeon Papadopoulos, Yiannis Kompatsiaris: Two-level message clustering for topic detection in Twitter. 17:30-17:40: Winners’ announcement! #16
  • 17. Limitations – Lessons Learned • Did not take into account time – However, methods that produce a newsworthy topic earlier should be rewarded • Did not take into account image relevance – since we considered it an optional field • Coherence and diversity had extreme values in numerous cases – e.g. when a single relevant tweet was provided as representative • Evaluation turned out to be a very complex task! • Assessing only five slots (out of the 96) is definitely a compromise: (a) consider use of more evaluators/AMT, (b) consider simpler evaluation tasks #17
  • 18. Plan • Release evaluation resources – list of reference topics – list of pooled newsworthy topics – evaluation scores • Papers – SNOW Data Challenge paper – Resubmission of participants’ papers with CEUR style – Submission to CEUR-ws.org • Open-source implementations? • Further plans? #18