SlideShare uma empresa Scribd logo
1 de 18
The 2012 Social Event Detection Dataset
Symeon Papadopoulos1, Emmanouil Schinas1, Vasileios Mezaris1,
Raphaël Troncy2, Yiannis Kompatsiaris1

1
  CERTH-ITI, Thessaloniki, Greece
2
  EURECOM, Sophia Antipolis, France


Oslo, 28 Feb - 1 Mar 2013
SED2012 Overview
• Large collection (>160K) of CC-licensed Flickr
  photos and some of their metadata
• Event annotations for 149 target events (of
  specific categories and locations of interest)

• Primary use: Social event detection
  – Used in the context of MediaEval 2012 (SED task)
• Secondary uses: image geotagging,
  distractors in CBIR, city summarization
                                      2
Dataset Overview
Flickr photo collection
• 167,332 photos
• 4,422 unique contributors
• Creative Commons licenses

Event Annotations
• Challenge 1: Technical events in Germany
• Challenge 2: Soccer events in Hamburg and Madrid
• Challenge 3: Indignados movement events in Madrid

                                      3
Data Collection Process
• Flickr API: http://www.flickr.com/services/api/
• Used method flickr.photo.search with five
  geographical centres:
   Barcelona, Cologne, Hamburg, Hannover, Madrid
• Time period: Jan 2009 – Dec 2011
• All photos CC licensed
• 403 photos from the
       EventMedia collection
      R. Troncy, B. Malocha, and A. Fialho. Linking Events with Media. In 6th Intern.
      Conference on Semantic Systems (I-SEMANTICS), Graz, Austria, 2010

                                                                    4
Photo Distribution
Place distribution



Yearly distribution



Language distribution



                        5
Dataset Collection Motivation
Selection of five cities (three German, two Spanish):
• Include large number of non-English text metadata (cf.
   language distribution table)
• Ensure existence of numerous events for the target types
• Include distractor images:
   – Challenge 2: Cologne, Hannover distractor for Hamburg, Barcelona
     distractor for Madrid
   – Challenge 3: Barcelona distractor for Madrid
Selection of only geotagged photos:
• Ease of annotation
Selection of only CC-licensed photos:
• Reuse of collection for research

                                                      6
Tag Statistics           (1/2)
                           number of users using the tag

51,611 unique tags

prevalence of
location specific tags




event-specific tags


                                            7
Tag Statistics                    (2/2)
                                       barcelona
>20K photos have no tags                    spain
                                                    madrid



                                                             >57% of tags appear
                                                                   once or twice




 83.9% less than or equal to 10 tags      >40K tags appear less than 10 times


                                                         8
User Statistics




                                       60% of users less
                                       than 10 photos




           30 most active users contribute ~30% of dataset
                                            9
Ground Truth Creation
• Manual annotations by use of CrEve
  – web-based annotation
  – two-round annotation by five annotators (three in the
    first, two in the second)
  – interactive annotation (search & annotate)
  – each round terminated as soon as no new event-related
    photos discovered
  – approximate effort: 100 person-hours
   C. Zigkolis, S. Papadopoulos, G. Filippou, Y. Kompatsiaris, A. Vakali. Collaborative Event
   Annotation in Tagged Photo Collections. Multimedia Tools & Applications, 2012


• Annotations for Challenge 1 enriched by EventMedia
  (403 photos featuring technical events in Germany)
                                                                        10
Ground Truth Statistics (1/3)




           10 events related
           with >100 photos

                               ~27% of events associated
                                      with 1 or 2 photos


                                     11
Ground Truth Statistics (2/3)
106 events are captured by
single users
                                 erroneous timestamps in photos




     9 events captured by more   The majority of events last for less
     than 10 people              than a day (typical for soccer)
                                               12
Ground Truth Statistics (3/3)
 Madrid events

                      Santiago Bernabeu
                      stadium              Puerta del Sol




Stadium of Butarque



                      Vicente Calderon stadium
                                                 13
Technical Event Examples
PHP Unconf. 2010           Gamescom 2009




              CeBIT 2010                   Convention Camp 2011




                                                      14
Soccer Event Examples
Real Madrid – Milan (2010)          World Cup 2010




                    St. Pauli – HSV (2010)           Spain – Colombia (2011)




                                                              15
Indignados Event Examples
Inaugural march, 15 May         Large gathering, 20 May




            Gathering, 15 Oct               Demonstration, 17 Nov




                                                          16
Evaluation
• F-measure (macro), Precision, Recall
  – goodness of retrieved photos, but not how well
    they were clustered into events
• Normalized Mutual Information (NMI)
  – compares automatically extracted clustering of
    photos into events with the ground truth
• Evaluation script is made available together
  with the dataset.
• Implementation of event detection available:
          http://mklab.iti.gr/project/sed2012_certh
                                       17
Questions
 @sympapadopoulos
 www.slideshare.net/sympapadopoulos

Mais conteúdo relacionado

Semelhante a SED2012 Dataset

(Linked Data Development and Exploitation track) "Generating the Semantic Sna...
(Linked Data Development and Exploitation track) "Generating the Semantic Sna...(Linked Data Development and Exploitation track) "Generating the Semantic Sna...
(Linked Data Development and Exploitation track) "Generating the Semantic Sna...
icwe2015
 
3D Printing: GIS Day 2013 Work in Progress Report
3D Printing: GIS Day 2013 Work in Progress Report3D Printing: GIS Day 2013 Work in Progress Report
3D Printing: GIS Day 2013 Work in Progress Report
Peter Löwe
 

Semelhante a SED2012 Dataset (20)

VRCAI 2011 Billinghurst Keynote
VRCAI 2011 Billinghurst KeynoteVRCAI 2011 Billinghurst Keynote
VRCAI 2011 Billinghurst Keynote
 
(Linked Data Development and Exploitation track) "Generating the Semantic Sna...
(Linked Data Development and Exploitation track) "Generating the Semantic Sna...(Linked Data Development and Exploitation track) "Generating the Semantic Sna...
(Linked Data Development and Exploitation track) "Generating the Semantic Sna...
 
Computer Vision++: Where Do We Go from Here?
Computer Vision++: Where Do We Go from Here?Computer Vision++: Where Do We Go from Here?
Computer Vision++: Where Do We Go from Here?
 
3D Printing: GIS Day 2013 Work in Progress Report
3D Printing: GIS Day 2013 Work in Progress Report3D Printing: GIS Day 2013 Work in Progress Report
3D Printing: GIS Day 2013 Work in Progress Report
 
Multimedia rescue 161018
Multimedia rescue 161018Multimedia rescue 161018
Multimedia rescue 161018
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
 
Jan Hendrik Hammer, Fraunhofer, KIT, Eyetracking and Gaze Analysis
Jan Hendrik Hammer, Fraunhofer, KIT, Eyetracking and Gaze AnalysisJan Hendrik Hammer, Fraunhofer, KIT, Eyetracking and Gaze Analysis
Jan Hendrik Hammer, Fraunhofer, KIT, Eyetracking and Gaze Analysis
 
News Semantic Snapshot
News Semantic SnapshotNews Semantic Snapshot
News Semantic Snapshot
 
Klipfolio - Your Swiss Knife on data
Klipfolio - Your Swiss Knife on dataKlipfolio - Your Swiss Knife on data
Klipfolio - Your Swiss Knife on data
 
Smart Data fo the Smart Cities and Smart Factories in the future
Smart Data fo the Smart Cities and Smart Factories in the futureSmart Data fo the Smart Cities and Smart Factories in the future
Smart Data fo the Smart Cities and Smart Factories in the future
 
Analyzing large multimedia collections in an urban context - Prof. Marcel Wor...
Analyzing large multimedia collections in an urban context - Prof. Marcel Wor...Analyzing large multimedia collections in an urban context - Prof. Marcel Wor...
Analyzing large multimedia collections in an urban context - Prof. Marcel Wor...
 
Using synthetic data for computer vision model training
Using synthetic data for computer vision model trainingUsing synthetic data for computer vision model training
Using synthetic data for computer vision model training
 
Information Fusion Methods for Location Data Analysis
Information Fusion Methods for Location Data AnalysisInformation Fusion Methods for Location Data Analysis
Information Fusion Methods for Location Data Analysis
 
Web and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News ProfessionalsWeb and Social Media Image Forensics for News Professionals
Web and Social Media Image Forensics for News Professionals
 
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
 
Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...
 
A Large-Scale Analysis of YouTube Videos Depicting Everyday Thermal Camera Use
A Large-Scale Analysis of YouTube Videos Depicting Everyday Thermal Camera UseA Large-Scale Analysis of YouTube Videos Depicting Everyday Thermal Camera Use
A Large-Scale Analysis of YouTube Videos Depicting Everyday Thermal Camera Use
 
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
 
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
 
COSC 426 Lecture 1: Introduction to Augmented Reality
COSC 426 Lecture 1: Introduction to Augmented RealityCOSC 426 Lecture 1: Introduction to Augmented Reality
COSC 426 Lecture 1: Introduction to Augmented Reality
 

Mais de Symeon Papadopoulos

Mais de Symeon Papadopoulos (20)

DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
DeepFake Detection: Challenges, Progress and Hands-on Demonstration of Techno...
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
 
Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
 
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
Deepfake Detection: The Importance of Training Data Preprocessing and Practic...
 
COVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact TracingCOVID-19 Infodemic vs Contact Tracing
COVID-19 Infodemic vs Contact Tracing
 
Similarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia contentSimilarity-based retrieval of multimedia content
Similarity-based retrieval of multimedia content
 
Twitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air QualityTwitter-based Sensing of City-level Air Quality
Twitter-based Sensing of City-level Air Quality
 
Aggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media ContentAggregating and Analyzing the Context of Social Media Content
Aggregating and Analyzing the Context of Social Media Content
 
Verifying Multimedia Content on the Internet
Verifying Multimedia Content on the InternetVerifying Multimedia Content on the Internet
Verifying Multimedia Content on the Internet
 
A Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering DetectionA Web-based Service for Image Tampering Detection
A Web-based Service for Image Tampering Detection
 
Learning to detect Misleading Content on Twitter
Learning to detect Misleading Content on TwitterLearning to detect Misleading Content on Twitter
Learning to detect Misleading Content on Twitter
 
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN LayersNear-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
Near-Duplicate Video Retrieval by Aggregating Intermediate CNN Layers
 
Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016Verifying Multimedia Use at MediaEval 2016
Verifying Multimedia Use at MediaEval 2016
 
Multimedia Privacy
Multimedia PrivacyMultimedia Privacy
Multimedia Privacy
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
In-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging PerformanceIn-depth Exploration of Geotagging Performance
In-depth Exploration of Geotagging Performance
 
Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...Perceived versus Actual Predictability of Personal Information in Social Netw...
Perceived versus Actual Predictability of Personal Information in Social Netw...
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
 
Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015Finding Diverse Social Images at MediaEval 2015
Finding Diverse Social Images at MediaEval 2015
 
CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015CERTH/CEA LIST at MediaEval Placing Task 2015
CERTH/CEA LIST at MediaEval Placing Task 2015
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

SED2012 Dataset

  • 1. The 2012 Social Event Detection Dataset Symeon Papadopoulos1, Emmanouil Schinas1, Vasileios Mezaris1, Raphaël Troncy2, Yiannis Kompatsiaris1 1 CERTH-ITI, Thessaloniki, Greece 2 EURECOM, Sophia Antipolis, France Oslo, 28 Feb - 1 Mar 2013
  • 2. SED2012 Overview • Large collection (>160K) of CC-licensed Flickr photos and some of their metadata • Event annotations for 149 target events (of specific categories and locations of interest) • Primary use: Social event detection – Used in the context of MediaEval 2012 (SED task) • Secondary uses: image geotagging, distractors in CBIR, city summarization 2
  • 3. Dataset Overview Flickr photo collection • 167,332 photos • 4,422 unique contributors • Creative Commons licenses Event Annotations • Challenge 1: Technical events in Germany • Challenge 2: Soccer events in Hamburg and Madrid • Challenge 3: Indignados movement events in Madrid 3
  • 4. Data Collection Process • Flickr API: http://www.flickr.com/services/api/ • Used method flickr.photo.search with five geographical centres: Barcelona, Cologne, Hamburg, Hannover, Madrid • Time period: Jan 2009 – Dec 2011 • All photos CC licensed • 403 photos from the EventMedia collection R. Troncy, B. Malocha, and A. Fialho. Linking Events with Media. In 6th Intern. Conference on Semantic Systems (I-SEMANTICS), Graz, Austria, 2010 4
  • 5. Photo Distribution Place distribution Yearly distribution Language distribution 5
  • 6. Dataset Collection Motivation Selection of five cities (three German, two Spanish): • Include large number of non-English text metadata (cf. language distribution table) • Ensure existence of numerous events for the target types • Include distractor images: – Challenge 2: Cologne, Hannover distractor for Hamburg, Barcelona distractor for Madrid – Challenge 3: Barcelona distractor for Madrid Selection of only geotagged photos: • Ease of annotation Selection of only CC-licensed photos: • Reuse of collection for research 6
  • 7. Tag Statistics (1/2) number of users using the tag 51,611 unique tags prevalence of location specific tags event-specific tags 7
  • 8. Tag Statistics (2/2) barcelona >20K photos have no tags spain madrid >57% of tags appear once or twice 83.9% less than or equal to 10 tags >40K tags appear less than 10 times 8
  • 9. User Statistics 60% of users less than 10 photos 30 most active users contribute ~30% of dataset 9
  • 10. Ground Truth Creation • Manual annotations by use of CrEve – web-based annotation – two-round annotation by five annotators (three in the first, two in the second) – interactive annotation (search & annotate) – each round terminated as soon as no new event-related photos discovered – approximate effort: 100 person-hours C. Zigkolis, S. Papadopoulos, G. Filippou, Y. Kompatsiaris, A. Vakali. Collaborative Event Annotation in Tagged Photo Collections. Multimedia Tools & Applications, 2012 • Annotations for Challenge 1 enriched by EventMedia (403 photos featuring technical events in Germany) 10
  • 11. Ground Truth Statistics (1/3) 10 events related with >100 photos ~27% of events associated with 1 or 2 photos 11
  • 12. Ground Truth Statistics (2/3) 106 events are captured by single users erroneous timestamps in photos 9 events captured by more The majority of events last for less than 10 people than a day (typical for soccer) 12
  • 13. Ground Truth Statistics (3/3) Madrid events Santiago Bernabeu stadium Puerta del Sol Stadium of Butarque Vicente Calderon stadium 13
  • 14. Technical Event Examples PHP Unconf. 2010 Gamescom 2009 CeBIT 2010 Convention Camp 2011 14
  • 15. Soccer Event Examples Real Madrid – Milan (2010) World Cup 2010 St. Pauli – HSV (2010) Spain – Colombia (2011) 15
  • 16. Indignados Event Examples Inaugural march, 15 May Large gathering, 20 May Gathering, 15 Oct Demonstration, 17 Nov 16
  • 17. Evaluation • F-measure (macro), Precision, Recall – goodness of retrieved photos, but not how well they were clustered into events • Normalized Mutual Information (NMI) – compares automatically extracted clustering of photos into events with the ground truth • Evaluation script is made available together with the dataset. • Implementation of event detection available: http://mklab.iti.gr/project/sed2012_certh 17

Notas do Editor

  1. Events with 1 or 2 photos are much harder to detect, e.g. by methods based on clustering.