SlideShare uma empresa Scribd logo
1 de 17
~ Multimodal Video Classification ~

                            ARF (Austria-Romania-France) team


       Bogdan IONESCU*1,3              Ionuț MIRONICĂ1              Klaus SEYERLEHNER2
           bionescu@imag.pub.ro          imironica@imag.pub.ro               music@cp.jku.at

          Peter KNEES2                  Jan SCHLÜTER4                  Markus SCHEDL2
            peter.knees@jku.at            jan.schlueter@ofai.at           markus.schedl@jku.at

           Horia CUCU1                     Andi BUZO1                 Patrick LAMBERT3
            horia.cucu@upb.ro              andi.buzo@upb.ro           patrick.lambert@univ-savoie.fr


    *this work was partially supported under European Structural Funds EXCEL POSDRU/89/1.5/S/62557.
1                       2                             3                        4
         University                                                                         Austrian Research
         POLITEHNICA                                                                        Institute for Artificial
         of Bucharest                                                                       Intelligence
Presentation outline


          • The approach

          • Video content description

          • Experimental results

          • Conclusions and future work




MediaEval - Pisa, Italy, 4-5 October 2012   1/16 2
The approach
  > challenge: find a way to assign (genre) tags to unknown videos;
  > approach: machine learning paradigm;

                                     …
      web       food       autos             label data

                        train


                                            unlabeled data

               classifier                   labeled data



                                                             tagged video database
                                                                 video database
MediaEval - Pisa, Italy, 4-5 October 2012                                            2/163
The approach: classification
  > the entire process relies on the concept of “similarity” computed
  between content annotations (numeric features),

  > this year focus is on:

       objective 1: go multimodal (truly)




                   visual                   audio   text


       objective 2: test a broad range of classifiers and descriptor
       combinations;


MediaEval - Pisa, Italy, 4-5 October 2012                               3/164
Video content description - audio
   block-level audio features                           • Spectral Pattern,
  (capture also local temporal information)              ~ soundtrack’s timbre;
                                                         • delta Spectral Pattern,
    e.g. 50% overlapping
                                                         ~ strength of onsets;
                                                         • variance delta Spectral Pattern,
                                             average     ~ variation of the onset strength;
                                             median      • Logarithmic Fluctuation Pattern,
                                             variance    ~ rhythmic aspects;
                                             ...         • Correlation Pattern,
                                                         ~ loudness changes;
                                                         • Spectral Contrast Pattern,
                                                         ~ ”toneness”;
                                                            • Local Single Gaussian model,
                  [Klaus Seyerlehner et al., MIREX’11, USA] ~ timbral;
                                                         • George Tzanetakis model,
                                                         ~ timbral;

MediaEval - Pisa, Italy, 4-5 October 2012                                               4/16
                                                                                           5
Video content description - audio
     standard audio features
    (audio frame-based)

                                                         • Zero-Crossing Rate,

                                                         • Linear Predictive Coefficients,

                                       time              • Line Spectral Pairs,

                                                         • Mel-Frequency Cepstral Coefficients,
                                              global
                                             feature     • spectral centroid, flux, rolloff, and
    f1 f2        …        fn
                                                =        kurtosis,
+                                           mean &       + variance of each feature over
     var{f2}          var{fn}               variance     a certain window.



                                            [B. Mathieu et al., Yaafe toolbox, ISMIR’10, Netherlands]

MediaEval - Pisa, Italy, 4-5 October 2012                                                       5/16
                                                                                                   6
Video content description - visual
   MPEG-7 & color/texture descriptors
  (visual frame-based)

                                                            • Local Binary Pattern,

                                              global        • Autocorrelogram,
                                             feature        • Color Coherence Vector,
                                                 =
                                             mean &         • Color Layout Pattern,
                                          dispersion &      • Edge Histogram,
                                          skewness &
                               time
                                            kurtosis &      • Classic color histogram,
    f1      f2    …       fn                median &
                                                            • Scalable Color Descriptor,
                                        root mean square
                                                            • Color moments.



                                              [OpenCV toolbox, http://opencv.willowgarage.com]

MediaEval - Pisa, Italy, 4-5 October 2012                                                6/16
                                                                                            7
Video content description - visual
   feature descriptors
  (visual frame-based)
  • Histogram of oriented Gradients (HoG)
  ~ counts occurrences of gradient orientation
                                                                 feature points (e.g. Harris)
  in localized portions of an image (20º per bin)

  • Harris corner detector

  • Speeded Up Robust Feature (SURF)




                                                    image source http://www.ifp.illinois.edu/~yuhuang

                                               [OpenCV toolbox, http://opencv.willowgarage.com]

MediaEval - Pisa, Italy, 4-5 October 2012                                                       7/16
                                                                                                   8
Video content description - text
   TF-IDF descriptors
  (Term Frequency-Inverse Document Frequency)

  > text sources: ASR and metadata,

     1. remove XML markups,

     2. remove terms <5%-percentile of the frequency distribution,

     3. select term corpus: retaining for each genre class m terms (e.g. m =
     150 for ASR and 20 for metadata) with the highest χ2 values that
     occur more frequently than in complement classes,

     4. for each document we represent the TF-IDF values.



MediaEval - Pisa, Italy, 4-5 October 2012                                      8/16
                                                                                  9
Experimental results: devset (5,127 seq.)
  > classifiers from Weka (Bayes, lazy, functional, trees, etc.),
  > cross-validation (train 50% – test 50%),
  avg. Fscore (over all genres)




    - visual descriptors capabilities 30%±10%,
    - using more visual is not more accurate than using few,
    - best LBP+CCV+histogram (Fscore=41.2%).
                                            [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/]

MediaEval - Pisa, Italy, 4-5 October 2012                                                9/1610
Experimental results: devset (5,127 seq.)
  > cross-validation (train 50% – test 50%),


  avg. Fscore (over all genres)




     - audio still better than visual (improvement ~6%),

     - proposed block-based better than standard (by ~10%),

                                            [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/]

MediaEval - Pisa, Italy, 4-5 October 2012                                                10/16
                                                                                             11
Experimental results: devset (5,127 seq.)
  > cross-validation (train 50% – test 50%),


  avg. Fscore (over all genres)




     - ASR from LIMSI more representative than LIUM (~3%),

     - best performance ASR LIMSI + metadata (Fscore=68%).

                                            [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/]

MediaEval - Pisa, Italy, 4-5 October 2012                                                11/16
                                                                                             12
Experimental results: devset (5,127 seq.)
  > cross-validation (train 50% – test 50%),


  avg. Fscore (over all genres)




     - audio-visual close to text (ASR) for the automatic descriptors,

     - increasing the number of modalities increases the performance.

                                            [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/]

MediaEval - Pisa, Italy, 4-5 October 2012                                                12/16
                                                                                             13
Experimental results: official runs (9,550 seq.)
  > train on devset, test on testset (SVM linear),

 MediaEval                                                                MediaEval
   2011                                                                     2011
 MAP 12%                                                                  MAP 10.3%




     Run1              Run2                  Run3              Run4         Run5
  LBP+CCV+           TF-IDF on        audio block-based +      audio      TF-IDF on
  hist + audio       ASR LIMSI        LBP + CCV + hist +    block-based   metadata +
                                                                          metadata
  block-based                           TF-IDF on ASR                     ASR LIMSI
                                             LIMSI




MediaEval - Pisa, Italy, 4-5 October 2012                                        13/16
                                                                                     14
Experimental results: official runs (9,550 seq.)
  > genre MAP for Run 5: TF-IDF on ASR + metadata,
                  Run 1: visual + audio
  autos                             gaming   religion   environment
  52%                                71%      71%           50%




MediaEval - Pisa, Italy, 4-5 October 2012                             14/16
                                                                          15
Conclusions and future work
  > classification adapts to the corpus – changing the corpus will
  change the performance;
  > audio-visual descriptors are inherently limited;
  > how far can we go with ad-hoc classification without human
  intervention?

  > future work:
      more elaborated late-fusion ?
      pursue tests on the entire data set;
      perhaps more elaborated Bag-of-Visual-Words.

    Acknowledgement: we would like to thank Prof. Fausto Giunchiglia and
    Prof. Nicu Sebe from University of Trento for their support.

MediaEval - Pisa, Italy, 4-5 October 2012                                  15/16
                                                                               16
thank you !
                       any questions ?




MediaEval - Pisa, Italy, 4-5 October 2012   16/16
                                                17

Mais conteúdo relacionado

Destaque

GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012MediaEval2012
 
Brave New Task: User Account Matching
Brave New Task: User Account MatchingBrave New Task: User Account Matching
Brave New Task: User Account MatchingMediaEval2012
 
Como hacer una pagina web en wix sharon
Como hacer una pagina web en wix sharonComo hacer una pagina web en wix sharon
Como hacer una pagina web en wix sharonSharon Jimenez
 
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...MediaEval2012
 
Ghent and Cardiff University at the 2012 Placing Task
Ghent and Cardiff University at the 2012 Placing TaskGhent and Cardiff University at the 2012 Placing Task
Ghent and Cardiff University at the 2012 Placing TaskMediaEval2012
 
The L2F Spoken Web Search system for Mediaeval 2012
The L2F Spoken Web Search system for Mediaeval 2012The L2F Spoken Web Search system for Mediaeval 2012
The L2F Spoken Web Search system for Mediaeval 2012MediaEval2012
 
KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual CuesKIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual CuesMediaEval2012
 
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...MediaEval2012
 
Activities for journalistic skills
Activities for journalistic skillsActivities for journalistic skills
Activities for journalistic skillsJNavarro0321
 
How Spatial Segmentation improves the Multimodal Geo-Tagging
How Spatial Segmentation improves the Multimodal Geo-TaggingHow Spatial Segmentation improves the Multimodal Geo-Tagging
How Spatial Segmentation improves the Multimodal Geo-TaggingMediaEval2012
 
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskNII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskMediaEval2012
 
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search TaskThe TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search TaskMediaEval2012
 
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMTUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMMediaEval2012
 
Intro totransportphenomenanew
Intro totransportphenomenanewIntro totransportphenomenanew
Intro totransportphenomenanewilovepurin
 
2010 Marketing Plan
2010 Marketing Plan2010 Marketing Plan
2010 Marketing PlanJPemberton15
 
6dicas– veda 4
6dicas– veda 46dicas– veda 4
6dicas– veda 4souzadea1
 
14 10 21_презентация сту
14 10 21_презентация сту14 10 21_презентация сту
14 10 21_презентация стуStanislav Litvinenko
 

Destaque (20)

10 ρ. δρακουλησ
10 ρ. δρακουλησ10 ρ. δρακουλησ
10 ρ. δρακουλησ
 
GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012GTTS System for the Spoken Web Search Task at MediaEval 2012
GTTS System for the Spoken Web Search Task at MediaEval 2012
 
Brave New Task: User Account Matching
Brave New Task: User Account MatchingBrave New Task: User Account Matching
Brave New Task: User Account Matching
 
Como hacer una pagina web en wix sharon
Como hacer una pagina web en wix sharonComo hacer una pagina web en wix sharon
Como hacer una pagina web en wix sharon
 
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
The Shanghai-Hongkong Team at MediaEval2012: Violent Scene Detection Using Tr...
 
Ghent and Cardiff University at the 2012 Placing Task
Ghent and Cardiff University at the 2012 Placing TaskGhent and Cardiff University at the 2012 Placing Task
Ghent and Cardiff University at the 2012 Placing Task
 
κειμενο
κειμενοκειμενο
κειμενο
 
The L2F Spoken Web Search system for Mediaeval 2012
The L2F Spoken Web Search system for Mediaeval 2012The L2F Spoken Web Search system for Mediaeval 2012
The L2F Spoken Web Search system for Mediaeval 2012
 
KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual CuesKIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
KIT at MediaEval 2012 – Content–based Genre Classification with Visual Cues
 
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
TUB @ MediaEval 2012 Tagging Task: Feature Selection Methods for Bag-of-(visu...
 
Papiloma humano
Papiloma humanoPapiloma humano
Papiloma humano
 
Activities for journalistic skills
Activities for journalistic skillsActivities for journalistic skills
Activities for journalistic skills
 
How Spatial Segmentation improves the Multimodal Geo-Tagging
How Spatial Segmentation improves the Multimodal Geo-TaggingHow Spatial Segmentation improves the Multimodal Geo-Tagging
How Spatial Segmentation improves the Multimodal Geo-Tagging
 
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect TaskNII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
NII, Japan at MediaEval 2012 Violent Scenes Detection Affect Task
 
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search TaskThe TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
The TUM Cumulative DTW Approach for the Mediaeval 2012 Spoken Web Search Task
 
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVMTUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
TUKE MediaEval 2012: Spoken Web Search using DTW and Unsupervised SVM
 
Intro totransportphenomenanew
Intro totransportphenomenanewIntro totransportphenomenanew
Intro totransportphenomenanew
 
2010 Marketing Plan
2010 Marketing Plan2010 Marketing Plan
2010 Marketing Plan
 
6dicas– veda 4
6dicas– veda 46dicas– veda 4
6dicas– veda 4
 
14 10 21_презентация сту
14 10 21_презентация сту14 10 21_презентация сту
14 10 21_презентация сту
 

Semelhante a ARF @ MediaEval 2012: Multimodal Video Classification

Lec18 bag of_features
Lec18 bag of_featuresLec18 bag of_features
Lec18 bag of_featuresBo Li
 
Lecture 21 - Image Categorization - Computer Vision Spring2015
Lecture 21 - Image Categorization -  Computer Vision Spring2015Lecture 21 - Image Categorization -  Computer Vision Spring2015
Lecture 21 - Image Categorization - Computer Vision Spring2015Jia-Bin Huang
 
Color: from craft to computation
Color: from craft to computationColor: from craft to computation
Color: from craft to computationJan Morovic
 
Overview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskOverview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskMediaEval2012
 
Nema e newsletter
Nema e newsletterNema e newsletter
Nema e newsletterLeigh Smead
 
Experimental Media Voodoo™
Experimental Media Voodoo™Experimental Media Voodoo™
Experimental Media Voodoo™SkyRonDotOrg
 
Vdfp audio and video fingerprinting
Vdfp   audio and video fingerprintingVdfp   audio and video fingerprinting
Vdfp audio and video fingerprintingWietskevdHeuvel
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown
 
Fairfield High School Handout
Fairfield High School HandoutFairfield High School Handout
Fairfield High School HandoutKatherineHaratsis
 
Horst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
Horst Goes Pop - Wieviel Musikempfehlung braucht der MenschHorst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
Horst Goes Pop - Wieviel Musikempfehlung braucht der MenschStephan Baumann
 
Open archive islandora-channel-training
Open archive islandora-channel-trainingOpen archive islandora-channel-training
Open archive islandora-channel-trainingscottmertz
 

Semelhante a ARF @ MediaEval 2012: Multimodal Video Classification (14)

Speech recognition (dr. m. sabarimalai manikandan)
Speech recognition (dr. m. sabarimalai manikandan)Speech recognition (dr. m. sabarimalai manikandan)
Speech recognition (dr. m. sabarimalai manikandan)
 
Lec18 bag of_features
Lec18 bag of_featuresLec18 bag of_features
Lec18 bag of_features
 
Lecture 21 - Image Categorization - Computer Vision Spring2015
Lecture 21 - Image Categorization -  Computer Vision Spring2015Lecture 21 - Image Categorization -  Computer Vision Spring2015
Lecture 21 - Image Categorization - Computer Vision Spring2015
 
Color: from craft to computation
Color: from craft to computationColor: from craft to computation
Color: from craft to computation
 
Overview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging TaskOverview of the MediaEval 2012 Tagging Task
Overview of the MediaEval 2012 Tagging Task
 
Nema e newsletter
Nema e newsletterNema e newsletter
Nema e newsletter
 
Experimental Media Voodoo™
Experimental Media Voodoo™Experimental Media Voodoo™
Experimental Media Voodoo™
 
Vdfp audio and video fingerprinting
Vdfp   audio and video fingerprintingVdfp   audio and video fingerprinting
Vdfp audio and video fingerprinting
 
VAEs for multimodal disentanglement
VAEs for multimodal disentanglementVAEs for multimodal disentanglement
VAEs for multimodal disentanglement
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
Fairfield High School Handout
Fairfield High School HandoutFairfield High School Handout
Fairfield High School Handout
 
Dmk audioviz
Dmk audiovizDmk audioviz
Dmk audioviz
 
Horst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
Horst Goes Pop - Wieviel Musikempfehlung braucht der MenschHorst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
Horst Goes Pop - Wieviel Musikempfehlung braucht der Mensch
 
Open archive islandora-channel-training
Open archive islandora-channel-trainingOpen archive islandora-channel-training
Open archive islandora-channel-training
 

Mais de MediaEval2012

MediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval2012
 
A Multimodal Approach for Video Geocoding
A Multimodal Approach for   Video Geocoding A Multimodal Approach for   Video Geocoding
A Multimodal Approach for Video Geocoding MediaEval2012
 
Brave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music TaggingBrave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music TaggingMediaEval2012
 
Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012MediaEval2012
 
CUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskCUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskMediaEval2012
 
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskDCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskMediaEval2012
 
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...MediaEval2012
 
The CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsThe CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsMediaEval2012
 
Overview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskOverview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskMediaEval2012
 
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval2012
 
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval2012
 
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...MediaEval2012
 
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...MediaEval2012
 
The MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioThe MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioMediaEval2012
 
LIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodLIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodMediaEval2012
 
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...MediaEval2012
 
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskUNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskMediaEval2012
 
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012MediaEval2012
 

Mais de MediaEval2012 (20)

MediaEval 2012 Opening
MediaEval 2012 OpeningMediaEval 2012 Opening
MediaEval 2012 Opening
 
Closing
ClosingClosing
Closing
 
A Multimodal Approach for Video Geocoding
A Multimodal Approach for   Video Geocoding A Multimodal Approach for   Video Geocoding
A Multimodal Approach for Video Geocoding
 
Brave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music TaggingBrave New Task: Musiclef Multimodal Music Tagging
Brave New Task: Musiclef Multimodal Music Tagging
 
Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012Search and Hyperlinking Task at MediaEval 2012
Search and Hyperlinking Task at MediaEval 2012
 
CUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking TaskCUNI at MediaEval 2012: Search and Hyperlinking Task
CUNI at MediaEval 2012: Search and Hyperlinking Task
 
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking TaskDCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
DCU Search Runs at MediaEval 2012: Search and Hyperlinking Task
 
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
Ghent University-IBBT at MediaEval 2012 Search and Hyperlinking: Semantic Sim...
 
The CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and OnwardsThe CLEF Initiative From 2010 to 2012 and Onwards
The CLEF Initiative From 2010 to 2012 and Onwards
 
Overview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy TaskOverview of MediaEval 2012 Visual Privacy Task
Overview of MediaEval 2012 Visual Privacy Task
 
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
MediaEval 2012 Visual Privacy Task: Privacy and Intelligibility through Pixel...
 
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
MediaEval 2012 Visual Privacy Task: Applying Transform-domain Scrambling to A...
 
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
Violent Scenes Detection with Large, Brute-forced Acoustic and Visual Feature...
 
mevd2012 esra_
 mevd2012 esra_ mevd2012 esra_
mevd2012 esra_
 
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
Technicolor/INRIA/Imperial College London at the MediaEval 2012 Violent Scene...
 
The MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes DetectioThe MediaEval 2012 Affect Task: Violent Scenes Detectio
The MediaEval 2012 Affect Task: Violent Scenes Detectio
 
LIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic methodLIG at MediaEval 2012 affect task: use of a generic method
LIG at MediaEval 2012 affect task: use of a generic method
 
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
Violence Detection in Video by Large Scale Multi-Scale Local Binary Pattern D...
 
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging TaskUNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
UNICAMP-UFMG at MediaEval 2012: Genre Tagging Task
 
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
Telefonica Research System for the Spoken Web Search task at Mediaeval 2012
 

Último

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Último (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

ARF @ MediaEval 2012: Multimodal Video Classification

  • 1. ~ Multimodal Video Classification ~ ARF (Austria-Romania-France) team Bogdan IONESCU*1,3 Ionuț MIRONICĂ1 Klaus SEYERLEHNER2 bionescu@imag.pub.ro imironica@imag.pub.ro music@cp.jku.at Peter KNEES2 Jan SCHLÜTER4 Markus SCHEDL2 peter.knees@jku.at jan.schlueter@ofai.at markus.schedl@jku.at Horia CUCU1 Andi BUZO1 Patrick LAMBERT3 horia.cucu@upb.ro andi.buzo@upb.ro patrick.lambert@univ-savoie.fr *this work was partially supported under European Structural Funds EXCEL POSDRU/89/1.5/S/62557. 1 2 3 4 University Austrian Research POLITEHNICA Institute for Artificial of Bucharest Intelligence
  • 2. Presentation outline • The approach • Video content description • Experimental results • Conclusions and future work MediaEval - Pisa, Italy, 4-5 October 2012 1/16 2
  • 3. The approach > challenge: find a way to assign (genre) tags to unknown videos; > approach: machine learning paradigm; … web food autos label data train unlabeled data classifier labeled data tagged video database video database MediaEval - Pisa, Italy, 4-5 October 2012 2/163
  • 4. The approach: classification > the entire process relies on the concept of “similarity” computed between content annotations (numeric features), > this year focus is on: objective 1: go multimodal (truly) visual audio text objective 2: test a broad range of classifiers and descriptor combinations; MediaEval - Pisa, Italy, 4-5 October 2012 3/164
  • 5. Video content description - audio  block-level audio features • Spectral Pattern, (capture also local temporal information) ~ soundtrack’s timbre; • delta Spectral Pattern, e.g. 50% overlapping ~ strength of onsets; • variance delta Spectral Pattern, average ~ variation of the onset strength; median • Logarithmic Fluctuation Pattern, variance ~ rhythmic aspects; ... • Correlation Pattern, ~ loudness changes; • Spectral Contrast Pattern, ~ ”toneness”; • Local Single Gaussian model, [Klaus Seyerlehner et al., MIREX’11, USA] ~ timbral; • George Tzanetakis model, ~ timbral; MediaEval - Pisa, Italy, 4-5 October 2012 4/16 5
  • 6. Video content description - audio  standard audio features (audio frame-based) • Zero-Crossing Rate, • Linear Predictive Coefficients, time • Line Spectral Pairs, • Mel-Frequency Cepstral Coefficients, global feature • spectral centroid, flux, rolloff, and f1 f2 … fn = kurtosis, + mean & + variance of each feature over var{f2} var{fn} variance a certain window. [B. Mathieu et al., Yaafe toolbox, ISMIR’10, Netherlands] MediaEval - Pisa, Italy, 4-5 October 2012 5/16 6
  • 7. Video content description - visual  MPEG-7 & color/texture descriptors (visual frame-based) • Local Binary Pattern, global • Autocorrelogram, feature • Color Coherence Vector, = mean & • Color Layout Pattern, dispersion & • Edge Histogram, skewness & time kurtosis & • Classic color histogram, f1 f2 … fn median & • Scalable Color Descriptor, root mean square • Color moments. [OpenCV toolbox, http://opencv.willowgarage.com] MediaEval - Pisa, Italy, 4-5 October 2012 6/16 7
  • 8. Video content description - visual  feature descriptors (visual frame-based) • Histogram of oriented Gradients (HoG) ~ counts occurrences of gradient orientation feature points (e.g. Harris) in localized portions of an image (20º per bin) • Harris corner detector • Speeded Up Robust Feature (SURF) image source http://www.ifp.illinois.edu/~yuhuang [OpenCV toolbox, http://opencv.willowgarage.com] MediaEval - Pisa, Italy, 4-5 October 2012 7/16 8
  • 9. Video content description - text  TF-IDF descriptors (Term Frequency-Inverse Document Frequency) > text sources: ASR and metadata, 1. remove XML markups, 2. remove terms <5%-percentile of the frequency distribution, 3. select term corpus: retaining for each genre class m terms (e.g. m = 150 for ASR and 20 for metadata) with the highest χ2 values that occur more frequently than in complement classes, 4. for each document we represent the TF-IDF values. MediaEval - Pisa, Italy, 4-5 October 2012 8/16 9
  • 10. Experimental results: devset (5,127 seq.) > classifiers from Weka (Bayes, lazy, functional, trees, etc.), > cross-validation (train 50% – test 50%), avg. Fscore (over all genres) - visual descriptors capabilities 30%±10%, - using more visual is not more accurate than using few, - best LBP+CCV+histogram (Fscore=41.2%). [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/] MediaEval - Pisa, Italy, 4-5 October 2012 9/1610
  • 11. Experimental results: devset (5,127 seq.) > cross-validation (train 50% – test 50%), avg. Fscore (over all genres) - audio still better than visual (improvement ~6%), - proposed block-based better than standard (by ~10%), [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/] MediaEval - Pisa, Italy, 4-5 October 2012 10/16 11
  • 12. Experimental results: devset (5,127 seq.) > cross-validation (train 50% – test 50%), avg. Fscore (over all genres) - ASR from LIMSI more representative than LIUM (~3%), - best performance ASR LIMSI + metadata (Fscore=68%). [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/] MediaEval - Pisa, Italy, 4-5 October 2012 11/16 12
  • 13. Experimental results: devset (5,127 seq.) > cross-validation (train 50% – test 50%), avg. Fscore (over all genres) - audio-visual close to text (ASR) for the automatic descriptors, - increasing the number of modalities increases the performance. [Weka toolbox, http://www.cs.waikato.ac.nz/ml/weka/] MediaEval - Pisa, Italy, 4-5 October 2012 12/16 13
  • 14. Experimental results: official runs (9,550 seq.) > train on devset, test on testset (SVM linear), MediaEval MediaEval 2011 2011 MAP 12% MAP 10.3% Run1 Run2 Run3 Run4 Run5 LBP+CCV+ TF-IDF on audio block-based + audio TF-IDF on hist + audio ASR LIMSI LBP + CCV + hist + block-based metadata + metadata block-based TF-IDF on ASR ASR LIMSI LIMSI MediaEval - Pisa, Italy, 4-5 October 2012 13/16 14
  • 15. Experimental results: official runs (9,550 seq.) > genre MAP for Run 5: TF-IDF on ASR + metadata, Run 1: visual + audio autos gaming religion environment 52% 71% 71% 50% MediaEval - Pisa, Italy, 4-5 October 2012 14/16 15
  • 16. Conclusions and future work > classification adapts to the corpus – changing the corpus will change the performance; > audio-visual descriptors are inherently limited; > how far can we go with ad-hoc classification without human intervention? > future work:  more elaborated late-fusion ?  pursue tests on the entire data set;  perhaps more elaborated Bag-of-Visual-Words. Acknowledgement: we would like to thank Prof. Fausto Giunchiglia and Prof. Nicu Sebe from University of Trento for their support. MediaEval - Pisa, Italy, 4-5 October 2012 15/16 16
  • 17. thank you ! any questions ? MediaEval - Pisa, Italy, 4-5 October 2012 16/16 17