SlideShare uma empresa Scribd logo
1 de 22
Affective Multimodal Analysis
for the Media Industry
O. Ben-Ahmed and B. Huet
EURECOM
Sophia Antipolis, France
Motivation and Context
• Media Industry and Multimedia Retrieval
– Indexing and retrieval
– Annotation
– Summarization and Trailer creation
– Enrichments and hyperlinking
• NexGenTV (ANRT National):
– Multimodal analysis for enriching broadcast content
via second screen applications
• ANTRACT (ANR National):
– Mine video archive to provide information for historians
• MeMAD (H2020 EU):
– Provide new methods to translate Image and Sounds into
Words (for indexing, retrieving, repurposing and accessibility)
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 2
Predicting Media Interestingness (PMI)
– Automatically analyze media data
– Identify the most attractive content
• Data Driven /
Content based approaches
– gap between low-level features
and high-level human perception
• Our proposal
– Address PMI in association with
Media Genre
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 3
http://www.dailyherald.com/article/20110627/entlife/706279989/
Why Genre Inference for PMI
• Motivation
– Interestingness is highly correlated with data emotional content
– Affective representation of data content
• Hypothesis
– Emotional impact of movie genre can be a factor for interestingness of a video fragment
• Constraints
– Subjectivity of the task – Data collection issues
– Limited Dataset
• Method
– Mid-level representation based on media genre prediction for PMI
• Represent each video fragment/image as a distribution of genres
– Transfer Learning
• Using genre probability distribution to infer Media Fragment Interestingness
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 4
Our Framework
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 5
Media Fragment Genre
Media Fragment
Interestingness
Media Genre Prediction
• Visual Branch
– Deep CNN for features extraction
– LSTM for modelling temporal cue
• Audio Branch
– Deep features extraction : Soundnet
– SVM classifier
ResNet
Architecture
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 6
Media Genre Dataset
• Movie trailers dataset
– K S Sivaraman and G. Somappa. MovieScope
• 525 YouTube Videos over 4 Genres (available on IMDB)
– Extended with a 5th Genre
• Sci-Fi
Trailer
Genre
Training Test Total
Action 69 44 114
Drama 95 39 134
Horror 99 59 158
Romance 80 39 119
Sci-fi 72 35 107
Total 415 216 632
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 7
Visual Audio Audio-Visual
Action
Drama
Horror
Romance
Sci-fi
Visual Audio Audio-Visual
Action 2.5% 33,34% 17.92%
Drama 0% 17,78% 8.89%
Horror 0.37% 2.61% 1.49%
Romance 0% 3.87% 1.93%
Sci-fi 97.12% 42,40% 69,76%
Media Genre Prediction Example
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 8
Features/Metrics Precision
Sivaraman and Somappa. (4 genres) 0,85
One Frame-VGG + Soundnet 0,85
ResNet-LSTM + Soundnet 0,87
Interestingness Classification
• Features vectors
– Probability vector for the genre distribution for video fragments
• Classifier
– Binary SVM,
– Including assessors confidence scores
• Modality Analysis
– Audio genre vector
– Visual genre vector
– Audio-Visual genre vector
• Concatenated visual- and audio-based genre vectors probabilities
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 9
Media Interestingness Dataset
• Creative Commons licensed “Hollywood” Video
– 103 movie trailers and 4 continuous extracts
– Shot segmentation
– 7396 video segments for training and 2435 video segments
for testing
– Low level and mid level features
– Annotation (GT) performed by human assessors
• Interesting (1) or Not Interesting (2)
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET 10
Example 1: The longest ride
GENRE Action Drama Horror Romance Sci-fi
COVERAGE 9.02% 29.86% 6.25% 43.75% 11.11%
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 11
Example 1: The longest ride
GENRE Action Drama Horror Romance Sci-fi
COVERAGE 9.02% 29.86% 6.25% 43.75% 11.11%
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 12
Example 1: The longest ride
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 13
Predicted Interestingness= 1
Romance = 97,45%
Example 1: The longest ride
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 14
Predicted Interestingness= 0
Romance = 48,49%
Example 2: Tears of Steel
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 15
GENRE Action Drama Horror Romance Sci-fi
COVERAGE 18.37% 16.32% 6.12% 6.12% 53.06%
Example 2: Tears of Steel
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 16
GENRE Action Drama Horror Romance Sci-fi
COVERAGE 18.37% 16.32% 6.12% 6.12% 53.06%
Example 2: Tears of Steel
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 17
Predicted Interestingness= 1
Sci-fi =95,38%
Example 2: Tears of Steel
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 18
Predicted Interestingness= 0
Sci-fi= 53,67%
Experiments and Results
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 19
Conclusions and Future Work
• We proposed to train a Genre Recognition System as a mid-level representation
for Predicting Media Interestingness
• Deep Audio and Visual Features for Genre Recognition
• LSTM used to predict Genre over Video Shot duration
• Transfer Learning for Predicting Media Interestingness
• Best Results:
– 0.21 MAP (on test set) [Previous State of the Art 0.20 MAP]
• Audio brings limited additional information for PMI
• End-to-End joint learning (fine-tuning) of audio-visual features
• Extent mid-level representation with other features (Emotion, Valence/Arousal, etc.)
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 20
Recent Related Publications
• Ben-Ahmed, O., J. Wacker, A. Gaballo, B. Huet, EURECOM @MediaEval 2017: Media genre inference for
predicting media interestingness, MEDIAEVAL 2017, MediaEval Benchmarking Initiative for Multimedia
Evaluation, 13-15 September 2017, Dublin, Ireland
• Pini S., O. Ben-Ahmed, M. Cornia, L. Baraldi, R. Cucchiara, Rita; B. Huet, Modeling multimodal cues in a
deep learning-based framework for emotion recognition in the wild, ICMI 2017, 19th ACM International
Conference on Multimodal Interaction, November 13-17th, 2017, Glasgow, United Kingdom
• Smith, J. R., D. Joshi, B. Huet, W. Hsu, J. Cota, Harnessing A.I. for augmenting creativity: Application to
movie trailer creation, ACMMM 2017, 25th ACM Multimedia Conference, October 23-27, 2017, Mountain
View, CA, USA
• Tiwari S. N., N. Q. K. Duong, F. Lefebvre, C.-H. Demarty, B. Huet, L. Chevallier, Deep features for
multimodal emotion classification, on HAL
• Paleari, M., R. Chellali, B. Huet, Bimodal emotion recognition, ICSR 2010, International Conference on
Social Robotics, November 23-24, 2010, Singapore / Also published as LNCS Volume 6414/2010
• Mérialdo B. and B. Huet, Automatic video summarization, Book chapter in "Interactive Video, Algorithms
and Technologies" by Hammoud, Riad (Ed.), 2006, XVI, 250 p, ISBN: 3-540-33214-6
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET 21
Questions?
6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 22
Benoit Huet.
Thank you,

Mais conteúdo relacionado

Semelhante a Affective Multimodal Analysis for the Media Industry

SMPTE Mot. Imag. J-2015-Ricca-8-12
SMPTE Mot. Imag. J-2015-Ricca-8-12SMPTE Mot. Imag. J-2015-Ricca-8-12
SMPTE Mot. Imag. J-2015-Ricca-8-12
Aim Ricca
 
MA Thesis 2009 - Introduction & Conclusion - Lucy Jones
MA Thesis 2009 - Introduction & Conclusion - Lucy JonesMA Thesis 2009 - Introduction & Conclusion - Lucy Jones
MA Thesis 2009 - Introduction & Conclusion - Lucy Jones
Lucy Jones
 
Semantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking Task
Semantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking TaskSemantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking Task
Semantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking Task
MediaMixerCommunity
 
Statement of Interest (This is used to apply for Graduate Schoo.docx
Statement of Interest (This is used to apply for Graduate Schoo.docxStatement of Interest (This is used to apply for Graduate Schoo.docx
Statement of Interest (This is used to apply for Graduate Schoo.docx
rafaelaj1
 

Semelhante a Affective Multimodal Analysis for the Media Industry (20)

SMPTE Mot. Imag. J-2015-Ricca-8-12
SMPTE Mot. Imag. J-2015-Ricca-8-12SMPTE Mot. Imag. J-2015-Ricca-8-12
SMPTE Mot. Imag. J-2015-Ricca-8-12
 
MA Thesis 2009 - Introduction & Conclusion - Lucy Jones
MA Thesis 2009 - Introduction & Conclusion - Lucy JonesMA Thesis 2009 - Introduction & Conclusion - Lucy Jones
MA Thesis 2009 - Introduction & Conclusion - Lucy Jones
 
Business Analytics and Optimization Introduction
Business Analytics and Optimization IntroductionBusiness Analytics and Optimization Introduction
Business Analytics and Optimization Introduction
 
BDS LIFE MAGAZINE
BDS LIFE MAGAZINEBDS LIFE MAGAZINE
BDS LIFE MAGAZINE
 
Streaming videos are replacing traditional mode of entertainment
Streaming videos are replacing traditional mode of entertainmentStreaming videos are replacing traditional mode of entertainment
Streaming videos are replacing traditional mode of entertainment
 
Technology - proliferation of hardware and content
Technology - proliferation of hardware and contentTechnology - proliferation of hardware and content
Technology - proliferation of hardware and content
 
SC6 Workshop 1: What can big data do for you?
SC6 Workshop 1: What can big data do for you? SC6 Workshop 1: What can big data do for you?
SC6 Workshop 1: What can big data do for you?
 
Policy-driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-driven Dynamic HTTP Adaptive Streaming Player EnvironmentPolicy-driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-driven Dynamic HTTP Adaptive Streaming Player Environment
 
Top Conferences of Computer Science & Electronics, August 2017
Top Conferences of Computer Science & Electronics, August 2017Top Conferences of Computer Science & Electronics, August 2017
Top Conferences of Computer Science & Electronics, August 2017
 
Recommendation as Search: Reflections on Symmetry
Recommendation as Search: Reflections on SymmetryRecommendation as Search: Reflections on Symmetry
Recommendation as Search: Reflections on Symmetry
 
When textual and visual information join forces for multimedia retrieval
When textual and visual information join forces for multimedia retrievalWhen textual and visual information join forces for multimedia retrieval
When textual and visual information join forces for multimedia retrieval
 
Thinking the archives of 2020: Opportunitiws, priorities, Issues
Thinking the archives of 2020: Opportunitiws, priorities, IssuesThinking the archives of 2020: Opportunitiws, priorities, Issues
Thinking the archives of 2020: Opportunitiws, priorities, Issues
 
Python IEEE 2019 Projects List
Python IEEE 2019 Projects List Python IEEE 2019 Projects List
Python IEEE 2019 Projects List
 
Semantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking Task
Semantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking TaskSemantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking Task
Semantic Multimedia Remixing - MediaEval 2013 Search and Hyperlinking Task
 
IRJET- Movie Success Prediction using Popularity Factor from Social Media
IRJET- Movie Success Prediction using Popularity Factor from Social MediaIRJET- Movie Success Prediction using Popularity Factor from Social Media
IRJET- Movie Success Prediction using Popularity Factor from Social Media
 
IRJET- Review on Anti-Piracy Screening System
IRJET-  	  Review on Anti-Piracy Screening SystemIRJET-  	  Review on Anti-Piracy Screening System
IRJET- Review on Anti-Piracy Screening System
 
MHV'22 - Super-resolution Based Bitrate Adaptation for HTTP Adaptive Streamin...
MHV'22 - Super-resolution Based Bitrate Adaptation for HTTP Adaptive Streamin...MHV'22 - Super-resolution Based Bitrate Adaptation for HTTP Adaptive Streamin...
MHV'22 - Super-resolution Based Bitrate Adaptation for HTTP Adaptive Streamin...
 
Kanopy PDA Video Pilot
Kanopy PDA Video PilotKanopy PDA Video Pilot
Kanopy PDA Video Pilot
 
AIVS - AI, Industrial Data Space, and Innovation Transformation
AIVS - AI, Industrial Data Space, and Innovation TransformationAIVS - AI, Industrial Data Space, and Innovation Transformation
AIVS - AI, Industrial Data Space, and Innovation Transformation
 
Statement of Interest (This is used to apply for Graduate Schoo.docx
Statement of Interest (This is used to apply for Graduate Schoo.docxStatement of Interest (This is used to apply for Graduate Schoo.docx
Statement of Interest (This is used to apply for Graduate Schoo.docx
 

Mais de Benoit HUET

Multimedia Content Understanding: Bringing Context to Content
Multimedia Content Understanding: Bringing Context to ContentMultimedia Content Understanding: Bringing Context to Content
Multimedia Content Understanding: Bringing Context to Content
Benoit HUET
 

Mais de Benoit HUET (8)

NexGenTV: Providing Real-Time Insight during Political Debates in a Second Sc...
NexGenTV: Providing Real-Time Insight during Political Debates in a Second Sc...NexGenTV: Providing Real-Time Insight during Political Debates in a Second Sc...
NexGenTV: Providing Real-Time Insight during Political Debates in a Second Sc...
 
Convenient Discovery of Archived Video Using Audiovisual Hyperlinking
Convenient Discovery of Archived Video Using Audiovisual HyperlinkingConvenient Discovery of Archived Video Using Audiovisual Hyperlinking
Convenient Discovery of Archived Video Using Audiovisual Hyperlinking
 
Hyper Video Browser Search and Hyperlinking in Broadcast Media
Hyper Video Browser Search and Hyperlinking in Broadcast MediaHyper Video Browser Search and Hyperlinking in Broadcast Media
Hyper Video Browser Search and Hyperlinking in Broadcast Media
 
Multimedia Content Understanding: Bringing Context to Content
Multimedia Content Understanding: Bringing Context to ContentMultimedia Content Understanding: Bringing Context to Content
Multimedia Content Understanding: Bringing Context to Content
 
Mining the Web for Multimedia-based Enriching - Multimedia Hyperlinking and ...
Mining the Web for Multimedia-based Enriching - Multimedia Hyperlinking and ...Mining the Web for Multimedia-based Enriching - Multimedia Hyperlinking and ...
Mining the Web for Multimedia-based Enriching - Multimedia Hyperlinking and ...
 
LinkedTV @ MediaEval 2013 Search and Hyperlinking Task
LinkedTV @ MediaEval 2013 Search and Hyperlinking TaskLinkedTV @ MediaEval 2013 Search and Hyperlinking Task
LinkedTV @ MediaEval 2013 Search and Hyperlinking Task
 
Multimedia Data Collection using Social Media Analysis
Multimedia Data Collection using Social Media Analysis Multimedia Data Collection using Social Media Analysis
Multimedia Data Collection using Social Media Analysis
 
Wsm2011
Wsm2011Wsm2011
Wsm2011
 

Último

Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 

Último (20)

❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Unit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 oUnit5-Cloud.pptx for lpu course cse121 o
Unit5-Cloud.pptx for lpu course cse121 o
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 

Affective Multimodal Analysis for the Media Industry

  • 1. Affective Multimodal Analysis for the Media Industry O. Ben-Ahmed and B. Huet EURECOM Sophia Antipolis, France
  • 2. Motivation and Context • Media Industry and Multimedia Retrieval – Indexing and retrieval – Annotation – Summarization and Trailer creation – Enrichments and hyperlinking • NexGenTV (ANRT National): – Multimodal analysis for enriching broadcast content via second screen applications • ANTRACT (ANR National): – Mine video archive to provide information for historians • MeMAD (H2020 EU): – Provide new methods to translate Image and Sounds into Words (for indexing, retrieving, repurposing and accessibility) 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 2
  • 3. Predicting Media Interestingness (PMI) – Automatically analyze media data – Identify the most attractive content • Data Driven / Content based approaches – gap between low-level features and high-level human perception • Our proposal – Address PMI in association with Media Genre 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 3 http://www.dailyherald.com/article/20110627/entlife/706279989/
  • 4. Why Genre Inference for PMI • Motivation – Interestingness is highly correlated with data emotional content – Affective representation of data content • Hypothesis – Emotional impact of movie genre can be a factor for interestingness of a video fragment • Constraints – Subjectivity of the task – Data collection issues – Limited Dataset • Method – Mid-level representation based on media genre prediction for PMI • Represent each video fragment/image as a distribution of genres – Transfer Learning • Using genre probability distribution to infer Media Fragment Interestingness 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 4
  • 5. Our Framework 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 5 Media Fragment Genre Media Fragment Interestingness
  • 6. Media Genre Prediction • Visual Branch – Deep CNN for features extraction – LSTM for modelling temporal cue • Audio Branch – Deep features extraction : Soundnet – SVM classifier ResNet Architecture 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 6
  • 7. Media Genre Dataset • Movie trailers dataset – K S Sivaraman and G. Somappa. MovieScope • 525 YouTube Videos over 4 Genres (available on IMDB) – Extended with a 5th Genre • Sci-Fi Trailer Genre Training Test Total Action 69 44 114 Drama 95 39 134 Horror 99 59 158 Romance 80 39 119 Sci-fi 72 35 107 Total 415 216 632 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 7
  • 8. Visual Audio Audio-Visual Action Drama Horror Romance Sci-fi Visual Audio Audio-Visual Action 2.5% 33,34% 17.92% Drama 0% 17,78% 8.89% Horror 0.37% 2.61% 1.49% Romance 0% 3.87% 1.93% Sci-fi 97.12% 42,40% 69,76% Media Genre Prediction Example 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 8 Features/Metrics Precision Sivaraman and Somappa. (4 genres) 0,85 One Frame-VGG + Soundnet 0,85 ResNet-LSTM + Soundnet 0,87
  • 9. Interestingness Classification • Features vectors – Probability vector for the genre distribution for video fragments • Classifier – Binary SVM, – Including assessors confidence scores • Modality Analysis – Audio genre vector – Visual genre vector – Audio-Visual genre vector • Concatenated visual- and audio-based genre vectors probabilities 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 9
  • 10. Media Interestingness Dataset • Creative Commons licensed “Hollywood” Video – 103 movie trailers and 4 continuous extracts – Shot segmentation – 7396 video segments for training and 2435 video segments for testing – Low level and mid level features – Annotation (GT) performed by human assessors • Interesting (1) or Not Interesting (2) 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET 10
  • 11. Example 1: The longest ride GENRE Action Drama Horror Romance Sci-fi COVERAGE 9.02% 29.86% 6.25% 43.75% 11.11% 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 11
  • 12. Example 1: The longest ride GENRE Action Drama Horror Romance Sci-fi COVERAGE 9.02% 29.86% 6.25% 43.75% 11.11% 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 12
  • 13. Example 1: The longest ride 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 13 Predicted Interestingness= 1 Romance = 97,45%
  • 14. Example 1: The longest ride 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 14 Predicted Interestingness= 0 Romance = 48,49%
  • 15. Example 2: Tears of Steel 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 15 GENRE Action Drama Horror Romance Sci-fi COVERAGE 18.37% 16.32% 6.12% 6.12% 53.06%
  • 16. Example 2: Tears of Steel 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 16 GENRE Action Drama Horror Romance Sci-fi COVERAGE 18.37% 16.32% 6.12% 6.12% 53.06%
  • 17. Example 2: Tears of Steel 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 17 Predicted Interestingness= 1 Sci-fi =95,38%
  • 18. Example 2: Tears of Steel 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 18 Predicted Interestingness= 0 Sci-fi= 53,67%
  • 19. Experiments and Results 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 19
  • 20. Conclusions and Future Work • We proposed to train a Genre Recognition System as a mid-level representation for Predicting Media Interestingness • Deep Audio and Visual Features for Genre Recognition • LSTM used to predict Genre over Video Shot duration • Transfer Learning for Predicting Media Interestingness • Best Results: – 0.21 MAP (on test set) [Previous State of the Art 0.20 MAP] • Audio brings limited additional information for PMI • End-to-End joint learning (fine-tuning) of audio-visual features • Extent mid-level representation with other features (Emotion, Valence/Arousal, etc.) 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 20
  • 21. Recent Related Publications • Ben-Ahmed, O., J. Wacker, A. Gaballo, B. Huet, EURECOM @MediaEval 2017: Media genre inference for predicting media interestingness, MEDIAEVAL 2017, MediaEval Benchmarking Initiative for Multimedia Evaluation, 13-15 September 2017, Dublin, Ireland • Pini S., O. Ben-Ahmed, M. Cornia, L. Baraldi, R. Cucchiara, Rita; B. Huet, Modeling multimodal cues in a deep learning-based framework for emotion recognition in the wild, ICMI 2017, 19th ACM International Conference on Multimodal Interaction, November 13-17th, 2017, Glasgow, United Kingdom • Smith, J. R., D. Joshi, B. Huet, W. Hsu, J. Cota, Harnessing A.I. for augmenting creativity: Application to movie trailer creation, ACMMM 2017, 25th ACM Multimedia Conference, October 23-27, 2017, Mountain View, CA, USA • Tiwari S. N., N. Q. K. Duong, F. Lefebvre, C.-H. Demarty, B. Huet, L. Chevallier, Deep features for multimodal emotion classification, on HAL • Paleari, M., R. Chellali, B. Huet, Bimodal emotion recognition, ICSR 2010, International Conference on Social Robotics, November 23-24, 2010, Singapore / Also published as LNCS Volume 6414/2010 • Mérialdo B. and B. Huet, Automatic video summarization, Book chapter in "Interactive Video, Algorithms and Technologies" by Hammoud, Riad (Ed.), 2006, XVI, 250 p, ISBN: 3-540-33214-6 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET 21
  • 22. Questions? 6/27/2018 ICMR18- ACMMM TPC Workshop - B. HUET - p 22 Benoit Huet. Thank you,

Notas do Editor

  1. The media industry is constantly making use of affective signals whether in text, sound, image or moving images with the aim of attracting our attention and conveying a message or a story. In this presentation we will look at the analysis of audio-visual content for understanding its affective properties. We will start with proposing a semi supervised approach for identifying the genre of a media (action, drama, horror, etc..). We will then show how the genre of video segments can be used to determine its interestingness. There are many usage scenarios where such information about the content has value for the media editors and archivists. Beyond genre and interestingness, emotion recognition in videos in another important cue when understanding the content of audio-visual documents. For this a deep model combining three key component for recognizing human expression of emotions has been devised. It includes static as well as dynamic facial features and audio information. The approach was shown to perform well on the Emotion Recognition in the Wild 2017 challenge. Applications to past and ongoing research and industry projects will be used throughout to illustrate the presentation.
  2. Soleymani 2015 Visual Interestingness Van Gool 2013 Image Interestingness Predicting Media Interestingness (PMI) aims to automatically analyze media data and identify the most attractive content Most of the previous proposed Approaches for PMI are directly based on the multimedia content However those approaches suffer from the gap between low-level features and high-level human perception. Supported by recent research, we address PMI in association with media emotional content through Genre Inference
  3. Recent research proved that perceived interestingness is highly correlated with data emotional content Soleymani 2015 Visual Interestingness Van Gool 2013 Image Interestingness Hence, an affective representation of video content will be useful for identifying the most important parts in a movie. In this work, we hypothesize that the emotional impact of the movie genre can be a factor for the perceived interestingness of a video for a given viewer. Therefore, we adopt a mid-level representation based on media genre recognition. We propose to represent each sample as a distribution over genres (action, drama, horror, romance, sci-fi). For instance, a high confidence for the horror label inside the shot genre distribution could be perceived as more emotional (scary in this case). Therefore, this shot might be more characteristic and therefore more interesting than a neutral genre that could appear in any shot.
  4. About Soundnet : The Soundnet model is build by learning acoustic representation from large collections of unlabeled videos. The training is based on a student-teacher procedure which transfers discriminative visual knowledge from well established visual models (ImageNet, Places) into sound domain using unlabeled video as a bridge. Audio features extracted from the soundnet model achieved state-of-the-art performance when used as input to an SVM on two audio scene classification benchmarks.
  5. A shot from the devset of MediaEval set and its genre prediction for visual, audio and audio-visual features The video is here : https://drive.google.com/open?id=0B-Ebwt-Z3nNMNEFsbnNwRzdrazg The name of the film is After Earth Film
  6. Video 59 dans devset Déja la plupart des pics sont dans les genres romance et Drama car c un fil de drama et romance de base
  7. Video 59 dans devset Déja la plupart des pics sont dans les genres romance et Drama car c un fil de drama et romance de base
  8. Il ya 8 interssting shots dans le GT (['3-41', '42-69', '874-920', '1736-1754', '1772-1795', '2289-2329', '2746-2769', '3027-3047'] Je vais mettre deux shots intéressants ( je vous laisse le choix pour choisir un pour que je le mentionne sur le graphe avec l’etoile rouge) Index des vidéos rouges : 13 et 35 Et un shot non interss (index 85 (2477-2495))
  9. Il ya 8 interesting shots dans le GT (['3-41', '42-69', '874-920', '1736-1754', '1772-1795', '2289-2329', '2746-2769', '3027-3047'] les shots intérrassant sont : 9 ->(0.9744806289672852), 15 ->(0.9995414018630981) , 48 ->(0.9999010562896729), 66->(0.9726647734642029), 104->(0.9965728521347046), 110->(0.6256349086761475) 111->( 0.927717924118042), 141-> ((0.7384583353996277) 111, 104, 66, 48, 110, 15, 141, 9] Video 1 avec etoile rouge (3027-3047)---> index sur le graphe = 9 avec romance = 0.9744806289672852 Video 2 avec etoile rouge (2746-2769)---> index sur la graphe = 141 (0.7384583353996277) Ici je propose qu’on laisse que la video 1 avec etoile rouge ? Video avec etoile verte (2477-2495)--> index sur graphe =85 avec romance = 0.48481258749961853 Je vais mettre deux shots intéressants ( je vous laisse le choix pour choisir un pour que je le mentionne sur le graphe avec l’etoile rouge) Et un shot non interss (index 85 (2477-2495))
  10. Video 59 dans devset Déja la plupart des pics sont dans les genres romance et Drama car c un fil de drama et romance de base
  11. Video 59 dans devset Déja la plupart des pics sont dans les genres romance et Drama car c un fil de drama et romance de base
  12. Ok ! les shots intérrassant sont : 7(86.85%),10(99.98%),18 (95.22%) ,27(99.99%),35(95.38%), et 48(91.12%)
  13. Ok ! les shots intérrassant sont : 7(86.85%),10(99.98%),18 (95.22%) ,27(99.99%),35(95.38%), et 48(91.12%)
  14. Shots are too short for the audio to have impact?