SlideShare uma empresa Scribd logo
1 de 24
/24
Contextual modeling of audio signals
toward information retrieval
Samuel Kim Ph.D.
Given Zone, LLC
allthatsignal@gmail.com
http://allthatsignal.com
All that signal; for the people, by the people, of the people © Given Zone, LLC
/242
Audio Information Retrieval
/243
Motivation
/24
Open Challenges
Heterogeneous …
/24
Context-based approach rather than content-based
Proposed approach
/246
 An acoustic scene consists of a set of acoustic topics.
 Each acoustic topic has a probability over acoustic words.
Hypothesis
Acoustic Topics
Acoustic scene Acoustic Topics
Acoustic words
(signal characteristics)
/24
7
Conjugate pair of multinomial: Dirichlet distribution
What if the number of balls, i.e., , are random ?
Ball-picking problem (a.k.a. urn problem): Multinomial
Problem formulation
/24
Latent Dirichlet Allocation (LDA)
8
 Graphical representation of LDA
Dirichlet
Parameter
Topic
distributions
Word distribution
w/ given topic
Dirichlet
Parameter
Topic Word
/24
Acoustic Words
9
/24
Approximation
 Infer/Model process
• Involves intractable computations, such as
 Approximation methods
• Gibbs sampling method [Steyvers 2007]
 A form of Markov Chain Monte Carlo (MCMC)
• Variational approximation [Blei2003], etc.
10
/24
Interpretation
11
Latent Acoustic Topics
Acoustic Topics
Probabilistic assign to individual topics
the size represents the probability
Acoustic Words
Audio Features
... ...
Discrete symbol of acoustic characteristics
Play similar roles with text words
Probabilistic (soft) Clustering
in terms of acoustic words’ co-occurrence
/24
 Two-step Learning
For Classification Applications
12
Training DB
Acoustic
Topic Model
Topic Distribution
Probability
Classifiers
(Multiclass SVM)
Test signal
Unsupervised
modeling
Supervised
classifier
TestphaseTrainingphase
/24
Possible Applications
13
Content Identification
• Music Information Retrieval [Levi2009]
• Audio Fingerprinting [Kim2012]
Audio Scene Analysis
• Understanding auditory scene [Kim2009]
• Environmental sound classification [Kim2010]
User Modeling
• Behavioral Analysis [Kim2011]
• Emotion recognition [Kim2013]
/24
Target application
Automatic classification of TV genre using audio content
/24
Scenarios
 Off-line
• Assumes the system knows when the program
starts and ends.
• Prior segmentation required.
 On-line
• Makes decisions without prior segmentation
 Every X seconds
 Online scene detection, etc.
15
/24
Scenarios
 Models are trained in an off-line manner
16
Training DB
Acoustic
Topic Model
Topic Distribution
Probability
Classifiers
(Multiclass SVM)
Test signal
TestphaseTrainingphase
Test signal
Segmentation
Off-line result
On-line resultOn
Off
/24
Dataset
 RAI dataset
• Providing a benchmarking test-bed (6-fold cv)
• Italian TV broadcast programs
• 7 genres
• 262 programs (15 min/pr.)
17
/24
Off-line classification
18
[2007] M. Montagnuolo and A. Messina, “TV Genre Classification Using Multimodal Information and Multilayer Perceptrons,” LNAI 4733, 2007.
[2009] M. Montagnuolo and A. Messina, “Parallel neural networks for multimodal video genre classification,” Multimedia Tools and Applications, vol. 41, 2009
[2010] H. Ekenel, et al. “Content-based Video Genre Classification Using Multiple Cues,” ACM 2010.
 Overall accuracy
• Comparison with conventional content-based
approaches
 Competitive results using only audio contents
Accuracy (%)
MLP *
[2007]
MLP *
[2009]
SVM **
[2010]
GMM
(64 mixtures)
ATM
(64 topics)
(2,048 words)
Audio Only - - 86.6 93.6 94.3
Audio-Visual 92.0 94.9 99.6 - -
* MLP: Multilayer Perceptron
** SVM: Support Vector Machine
/24
Off-line classification
 Confusion matrix
• ATM
• GMM
19
CT
CM
FB
MU
NE
TS
WF
Cartoon
Commercial
Football
Music
News
Talk show
Weather Forecast
/24
On-line classification
 Accuracy according to length of segments
20
0 1 2 3 4 5 6
68
70
72
74
76
78
80
82
Accuracy(%)
time (s)
ATM
GMM
/24
On-line classification
 Per-class F-measure
21
[1 second] [6 seconds]
/24
Summary
 Genre of TV programs can be detected using
only audio content
• Using context-based approach
 On-line and off-line tasks
• Competitive results with conventional audio-visual
approaches in off-line tasks
• ATM outperforms GMM if segments are long
enough in on-line tasks
22
/24
Conclusions
 Acoustic Topic Model (ATM)
• Capturing contextual information of audio signals
by modeling co-occurrence of text-like audio
signals
• Can be used in various classification application
incorporation with supervised classifier
23
/24
Merci beaucoup
All that signal; for the people, by the people, of the people © Given Zone, LLC

Mais conteúdo relacionado

Semelhante a Context-based modeling of audio signals toward information retrieval

End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...Universitat Politècnica de Catalunya
 
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion RecognitionSeoul National University
 
Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24Keunwoo Choi
 
Music Gesture for Visual Sound Separation
Music Gesture for Visual Sound SeparationMusic Gesture for Visual Sound Separation
Music Gesture for Visual Sound Separationivaderivader
 
Towards User-friendly Audio Creation
Towards User-friendly Audio CreationTowards User-friendly Audio Creation
Towards User-friendly Audio CreationJean Vanderdonckt
 
Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...
Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...
Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...Sebastian Ruder
 
Mediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slidesMediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slidesXavier Anguera
 
10 Minute Research Presentation on Ambisonics and Impact
10 Minute Research Presentation on Ambisonics and Impact10 Minute Research Presentation on Ambisonics and Impact
10 Minute Research Presentation on Ambisonics and ImpactBruce Wiggins
 
Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Daichi Kitamura
 
dcase2016_taslp.pdf
dcase2016_taslp.pdfdcase2016_taslp.pdf
dcase2016_taslp.pdfzkdcxoan
 
Attention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoisingAttention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoisingIAESIJAI
 
New coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metricsNew coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metricsTouradj Ebrahimi
 
Introduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionIntroduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionNAVER Engineering
 
Teleclassing: opzet en technologische uitdagingen - Bart De Schuymer - Televic
Teleclassing: opzet en technologische uitdagingen - Bart De Schuymer - TelevicTeleclassing: opzet en technologische uitdagingen - Bart De Schuymer - Televic
Teleclassing: opzet en technologische uitdagingen - Bart De Schuymer - Televicimec.archive
 
Toward wave net speech synthesis
Toward wave net speech synthesisToward wave net speech synthesis
Toward wave net speech synthesisNAVER Engineering
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingLionel Briand
 
Beamforming for Multiuser Massive MIMO Systems: Digital versus Hybrid Analog-...
Beamforming for Multiuser Massive MIMO Systems: Digital versus Hybrid Analog-...Beamforming for Multiuser Massive MIMO Systems: Digital versus Hybrid Analog-...
Beamforming for Multiuser Massive MIMO Systems: Digital versus Hybrid Analog-...T. E. BOGALE
 

Semelhante a Context-based modeling of audio signals toward information retrieval (20)

End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
 
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
[slide] Attentive Modality Hopping Mechanism for Speech Emotion Recognition
 
Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24Deep learning for music classification, 2016-05-24
Deep learning for music classification, 2016-05-24
 
Music Gesture for Visual Sound Separation
Music Gesture for Visual Sound SeparationMusic Gesture for Visual Sound Separation
Music Gesture for Visual Sound Separation
 
Thesis
ThesisThesis
Thesis
 
Towards User-friendly Audio Creation
Towards User-friendly Audio CreationTowards User-friendly Audio Creation
Towards User-friendly Audio Creation
 
Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...
Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...
Topic Listener - Observing Key Topics from Multi-Channel Speech Audio Streams...
 
MaryamNajafianPhDthesis
MaryamNajafianPhDthesisMaryamNajafianPhDthesis
MaryamNajafianPhDthesis
 
Mediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slidesMediaeval 2013 Spoken Web Search results slides
Mediaeval 2013 Spoken Web Search results slides
 
10 Minute Research Presentation on Ambisonics and Impact
10 Minute Research Presentation on Ambisonics and Impact10 Minute Research Presentation on Ambisonics and Impact
10 Minute Research Presentation on Ambisonics and Impact
 
Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...Robust music signal separation based on supervised nonnegative matrix factori...
Robust music signal separation based on supervised nonnegative matrix factori...
 
04 --spatial-data
04 --spatial-data04 --spatial-data
04 --spatial-data
 
dcase2016_taslp.pdf
dcase2016_taslp.pdfdcase2016_taslp.pdf
dcase2016_taslp.pdf
 
Attention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoisingAttention gated encoder-decoder for ultrasonic signal denoising
Attention gated encoder-decoder for ultrasonic signal denoising
 
New coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metricsNew coding techniques, standardisation, and quality metrics
New coding techniques, standardisation, and quality metrics
 
Introduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detectionIntroduction to deep learning based voice activity detection
Introduction to deep learning based voice activity detection
 
Teleclassing: opzet en technologische uitdagingen - Bart De Schuymer - Televic
Teleclassing: opzet en technologische uitdagingen - Bart De Schuymer - TelevicTeleclassing: opzet en technologische uitdagingen - Bart De Schuymer - Televic
Teleclassing: opzet en technologische uitdagingen - Bart De Schuymer - Televic
 
Toward wave net speech synthesis
Toward wave net speech synthesisToward wave net speech synthesis
Toward wave net speech synthesis
 
Revisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software TestingRevisiting the Notion of Diversity in Software Testing
Revisiting the Notion of Diversity in Software Testing
 
Beamforming for Multiuser Massive MIMO Systems: Digital versus Hybrid Analog-...
Beamforming for Multiuser Massive MIMO Systems: Digital versus Hybrid Analog-...Beamforming for Multiuser Massive MIMO Systems: Digital versus Hybrid Analog-...
Beamforming for Multiuser Massive MIMO Systems: Digital versus Hybrid Analog-...
 

Último

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 

Último (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 

Context-based modeling of audio signals toward information retrieval

  • 1. /24 Contextual modeling of audio signals toward information retrieval Samuel Kim Ph.D. Given Zone, LLC allthatsignal@gmail.com http://allthatsignal.com All that signal; for the people, by the people, of the people © Given Zone, LLC
  • 5. /24 Context-based approach rather than content-based Proposed approach
  • 6. /246  An acoustic scene consists of a set of acoustic topics.  Each acoustic topic has a probability over acoustic words. Hypothesis Acoustic Topics Acoustic scene Acoustic Topics Acoustic words (signal characteristics)
  • 7. /24 7 Conjugate pair of multinomial: Dirichlet distribution What if the number of balls, i.e., , are random ? Ball-picking problem (a.k.a. urn problem): Multinomial Problem formulation
  • 8. /24 Latent Dirichlet Allocation (LDA) 8  Graphical representation of LDA Dirichlet Parameter Topic distributions Word distribution w/ given topic Dirichlet Parameter Topic Word
  • 10. /24 Approximation  Infer/Model process • Involves intractable computations, such as  Approximation methods • Gibbs sampling method [Steyvers 2007]  A form of Markov Chain Monte Carlo (MCMC) • Variational approximation [Blei2003], etc. 10
  • 11. /24 Interpretation 11 Latent Acoustic Topics Acoustic Topics Probabilistic assign to individual topics the size represents the probability Acoustic Words Audio Features ... ... Discrete symbol of acoustic characteristics Play similar roles with text words Probabilistic (soft) Clustering in terms of acoustic words’ co-occurrence
  • 12. /24  Two-step Learning For Classification Applications 12 Training DB Acoustic Topic Model Topic Distribution Probability Classifiers (Multiclass SVM) Test signal Unsupervised modeling Supervised classifier TestphaseTrainingphase
  • 13. /24 Possible Applications 13 Content Identification • Music Information Retrieval [Levi2009] • Audio Fingerprinting [Kim2012] Audio Scene Analysis • Understanding auditory scene [Kim2009] • Environmental sound classification [Kim2010] User Modeling • Behavioral Analysis [Kim2011] • Emotion recognition [Kim2013]
  • 14. /24 Target application Automatic classification of TV genre using audio content
  • 15. /24 Scenarios  Off-line • Assumes the system knows when the program starts and ends. • Prior segmentation required.  On-line • Makes decisions without prior segmentation  Every X seconds  Online scene detection, etc. 15
  • 16. /24 Scenarios  Models are trained in an off-line manner 16 Training DB Acoustic Topic Model Topic Distribution Probability Classifiers (Multiclass SVM) Test signal TestphaseTrainingphase Test signal Segmentation Off-line result On-line resultOn Off
  • 17. /24 Dataset  RAI dataset • Providing a benchmarking test-bed (6-fold cv) • Italian TV broadcast programs • 7 genres • 262 programs (15 min/pr.) 17
  • 18. /24 Off-line classification 18 [2007] M. Montagnuolo and A. Messina, “TV Genre Classification Using Multimodal Information and Multilayer Perceptrons,” LNAI 4733, 2007. [2009] M. Montagnuolo and A. Messina, “Parallel neural networks for multimodal video genre classification,” Multimedia Tools and Applications, vol. 41, 2009 [2010] H. Ekenel, et al. “Content-based Video Genre Classification Using Multiple Cues,” ACM 2010.  Overall accuracy • Comparison with conventional content-based approaches  Competitive results using only audio contents Accuracy (%) MLP * [2007] MLP * [2009] SVM ** [2010] GMM (64 mixtures) ATM (64 topics) (2,048 words) Audio Only - - 86.6 93.6 94.3 Audio-Visual 92.0 94.9 99.6 - - * MLP: Multilayer Perceptron ** SVM: Support Vector Machine
  • 19. /24 Off-line classification  Confusion matrix • ATM • GMM 19 CT CM FB MU NE TS WF Cartoon Commercial Football Music News Talk show Weather Forecast
  • 20. /24 On-line classification  Accuracy according to length of segments 20 0 1 2 3 4 5 6 68 70 72 74 76 78 80 82 Accuracy(%) time (s) ATM GMM
  • 21. /24 On-line classification  Per-class F-measure 21 [1 second] [6 seconds]
  • 22. /24 Summary  Genre of TV programs can be detected using only audio content • Using context-based approach  On-line and off-line tasks • Competitive results with conventional audio-visual approaches in off-line tasks • ATM outperforms GMM if segments are long enough in on-line tasks 22
  • 23. /24 Conclusions  Acoustic Topic Model (ATM) • Capturing contextual information of audio signals by modeling co-occurrence of text-like audio signals • Can be used in various classification application incorporation with supervised classifier 23
  • 24. /24 Merci beaucoup All that signal; for the people, by the people, of the people © Given Zone, LLC

Notas do Editor

  1. Ambiguities in soundHeterogeneous A mixture of multiple sound sourcesDependency on context Similar audio contents may represent different meanings according to surrounding sound.
  2. Acoustic word, which play similar role with words in textTransform audio signals to text-likesignalsWe have tried various strategies, like ASR, onomatopoeic words, but MFCC-VQ rocks