SlideShare uma empresa Scribd logo
1 de 38
Audio Clip Classification

         Anvita Bajpai
      anvita@mailcity.com
Source:
http://www.hindu.com/thehindu/seta/2002/01/10/stories/2002011000080300.htm
Exploding information




   One hour of TV broadcast across the world is 100 Petabyte.
●




Source: http://www.sims.berkeley.edu/research/projects/how-much-info/summary.html#tv
Audio indexing
    Reason of choosing audio data for study
●


        Easier to process
    –

        Contains significant information
    –

    Indexing – method of organizing data for further
●

    search and retrieval
        Example – book indexing
    –

    Audio Indexing – indexing non-text data using
●

    audio part of it
Example of an audio indexing system




Source: J. Makhoul, F. Kubala, T. Leek, D. Liu, L. Nguyen, R. Schwartz, and A. Srivastava.
  “Speech and language technologies for audio indexing and retrieval”, in Proc. of the IEEE,
  88(8), pp. 1338-1353, 2000.
More examples of audio indexing tasks

    Spoken document retrieval
●


    Speaker identification
●


    Language identification
●


    Music classification
●


    Music/speech discrimination
●


    Audio classification
●


        An important step in building an audio indexing system.
    –
Levels of information in audio signal
    Subsegmental information
●


        Related to excitation source characteristics
    –

    Segmental information
●


        Related to system / physiological characteristics
    –

    Suprasegmental information
●


        Related to behavioural characteristics of audio
    –
Audio clip classification
    Closed set problem
●


    To classify a given audio clip in one of the
●

    following predefined categories
        Advertisement
    –

        Cartoon
    –

        Cricket
    –

        Football
    –

        News
    –
Issues in audio clip classification
    Feature extraction
●


        Effective representation of data to capture all
    –
        significant properties of audio for the task
        Robust under various conditions
    –

    Classification
●


        Formulation of a distance measure and rule/models
    –
             Training a models for the task
         ●


             Testing – actual classification task
         ●


             Combining evidences from different systems
         ●
Missing component in existing
        approaches and it's importance
    Features derived based on spectral analysis
●


        Carry significant properties of audio data at segmental level
    –

        Miss information present at subsegmental, suprasegmental level
    –

    Perceptually significant information in linear prediction
●

    (LP) residual of signal
        Complimentary in nature to the spectral information
    –

        Subsegmental and suprasegmental information not being used
    –
        in current systems
Presence of audio-specific




                                                     Residual
                                          Original
information in LP residual

                             Aa_res.wav




        Aa1.wav




                              Aa1.wav
Extracting audio-specific information
from LP residual
      LP residual
  –
           May contain higher order correlation among samples
       ●


           It is difficult to extract it using standard signal processing and
       ●

           statistical techniques
      Hence proposed autoassociative neural networks
  –
      (AANN) models to capture information from residual
           Used to capture features
       ●


       for speaker recognition task
           Structure of network
       ●


                40L 48N 12N 48N 40L
            –
Use of audio component knowledge
    Audio category
●


        Composed of one or more audio components
    –

    Audio component
●


        Specific to an audio category
    –

    Six components chosen for study
●


        Music
    –

        Speech   - Conversational, Cartoon, Clean
    –

        Noise    - Football, Cricket
    –
Training phase of AANN models
    Trained one AANN model for each of six
●

    components
    Models trained
●


    for 2000 epochs




                           AANN training error curve
Testing phase
 (confidence scores output of 6 AANN models for a news test clip)




a) for a segment of the clip, (b) expended version of the same. Duration of total test clip is 10 sec
Work flow diagram




                            (of 6 components)

                    MLP – Multilayer perceptron
MLP for decision making task
    MLP for capturing audio-specific information
●

    captured by AANN, as it is
        Suitable for pattern recognition tasks
    –

        Have ability to form complex decision surface by
    –
        using discriminating learning algorithms
    Structure of MLP used - 6L 24N 12N 5N
●
Confidence scores output of 6 component AANN models
               Contd...
                                                                  24        12
                                                      Nodes 6
                                                                                   5
                                                         M                         A

                                                         S1                        C




                                                                                              Audio Category
                                                         S2
                                                                                   K

                                                         S3
                                                                                    F
                                                         N1
                                                                                    N
                                                         N2                        OP layer
                                                       IP layer    Hidden layers
Classification results
                 Audio class       % of clips correctly classified
                                        DB1             DB2
                 Advertisement           83.00%            43.50%
                 Cartoon                 88.00%            45.50%
                 Cricket                 86.00%            38.50%
                 Football                90.50%            75.50%
                 News                    85.50%            63.30%
                 Average                 86.60%            53.26%




DB1 – Data collected from single TV channel, contains 200 clips, 40 of each category
DB2 – Data collected across all broadcasted channels, contains 1659 clips,
      Adv. – 226, Cartoon – 208, Cricket – 318, Football – 600, News – 306
Classification results for spectral
                              1
      features-based system
          Audio class                          % of clips correctly classified
                            Spectral features-based system         LP residual-based system
                                       DB1               DB2             DB1              DB2
          Advertisement            85.00%             65.00%         83.00%           43.50%
          Cartoon                  90.00%             75.00%         88.00%           45.50%
          Cricket                  90.00%             65.00%         86.00%           38.50%
          Football                 92.50%             40.00%         90.50%           75.60%
          News                     87.50%             65.30%         85.50%           63.30%

          Average                  89.00%             62.06%         86.60%           53.26%




Ref. [1] Gaurav Aggarwal, Features for Audio Indexing, M Tech report, CSE Deptt, IIT Madras, Apr. 2002
Classification results from source,
   spectral features-based systems
              A




                       System 1                System 2




A – All test audio clips (DB2)
System 1 – clips recognised using spectral features-based system
System 2 – clips recognised using excitation source (LP residual) based system
Results of combined (subsegmental
    and segmental) system for DB2
Audio class       % of clips correctly classified in systems

                Spectral    LP residual      Abstract level    Rank+measurement level
                 Based            based        Combination                Combination
Advertisement   65.00%           43.50%             83.00%                    92.47%
Cartoon         75.00%           45.50%             92.00%                    98.55%
Cricket         65.00%           38.50%             87.50%                    88.67%
Football        40.00%           75.60%             87.00%                    91.16%
News            65.30%           63.30%             86.30%                    95.10%

Average         62.06%           53.26%             87.25%                    93.18%
uprasegmental information in Hilbert
nvelope of LP residual of audio signal
Suprasegmental information in LP
   residual for audio clip classification




Autocorrelation samples of Hilbert envelope of LP residual for 5 audio classes
Statistics of autocorrelation sequence




Correction – here we have statistics of autocorrelation sequence peaks of HE (not LP residual)
Statistics of autocorrelation sequence
Scope of future work
    Extending the framework for other audio
●

    indexing applications
    Exploring methods to add suprasegmental
●

    information to the combined system
    (though far away..) Building a multimedia
●

    indexing system
Summary and conclusions
    Need to organize audio data because of its large volume and
●



    need in real-life applications
    Presence of audio specific information in LP residual
●




    AANN model's ability to capture subsegmental information
●



    from residual for the task
    Use of MLP for decision making using the information
●



    captured by AANN
    Complementary nature of source information to the system
●



    information
    Presence of audio-specific suprasegmental information in LP
●



    residual
Major contributions
    Extraction of audio-specific information from LP
●

    residual using NN models
    Showing the complementary nature of source and
●

    system information for the audio clip
    classification task
    Showing the presence of audio-specific
●

    suprasegmental information in LP residual
References
     T. Zhang and C.-C. J. Kuo, quot;Content-based classification and retrieval of audio,quot; in Conference on
1.




     Advanced Signal Processing Algorithms, Architectures, and Implementations VIII, San Diego,
     California, July 1998, vol. 3461 of Proc. of SPIE.
     J. Makhoul, F. Kubala, T. Leek, D. Liu, L. Nguyen, R. Schwartz, and A. Srivastava. “Speech and
2.




     language technologies for audio indexing and retrieval”, in Proc. of the IEEE, 88(8), pp. 1338-1353,
     2000.
     Y. Wang, Z. Liu, and J. Huang. “Multimedia Content Analysis using Audio and Visual Clues”,
3.




     IEEE SP Magazine, 17(6), Nov. 2000.
     M.A. Kramer, quot;Nonlinear principal component analysis using autoassociative neural networks,quot;
4.




     AIChE Journal, vol. 37, pp. 233-243, Feb. 1991.
     J. Makhoul, quot;Linear prediction: A tutorial review,quot; in Proc. IEEE, vol. 63, pp. 561--580, 1975.
5.




     B. Yegnanarayana, S.R.M. Prasanna, and K.S. Rao, “Speech enhancement using excitation source
6.




     information,'' in Proc. Int. Conf. Acoust., Speech, Signal Processing, Orlando, FL, USA, May 2002.
     S.R.M. Prasanna, Ch.S. Gupta, and B. Yegnanarayana, “Autoassociative neural network models for
7.




     speaker verification using source features,'' in Proc. Sixth Int. Conf. Cognitive Neural Systems,
     Boston University, Boston, USA, May-June 2002.
     B. Yegnanarayana, Artificial Neural Networks, Prentice Hall of India, New Delhi, 1999.
8.
Related publications
1.   Anvita Bajpai and B. Yegnanarayana, “Audio Clip Classification using LP
     Residual and Neural Networks Models”, European Signal and Image
     Processing Conference (EUSIPCO-2004), Vienna, Austria, 6-10 September
     2004
2.   Anvita Bajpai and B. Yegnanarayana, “Exploring Features for Audio
     Indexing using LP Residual and AANN Models”, accepted for The 17th
     International FLAIRS Conference (FLAIRS - 2004), Miami Beach, Florida,
     17-19 May 2004.
3.   Anvita Bajpai and B. Yegnanarayana, “Exploring Features for Audio Clip
     Classification using LP Residual and Neural Networks Models”,
     International Conference on Intelligent Signal and Image Processing (ICISIP-
     2004), Chennai, India, 4-7 January 2004
4.   Gaurav Aggarwal, Anvita Bajpai and B. Yegnanarayana, “Exploring
     Features for Audio Indexing”, in Indian Research Scholar Seminar (IRIS-
     2002), Indian Institute of Science, Bangalore, India, March 2002
Following are extra slides not part of
main presentation
Effect of # of epochs used for AANN
training




    Confidence scores output of 6 AANN models for a news test clip
Even well-trained humans don't always react the
●

    way they were trained.
        Source: www.computer.org/computer/homepage/
    –
        0103/random/r1014.pdf, by Bob Colwell
Classification of audio using spectral features
  •Extraction of features - based on
    –Volume
            Standard deviation and Dynamic range of volume, Volume
        ●



            undulation, 4Hz modulation energy
    –Zero Crossing Rate
            Standard deviation of ZCR, Silence-nonsilence ratio
        ●




    –Pitch
            Pitch contour, Pitch standard deviation, Similar pitch ratio, Pitch-
        ●



            nonpitch ratio
    –Spectrum
            Frequency centroid, Bandwidth, Ratio of energy in various frequency
        ●



            sub-bands
Features for Categorization of Audio Clips
(4Hz modulation energy)




                   Cricket                   Football




                              News
Features for Categorization of Audio Clips
            (Similar Pitch Ratio)            .
(Contd..)




                   Cricket                   Football




                                News
Importance of Task Dependent Feature
          (Standard deviation of ZCR)




                   Speaker 1                    Speaker 2




                                        Music

Mais conteúdo relacionado

Mais procurados

Matched filter detection
Matched filter detectionMatched filter detection
Matched filter detectionSURYA DEEPAK
 
An Introduction to Eye Diagram, Phase Noise and Jitter
An Introduction to Eye Diagram, Phase Noise and JitterAn Introduction to Eye Diagram, Phase Noise and Jitter
An Introduction to Eye Diagram, Phase Noise and JitterDr. Mohieddin Moradi
 
Sound Source Localization
Sound Source LocalizationSound Source Localization
Sound Source LocalizationMuhammad Imran
 
Adaptive filter
Adaptive filterAdaptive filter
Adaptive filterA. Shamel
 
Audio Codec
Audio CodecAudio Codec
Audio Codeclesleyw
 
image basics and image compression
image basics and image compressionimage basics and image compression
image basics and image compressionmurugan hari
 
Digital Image Processing: Image Segmentation
Digital Image Processing: Image SegmentationDigital Image Processing: Image Segmentation
Digital Image Processing: Image SegmentationMostafa G. M. Mostafa
 
Audio compression
Audio compression Audio compression
Audio compression Darshan IT
 
Simultaneous Smoothing and Sharpening of Color Images
Simultaneous Smoothing and Sharpening of Color ImagesSimultaneous Smoothing and Sharpening of Color Images
Simultaneous Smoothing and Sharpening of Color ImagesCristina Pérez Benito
 
Image degradation and noise by Md.Naseem Ashraf
Image degradation and noise by Md.Naseem AshrafImage degradation and noise by Md.Naseem Ashraf
Image degradation and noise by Md.Naseem AshrafMD Naseem Ashraf
 
Electronics and Telecommunications: Frequency modulation
Electronics and Telecommunications: Frequency modulationElectronics and Telecommunications: Frequency modulation
Electronics and Telecommunications: Frequency modulationArti Parab Academics
 

Mais procurados (20)

Speech processing
Speech processingSpeech processing
Speech processing
 
Matched filter detection
Matched filter detectionMatched filter detection
Matched filter detection
 
Adaptive filter
Adaptive filterAdaptive filter
Adaptive filter
 
An Introduction to Eye Diagram, Phase Noise and Jitter
An Introduction to Eye Diagram, Phase Noise and JitterAn Introduction to Eye Diagram, Phase Noise and Jitter
An Introduction to Eye Diagram, Phase Noise and Jitter
 
Audio compression
Audio compressionAudio compression
Audio compression
 
Noise
NoiseNoise
Noise
 
image compression ppt
image compression pptimage compression ppt
image compression ppt
 
Sound Source Localization
Sound Source LocalizationSound Source Localization
Sound Source Localization
 
Adaptive filter
Adaptive filterAdaptive filter
Adaptive filter
 
Audio Codec
Audio CodecAudio Codec
Audio Codec
 
Wavelet
WaveletWavelet
Wavelet
 
image basics and image compression
image basics and image compressionimage basics and image compression
image basics and image compression
 
NOISE FILTERS IN IMAGE PROCESSING
NOISE FILTERS IN IMAGE PROCESSINGNOISE FILTERS IN IMAGE PROCESSING
NOISE FILTERS IN IMAGE PROCESSING
 
Digital Image Processing: Image Segmentation
Digital Image Processing: Image SegmentationDigital Image Processing: Image Segmentation
Digital Image Processing: Image Segmentation
 
Audio compression
Audio compression Audio compression
Audio compression
 
Simultaneous Smoothing and Sharpening of Color Images
Simultaneous Smoothing and Sharpening of Color ImagesSimultaneous Smoothing and Sharpening of Color Images
Simultaneous Smoothing and Sharpening of Color Images
 
Audio system
Audio systemAudio system
Audio system
 
Image degradation and noise by Md.Naseem Ashraf
Image degradation and noise by Md.Naseem AshrafImage degradation and noise by Md.Naseem Ashraf
Image degradation and noise by Md.Naseem Ashraf
 
Audio compression
Audio compressionAudio compression
Audio compression
 
Electronics and Telecommunications: Frequency modulation
Electronics and Telecommunications: Frequency modulationElectronics and Telecommunications: Frequency modulation
Electronics and Telecommunications: Frequency modulation
 

Destaque

WolframAlpha Wave of the Future
WolframAlpha Wave of the FutureWolframAlpha Wave of the Future
WolframAlpha Wave of the FutureFred Feldon
 
Reci Q1 2016 Houston MOB Report
Reci Q1 2016 Houston MOB ReportReci Q1 2016 Houston MOB Report
Reci Q1 2016 Houston MOB ReportThomas Amato
 
Balloon tree 넌혼자가 아니야
Balloon tree 넌혼자가 아니야Balloon tree 넌혼자가 아니야
Balloon tree 넌혼자가 아니야은 허
 
Journal of Shellfish Research suplicy et el
Journal of Shellfish Research suplicy et elJournal of Shellfish Research suplicy et el
Journal of Shellfish Research suplicy et elFelipe Matarazzo Suplicy
 
The Contradictions of Sustainable Urban Water Management
The Contradictions of Sustainable Urban Water ManagementThe Contradictions of Sustainable Urban Water Management
The Contradictions of Sustainable Urban Water ManagementGreen Initiatives 绿色倡议
 
Visita a la casa del agua en rioseco
Visita a la casa del agua en riosecoVisita a la casa del agua en rioseco
Visita a la casa del agua en riosecoisabelri
 

Destaque (8)

WolframAlpha Wave of the Future
WolframAlpha Wave of the FutureWolframAlpha Wave of the Future
WolframAlpha Wave of the Future
 
Reci Q1 2016 Houston MOB Report
Reci Q1 2016 Houston MOB ReportReci Q1 2016 Houston MOB Report
Reci Q1 2016 Houston MOB Report
 
Balloon tree 넌혼자가 아니야
Balloon tree 넌혼자가 아니야Balloon tree 넌혼자가 아니야
Balloon tree 넌혼자가 아니야
 
PITS.MOBI
PITS.MOBIPITS.MOBI
PITS.MOBI
 
Journal of Shellfish Research suplicy et el
Journal of Shellfish Research suplicy et elJournal of Shellfish Research suplicy et el
Journal of Shellfish Research suplicy et el
 
Pierangelo De Poli
Pierangelo De Poli Pierangelo De Poli
Pierangelo De Poli
 
The Contradictions of Sustainable Urban Water Management
The Contradictions of Sustainable Urban Water ManagementThe Contradictions of Sustainable Urban Water Management
The Contradictions of Sustainable Urban Water Management
 
Visita a la casa del agua en rioseco
Visita a la casa del agua en riosecoVisita a la casa del agua en rioseco
Visita a la casa del agua en rioseco
 

Semelhante a Anvita Audio Classification Presentation

Random Artificial Incorporation of Noise in a Learning Classifier System Envi...
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...Random Artificial Incorporation of Noise in a Learning Classifier System Envi...
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...Daniele Loiacono
 
Anvita Wisp 2007 Presentation
Anvita Wisp 2007 PresentationAnvita Wisp 2007 Presentation
Anvita Wisp 2007 Presentationguest6e7a1b1
 
Anvita Ncvpripg 2008 Presentation
Anvita Ncvpripg 2008 PresentationAnvita Ncvpripg 2008 Presentation
Anvita Ncvpripg 2008 Presentationguest6e7a1b1
 
FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)IRJET Journal
 
Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...
Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...
Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...Md Kafiul Islam
 
03-Reznik-DASH-IF-workshop-2019-CAE.pdf
03-Reznik-DASH-IF-workshop-2019-CAE.pdf03-Reznik-DASH-IF-workshop-2019-CAE.pdf
03-Reznik-DASH-IF-workshop-2019-CAE.pdfJunZhao68
 
Image and Video Compression, A brief history - Wang.ppt
Image and Video Compression, A brief history - Wang.pptImage and Video Compression, A brief history - Wang.ppt
Image and Video Compression, A brief history - Wang.pptNeutronZion
 
Don W Bregofirmware Cp
Don W Bregofirmware CpDon W Bregofirmware Cp
Don W Bregofirmware Cpdonwelch
 
[SNU Computer Vision Course Project] Image Style Recognition
[SNU Computer Vision Course Project] Image Style Recognition[SNU Computer Vision Course Project] Image Style Recognition
[SNU Computer Vision Course Project] Image Style RecognitionHunjae Jung
 
Music genre prediction
Music genre predictionMusic genre prediction
Music genre predictionAnusha Chavva
 
Next generation image compression standards: JPEG XR and AIC
Next generation image compression standards: JPEG XR and AICNext generation image compression standards: JPEG XR and AIC
Next generation image compression standards: JPEG XR and AICTouradj Ebrahimi
 
David Prendergast - Innovative Physics - From AI to Fukushima - Isle of Wight...
David Prendergast - Innovative Physics - From AI to Fukushima - Isle of Wight...David Prendergast - Innovative Physics - From AI to Fukushima - Isle of Wight...
David Prendergast - Innovative Physics - From AI to Fukushima - Isle of Wight...onthewight
 
Scalawox deeplearning
Scalawox deeplearningScalawox deeplearning
Scalawox deeplearningscalawox
 
Niqa competitive alternative for non-intrusive voice quality testing (p.563)
Niqa   competitive alternative for non-intrusive voice quality testing (p.563)Niqa   competitive alternative for non-intrusive voice quality testing (p.563)
Niqa competitive alternative for non-intrusive voice quality testing (p.563)Sevana Oü
 
Multispectral imaging in Forensics with VideometerLab 3
Multispectral imaging in Forensics with VideometerLab 3Multispectral imaging in Forensics with VideometerLab 3
Multispectral imaging in Forensics with VideometerLab 3Adrian Waltho
 
Project - Sound Model Similarity Search
Project - Sound Model Similarity SearchProject - Sound Model Similarity Search
Project - Sound Model Similarity SearchSudarshan Bala
 
LISUN Spectral colorimeter
LISUN Spectral colorimeterLISUN Spectral colorimeter
LISUN Spectral colorimeter世满 江
 
極紫外線散射儀於先進製程檢測應用
極紫外線散射儀於先進製程檢測應用極紫外線散射儀於先進製程檢測應用
極紫外線散射儀於先進製程檢測應用CHENHuiMei
 

Semelhante a Anvita Audio Classification Presentation (20)

Random Artificial Incorporation of Noise in a Learning Classifier System Envi...
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...Random Artificial Incorporation of Noise in a Learning Classifier System Envi...
Random Artificial Incorporation of Noise in a Learning Classifier System Envi...
 
Anvita Wisp 2007 Presentation
Anvita Wisp 2007 PresentationAnvita Wisp 2007 Presentation
Anvita Wisp 2007 Presentation
 
Anvita Ncvpripg 2008 Presentation
Anvita Ncvpripg 2008 PresentationAnvita Ncvpripg 2008 Presentation
Anvita Ncvpripg 2008 Presentation
 
FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)FORECASTING MUSIC GENRE (RNN - LSTM)
FORECASTING MUSIC GENRE (RNN - LSTM)
 
Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...
Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...
Poster Presentation on "Artifact Characterization and Removal for In-Vivo Neu...
 
03-Reznik-DASH-IF-workshop-2019-CAE.pdf
03-Reznik-DASH-IF-workshop-2019-CAE.pdf03-Reznik-DASH-IF-workshop-2019-CAE.pdf
03-Reznik-DASH-IF-workshop-2019-CAE.pdf
 
Av Recognition
Av RecognitionAv Recognition
Av Recognition
 
Image and Video Compression, A brief history - Wang.ppt
Image and Video Compression, A brief history - Wang.pptImage and Video Compression, A brief history - Wang.ppt
Image and Video Compression, A brief history - Wang.ppt
 
Don W Bregofirmware Cp
Don W Bregofirmware CpDon W Bregofirmware Cp
Don W Bregofirmware Cp
 
[SNU Computer Vision Course Project] Image Style Recognition
[SNU Computer Vision Course Project] Image Style Recognition[SNU Computer Vision Course Project] Image Style Recognition
[SNU Computer Vision Course Project] Image Style Recognition
 
Presentation-Umar
Presentation-UmarPresentation-Umar
Presentation-Umar
 
Music genre prediction
Music genre predictionMusic genre prediction
Music genre prediction
 
Next generation image compression standards: JPEG XR and AIC
Next generation image compression standards: JPEG XR and AICNext generation image compression standards: JPEG XR and AIC
Next generation image compression standards: JPEG XR and AIC
 
David Prendergast - Innovative Physics - From AI to Fukushima - Isle of Wight...
David Prendergast - Innovative Physics - From AI to Fukushima - Isle of Wight...David Prendergast - Innovative Physics - From AI to Fukushima - Isle of Wight...
David Prendergast - Innovative Physics - From AI to Fukushima - Isle of Wight...
 
Scalawox deeplearning
Scalawox deeplearningScalawox deeplearning
Scalawox deeplearning
 
Niqa competitive alternative for non-intrusive voice quality testing (p.563)
Niqa   competitive alternative for non-intrusive voice quality testing (p.563)Niqa   competitive alternative for non-intrusive voice quality testing (p.563)
Niqa competitive alternative for non-intrusive voice quality testing (p.563)
 
Multispectral imaging in Forensics with VideometerLab 3
Multispectral imaging in Forensics with VideometerLab 3Multispectral imaging in Forensics with VideometerLab 3
Multispectral imaging in Forensics with VideometerLab 3
 
Project - Sound Model Similarity Search
Project - Sound Model Similarity SearchProject - Sound Model Similarity Search
Project - Sound Model Similarity Search
 
LISUN Spectral colorimeter
LISUN Spectral colorimeterLISUN Spectral colorimeter
LISUN Spectral colorimeter
 
極紫外線散射儀於先進製程檢測應用
極紫外線散射儀於先進製程檢測應用極紫外線散射儀於先進製程檢測應用
極紫外線散射儀於先進製程檢測應用
 

Último

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Último (20)

Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Anvita Audio Classification Presentation

  • 1. Audio Clip Classification Anvita Bajpai anvita@mailcity.com
  • 3. Exploding information One hour of TV broadcast across the world is 100 Petabyte. ● Source: http://www.sims.berkeley.edu/research/projects/how-much-info/summary.html#tv
  • 4. Audio indexing Reason of choosing audio data for study ● Easier to process – Contains significant information – Indexing – method of organizing data for further ● search and retrieval Example – book indexing – Audio Indexing – indexing non-text data using ● audio part of it
  • 5. Example of an audio indexing system Source: J. Makhoul, F. Kubala, T. Leek, D. Liu, L. Nguyen, R. Schwartz, and A. Srivastava. “Speech and language technologies for audio indexing and retrieval”, in Proc. of the IEEE, 88(8), pp. 1338-1353, 2000.
  • 6. More examples of audio indexing tasks Spoken document retrieval ● Speaker identification ● Language identification ● Music classification ● Music/speech discrimination ● Audio classification ● An important step in building an audio indexing system. –
  • 7. Levels of information in audio signal Subsegmental information ● Related to excitation source characteristics – Segmental information ● Related to system / physiological characteristics – Suprasegmental information ● Related to behavioural characteristics of audio –
  • 8. Audio clip classification Closed set problem ● To classify a given audio clip in one of the ● following predefined categories Advertisement – Cartoon – Cricket – Football – News –
  • 9. Issues in audio clip classification Feature extraction ● Effective representation of data to capture all – significant properties of audio for the task Robust under various conditions – Classification ● Formulation of a distance measure and rule/models – Training a models for the task ● Testing – actual classification task ● Combining evidences from different systems ●
  • 10. Missing component in existing approaches and it's importance Features derived based on spectral analysis ● Carry significant properties of audio data at segmental level – Miss information present at subsegmental, suprasegmental level – Perceptually significant information in linear prediction ● (LP) residual of signal Complimentary in nature to the spectral information – Subsegmental and suprasegmental information not being used – in current systems
  • 11. Presence of audio-specific Residual Original information in LP residual Aa_res.wav Aa1.wav Aa1.wav
  • 12. Extracting audio-specific information from LP residual LP residual – May contain higher order correlation among samples ● It is difficult to extract it using standard signal processing and ● statistical techniques Hence proposed autoassociative neural networks – (AANN) models to capture information from residual Used to capture features ● for speaker recognition task Structure of network ● 40L 48N 12N 48N 40L –
  • 13. Use of audio component knowledge Audio category ● Composed of one or more audio components – Audio component ● Specific to an audio category – Six components chosen for study ● Music – Speech - Conversational, Cartoon, Clean – Noise - Football, Cricket –
  • 14. Training phase of AANN models Trained one AANN model for each of six ● components Models trained ● for 2000 epochs AANN training error curve
  • 15. Testing phase (confidence scores output of 6 AANN models for a news test clip) a) for a segment of the clip, (b) expended version of the same. Duration of total test clip is 10 sec
  • 16. Work flow diagram (of 6 components) MLP – Multilayer perceptron
  • 17. MLP for decision making task MLP for capturing audio-specific information ● captured by AANN, as it is Suitable for pattern recognition tasks – Have ability to form complex decision surface by – using discriminating learning algorithms Structure of MLP used - 6L 24N 12N 5N ●
  • 18. Confidence scores output of 6 component AANN models Contd... 24 12 Nodes 6 5 M A S1 C Audio Category S2 K S3 F N1 N N2 OP layer IP layer Hidden layers
  • 19. Classification results Audio class % of clips correctly classified DB1 DB2 Advertisement 83.00% 43.50% Cartoon 88.00% 45.50% Cricket 86.00% 38.50% Football 90.50% 75.50% News 85.50% 63.30% Average 86.60% 53.26% DB1 – Data collected from single TV channel, contains 200 clips, 40 of each category DB2 – Data collected across all broadcasted channels, contains 1659 clips, Adv. – 226, Cartoon – 208, Cricket – 318, Football – 600, News – 306
  • 20. Classification results for spectral 1 features-based system Audio class % of clips correctly classified Spectral features-based system LP residual-based system DB1 DB2 DB1 DB2 Advertisement 85.00% 65.00% 83.00% 43.50% Cartoon 90.00% 75.00% 88.00% 45.50% Cricket 90.00% 65.00% 86.00% 38.50% Football 92.50% 40.00% 90.50% 75.60% News 87.50% 65.30% 85.50% 63.30% Average 89.00% 62.06% 86.60% 53.26% Ref. [1] Gaurav Aggarwal, Features for Audio Indexing, M Tech report, CSE Deptt, IIT Madras, Apr. 2002
  • 21. Classification results from source, spectral features-based systems A System 1 System 2 A – All test audio clips (DB2) System 1 – clips recognised using spectral features-based system System 2 – clips recognised using excitation source (LP residual) based system
  • 22. Results of combined (subsegmental and segmental) system for DB2 Audio class % of clips correctly classified in systems Spectral LP residual Abstract level Rank+measurement level Based based Combination Combination Advertisement 65.00% 43.50% 83.00% 92.47% Cartoon 75.00% 45.50% 92.00% 98.55% Cricket 65.00% 38.50% 87.50% 88.67% Football 40.00% 75.60% 87.00% 91.16% News 65.30% 63.30% 86.30% 95.10% Average 62.06% 53.26% 87.25% 93.18%
  • 23. uprasegmental information in Hilbert nvelope of LP residual of audio signal
  • 24. Suprasegmental information in LP residual for audio clip classification Autocorrelation samples of Hilbert envelope of LP residual for 5 audio classes
  • 25. Statistics of autocorrelation sequence Correction – here we have statistics of autocorrelation sequence peaks of HE (not LP residual)
  • 27. Scope of future work Extending the framework for other audio ● indexing applications Exploring methods to add suprasegmental ● information to the combined system (though far away..) Building a multimedia ● indexing system
  • 28. Summary and conclusions Need to organize audio data because of its large volume and ● need in real-life applications Presence of audio specific information in LP residual ● AANN model's ability to capture subsegmental information ● from residual for the task Use of MLP for decision making using the information ● captured by AANN Complementary nature of source information to the system ● information Presence of audio-specific suprasegmental information in LP ● residual
  • 29. Major contributions Extraction of audio-specific information from LP ● residual using NN models Showing the complementary nature of source and ● system information for the audio clip classification task Showing the presence of audio-specific ● suprasegmental information in LP residual
  • 30. References T. Zhang and C.-C. J. Kuo, quot;Content-based classification and retrieval of audio,quot; in Conference on 1. Advanced Signal Processing Algorithms, Architectures, and Implementations VIII, San Diego, California, July 1998, vol. 3461 of Proc. of SPIE. J. Makhoul, F. Kubala, T. Leek, D. Liu, L. Nguyen, R. Schwartz, and A. Srivastava. “Speech and 2. language technologies for audio indexing and retrieval”, in Proc. of the IEEE, 88(8), pp. 1338-1353, 2000. Y. Wang, Z. Liu, and J. Huang. “Multimedia Content Analysis using Audio and Visual Clues”, 3. IEEE SP Magazine, 17(6), Nov. 2000. M.A. Kramer, quot;Nonlinear principal component analysis using autoassociative neural networks,quot; 4. AIChE Journal, vol. 37, pp. 233-243, Feb. 1991. J. Makhoul, quot;Linear prediction: A tutorial review,quot; in Proc. IEEE, vol. 63, pp. 561--580, 1975. 5. B. Yegnanarayana, S.R.M. Prasanna, and K.S. Rao, “Speech enhancement using excitation source 6. information,'' in Proc. Int. Conf. Acoust., Speech, Signal Processing, Orlando, FL, USA, May 2002. S.R.M. Prasanna, Ch.S. Gupta, and B. Yegnanarayana, “Autoassociative neural network models for 7. speaker verification using source features,'' in Proc. Sixth Int. Conf. Cognitive Neural Systems, Boston University, Boston, USA, May-June 2002. B. Yegnanarayana, Artificial Neural Networks, Prentice Hall of India, New Delhi, 1999. 8.
  • 31. Related publications 1. Anvita Bajpai and B. Yegnanarayana, “Audio Clip Classification using LP Residual and Neural Networks Models”, European Signal and Image Processing Conference (EUSIPCO-2004), Vienna, Austria, 6-10 September 2004 2. Anvita Bajpai and B. Yegnanarayana, “Exploring Features for Audio Indexing using LP Residual and AANN Models”, accepted for The 17th International FLAIRS Conference (FLAIRS - 2004), Miami Beach, Florida, 17-19 May 2004. 3. Anvita Bajpai and B. Yegnanarayana, “Exploring Features for Audio Clip Classification using LP Residual and Neural Networks Models”, International Conference on Intelligent Signal and Image Processing (ICISIP- 2004), Chennai, India, 4-7 January 2004 4. Gaurav Aggarwal, Anvita Bajpai and B. Yegnanarayana, “Exploring Features for Audio Indexing”, in Indian Research Scholar Seminar (IRIS- 2002), Indian Institute of Science, Bangalore, India, March 2002
  • 32. Following are extra slides not part of main presentation
  • 33. Effect of # of epochs used for AANN training Confidence scores output of 6 AANN models for a news test clip
  • 34. Even well-trained humans don't always react the ● way they were trained. Source: www.computer.org/computer/homepage/ – 0103/random/r1014.pdf, by Bob Colwell
  • 35. Classification of audio using spectral features •Extraction of features - based on –Volume Standard deviation and Dynamic range of volume, Volume ● undulation, 4Hz modulation energy –Zero Crossing Rate Standard deviation of ZCR, Silence-nonsilence ratio ● –Pitch Pitch contour, Pitch standard deviation, Similar pitch ratio, Pitch- ● nonpitch ratio –Spectrum Frequency centroid, Bandwidth, Ratio of energy in various frequency ● sub-bands
  • 36. Features for Categorization of Audio Clips (4Hz modulation energy) Cricket Football News
  • 37. Features for Categorization of Audio Clips (Similar Pitch Ratio) . (Contd..) Cricket Football News
  • 38. Importance of Task Dependent Feature (Standard deviation of ZCR) Speaker 1 Speaker 2 Music