SlideShare a Scribd company logo
1 of 26
Download to read offline
Identifying and Annotating Speakers
in Phone Conversations
Identifying and Annotating Speakers
in Phone Conversations
Yuriy Guts
DataRobot
Yuriy Guts
DataRobot
SPEAKER A SPEAKER B SPEAKER A SPEAKER C SPEAKER B
No prior knowledge about the speakers whatsoever.
1. Extract compact metadata from bulky multimedia sources.
2. Enable information retrieval queries for audio content.
3. Provide training data for upstream modeling projects:
○ Speaker identification
○ Speech recognition
○ Emotion analysis
Speaker Diarization: Why Do It?
Cocktail Party Problem? ICA To The Rescue?
Nope. We have mono signal, so # input mixtures < # desired separated outputs
Source Types & Characteristics
Broadcast News Meetings Phone Calls
Uninterrupted speech : Longer segments Shorter segments Shorter segments
Speaker overlap : Negligible High Moderate
Background conditions :
Diverse: music, jingles,
background events
Uniform Uniform (but can be noisy)
Dominant speaker : Yes (anchor) Unknown Unknown
Number of speakers : Unknown Unknown Unknown (but usually 2)
Original publication: http://www.slideshare.net/ParthePandit/parthepandit10d070009ddpthesis-53767213
100 hrs of recorded customer support calls.
● Mono WAV files, 8 kHz.
● Some files (55 hrs) are human-labeled:
Input Data
0000.00,0005.63,agent,Female
0005.63,0017.13,customer,Female
0017.13,0027.76,agent,Female
0027.76,0034.94,customer,Female
0034.94,0035.89,agent,Female
0035.89,0044.50,silence,None
0044.50,0046.20,unrelated,None
0046.20,0049.10,customer,Female
0049.10,0050.14,agent,Female
0050.14,0060.99,silence,None
0060.99,0066.60,agent,Female
0066.60,0080.12,customer,Female
1. Audio durations are huge.
20 sec min, 12 min avg, 2 hours max.
2. Lots of waiting on the line.
Meaningful speech takes only ~55% of the time.
3. Inconsistent connection quality, lots of crackling and background noise.
Particular Challenges of Support Calls
1. Non-speech activity cutoff.
Overall Approach for Speaker Diarization
For phone conversations, a simple
RMSE / ZCR threshold works fine.
Otherwise, build a supervised VAD.
2. Changepoint candidate detection.
3. Segment recombination (clustering).
Process Raw Spectrogram?
MFCC (Mel-Frequency Cepstral Coefficients)
MFCC
Sliding window (0.5–3 sec duration)
Statistical hypothesis testing:
Are two adjacent windows better modeled by one distribution or two?
Changepoint Detection
Models: GMMs over MFCCs
Log-likelihood of the data points given the model
Bayesian Information Criterion (BIC)
Number of parameters of the model
Number of data points the model was trained on
Penalty term
Lower BIC = better fit
A positive value indicates dissimilarity between the audio windows.
Can use a threshold on to detect changepoints.
Delta-BIC Distance Metric
def gmm_delta_bic(gmm1, gmm2, X1, X2):
gmm_combined = sklearn.mixture.GaussianMixture(
n_components=NUM_GAUSSIANS_PER_GMM,
covariance_type="full",
random_state=42,
max_iter=200,
verbose=0
)
X_combined = np.vstack([X1, X2])
gmm_combined.fit(X_combined)
bic_combined = gmm_combined.bic(X_combined)
bic_gmm1 = gmm1.bic(X1)
bic_gmm2 = gmm2.bic(X2)
return bic_combined - bic_gmm1 - bic_gmm2
Can also use KL-Divergence for multivariate single-Gaussian GMMs:
KL-Divergence
Will be faster, but can produce more noisy changepoints.
Clustering
Perform hierarchical
agglomerative clustering (HAC)
based on until the
stopping criterion is met.
Clustering
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
Clustering
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
Clustering
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
Clustering
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
Clustering
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
Clustering
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
1. Non-speech activity cutoff by RMSE threshold, analysis length = 0.5 sec.
2. Modeling 0.5 sec segments with single-Gaussian GMMs over 19-dimensional
MFCCs. MFCC analysis length = 30 ms, frame hop length = 10 ms.
3. Potential changepoint detection based on KL-Divergence.
4. HAC based on ΔBIC with a stopping threshold (or prior number of speakers).
Final System
DER (Diarization Error Rate)
Evaluation
DER = Missed Speech Time % + False Alarm Time % + Speaker Error Time %
Evaluation
Can use Hungarian algorithm
to find the best match between
ground truth labels and
hypothesis labels.
Achieved avg. DER: 16.3%
(2.7% missed, 5% false alarm,
8.6% wrong speaker)
State of the art: 14.6–16.1%
depending on source type.
Time: 2x–5x faster than RT
(HAC has quadratic complexity)

More Related Content

Similar to DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонных разговорах _Юрий Гуц

Dcase2016 oral presentation - Experiments on DCASE 2016: Acoustic Scene Class...
Dcase2016 oral presentation - Experiments on DCASE 2016: Acoustic Scene Class...Dcase2016 oral presentation - Experiments on DCASE 2016: Acoustic Scene Class...
Dcase2016 oral presentation - Experiments on DCASE 2016: Acoustic Scene Class...Ankit Shah
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
 
03 image transformations_i
03 image transformations_i03 image transformations_i
03 image transformations_iankit_ppt
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習台灣資料科學年會
 
Lecture 8 audio compression
Lecture 8 audio compressionLecture 8 audio compression
Lecture 8 audio compressionMr SMAK
 
Lecture 8 audio compression
Lecture 8 audio compressionLecture 8 audio compression
Lecture 8 audio compressionMr SMAK
 
A computer vision approach to speech enhancement
A computer vision approach to speech enhancementA computer vision approach to speech enhancement
A computer vision approach to speech enhancementRamin Anushiravani
 
ICLR 2 papers review in signal processing domain
ICLR 2 papers review in signal processing domain ICLR 2 papers review in signal processing domain
ICLR 2 papers review in signal processing domain June-Woo Kim
 
VII Compression Introduction
VII Compression IntroductionVII Compression Introduction
VII Compression Introductionsangusajjan
 
Audio Mastering
Audio MasteringAudio Mastering
Audio MasteringJoe Nasr
 
Gbm.more GBM in H2O
Gbm.more GBM in H2OGbm.more GBM in H2O
Gbm.more GBM in H2OSri Ambati
 

Similar to DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонных разговорах _Юрий Гуц (20)

Speaker Segmentation (2006)
Speaker Segmentation (2006)Speaker Segmentation (2006)
Speaker Segmentation (2006)
 
Dcase2016 oral presentation - Experiments on DCASE 2016: Acoustic Scene Class...
Dcase2016 oral presentation - Experiments on DCASE 2016: Acoustic Scene Class...Dcase2016 oral presentation - Experiments on DCASE 2016: Acoustic Scene Class...
Dcase2016 oral presentation - Experiments on DCASE 2016: Acoustic Scene Class...
 
SEGAN: Speech Enhancement Generative Adversarial Network
SEGAN: Speech Enhancement Generative Adversarial NetworkSEGAN: Speech Enhancement Generative Adversarial Network
SEGAN: Speech Enhancement Generative Adversarial Network
 
MPEG/Audio Compression
MPEG/Audio CompressionMPEG/Audio Compression
MPEG/Audio Compression
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
 
Thesis
ThesisThesis
Thesis
 
Speech coding techniques
Speech coding techniquesSpeech coding techniques
Speech coding techniques
 
Soundpres
SoundpresSoundpres
Soundpres
 
Audio Mixing Console
Audio Mixing ConsoleAudio Mixing Console
Audio Mixing Console
 
03 image transformations_i
03 image transformations_i03 image transformations_i
03 image transformations_i
 
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
[DSC 2016] 系列活動:李宏毅 / 一天搞懂深度學習
 
Lec5 Compression
Lec5 CompressionLec5 Compression
Lec5 Compression
 
Lecture 8 audio compression
Lecture 8 audio compressionLecture 8 audio compression
Lecture 8 audio compression
 
Lecture 8 audio compression
Lecture 8 audio compressionLecture 8 audio compression
Lecture 8 audio compression
 
A computer vision approach to speech enhancement
A computer vision approach to speech enhancementA computer vision approach to speech enhancement
A computer vision approach to speech enhancement
 
ICLR 2 papers review in signal processing domain
ICLR 2 papers review in signal processing domain ICLR 2 papers review in signal processing domain
ICLR 2 papers review in signal processing domain
 
VII Compression Introduction
VII Compression IntroductionVII Compression Introduction
VII Compression Introduction
 
Audio Mastering
Audio MasteringAudio Mastering
Audio Mastering
 
Gbm.more GBM in H2O
Gbm.more GBM in H2OGbm.more GBM in H2O
Gbm.more GBM in H2O
 

More from GeeksLab Odessa

DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...GeeksLab Odessa
 
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...GeeksLab Odessa
 
DataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский ВикторDataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский ВикторGeeksLab Odessa
 
DataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображениеDataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображениеGeeksLab Odessa
 
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...GeeksLab Odessa
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладGeeksLab Odessa
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладGeeksLab Odessa
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладGeeksLab Odessa
 
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...GeeksLab Odessa
 
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...GeeksLab Odessa
 
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко GeeksLab Odessa
 
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...GeeksLab Odessa
 
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...GeeksLab Odessa
 
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...GeeksLab Odessa
 
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...GeeksLab Odessa
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...GeeksLab Odessa
 
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот GeeksLab Odessa
 
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...GeeksLab Odessa
 
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js GeeksLab Odessa
 
JS Lab2017_Redux: время двигаться дальше?_Екатерина Лизогубова
JS Lab2017_Redux: время двигаться дальше?_Екатерина ЛизогубоваJS Lab2017_Redux: время двигаться дальше?_Екатерина Лизогубова
JS Lab2017_Redux: время двигаться дальше?_Екатерина ЛизогубоваGeeksLab Odessa
 

More from GeeksLab Odessa (20)

DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
DataScience Lab2017_Коррекция геометрических искажений оптических спутниковых...
 
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
DataScience Lab 2017_Kappa Architecture: How to implement a real-time streami...
 
DataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский ВикторDataScience Lab 2017_Блиц-доклад_Турский Виктор
DataScience Lab 2017_Блиц-доклад_Турский Виктор
 
DataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображениеDataScience Lab 2017_Обзор методов детекции лиц на изображение
DataScience Lab 2017_Обзор методов детекции лиц на изображение
 
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
DataScienceLab2017_Сходство пациентов: вычистка дубликатов и предсказание про...
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
 
DataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-докладDataScienceLab2017_Блиц-доклад
DataScienceLab2017_Блиц-доклад
 
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
DataScienceLab2017_Cервинг моделей, построенных на больших данных с помощью A...
 
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
DataScienceLab2017_BioVec: Word2Vec в задачах анализа геномных данных и биоин...
 
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
DataScienceLab2017_Data Sciences и Big Data в Телекоме_Александр Саенко
 
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
DataScienceLab2017_Высокопроизводительные вычислительные возможности для сист...
 
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
DataScience Lab 2017_Мониторинг модных трендов с помощью глубокого обучения и...
 
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
DataScience Lab 2017_From bag of texts to bag of clusters_Терпиль Евгений / П...
 
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
DataScience Lab 2017_Графические вероятностные модели для принятия решений в ...
 
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
DataScienceLab2017_Оптимизация гиперпараметров машинного обучения при помощи ...
 
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
DataScienceLab2017_Как знать всё о покупателях (или почти всё)?_Дарина Перемот
 
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
JS Lab 2017_Mapbox GL: как работают современные интерактивные карты_Владимир ...
 
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
JS Lab2017_Под микроскопом: блеск и нищета микросервисов на node.js
 
JS Lab2017_Redux: время двигаться дальше?_Екатерина Лизогубова
JS Lab2017_Redux: время двигаться дальше?_Екатерина ЛизогубоваJS Lab2017_Redux: время двигаться дальше?_Екатерина Лизогубова
JS Lab2017_Redux: время двигаться дальше?_Екатерина Лизогубова
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 

Recently uploaded (20)

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

DataScience Lab 2017_Кто здесь? Автоматическая разметка спикеров на телефонных разговорах _Юрий Гуц

  • 1. Identifying and Annotating Speakers in Phone Conversations Identifying and Annotating Speakers in Phone Conversations Yuriy Guts DataRobot Yuriy Guts DataRobot
  • 2. SPEAKER A SPEAKER B SPEAKER A SPEAKER C SPEAKER B No prior knowledge about the speakers whatsoever.
  • 3. 1. Extract compact metadata from bulky multimedia sources. 2. Enable information retrieval queries for audio content. 3. Provide training data for upstream modeling projects: ○ Speaker identification ○ Speech recognition ○ Emotion analysis Speaker Diarization: Why Do It?
  • 4. Cocktail Party Problem? ICA To The Rescue? Nope. We have mono signal, so # input mixtures < # desired separated outputs
  • 5. Source Types & Characteristics Broadcast News Meetings Phone Calls Uninterrupted speech : Longer segments Shorter segments Shorter segments Speaker overlap : Negligible High Moderate Background conditions : Diverse: music, jingles, background events Uniform Uniform (but can be noisy) Dominant speaker : Yes (anchor) Unknown Unknown Number of speakers : Unknown Unknown Unknown (but usually 2) Original publication: http://www.slideshare.net/ParthePandit/parthepandit10d070009ddpthesis-53767213
  • 6. 100 hrs of recorded customer support calls. ● Mono WAV files, 8 kHz. ● Some files (55 hrs) are human-labeled: Input Data 0000.00,0005.63,agent,Female 0005.63,0017.13,customer,Female 0017.13,0027.76,agent,Female 0027.76,0034.94,customer,Female 0034.94,0035.89,agent,Female 0035.89,0044.50,silence,None 0044.50,0046.20,unrelated,None 0046.20,0049.10,customer,Female 0049.10,0050.14,agent,Female 0050.14,0060.99,silence,None 0060.99,0066.60,agent,Female 0066.60,0080.12,customer,Female
  • 7. 1. Audio durations are huge. 20 sec min, 12 min avg, 2 hours max. 2. Lots of waiting on the line. Meaningful speech takes only ~55% of the time. 3. Inconsistent connection quality, lots of crackling and background noise. Particular Challenges of Support Calls
  • 8. 1. Non-speech activity cutoff. Overall Approach for Speaker Diarization For phone conversations, a simple RMSE / ZCR threshold works fine. Otherwise, build a supervised VAD. 2. Changepoint candidate detection. 3. Segment recombination (clustering).
  • 10. MFCC (Mel-Frequency Cepstral Coefficients) MFCC
  • 11. Sliding window (0.5–3 sec duration) Statistical hypothesis testing: Are two adjacent windows better modeled by one distribution or two? Changepoint Detection
  • 13. Log-likelihood of the data points given the model Bayesian Information Criterion (BIC) Number of parameters of the model Number of data points the model was trained on Penalty term Lower BIC = better fit
  • 14. A positive value indicates dissimilarity between the audio windows. Can use a threshold on to detect changepoints. Delta-BIC Distance Metric
  • 15. def gmm_delta_bic(gmm1, gmm2, X1, X2): gmm_combined = sklearn.mixture.GaussianMixture( n_components=NUM_GAUSSIANS_PER_GMM, covariance_type="full", random_state=42, max_iter=200, verbose=0 ) X_combined = np.vstack([X1, X2]) gmm_combined.fit(X_combined) bic_combined = gmm_combined.bic(X_combined) bic_gmm1 = gmm1.bic(X1) bic_gmm2 = gmm2.bic(X2) return bic_combined - bic_gmm1 - bic_gmm2
  • 16. Can also use KL-Divergence for multivariate single-Gaussian GMMs: KL-Divergence Will be faster, but can produce more noisy changepoints.
  • 17. Clustering Perform hierarchical agglomerative clustering (HAC) based on until the stopping criterion is met.
  • 18. Clustering 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
  • 19. Clustering 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
  • 20. Clustering 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
  • 21. Clustering 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
  • 22. Clustering 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
  • 23. Clustering 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0
  • 24. 1. Non-speech activity cutoff by RMSE threshold, analysis length = 0.5 sec. 2. Modeling 0.5 sec segments with single-Gaussian GMMs over 19-dimensional MFCCs. MFCC analysis length = 30 ms, frame hop length = 10 ms. 3. Potential changepoint detection based on KL-Divergence. 4. HAC based on ΔBIC with a stopping threshold (or prior number of speakers). Final System
  • 25. DER (Diarization Error Rate) Evaluation DER = Missed Speech Time % + False Alarm Time % + Speaker Error Time %
  • 26. Evaluation Can use Hungarian algorithm to find the best match between ground truth labels and hypothesis labels. Achieved avg. DER: 16.3% (2.7% missed, 5% false alarm, 8.6% wrong speaker) State of the art: 14.6–16.1% depending on source type. Time: 2x–5x faster than RT (HAC has quadratic complexity)