카카오가 가지고 있는 음성처리 기술

카카오미니는 음성인식을 어떻게 할까?
음성합성은 어떻게 할까?
앞으로 음성기술이 어떻게 발전할까?
노재근(jack.roh)
kakao corp.(음성처리파트)

INDEX
➤ Hey Kakao!
➤ Short history about speech recognition technology at Kakao
➤ Why is that so difficult?
➤ Where are we?
➤ Dive in to Kakao Speech Tech
➤ Echo cancellation
➤ Combi wake up
➤ Far-field, noisy audio
➤ Training
➤ Speaker Verification
➤ TTS
➤ Services that using Kakao Speech Engine
➤ Wrap up
➤ Q&A

➤ Hey Kakao!
➤ A.I. speaker

Hey Kakao!
➤ A.I. speaker
➤ What is A.I.?
➤ How does people imagine

➤ A.I. speaker
➤ What is A.I.?
➤ How does people imagine
➤ What will be the future fundamental technology?
➤ Speech Recognition and Speech Synthesis!
Hey Kakao!

Short history about speech recognition technology at Kakao
➤ Kakao Speech Engine
➤ 2010. 6 국내 최초 음성검색 서비스 출시
➤ 2012. 12 음성인식 전문 기업 Dialoid 인수
➤ 2013. LG유플러스, KT에 음성인식 솔루션 제공
➤ 2013. DNN기반 음성인식기 구현
➤ 2014. 2 음성인식 Newtone API 공개
➤ 2014. 6 음성합성 Newtone talk API 공개
➤ 2015 ~ 2017 카카오맵, 카카오내비, 카카오T, 멜론, 치즈,
카카오버스, 카카오 지하철, 브런치 앱 등에 음성엔진 적용
➤ 2017. 1 Newtone API 하루 2만건 무료 파격 제공
➤ 2017. 7 현대차 제네시스 G70에 카카오 음성인식엔진 탑재
➤ 2017. 11 카카오미니 공식 출시
➤ 2018 카카오 내비와 현대/기아차에 카카오 i 엔진으로 변경중

Why is that so difficult?
➤ How human interaction used by Speech?
➤ Sight & Hearing
➤ Context awareness
➤ Assumption : know vocabulary and its pronunciation
➤ 바베큐
➤ 바비큐
➤ 버니케어
➤ 아뎅큐
➤ 화네큐
➤ 아니큐
➤ 안힉혀
➤ 바늘키워
➤ 다내끼여
➤ 화낼티여
➤ 화낼끼여
➤ ..
➤ ..

Where are we?
다음앱 다음지도 sms 네비게이션 딕테이션
30.9
22.2
26.3
3
G
kakao

Where are we?
1.음악
2.스몰톡
3.알람
4.스피커
5.날씨
6.시간
7.라디오
8.인물
9.뉴스
10.시스템
11.운세
12.팟캐스트
13.주식
14.날짜
15.도움말
16.일정
17.환율
18.메모
19.로또
20.교통정보
21.스포츠봇
22.주문하기
23.콘텐츠
24.TV
25.어학
26.실검
323232
64
32323232323232323232323232
48484848
96969696
448

DIVE IN TO KAKAO SPEECH TECH
➤ Echo cancellation
➤ Combi wake up
➤ Far-ﬁeld, noisy audio
➤ Training
➤ Speaker Veriﬁcation
➤ TTS
➤ What’s next ?

ECHO CANCELLATION
카카오야 헤이카카오

Combi wake up (2)
➤ Cloud-based wake up veriﬁcation
➤ Why?
➤ How?

Far-field, noisy audio
➤ Record near ﬁeld data to far distance with mouth simulator
➤ Room Impulse Response (RIR)
➤ Convolution with various RIR
➤ Adding various noisy data
➤ 2,000 (RIR) * 300 (noisy) = 600,000
➤ 600,000 * randomized current training set
h[n] =
I−1
∑
i=0
rgi
di
δ[n − [
di fi
c0
]]

Training
➤ PM : pronunciation model = G2P
➤ AM : Acoustic Model
➤ LM : Language Model
argwmaxP(W|O) = argwmax
P(O|W)P(W)
P(O)
Decoding AM LM

Training (2)
➤ AM
➤ 불특정 다수 화자의 다양한 발음 특성을 학습하는 과정
➤ 동일 문장을 발음해도 화자나 환경 등에 따라 음성 신호가 다름
➤ 1초 음성이 가질수 있는 경우의 수
➤ 다양한 화자, 환경, 어휘의 데이터를 반영해야 함
➤ 초기 학습 모델은 실제 서비스 환경을 모두 반영하는데 부족
➤ 서비스가 사용되는 환경의 데이터가 학습에 반영되어야 함
➤ 카카오 음향모델 학습 데이터는 약 2만시간
216,000×2×8
= 2256,000
≈ 1076,800

Training (3)
➤ LM
➤ N-gram
➤ 10 TB Data
➤ Everyday LM was built (8 hours) using spark parallel computing

Speaker Verification
➤ 학습에는 수 ~ 수십 초 정도의 짧은 정보를 이용
➤ 화자를 구별하면 개인화 된 서비스 가능
➤ 음악 / 뉴스 추천
➤ 카톡 메세지 수/발신
➤ 결재, 보안 - 주문하기, 송금하기
➤ Challenging Point
➤ 감기 등으로 목소리가 변하는 경우
➤ 주변 잡음이 심한경우
➤ TV 나 다른 사람들의 목소리가 있는 경우
특징추출
화자정보
화자모델
학습
특징추출 화자인식
화자모델
등록과정
인식과정
음성신호
음성신호

• 실제 음성을 이어 붙이는 기술
• 고품질/고비용
• 스타일 변경 어려움
편집 합성 기술
Text를 분석하여 음성으로 변환하는 기술
음성 합성 기술이란?
• 파라미터에서 음성을 합성해내는
기술
• 소용량 DB / 연속성 / 안정적
• 스타일 변경 쉬움
통계적 파라미터 합성
TTS
➤ Unit Selection, HTS
https://speech-api.kakao.com

TTS(2)
➤ DNN
➤ Seq2seq 기반 end-to-end
학습하여 더욱 자연스러운
합성음 생성
➤ Generative 고품질 음성
합성 모델링
➤ 소용량 음성DB로 저비용
음성 합성
https://speech-api.kakao.com
딥러닝을 이용한 합성 기술 개발
Input:text
query
Text Encoder
(Convnet, RNN)
Attention
Decoder
(RNN)
{key, value}
Spectrogram
Spectrogram Inversion
Tacotron 
(Seq2seq + Attention)
Mixture of Distribution
Predicted Sample
…
Conv
Previous 
Samples
DilatedLayer
DilatedLayer
DilatedLayer
DilatedLayer
DilatedLayer
Wavenet

Services that using Kakao Speech Engine

Wrap up
➤ Speech Technology is essential in A.I. era!
➤ Speech Recognition is improved a lot!
➤ Kakao is leading in this ﬁeld!
➤ Recent technology in this ﬁeld is also followed!
➤ E2E ASR, DNN based TTS, Transfer learning (style transfer)
➤ A.I. is up coming!
➤ A.I. speaker, smart home, smart car, smart robot?
➤ Expect great things, and attempt great things!

카카오가 가지고 있는 음성처리 기술

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a 카카오가 가지고 있는 음성처리 기술

Semelhante a 카카오가 가지고 있는 음성처리 기술 (8)

Mais de if kakao

Mais de if kakao (20)

카카오가 가지고 있는 음성처리 기술