Integration of speech recognition with computer assisted translation

1
Integration of Speech
Recognition with Computer
Assisted Translation
CHAMANI SHIRANTHIKA
UNIVERSITY OF MORATUWA
SRI LANKA

 Introduction
 What is Machine Translation (MT)
 What is Computer Assisted Translation (CAT)
 Technology adapted by CAT and MT
 Technology adapted by Automatic Speech Recognition (ASR)
 ASR and CAT integration
 Translation Quality Evaluation
 New Trends in Machine Translation
 Conclusion
Overview
2

Introduction
 Researches have been undertaken to implement translation tools with out a
use of human translator to increase translation quality and to reduce
human post editing needed
 Speech is the most natural , practical , simplest and efficient method of
human communication
 Integration of Speech to translation process is a wide research area
 As a solution for blind people for the communication with different
languages
3

 Sub categories of Computational Linguistics
 Computational linguistics is the branch of linguistics in which the techniques of computer
science are applied to the analysis and synthesis of language and speech
 Machine Translation is accomplished by feeding a text to a computer algorithm that
automatically translates in to another language. That is there is no human
involvement
 Computer Assisted Translation is human translation carried out with the aid of
computerized tools
What is MT & CAT
4

Technology adapted by MT & CAT
5

 A rule based machine translation system consists of collection of rules called
grammar rules, lexicon and software programs to process the rules
 Focus on syntactic, semantic and morphological details of both the source language
and the translated language when translating
• Syntactic – how words are grammatically arranged in sentences, how we speech in
communication
• Semantic – how words are meaningfully arranged in sentences
• Morphological – structure of sentences, source and targeted language
Rule Based Machine Translation
6

Rule Based Machine Translation (continued..)
Structure of the Rule Based Machine Translation system
• A tree structure is used to represent the structure of the sentence
• A typical English sentence consists of two major parts as the noun phrase (NP) &
the verb phrase (VP)
• These two can be further divided
• Following are the rules to represent a simple grammar
S -> NPVP
VP -> VNP
NP -> Name
NP -> ART N
S stands for sentence, V for verb, N for noun and ART for article
• Example : Saman was happy can be written in logical form as
(< PAST HAPPY> (NAME “Saman”))
7

 Translation based on previous example translations as results of experiments
 Uses knowledge sources to support the translation process
 Require bilingual content
 Two types
• Statistical Machine Translation
• Example Based Machine Translation
Empirical Based Machine Translation
8

Statistical Machine Translation
 Translations based on statistical models
 Statistical Translation Model – Learned from Bilingual Data (TM)
• Probabilistic mapping of equivalencies in source words and phrases with target languag
e words and phrases through the Unsupervised Expected Model (EM) training and word
and phrase alignment process
• Generates a lots of possible translations
• Includes finite state models such as finite state transducers , alignment models and
phrase based models
 Statistical Language Model – Learned from Monolingual Target Language Data
• Probabilistic model of relative fluency and general usage patterns in the target language
• Based on n-gram model
• Target language model selects the “best” translations from a list of possible candidates
• Candidates stored in a N-best list
• Concept of re-ranking
9

• N-gram is a contiguous sequence of n items from a given sequence of text or speech
• When the items are words, n-grams are also called as shingles
• An n-gram of size 1 is a “unigram” , size 2 is a “bigram” , size 3 is “trigram” and so on..
 Advantages : More efficient use of human and data resources
Disadvantages in rule based approach are eliminated
 Disadvantages : Corpus creation can be costly
Errors are hard to predict and fix
Examples : SYSTRAN, ART, METEO, LOGOS, Anusaarka, TC- Star, Google translate
Statistical Machine Translation (continued..)
10

What is Speech Recognition ?
 Speech Recognition is the translation of spoken words into text
 Also called “Automatic Speech Recognition” (ASR), “Computer Speech
Recognition” or just “Speech To Text” (STT)
11

Machine Learning paradigms for speech recognition
• Hidden Markov Model
• Discriminative Learning
• Structured Sequence Learning
• Bayesian Learning
• Adaptive Learning
• Multi – task Learning
• Active Learning
13

Speech recognition systems
• Google Speech API
• Cloud Speech API
• Microsoft cognitive services – Bing speech API
• API.AI
• Speechmatics
• Vocapia Speech to Text API
• Klaldi
• iSpeech
• Baidu
• Siri
• Hound
• Google Now
14

Integration of ASR & MT
4 approaches
 Word graphs product – Separate large word graphs for ASR and MT system
will be generated and take the product of these using composition operation in
automata theory
 ASR constrained search – Replaced n-gram language model of phrase base
MT with the ASR word graph
 Adapted Language Model – MT system has been improved by adopting it’s
language model to the ASR output
 MT-Derived Language Model – Rescoring the ASR word graph with a
language model that is derived from the MT system
Examples : TELNET, IBM 1,2 Models , SEECAT
15

Combined ASR / SMT Model
P(e) Language Model
P(f|e) Translation Model P(x|e) Acoustic Model
e = argmax {P(e). P(f|e).P(x|e)}
e: Target Language
Text
x: Speechf: Source Language
16

Loose Integration & Tight Integration approaches
Loose Integration approach
• P(e) has 2 components
• PS(e) – characterizes those aspects of language that can be acquired from large t
ext corpora in the target language
• PM(e) – represents the effects that can be acquired from the source language text
P(e) = (λM) PM(e) + (λS) PS(e)
Assumption :- These 2 models are independent
P(e) = (λM) PM(e) . (λS) PS(e)
P(e) = PM(e) λM. PS(e) λS
17

Tight Integration approach
• Involves using SMT to reevaluate ASR hypothesis
• Each string hypothesis appearing in the ASR N-best list is rescored using the language translation
probability, P(f|e) obtained from the SMT
• The score for each string is combined from a log linear combination of acoustic, language model &
translation model probabilities
e = argmaxe { (λ1) log(P(e)) + (λ2) log(P(f|e)) + (λ3) log(P(x|e)) }
18

Translation Quality Evaluation
 TWER or edit distance
 CSR
 BLEU
 KSR
 MAR
 SER
 F – Measure
 TER Scores
19

New trends in Machine Translation
• Neural MT
• Google Translate -> Phrase based which breaks an input sentence into words and phrases to be
translated largely independently
• GNMT -> considers the entire input sentence as a unit for translation
• Map the meaning of a sentence into a fixed-length vector representation and then generate a
translation based on that vector
• Advantages over Google Translate ---
 it requires fewer engineering design choices
 Easier to build and train
 Small memory footprint
 Generalize well to long sequences
20

GNMT translating a Chinese sentence to English
21

Neural MT of Microsoft
22
Encode
Encode
Encode
Final Output Matrix
Attention Layer
500 dimension vector
1000 dimension vector

ASR – MT System of Microsoft
23
Automatic Speech
Recognition
True Text Machine
Translation
Text To Speech

MT in 2017 (Some popular and publicly available commercial systems)
• Microsoft
• Microsoft Adapted
• Systran Neural
• SDL
• SDL Adapted
• Lilt
• Lilt Adapted
• Lilt Interactive
• A/B Testing
• Convergence of TM and MT Systems
• Adoption of MT for instant web publishing
• NMT + SMT/RBMT approach
• NMT for mobile devices
24

Deep learning in to MT
Attention Mechanism
• NMT translates the whole sentence at once
• Able to translate very long sentences
• Attention retains a memory of source hidden states (Random Access Memory)
• Compare target and source hidden states
• Learning both translation & alignment
• Local attention & Global attention 25

Conclusion
 Implemented models are still in developing era
 Statistical approaches and language models have been popularized so far
 Speech recognition tools have been developed so far
 Concept of AI , ML can be integrated
Future works
 Approaches to minimize the human post editing in translation
 Increase the quality of available translation tools
 Enhance the effectiveness of speech recognition paradigm
 A method of communication with other languages, for blind people so that they can enter a
speech input and can get responses in their language by means of speech to text conversion 26

Integration of speech recognition with computer assisted translation

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Integration of speech recognition with computer assisted translation

Semelhante a Integration of speech recognition with computer assisted translation (20)

Último

Último (20)

Integration of speech recognition with computer assisted translation