SlideShare uma empresa Scribd logo
1 de 9
National Conference on Translation, Dept. of Hindi, University of Kerala, February 23-24- 2010

                                   Man to Machine
                     A tutorial on the art of Machine Translation
                                                           Jaganadh G
                                                    jaganadhg@gmail.com
                                              http://jaganadhg.freeflux.net/blog


1 Introduction
           Machine Translation(MT), is a sub-field of computational linguistics that investigates the
use of computer software to translate text or speech from one natural language to another. It is
interesting to think about an MT system that can translate literary works from one language to
another language. To enjoy the novel 'Anything for you, ma'am'1; just feed the novel in to a MT
system and get it translated to your language. Such kind of MT systems are supposed to break the
language barrier. MT can help us to over come the technological barrier too. The drastic
developments Information Communication Technology(ICT) lead to information overflow through the
internet. But this information is available only in a very small sub set of languages. It is not
reachable for for a significant portion of users/people. This particular phenomena is called as “digital
divide”. Lots of information is available in the internet in English language; but the same information
may not be available in our vernaculars like Hindi or Malayalam. In the case of India only 3% of the
population can understand English2. In a country like India achievements in the field of R&D in MT
has great significance. In-short MT helps the world to be united both intellectually as well as
culturally. To achieve this task we have to do lots of exercises both in the field of language and
linguistics and computer science. The present tutorial is an introduction to the art of MT. This
material is compiled with the help of some already published literature in the field. The main sources
of this tutorial is mentioned in the reference section. The tutorial is just a theoretical over view of
the field.

1.1. History of MT

          The history of MT starts from early 1950's. But some hypothetical historic concepts existed
before the period. In 17th cent. Two philosophers Leibniz and Descartes put forward proposal for
codes which could relate words between languages. But still the proposal remains as theory only.
The first proposal for developing MT were put forwarded by Warren Wever, a researcher at
Rockefeller Foundation in 19493. After a few years actual research in the field of MT started at many
universities in the United States. The first public demonstration of an MT system was held on 7 th
January 1954 and at the head office of IBM. It is known as 'G       eorgetown-IBM experiment'. The
system was a kind of toy system, having just 250 words and translating just 49 carefully selected
Russian Sentence in to English. Many institutions inside the US was very active in the R&D
activities related to MT and the government was very much supportive to it. In 1964 the US
government constituted a committee to evaluate the progress in MT research. The committee was
called Automatic Language Processing Advisory Committee(ALPAC). They concluded that MT is
more expensive, less accurate and slower than the human translation, and that despite the expenses.
MT is not likely to reach the quality of a human translator in near future. But they recommended
that tools to be developed to aid the translators like automatic dictionaries, and research in
Computational Linguistics(CL) should be continued. It created a deep impact in the MT researchers.
Mt research was abandoned for a short duration. But the field raised like a phoenix bird and
significant developments are there. Mt research is very active in Indian Languages(IL) too.


1   http://www.raheja.org/
2   Sinha, R.M.K and A. Jain, 2003, 'Angalahindi: An English to Hindi machine translation system', Proceedings of the
    MT SUMMIT IX, Orelands, LA, pp.23-27.
3   http://en.wikipedia.org/wiki/History_of_machine_translation
National Conference on Translation, Dept. of Hindi, University of Kerala, February 23-24- 2010

2 Machine Translation
          Translation can be defined as the act or process of translating, especially from one language
into another. We know that producing high quality translation is difficult for human translators too.
A translator should posses knowledge of Source Language(SL), Target Language(TL) and its
grammar and culture etc.. Even if one posses all such knowledge we cant ensure that the person can
produce high quality translation. Because natural language is ambiguous. Even for the term
'translation' have four different meaning in different context. So the selection of word meaning while
translating from SL to TL requires context knowledge etc... Lets see how it can be made possible
with computers.

2.1 Approaches in MT

         Approaches in MT can be classified into four categories:
             1) Direct MT
             2) Rule-based MT
             3) Corpus-based MT
             4) Knowledge-based MT
                                            Machine Translation



                      Rule based MT                      Corpus based MT

            Direct MT                                                                Knowledge-based MT


                Transfer based MT Interlingua based
                                          MT
                                             Example based MT       Statistical MT

                                    Fig.1. Machine Translation Approaches

         Each of the approaches which mentioned above have its own advantages and disadvantages.
A brief note on the approaches are given below.

2.1.1 Direct Machine Translation

          As the very name suggests, direct MT systems provides direct translation. No intermediate
representation or complex architecture will be involved in the approach. It carries out word by word
translation with the help of a bilingual dictionary, usually followed by some syntactic re arrangement.
It involves little analysis of SL text, no parsing and mainly relays on the quality of bilingual
dictionary. Some minimal syntactic re arrangement etc.. only will be there in the system. A general
flow of a direct MT system is like:

               1) Remove morphological inflection from the SL words
               2) Look up a bilingual dictionary to get the corresponding TL word
               3) Perform necessary syntactical rearrangemnts
National Conference on Translation, Dept. of Hindi, University of Kerala, February 23-24- 2010




                                          SL                   TL
                     SL words Morphologic words Bilingual      words    Syntactic
           SL text                              dict lookup             rearrangement            TL text
                              al Analysis




                                                 SL TL
                                                 dictionary




                               Figure 2. Direct machine Translation System


         Consider the example 'Sita slept in the garden'. Lets see how it will be translated to Hindi
with a direct MT system.

         Input (Englisg Sentence)                 -                    Sita slept in the garden.
         Words translation                        –                    सीता सोयियि में बाग
         Syntactic rearrangement                  -                    सीता बाग में सोयियि ।

        Besides simple word translation and ordering, suffix handling and preposition handling is
needed to make the translation acceptable. It is called as idiomatization.

Consider the example :

         English Sentence                         -           The boy gave the girl a flower.
         Word Translation                         -           लड्का दी लटकी एक फू ल
         Syntactic rearrangement                  -           लड्का लड्की एक् िकताब दी
         Idiomatization                           -           लड्क ने लड्की कोय एक फू ल दी।
                                                                  े

         Modification of verb and adjective according to the gender of the subject is also required if
the TL has such constrains. In languages like Hindi such kind of grammatical phenomena has to
taken care to produce quality translation.

         E.g.

         English Sentence              -          She saw stars in the sky.
         Word Translation              -          वोय देखा तारे में आसमान
         Syntactic rearrangement       -          वोय आसमान में तारे देखी
         Idiomatization                -          उसने आसमान में तारे देिख ।

          To attain such a great quality in direct MT is very difficult if the SL and TL does not share
near syntactical as well as morphological phenomena. For a Hindi to English or English to Hindi
translation system, such a word by replacement and idiomatization will not produce understandable
MT output. Such kind of MT output is called as 'word salad'.

The major limitations for this MT approach is :
    1) Does not considers the structure and relationship between words
    2) There is no attempt to disambiguate the sense. Majority of words in our natural language
National Conference on Translation, Dept. of Hindi, University of Kerala, February 23-24- 2010

        are ambiguous. For example the Hindi word खाना is a verb denotes the activity of eating.
        When an adjective is preceded the meaning will be totally changed. Eg. बडा खाना .
     3) No adaptability -The system which is developed for a particular language pair will not be
        suitable for another language pair.

2.1.2. Rule-based MT(RBMT)

         The rule based approach in MT is pretty much advanced than the direct MT approach. The
system relays on hand made linguistic rules for performing the MT process. There are two types of
rule-based MT approaches are there 1) Transfer-based MT and 2) Interlingua based MT .

2.1.2.a. Transfer-based MT

        Int this approach the SL text is analyzes the SL text to produce a representation that
matches the rules of the target language. It requires the understanding of difference between the SL
and TL. A typical flow of RBMT is like
             1) Analysis of SL text [syntactical]
             2) Transfer – Transfer the SL syntactic structure to TL syntactic structure.
             3) G eneration – generate TL text with defined rules.


                                                 SL                           TL
                                                 repres                       represen
                                                 entatio                      tation
                                   Analysis      n         Transfer                      Synthesis
             SL text                                                                                   TL text




                                     SL                          SL – TL                     TL
                                     Grammar                     dictionary                  grammar




                                Figure -3 . Diagram of transfer-based MT


We can workout the system with our previous example 'Sita slept in the garden'.
       Input                    - Sita slept in the garden
       Analysis output          - (S (NP (NNP Sita)) (VP (VBD slept) (PP (IN in) (NP
                                         (DT the) (NN garden)))))

         After Syntactical transfer - (S (NP (NNP Sita)) (VP (PP (NP (DT the) (NN
                                           garden)) (IN in) ) (VBD slept) ))

         Hindi lexicalization          - (S (NP (NNP सीता)) (VP (PP (NP (NN बाग)) (IN
                                              में) ) (VBD सोयियि) ))

         Hindi Sentence                - सीता बाग में सोयियि ।



         The main advantage of the system is its modular structure. Analysis of SL text is
National Conference on Translation, Dept. of Hindi, University of Kerala, February 23-24- 2010

independent of the TL text generator system. Another notable advantage of the system is its
capability to disambiguate the word sense even in lexical level ambiguity too. For example the
English word 'book' falls in two parts of speech (POS) category i.e noun and verb. This approach
can handle such kind of lexical ambiguity up to certain extent. But the major disadvantage of the
system is related to its adaptability or extensibility for a group of language pairs. If we are trying to
develop a system for English to Hindi and Malayalam to Hindi we have to have to SL analyzers.

2.1.2.b Interlingua-based MT

         In interlingua based approach, the SL will be converted in to a language independent
meaning representation called 'interlingua'. From this interlingual representation, the TL text can be
generated. In short the translation in this approach is a two-stage process, i.e analysis and synthesis.


      SL text                           Interlingua
                  Analysis              representation                            TL text
                                                                   TL synthesis



                                Figure. 4. Model of interlingua based MT

          The flow of the system is very clear from the above given diagram itself. The system will
receive the input and performs SL analysis. This analysis is SL specific. The effort required to
develop and interlingua based machine translation system is much more than the transfer based
approach. The major source of difficulty in using this approach is defining a universal and abstract
interligual representation. A sample interligua representation for the sentence 'Sita slept in the
garden' is given below.

                   (*sleep
                   (tense past)
                   (mood declarative)
                   (punctuation period)
                   (subject (*Sita
                            (number singular)
                   (Location (*garden
                            (reference definite)
                            (number singular)))

          Sample interlingua for the sentence 'Sita slept in the garden'

2.1.3. Corpus Based MT

          Corpus is a large collection of text or speech in a language. In recent years there is an
increased interest in corpus based MT systems. Because it needs less effort form the side of
language/linguistic experts and less human effort is required. On the contarary they require large
amount of sentence aligned parallel corpus. The corpus based approach can be classified in to two
1)statistical MT(SMT) and 2) example based MT (EBMT).
2.1.3.a. SMT

        The SMT is inspired by the noisy channel used in Automatic Speech Recognition(ASR).
The noisy channel model introduces noice that which makes it difficult to recognize the input word.
A recognition system based on this builds a model of channels to identify how it modifies the input
National Conference on Translation, Dept. of Hindi, University of Kerala, February 23-24- 2010

and recover the original of the word.

          An SMT system models a TL sentence T, given a Sl sentence S, as the product of
translation probability P(S|T) and TL probability P(T). The translation probability P(S|T) accounts
for the adequecy of translation contents, where as P(T) accounts for fluency of target construction.
The basic view behind the SMT is that every sentence in a language has a possible translation in
other language; a sentence in one language can be translated to another language in many ways. This
choice is translator specific one.


                               Language                     Translation
                                                                           S
                                                T           Model P(S|T)
                               Model P(T)



                                          S   Decoder           T


                         Figure 4. Noisy channal model for Englidh to Hindi MT


          Let's consider the example of English to Hindi SMT system. Every Hindi sentence h is a
possible translation of an English sentence e. The probability that 'गायि खास खाता है ।' is translation of
'Murthy eats apple' is low as compared to the probability of 'रिव खाना खाता है ' being the translation
of the sentence. Every pair of sentence (e,h) a probability, P(h|e), which is the probability that a
translator when presented with an English sentence e, will produce h as its Hindi translation. We can
assume that when a native speaker of Hindi produces an English sentence he will be having a Hindi
sentence in mind and will be translating it in to English mentally. The goal of SMT is to find the
sentence h that the native speaker in his mind when he produces e. The noisy channel model can
be described like

          P(h|e) = P(e,h)/ P(e) = P(h) x P(e,h) / P(e)


         The two components inSMT are Language Model(LM) and Translation Model(TM). A
language model gives the probability of a sentence. These probabilities are calculated with N-G 4
                                                                                                ram
techniques. The translation model helps to compute the conditional probability P(e|h). it is trained
from a parallel corpus of English/Hindi pairs. This section is just a birds eye view of the SMT
techniques. Due to time constrains the section on SMT is concluding with this introductory remarks
on SMT. Some Free and Open Source (FOSS) tools are available now to experiment with the SMT
techniques5.

2.1.3.b. Example-based MT(EBMT)

         The EBMT system uses past translation examples to generate translation for a given SL
text. EBMT systems maintains an example-base consisting of translation examples between source
and target languages. When a SL sentence is given to the system, the system retrieves a similar SL
sentence from the example-base and its translation. Then it adapts the example to generate the TL
4   http://en.wikipedia.org/wiki/N-gram
5 www.apertium.org
www.statmt.org/moses
National Conference on Translation, Dept. of Hindi, University of Kerala, February 23-24- 2010

sentence of the input sentence. The EBMT system rest on the idea that slimier sentence will be have
slimier translations. The system has two main modules 1)retrieval and 2) adaption.



                    SL sentence
                                                    Example aptterns          TL sentence
                                   Retrival                                  Adaption




                                  Example base                             Adaption rules/ SL-TL
                                                                           dictionary



                                              Figure 5. Example based MT

         The task of retrial module is to retrieve translation examples from already stored example-
base. This module tries to retrieve an example from the base which is slimier to the input sentence.
The adaption module is responsible for carrying out the necessary modifications in the retried
example to generate the TL sentence. This modification may involve addition, deletion, insertion of
morphological words, constituent words or suffixes.

         Lets elaborate the concept with the help of an example. Consider English- Hindi transaltion
for the following input sentence:

         Input     -                       Santhosh is writing a letter.

         Example base -                    Vikram wrote a poem.                               (1)
                                           Anand is writing.                                  (2)
                                           Ravi is writing an essay.                          (3)
                                           Mukesh writes a Malayalam poem.                    (4)


         Selection by the retriever
                            Ravi is writing an essay
                           रिव एक उपन्यिास िलख रहा है ।

         Using this retrieved pair the system swill replace Ravi with Santhosh and उपन्यिास with पत्र in
TL translation.

2.1.3 Knowledge-based MT(KBMT)

         The MT systems which we seen so far uses either a morphological or syntactical or some
extent of semantic knowledge to translate SL text in to TL. Even though the IL system uses some
sort of semantics the central concept is syntactic analysis. Semantic based language analysis has been
introduced by Artificial Intelligence(AI) researchers. This approach requires a large amount of
ontological and lexical knowledge. The KBMT approach includes semantic parsing, lexical
decomposition in to semantic networks and resolution of ambiguities and uncertainties by reference of
knowledge-base.
National Conference on Translation, Dept. of Hindi, University of Kerala, February 23-24- 2010


           person ::= ('person'
                    ('isa' creature)
                    ('agent-of' (Eat, Drink, Move, Attck, Love ....))
                    ('consists-of'(Hand Foot, ....)))

           computer-user ::= ('computer-user'
                            ('isa' person)
                            ('agent-of (+(Operate)))
                            ('subworld' computer-world))
                     Example of an ontology for KBMT system


3. Machine Translation Evaluation

         Many online MT systems are available for the general public. One of the most famous
online MT service is the Gogle Translate service6. Have you ever tries the Hindi to English or
                             o
English to Hindi translate service of Gogle? If not just try it out and have a fun!!!
                                      o

           Evaluation of MT is a harder task than developing an MT systems. Or we can say equal
effort is required to evaluate MT. Why MT evaluation is crucial? Because what a consumer expects
from a commercial MT project is high quality translation. The aim of MT evaluation is to measure
how accurately an MY system can handle the phenomena included in translation from SL to TL.
Consider that you are giving the sentence 'I like milk' as input to an MT system; it produces मैं दूध
जैसा हूं instead of मुझे दूध पसं द है . What will your reaction? Definitely you will tell that the MT
system is waste!! Obviously an MT system may translate this sentence in to Hindi in the following
ways
                    मुझे दूध पसं द है
                    दूध मुझे पसं द है
                    मैं दूध जैसा हूं
Except the third translation everything else is acceptable.

         Many MT evaluation techniques were developed by the researchers. Among them the
BLUE7 , METROR8 and NIST9 metrics are widely used. These are automatic MT evaluation
methods. Besides this the effective method is human-evaluation. But the disadvantage of human
evaluation is that it is time consuming and costly! The automatic metrics are not that much effective
in the case of all the language pairs. Adaptability of BLUE metric in English to Indian language is
under study and some results and observations are already available 10.

4 MT Research in India

        MT research in started in the dawn of 1970 and the beginning of 1980's. The major projects
in MT system developments are carried out in IIT Kanpur, Central University of Hydrabad, IIIT
Hydrabad, AU-KBC Research Center Chennai, C-DAC, IISC Kolkatta and Tamil Virtual University
Thanjavur. The earlier system developed for English to Hindi is Anglabharati and anusaarak system
from IIT Kanpur. A list of MT projects in India is given below.
6    http://translate.google.com/
7    http://en.wikipedia.org/wiki/Bilingual_evaluation_understudy
8    http://en.wikipedia.org/wiki/METEOR
9    http://en.wikipedia.org/wiki/NIST_(metric)
10   http://www.cse.iitb.ac.in/~pb/papers/icon07-bleu.pdf
National Conference on Translation, Dept. of Hindi, University of Kerala, February 23-24- 2010


Name of the MT Project                Name of R&D center                    Language pair
Anglabharati                          IIT Kanpur                            English to Indian languages
Anubharati                            ''                                    Indian Language to English
Anusaarak                             IIT Kanpur, Central Univ. of English to Hindi, IL to IL
                                      Hydrabad, IIIT hydrabad
MaTra                                 C-DAC Mumbai                          English to Indian Languages
Mantra                                C-DAC Pune                            English to Hindi
UNL based MT                          IIT Bombay                            English to Hindi, Marathi
Tamil Hindi anusaarak                 AU-KBC Chennai                        Tamil to Hindi
English Tamil MT                      ''                                    English Tamil
Shakti                                IIIT Hydrabad                         English Hindi
Sampark                               ''                                    IL to Il


Beyond these project industry giants like IBM and Micrsoft are also engaged in English to Hindi MT
system development.

5. References

[1] Natural Language Processing and Information Retrieval, Tanveer Siddiqui, U S Tiwary, Oxfoard
University Press, Delhi, India, 2008.
[2] Speech and Language Processing, Daniel Jurafsky and James H. Martin, Prentice Hall, 2009.
[3] Foundation of Statistical Natural Language Processing, Chris Manning and Hinrich Sch ütze,
MIT Press. Cambridge, MA: May 1999.
[4] Statistical MT tutorial www.isi.edu/natural-language/mt/wkbk.rtf Accessed on 12-02-2010.
[5] Automatic Translation of Languages, http://www.mt-archive.info/Bar-Hillel-1960.pdf Accessed
on 15-02-2010.
[6] An Introduction to Machine Translation, http://www.hutchinsweb.me.uk/IntroMT-TOC.htm,
Accessed on 01-02-2010.


Note: Some of the examples and diagrams which used in this document is either directly
adapted from the the book Natural Language Processing and Information Retrieval [1].
Some modifications were made in certain examples.

Mais conteúdo relacionado

Mais procurados

Types of machine translation
Types of machine translationTypes of machine translation
Types of machine translation
Rushdi Shams
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
Kuppusamy P
 

Mais procurados (20)

Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing PPT presentation
Natural language processing PPT presentationNatural language processing PPT presentation
Natural language processing PPT presentation
 
What is machine translation
What is machine translationWhat is machine translation
What is machine translation
 
NLP
NLPNLP
NLP
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
Types of machine translation
Types of machine translationTypes of machine translation
Types of machine translation
 
Natural Language Processing in AI
Natural Language Processing in AINatural Language Processing in AI
Natural Language Processing in AI
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introduction
 
Natural language-processing
Natural language-processingNatural language-processing
Natural language-processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Nlp
NlpNlp
Nlp
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
NLP
NLPNLP
NLP
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 
Machine Translation: What it is?
Machine Translation: What it is?Machine Translation: What it is?
Machine Translation: What it is?
 
NLP
NLPNLP
NLP
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Introduction to Natural Language Processing
Introduction to Natural Language ProcessingIntroduction to Natural Language Processing
Introduction to Natural Language Processing
 

Destaque

Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
Jaganadh Gopinadhan
 
Arabic to-english machine translation
Arabic to-english machine translationArabic to-english machine translation
Arabic to-english machine translation
Arabic_NLP_ImamU2013
 
Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latest
Jaganadh Gopinadhan
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Yasir Khan
 
Health and family welfare (eleventh five year plan)
Health and family welfare (eleventh five year plan)Health and family welfare (eleventh five year plan)
Health and family welfare (eleventh five year plan)
Sa Rah
 

Destaque (16)

Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Arabic to-english machine translation
Arabic to-english machine translationArabic to-english machine translation
Arabic to-english machine translation
 
Repaso de fisiopatologia
Repaso de fisiopatologiaRepaso de fisiopatologia
Repaso de fisiopatologia
 
Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latest
 
Hindi Beam
Hindi BeamHindi Beam
Hindi Beam
 
Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python
 
Hdfs
HdfsHdfs
Hdfs
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Social Media Analytics
Social Media Analytics Social Media Analytics
Social Media Analytics
 
Health and family welfare (eleventh five year plan)
Health and family welfare (eleventh five year plan)Health and family welfare (eleventh five year plan)
Health and family welfare (eleventh five year plan)
 
Kerala tourism
Kerala tourismKerala tourism
Kerala tourism
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Family welfare programme
Family welfare programmeFamily welfare programme
Family welfare programme
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Hindi presentation
Hindi presentationHindi presentation
Hindi presentation
 
Translation Types
Translation TypesTranslation Types
Translation Types
 

Semelhante a A tutorial on Machine Translation

Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Waqas Tariq
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliterator
Shashank Shisodia
 

Semelhante a A tutorial on Machine Translation (20)

I026050054
I026050054I026050054
I026050054
 
Machine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to HindiMachine Translation System: Chhattisgarhi to Hindi
Machine Translation System: Chhattisgarhi to Hindi
 
Pxc3898474
Pxc3898474Pxc3898474
Pxc3898474
 
Machine Translation Approaches and Design Aspects
Machine Translation Approaches and Design AspectsMachine Translation Approaches and Design Aspects
Machine Translation Approaches and Design Aspects
 
Role of Machine Translation and Word Sense Disambiguation in Natural Language...
Role of Machine Translation and Word Sense Disambiguation in Natural Language...Role of Machine Translation and Word Sense Disambiguation in Natural Language...
Role of Machine Translation and Word Sense Disambiguation in Natural Language...
 
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHHANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
 
ReseachPaper
ReseachPaperReseachPaper
ReseachPaper
 
Hindi –tamil text translation
Hindi –tamil text translationHindi –tamil text translation
Hindi –tamil text translation
 
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
 
Language translation english to hindi
Language translation english to hindiLanguage translation english to hindi
Language translation english to hindi
 
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliterator
 
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachPunjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
 
Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageSurvey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi Language
 
A performance of svm with modified lesk approach for word sense disambiguatio...
A performance of svm with modified lesk approach for word sense disambiguatio...A performance of svm with modified lesk approach for word sense disambiguatio...
A performance of svm with modified lesk approach for word sense disambiguatio...
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
 
A Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to MarathiA Novel Approach for Rule Based Translation of English to Marathi
A Novel Approach for Rule Based Translation of English to Marathi
 
Role of language engineering to preserve endangered languages
Role of language engineering to preserve endangered languagesRole of language engineering to preserve endangered languages
Role of language engineering to preserve endangered languages
 

Mais de Jaganadh Gopinadhan

Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
Jaganadh Gopinadhan
 
Natural Language Processing with Per
Natural Language Processing with PerNatural Language Processing with Per
Natural Language Processing with Per
Jaganadh Gopinadhan
 
Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org
Jaganadh Gopinadhan
 
Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic
Jaganadh Gopinadhan
 

Mais de Jaganadh Gopinadhan (19)

Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - I
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Natural Language Processing with Per
Natural Language Processing with PerNatural Language Processing with Per
Natural Language Processing with Per
 
Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org
 
Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic
 
Linguistic localization framework for Ooo
Linguistic localization framework for OooLinguistic localization framework for Ooo
Linguistic localization framework for Ooo
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Ilucbe python v1.2
Ilucbe python v1.2Ilucbe python v1.2
Ilucbe python v1.2
 
Success Factor
Success Factor Success Factor
Success Factor
 
ntroduction to GNU/Linux Linux Installation and Basic Commands
ntroduction to GNU/Linux Linux Installation and Basic Commands ntroduction to GNU/Linux Linux Installation and Basic Commands
ntroduction to GNU/Linux Linux Installation and Basic Commands
 
Introduction to Free and Open Source Software
Introduction to Free and Open Source Software Introduction to Free and Open Source Software
Introduction to Free and Open Source Software
 
Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges
 
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
 
Tools andTechnologies for Large Scale Data Mining
Tools andTechnologies for Large Scale Data Mining Tools andTechnologies for Large Scale Data Mining
Tools andTechnologies for Large Scale Data Mining
 
Practical Natural Language Processing From Theory to Industrial Applications
Practical Natural Language Processing From Theory to Industrial Applications Practical Natural Language Processing From Theory to Industrial Applications
Practical Natural Language Processing From Theory to Industrial Applications
 
Mahout Tutorial FOSSMEET NITC
Mahout Tutorial FOSSMEET NITCMahout Tutorial FOSSMEET NITC
Mahout Tutorial FOSSMEET NITC
 
Practical Machine Learning
Practical Machine Learning Practical Machine Learning
Practical Machine Learning
 
Will Foss get me a Job?
Will Foss get me a Job?Will Foss get me a Job?
Will Foss get me a Job?
 

A tutorial on Machine Translation

  • 1. National Conference on Translation, Dept. of Hindi, University of Kerala, February 23-24- 2010 Man to Machine A tutorial on the art of Machine Translation Jaganadh G jaganadhg@gmail.com http://jaganadhg.freeflux.net/blog 1 Introduction Machine Translation(MT), is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another. It is interesting to think about an MT system that can translate literary works from one language to another language. To enjoy the novel 'Anything for you, ma'am'1; just feed the novel in to a MT system and get it translated to your language. Such kind of MT systems are supposed to break the language barrier. MT can help us to over come the technological barrier too. The drastic developments Information Communication Technology(ICT) lead to information overflow through the internet. But this information is available only in a very small sub set of languages. It is not reachable for for a significant portion of users/people. This particular phenomena is called as “digital divide”. Lots of information is available in the internet in English language; but the same information may not be available in our vernaculars like Hindi or Malayalam. In the case of India only 3% of the population can understand English2. In a country like India achievements in the field of R&D in MT has great significance. In-short MT helps the world to be united both intellectually as well as culturally. To achieve this task we have to do lots of exercises both in the field of language and linguistics and computer science. The present tutorial is an introduction to the art of MT. This material is compiled with the help of some already published literature in the field. The main sources of this tutorial is mentioned in the reference section. The tutorial is just a theoretical over view of the field. 1.1. History of MT The history of MT starts from early 1950's. But some hypothetical historic concepts existed before the period. In 17th cent. Two philosophers Leibniz and Descartes put forward proposal for codes which could relate words between languages. But still the proposal remains as theory only. The first proposal for developing MT were put forwarded by Warren Wever, a researcher at Rockefeller Foundation in 19493. After a few years actual research in the field of MT started at many universities in the United States. The first public demonstration of an MT system was held on 7 th January 1954 and at the head office of IBM. It is known as 'G eorgetown-IBM experiment'. The system was a kind of toy system, having just 250 words and translating just 49 carefully selected Russian Sentence in to English. Many institutions inside the US was very active in the R&D activities related to MT and the government was very much supportive to it. In 1964 the US government constituted a committee to evaluate the progress in MT research. The committee was called Automatic Language Processing Advisory Committee(ALPAC). They concluded that MT is more expensive, less accurate and slower than the human translation, and that despite the expenses. MT is not likely to reach the quality of a human translator in near future. But they recommended that tools to be developed to aid the translators like automatic dictionaries, and research in Computational Linguistics(CL) should be continued. It created a deep impact in the MT researchers. Mt research was abandoned for a short duration. But the field raised like a phoenix bird and significant developments are there. Mt research is very active in Indian Languages(IL) too. 1 http://www.raheja.org/ 2 Sinha, R.M.K and A. Jain, 2003, 'Angalahindi: An English to Hindi machine translation system', Proceedings of the MT SUMMIT IX, Orelands, LA, pp.23-27. 3 http://en.wikipedia.org/wiki/History_of_machine_translation
  • 2. National Conference on Translation, Dept. of Hindi, University of Kerala, February 23-24- 2010 2 Machine Translation Translation can be defined as the act or process of translating, especially from one language into another. We know that producing high quality translation is difficult for human translators too. A translator should posses knowledge of Source Language(SL), Target Language(TL) and its grammar and culture etc.. Even if one posses all such knowledge we cant ensure that the person can produce high quality translation. Because natural language is ambiguous. Even for the term 'translation' have four different meaning in different context. So the selection of word meaning while translating from SL to TL requires context knowledge etc... Lets see how it can be made possible with computers. 2.1 Approaches in MT Approaches in MT can be classified into four categories: 1) Direct MT 2) Rule-based MT 3) Corpus-based MT 4) Knowledge-based MT Machine Translation Rule based MT Corpus based MT Direct MT Knowledge-based MT Transfer based MT Interlingua based MT Example based MT Statistical MT Fig.1. Machine Translation Approaches Each of the approaches which mentioned above have its own advantages and disadvantages. A brief note on the approaches are given below. 2.1.1 Direct Machine Translation As the very name suggests, direct MT systems provides direct translation. No intermediate representation or complex architecture will be involved in the approach. It carries out word by word translation with the help of a bilingual dictionary, usually followed by some syntactic re arrangement. It involves little analysis of SL text, no parsing and mainly relays on the quality of bilingual dictionary. Some minimal syntactic re arrangement etc.. only will be there in the system. A general flow of a direct MT system is like: 1) Remove morphological inflection from the SL words 2) Look up a bilingual dictionary to get the corresponding TL word 3) Perform necessary syntactical rearrangemnts
  • 3. National Conference on Translation, Dept. of Hindi, University of Kerala, February 23-24- 2010 SL TL SL words Morphologic words Bilingual words Syntactic SL text dict lookup rearrangement TL text al Analysis SL TL dictionary Figure 2. Direct machine Translation System Consider the example 'Sita slept in the garden'. Lets see how it will be translated to Hindi with a direct MT system. Input (Englisg Sentence) - Sita slept in the garden. Words translation – सीता सोयियि में बाग Syntactic rearrangement - सीता बाग में सोयियि । Besides simple word translation and ordering, suffix handling and preposition handling is needed to make the translation acceptable. It is called as idiomatization. Consider the example : English Sentence - The boy gave the girl a flower. Word Translation - लड्का दी लटकी एक फू ल Syntactic rearrangement - लड्का लड्की एक् िकताब दी Idiomatization - लड्क ने लड्की कोय एक फू ल दी। े Modification of verb and adjective according to the gender of the subject is also required if the TL has such constrains. In languages like Hindi such kind of grammatical phenomena has to taken care to produce quality translation. E.g. English Sentence - She saw stars in the sky. Word Translation - वोय देखा तारे में आसमान Syntactic rearrangement - वोय आसमान में तारे देखी Idiomatization - उसने आसमान में तारे देिख । To attain such a great quality in direct MT is very difficult if the SL and TL does not share near syntactical as well as morphological phenomena. For a Hindi to English or English to Hindi translation system, such a word by replacement and idiomatization will not produce understandable MT output. Such kind of MT output is called as 'word salad'. The major limitations for this MT approach is : 1) Does not considers the structure and relationship between words 2) There is no attempt to disambiguate the sense. Majority of words in our natural language
  • 4. National Conference on Translation, Dept. of Hindi, University of Kerala, February 23-24- 2010 are ambiguous. For example the Hindi word खाना is a verb denotes the activity of eating. When an adjective is preceded the meaning will be totally changed. Eg. बडा खाना . 3) No adaptability -The system which is developed for a particular language pair will not be suitable for another language pair. 2.1.2. Rule-based MT(RBMT) The rule based approach in MT is pretty much advanced than the direct MT approach. The system relays on hand made linguistic rules for performing the MT process. There are two types of rule-based MT approaches are there 1) Transfer-based MT and 2) Interlingua based MT . 2.1.2.a. Transfer-based MT Int this approach the SL text is analyzes the SL text to produce a representation that matches the rules of the target language. It requires the understanding of difference between the SL and TL. A typical flow of RBMT is like 1) Analysis of SL text [syntactical] 2) Transfer – Transfer the SL syntactic structure to TL syntactic structure. 3) G eneration – generate TL text with defined rules. SL TL repres represen entatio tation Analysis n Transfer Synthesis SL text TL text SL SL – TL TL Grammar dictionary grammar Figure -3 . Diagram of transfer-based MT We can workout the system with our previous example 'Sita slept in the garden'. Input - Sita slept in the garden Analysis output - (S (NP (NNP Sita)) (VP (VBD slept) (PP (IN in) (NP (DT the) (NN garden))))) After Syntactical transfer - (S (NP (NNP Sita)) (VP (PP (NP (DT the) (NN garden)) (IN in) ) (VBD slept) )) Hindi lexicalization - (S (NP (NNP सीता)) (VP (PP (NP (NN बाग)) (IN में) ) (VBD सोयियि) )) Hindi Sentence - सीता बाग में सोयियि । The main advantage of the system is its modular structure. Analysis of SL text is
  • 5. National Conference on Translation, Dept. of Hindi, University of Kerala, February 23-24- 2010 independent of the TL text generator system. Another notable advantage of the system is its capability to disambiguate the word sense even in lexical level ambiguity too. For example the English word 'book' falls in two parts of speech (POS) category i.e noun and verb. This approach can handle such kind of lexical ambiguity up to certain extent. But the major disadvantage of the system is related to its adaptability or extensibility for a group of language pairs. If we are trying to develop a system for English to Hindi and Malayalam to Hindi we have to have to SL analyzers. 2.1.2.b Interlingua-based MT In interlingua based approach, the SL will be converted in to a language independent meaning representation called 'interlingua'. From this interlingual representation, the TL text can be generated. In short the translation in this approach is a two-stage process, i.e analysis and synthesis. SL text Interlingua Analysis representation TL text TL synthesis Figure. 4. Model of interlingua based MT The flow of the system is very clear from the above given diagram itself. The system will receive the input and performs SL analysis. This analysis is SL specific. The effort required to develop and interlingua based machine translation system is much more than the transfer based approach. The major source of difficulty in using this approach is defining a universal and abstract interligual representation. A sample interligua representation for the sentence 'Sita slept in the garden' is given below. (*sleep (tense past) (mood declarative) (punctuation period) (subject (*Sita (number singular) (Location (*garden (reference definite) (number singular))) Sample interlingua for the sentence 'Sita slept in the garden' 2.1.3. Corpus Based MT Corpus is a large collection of text or speech in a language. In recent years there is an increased interest in corpus based MT systems. Because it needs less effort form the side of language/linguistic experts and less human effort is required. On the contarary they require large amount of sentence aligned parallel corpus. The corpus based approach can be classified in to two 1)statistical MT(SMT) and 2) example based MT (EBMT). 2.1.3.a. SMT The SMT is inspired by the noisy channel used in Automatic Speech Recognition(ASR). The noisy channel model introduces noice that which makes it difficult to recognize the input word. A recognition system based on this builds a model of channels to identify how it modifies the input
  • 6. National Conference on Translation, Dept. of Hindi, University of Kerala, February 23-24- 2010 and recover the original of the word. An SMT system models a TL sentence T, given a Sl sentence S, as the product of translation probability P(S|T) and TL probability P(T). The translation probability P(S|T) accounts for the adequecy of translation contents, where as P(T) accounts for fluency of target construction. The basic view behind the SMT is that every sentence in a language has a possible translation in other language; a sentence in one language can be translated to another language in many ways. This choice is translator specific one. Language Translation S T Model P(S|T) Model P(T) S Decoder T Figure 4. Noisy channal model for Englidh to Hindi MT Let's consider the example of English to Hindi SMT system. Every Hindi sentence h is a possible translation of an English sentence e. The probability that 'गायि खास खाता है ।' is translation of 'Murthy eats apple' is low as compared to the probability of 'रिव खाना खाता है ' being the translation of the sentence. Every pair of sentence (e,h) a probability, P(h|e), which is the probability that a translator when presented with an English sentence e, will produce h as its Hindi translation. We can assume that when a native speaker of Hindi produces an English sentence he will be having a Hindi sentence in mind and will be translating it in to English mentally. The goal of SMT is to find the sentence h that the native speaker in his mind when he produces e. The noisy channel model can be described like P(h|e) = P(e,h)/ P(e) = P(h) x P(e,h) / P(e) The two components inSMT are Language Model(LM) and Translation Model(TM). A language model gives the probability of a sentence. These probabilities are calculated with N-G 4 ram techniques. The translation model helps to compute the conditional probability P(e|h). it is trained from a parallel corpus of English/Hindi pairs. This section is just a birds eye view of the SMT techniques. Due to time constrains the section on SMT is concluding with this introductory remarks on SMT. Some Free and Open Source (FOSS) tools are available now to experiment with the SMT techniques5. 2.1.3.b. Example-based MT(EBMT) The EBMT system uses past translation examples to generate translation for a given SL text. EBMT systems maintains an example-base consisting of translation examples between source and target languages. When a SL sentence is given to the system, the system retrieves a similar SL sentence from the example-base and its translation. Then it adapts the example to generate the TL 4 http://en.wikipedia.org/wiki/N-gram 5 www.apertium.org www.statmt.org/moses
  • 7. National Conference on Translation, Dept. of Hindi, University of Kerala, February 23-24- 2010 sentence of the input sentence. The EBMT system rest on the idea that slimier sentence will be have slimier translations. The system has two main modules 1)retrieval and 2) adaption. SL sentence Example aptterns TL sentence Retrival Adaption Example base Adaption rules/ SL-TL dictionary Figure 5. Example based MT The task of retrial module is to retrieve translation examples from already stored example- base. This module tries to retrieve an example from the base which is slimier to the input sentence. The adaption module is responsible for carrying out the necessary modifications in the retried example to generate the TL sentence. This modification may involve addition, deletion, insertion of morphological words, constituent words or suffixes. Lets elaborate the concept with the help of an example. Consider English- Hindi transaltion for the following input sentence: Input - Santhosh is writing a letter. Example base - Vikram wrote a poem. (1) Anand is writing. (2) Ravi is writing an essay. (3) Mukesh writes a Malayalam poem. (4) Selection by the retriever Ravi is writing an essay रिव एक उपन्यिास िलख रहा है । Using this retrieved pair the system swill replace Ravi with Santhosh and उपन्यिास with पत्र in TL translation. 2.1.3 Knowledge-based MT(KBMT) The MT systems which we seen so far uses either a morphological or syntactical or some extent of semantic knowledge to translate SL text in to TL. Even though the IL system uses some sort of semantics the central concept is syntactic analysis. Semantic based language analysis has been introduced by Artificial Intelligence(AI) researchers. This approach requires a large amount of ontological and lexical knowledge. The KBMT approach includes semantic parsing, lexical decomposition in to semantic networks and resolution of ambiguities and uncertainties by reference of knowledge-base.
  • 8. National Conference on Translation, Dept. of Hindi, University of Kerala, February 23-24- 2010 person ::= ('person' ('isa' creature) ('agent-of' (Eat, Drink, Move, Attck, Love ....)) ('consists-of'(Hand Foot, ....))) computer-user ::= ('computer-user' ('isa' person) ('agent-of (+(Operate))) ('subworld' computer-world)) Example of an ontology for KBMT system 3. Machine Translation Evaluation Many online MT systems are available for the general public. One of the most famous online MT service is the Gogle Translate service6. Have you ever tries the Hindi to English or o English to Hindi translate service of Gogle? If not just try it out and have a fun!!! o Evaluation of MT is a harder task than developing an MT systems. Or we can say equal effort is required to evaluate MT. Why MT evaluation is crucial? Because what a consumer expects from a commercial MT project is high quality translation. The aim of MT evaluation is to measure how accurately an MY system can handle the phenomena included in translation from SL to TL. Consider that you are giving the sentence 'I like milk' as input to an MT system; it produces मैं दूध जैसा हूं instead of मुझे दूध पसं द है . What will your reaction? Definitely you will tell that the MT system is waste!! Obviously an MT system may translate this sentence in to Hindi in the following ways मुझे दूध पसं द है दूध मुझे पसं द है मैं दूध जैसा हूं Except the third translation everything else is acceptable. Many MT evaluation techniques were developed by the researchers. Among them the BLUE7 , METROR8 and NIST9 metrics are widely used. These are automatic MT evaluation methods. Besides this the effective method is human-evaluation. But the disadvantage of human evaluation is that it is time consuming and costly! The automatic metrics are not that much effective in the case of all the language pairs. Adaptability of BLUE metric in English to Indian language is under study and some results and observations are already available 10. 4 MT Research in India MT research in started in the dawn of 1970 and the beginning of 1980's. The major projects in MT system developments are carried out in IIT Kanpur, Central University of Hydrabad, IIIT Hydrabad, AU-KBC Research Center Chennai, C-DAC, IISC Kolkatta and Tamil Virtual University Thanjavur. The earlier system developed for English to Hindi is Anglabharati and anusaarak system from IIT Kanpur. A list of MT projects in India is given below. 6 http://translate.google.com/ 7 http://en.wikipedia.org/wiki/Bilingual_evaluation_understudy 8 http://en.wikipedia.org/wiki/METEOR 9 http://en.wikipedia.org/wiki/NIST_(metric) 10 http://www.cse.iitb.ac.in/~pb/papers/icon07-bleu.pdf
  • 9. National Conference on Translation, Dept. of Hindi, University of Kerala, February 23-24- 2010 Name of the MT Project Name of R&D center Language pair Anglabharati IIT Kanpur English to Indian languages Anubharati '' Indian Language to English Anusaarak IIT Kanpur, Central Univ. of English to Hindi, IL to IL Hydrabad, IIIT hydrabad MaTra C-DAC Mumbai English to Indian Languages Mantra C-DAC Pune English to Hindi UNL based MT IIT Bombay English to Hindi, Marathi Tamil Hindi anusaarak AU-KBC Chennai Tamil to Hindi English Tamil MT '' English Tamil Shakti IIIT Hydrabad English Hindi Sampark '' IL to Il Beyond these project industry giants like IBM and Micrsoft are also engaged in English to Hindi MT system development. 5. References [1] Natural Language Processing and Information Retrieval, Tanveer Siddiqui, U S Tiwary, Oxfoard University Press, Delhi, India, 2008. [2] Speech and Language Processing, Daniel Jurafsky and James H. Martin, Prentice Hall, 2009. [3] Foundation of Statistical Natural Language Processing, Chris Manning and Hinrich Sch ütze, MIT Press. Cambridge, MA: May 1999. [4] Statistical MT tutorial www.isi.edu/natural-language/mt/wkbk.rtf Accessed on 12-02-2010. [5] Automatic Translation of Languages, http://www.mt-archive.info/Bar-Hillel-1960.pdf Accessed on 15-02-2010. [6] An Introduction to Machine Translation, http://www.hutchinsweb.me.uk/IntroMT-TOC.htm, Accessed on 01-02-2010. Note: Some of the examples and diagrams which used in this document is either directly adapted from the the book Natural Language Processing and Information Retrieval [1]. Some modifications were made in certain examples.