SlideShare uma empresa Scribd logo
1 de 10
Baixar para ler offline
Ryan Turner
[COM-4450.001 | [ORTIZ]
Machine Translation
FROM HUMAN TRANSLATION TO AUTOMATIC
LANGUAGE TRANSLATION
2
Introduction
Language is a sophisticated usage of symbols and oral speech used by the human species
to form a complex system of communication among one another. This ability is particular to the
human species as no other animal on the planet has the capability to create such a complex
system of communication and understanding. Language has created and defined what we know
as human culture and society. “In annuals of Anthropology, language is considered as a primary
tool for studying the culture of a civilization, what we speak influences what we think, what we
feel and what we believe.” (Ashraf, 25). The importance of language to any given culture is
immense and as there are over 5000 different dialects and languages1
in the world there must be
a system of understanding to not only communicate with different cultures but to also understand
their history. This means that the only way to interact with other cultures is to understand and
translate a vast amount of languages to whatever the primary language of the observer or
communicator understands. As globalization increases it is efficacious that there is a system of
accurate translations between languages to further the relationships of both the political and
business specters. Human translators require a lot of training and knowledge of different cultures
to be able to translate accurately. This is a viable option in many circumstances but with such an
influx of information and documents in so many languages due to the advancement of internet
sharing and communicating, there needs to be a faster and more easily accessible option then just
human translators. Automatic language translation systems or machine translation (MT) is a
leading technology of which computer programs analyze language structures and source texts to
create a translation to a target language with little to no human interaction. Such machine
1
Roughly 6,500 spoken languages but about 2,000 of those languages fewer than 1,000 speakers
3
translation tools available on the internet are Babel Fish, Google Translate, Babylon and StarDict
which are only capable of giving rough translations without human editing. This is because the
technology has still yet to advance to the complexity of many different languages. Understanding
how language systems work and the related terminology is the first thing to understanding how
MT works. Then the most commonly used MT systems will be described with their advantages
and disadvantages in translation so that there can be a better understanding where the technology
still needs to advance. By highlighting the disadvantages of these MT systems and the
complexity of translating languages in general, possible solutions will by explained for a future
technology that is more accurate, with less human interaction.
Language and Translation Terminology
In order to understand how Machine Translation or Automatic Language Translation
works it must be understood how language is constructed and what the human translation process
consists of. Human language generally consists of two main parts: a lexicon and a form of
grammar or set of rules. A lexicon basically is the knowledge of words and knowing the meaning
of such words. Grammar is a set of rules that allow human language to combine those words
from the knowledge of lexicons into a meaningful or coherent sentences. The translation process
involves understanding or decoding the meaning of the source text, both lexically and
grammatically, and then re-coding this meaning into the target language. This re-coding of the
language must follow the lexical understanding and grammatical rules of that target language.
The complexity in translation lies in the fact that many languages do not have similar grammar
rules or lexicons that follow the same meanings. Grammar rules that must be considered are that
of the types of words (nouns, verbs, adjectives, pronouns, prepositions, etc.), functions of the
4
words, case markings of the words and finally the gender of the words. In order to understand a
language one must also know how a language is structured and how the rules of grammar are
applied. For one to do this there must be an in-depth knowledge of the culture the language
comes from so that there is an understanding of its semantics2
, syntax3
, idioms4
and ambiguous
words that only have meaning given context. Many languages have one word with many
meanings but can only be translated given the context of the rest of the sentence. Given the
complexity of translating a language, many human translators are only able to translate few
source languages into even fewer target languages. For this problem MT has been evolving and
advancing so as to mediate the human involvement in translation and make it much easier and
faster for accurate translations of hundreds of languages.
Rule-Based Machine Translation
There are a few main MT systems used to automatically translate language, one of them
is called Rule-Based Machine Translation (RBMT). RBMT is a combination of three different
systems of translation which include transfer-based, interlingual and dictionary based machine
translation. Interlingual and dictionary based machine translation systems are often used the most
in RBTM unless the target language from the source has no interlingual standard. Interlingual
machine translation originally translates the source language into an independent language
separate from any other language. Then from this standard independent language it is transferred
2
The study of linguistic development by classifying and examining changes in meaning and form
3
The study of the patterns of formation of sentences and phrases from words
4
an expression whose meaning is not predictable from theusual meanings of its constituent elements, as kick thebuck
et or hang one's head, or from the generalgrammatical rules of a language, as the table round forthe round table, and
that is not a constituent of a largerexpression of like characteristics.
5
to a translation of the target language. If the source language and the target language do not have
or share the interlingual, independent language then the source language is translated first into an
intermediate understanding of the meaning of the sentence. From here it is transferred to the
target language through dictionary-based translation. The difference between transfer-based and
interlingual translation is that the interlingual system has a standard operational method, which
does prohibit certain language pairing, well transfer-based translation uses just an intermediate
system of the source language. If the pairing of languages have an interlingual translation the
overall RBMT will be more accurate. Once the source language has been translated either
through transfer-based or interlingual then that intermediate or independent language then goes
through the dictionary-based translation to get the final product of the target language.
Dictionary-based translation is as simple as translated the source language word for word to the
target language. If the source language was only dictionary translated without either transfer-
based or interlingual translation then the translation would miss all the grammar, syntactical, and
semantic rules as well as the target language’s idioms and morphemes5
(Wilks, 29). RBMT is
great for consistent and predictable translations well following the grammatical rules of the
language pairings. The disadvantages of RBMT include a lack in fluency and the translations
rarely catch exceptions to rules in any given language (Ashraf, 28).
Statistical-Based Machine Translation
Another method of Machine Translation is that of Statistical-Based Machine Translation
which is involved in analyzing the words and sentences based off of a system of statistics. This is
5
Any of the minimal grammatical units of a language, eachconstituting a word or meaningful part of a word, thatcan
not be divided into smaller independent grammaticalparts, as the, write, or the -ed of waited.
6
a methodology that uses statistical data to create a translation from which utilizes a bilingual
corpora (Ashraf, 28). A text corpus is a set of stored and processed texts that are very large and
structured that are used for statistical analysis. These statistical analysis are based on the
occurrences in the language and all the set of rules applied to such source and target language.
The SBMT method can only exist, however, if there is enough data to support of parties of
languages, source and target language. This means that there must be enough data or information
concerning each language stored and analyzed to create a translation. “Building statistical
translation models is a fast process, however, the innovation depends intensely on existing
multilingual corpora. At least 2 million words for a particular space and considerably more for
general dialect are needed” (Ashraf, 28). The problem with SBMT is that there needs to be an
existing set of analyzed linguistic data that is very CPU depended for this system of translation to
be very accurate. The way in which SBMT works is based off of a probability distribution in
which the source languages probability of meaning to target language is high. This means that
through the process of statistical analysis the given a source language is translated by the
probability that it occurs in the target language, if and only if, the data of analysis between the
language pairing is present and the probability high enough. SBMT requires the statistical data
be within the domain of the translator’s inquiry and if the pairing languages do not have enough
data the translation can be quite unpredictable. Another problem with SBMT is that the system
does not actually know the grammatical, syntactical, semantical, etc. rules to but merely relies on
loads of statistical information between a bilingual corpuses. The two great things about SBMT
are that, given enough stored and analyzed information of the pairing language, the translation
from source to target languages are very fluent and good at “catching exceptions to rules”
(Ashraf, 28).
7
Example-Based Machine Translation
Example-Based Machine Translation (EBMT) is very similar to SBMT in that it
compares the language pairings of the translation. The difference is that through the EBMT
system there is no probability analysis but rather a system that relies on previous translated
sentences between the languages pairings. This means that the bilingual corpus of language
pairings are an analogy and there must be prior translated data or information in this system
before a new translation between source and target language can be created. The EBMT system
is an analogy of previously translated language pairings through phrases rather than complete
sentences (Daybelge, 296). When translating a complete sentence from source to target language
there must first be data of a previous translation of certain phrases within the sentence. Once
these phrases are found in the system they are then put together to form the full translation of
source to target language. The inherent problem with EBMT is that there has to be a data log,
within the server being used, that has the previous translated sentences that are similar to what
the language pairings goal translation will be. Meaning there has to not only be many previous
translations between the purposed language pairings but there also has to be translations between
the two languages that are very similar to the meaning of the source to target translation. For this
reason this system of Machine Translation becomes very limited to very few languages that are
capable of being accurately translated.
Hybrid Machine Translation
All of the machine translation systems mentioned above each have their advantages and
disadvantages which leaves much more to be desired when it comes to the accuracy and
8
knowledge of Machine Translations. This lack of satisfaction among translation techniques has
led to what is called Hybrid Machine Translation (HMT) (). HMT is the technique of combining
multiple machine translation systems into one engine with the hopes of one system making up
for the disadvantages of another system and vice versa. The most common methods of HMT are
to combine RBMT and SBMT. The HMT called Statistical Rule Generation is one such method
that first generates a statistical analysis of grammatical rules within its database, if it has enough
of the pairing languages information. This statistical method aims to mimic a ruled based method
of translation from an analysis of information and thus forming rules of the grammar, syntax, etc.
for the pairing languages. Unlike SBMT, this method will follow the rules that it has generated
from its analysis even in the case of ambiguity in the language pairing translation which will
often times lack in fluency and expectations to rules within languages. For this reason this hybrid
method is only capable of creating accurate translations if the rules of each language are similar
or share a close etymological background6
.
The most accurate hybrid machine translation there is would be the Multi-pass system of
translation. In this method the source language is processed multiple times through both RBMT
and SBMT. This process starts with the RBMT process and is often times called the pre-process
in which the source language is analyzed for all of the rules common to that language. It then
creates an independent language (interlingual) or it creates an intermediate understanding of the
sentence (transfer) if the language pairings do not share an interlingual. This rule-based
translation is then passed through a statistical machine translation system that analysis both the
RBMT pre-processed translation and the information within its database of the two original
language pairings. This eliminates the RBMT system from having to translate from the
6
A chronological account of the birth and development of a particular word or element of a word, often delineating
its spread from one language to another and its evolving changes in form and meaning.
9
interlingual or transfer-based method and allows for two forms of statistical analysis to create a
more accurate translation. This Multi-pass system of machine translation does require both
databases of RBMT and SBMT to work coherently which in turn requires more disk space and
CPU usage. HMT may need more processing power and information to create automatic
language translations but the advantages of combining the two most popular methods of
translations has allowed for the most reliable and accurate translations to date().
Conclusion
The future for machine translation and automatic language translation systems are getting
much more accurate with less human involvement as evolving technology and enhancements in
data collection continue to progress. The complexity of language and the amount of different
languages, dialects and cultures in the world make it very hard for automatic language translation
but because of the importance of communication between cultures, politics, business and
learners, there has become a necessity to create a system that helps people around the world
connect. The ultimate goal is to have absolutely no human involvement in translating one
language from another whether it be from text to text or speech to speech (Grap, 12.6). This
would mean that any text scanned onto a computer or found on the internet could be
automatically translated to any language in the world with the full meaning of the source
language understood in the translated target language. With this type of technology humans
would be able to bridge the gap of communication between cultures which would thus progress
the knowledge and understanding of the human species in general.
10
Bibliography
Ashraf, Neeha, and Manzoor Ahmad. "Machine Translation Techniques and Their Comparative
Study." International Journal of Computer Applications IJCA 125.7 (2015): 25-31. Web.
Daybelge, Turhan, and Ilyas Cicekli. "A Ranking Method for Example Based Machine
Translation Results by Learning from User Feedback."Applied Intelligence Appl Intell 35.2
(2010): 296-321. Web.
Grap, Hannah. "Automated language translation ... a solution to public sector communication
requirements." Summit Magazine Sept. 2009. General OneFile. Web. 4 May 2016.
Park, Eun-Jin, Oh-Woog Kwon, Kangil Kim, and Young-Kil Kim. "Classification-Based
Approach for Hybridizing Statistical and Rule-Based Machine Translation." ETRI J ETRI
Journal 37.3 (2015): 541-50. Web.
Wilks, Yorick. Machine Translation: Its Scope and Limits. New York: Springer, 2008. Print.

Mais conteúdo relacionado

Mais procurados

INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
kevig
 
Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latest
Jaganadh Gopinadhan
 

Mais procurados (20)

Hindi –tamil text translation
Hindi –tamil text translationHindi –tamil text translation
Hindi –tamil text translation
 
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATIONA ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
 
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
 
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
A New Approach: Automatically Identify Naming Word from Bengali Sentence for ...
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
NLP
NLPNLP
NLP
 
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
 
Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latest
 
Nlp
NlpNlp
Nlp
 
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
 
Machine Translation
Machine TranslationMachine Translation
Machine Translation
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Hidden markov model based part of speech tagger for sinhala language
Hidden markov model based part of speech tagger for sinhala languageHidden markov model based part of speech tagger for sinhala language
Hidden markov model based part of speech tagger for sinhala language
 
Natural Language Processing glossary for Coders
Natural Language Processing glossary for CodersNatural Language Processing glossary for Coders
Natural Language Processing glossary for Coders
 
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
International Journal on Natural Language Computing (IJNLC) Vol. 4, No.2,Apri...
 
Machine translation
Machine translationMachine translation
Machine translation
 
Translation
TranslationTranslation
Translation
 
17dk0601652nd
17dk0601652nd17dk0601652nd
17dk0601652nd
 

Semelhante a ReseachPaper

Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Waqas Tariq
 
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
kevig
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliterator
Shashank Shisodia
 

Semelhante a ReseachPaper (20)

Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
 
Design and Development of Morphological Analyzer for Tigrigna Verbs using Hyb...
Design and Development of Morphological Analyzer for Tigrigna Verbs using Hyb...Design and Development of Morphological Analyzer for Tigrigna Verbs using Hyb...
Design and Development of Morphological Analyzer for Tigrigna Verbs using Hyb...
 
Building of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert SystemBuilding of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert System
 
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISHHANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
HANDLING CHALLENGES IN RULE BASED MACHINE TRANSLATION FROM MARATHI TO ENGLISH
 
DESIGN AND DEVELOPMENT OF MORPHOLOGICAL ANALYZER FOR TIGRIGNA VERBS USING HYB...
DESIGN AND DEVELOPMENT OF MORPHOLOGICAL ANALYZER FOR TIGRIGNA VERBS USING HYB...DESIGN AND DEVELOPMENT OF MORPHOLOGICAL ANALYZER FOR TIGRIGNA VERBS USING HYB...
DESIGN AND DEVELOPMENT OF MORPHOLOGICAL ANALYZER FOR TIGRIGNA VERBS USING HYB...
 
Machine Translation Approaches and Design Aspects
Machine Translation Approaches and Design AspectsMachine Translation Approaches and Design Aspects
Machine Translation Approaches and Design Aspects
 
Pxc3898474
Pxc3898474Pxc3898474
Pxc3898474
 
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text EditorDynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
Dynamic Construction of Telugu Speech Corpus for Voice Enabled Text Editor
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGESA SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
 
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
CONSTRUCTION OF AMHARIC-ARABIC PARALLEL TEXT CORPUS FOR NEURAL MACHINE TRANSL...
 
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
Construction of Amharic-arabic Parallel Text Corpus for Neural Machine Transl...
 
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
 
B0340710
B0340710B0340710
B0340710
 
SMT3
SMT3SMT3
SMT3
 
Implementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large DictionaryImplementation of Marathi Language Speech Databases for Large Dictionary
Implementation of Marathi Language Speech Databases for Large Dictionary
 
Substitution Error Analysis for Improving the Word Accuracy in Telugu Langua...
Substitution Error Analysis for Improving the Word Accuracy in  Telugu Langua...Substitution Error Analysis for Improving the Word Accuracy in  Telugu Langua...
Substitution Error Analysis for Improving the Word Accuracy in Telugu Langua...
 
E-Translation
E-TranslationE-Translation
E-Translation
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliterator
 

ReseachPaper

  • 1. Ryan Turner [COM-4450.001 | [ORTIZ] Machine Translation FROM HUMAN TRANSLATION TO AUTOMATIC LANGUAGE TRANSLATION
  • 2. 2 Introduction Language is a sophisticated usage of symbols and oral speech used by the human species to form a complex system of communication among one another. This ability is particular to the human species as no other animal on the planet has the capability to create such a complex system of communication and understanding. Language has created and defined what we know as human culture and society. “In annuals of Anthropology, language is considered as a primary tool for studying the culture of a civilization, what we speak influences what we think, what we feel and what we believe.” (Ashraf, 25). The importance of language to any given culture is immense and as there are over 5000 different dialects and languages1 in the world there must be a system of understanding to not only communicate with different cultures but to also understand their history. This means that the only way to interact with other cultures is to understand and translate a vast amount of languages to whatever the primary language of the observer or communicator understands. As globalization increases it is efficacious that there is a system of accurate translations between languages to further the relationships of both the political and business specters. Human translators require a lot of training and knowledge of different cultures to be able to translate accurately. This is a viable option in many circumstances but with such an influx of information and documents in so many languages due to the advancement of internet sharing and communicating, there needs to be a faster and more easily accessible option then just human translators. Automatic language translation systems or machine translation (MT) is a leading technology of which computer programs analyze language structures and source texts to create a translation to a target language with little to no human interaction. Such machine 1 Roughly 6,500 spoken languages but about 2,000 of those languages fewer than 1,000 speakers
  • 3. 3 translation tools available on the internet are Babel Fish, Google Translate, Babylon and StarDict which are only capable of giving rough translations without human editing. This is because the technology has still yet to advance to the complexity of many different languages. Understanding how language systems work and the related terminology is the first thing to understanding how MT works. Then the most commonly used MT systems will be described with their advantages and disadvantages in translation so that there can be a better understanding where the technology still needs to advance. By highlighting the disadvantages of these MT systems and the complexity of translating languages in general, possible solutions will by explained for a future technology that is more accurate, with less human interaction. Language and Translation Terminology In order to understand how Machine Translation or Automatic Language Translation works it must be understood how language is constructed and what the human translation process consists of. Human language generally consists of two main parts: a lexicon and a form of grammar or set of rules. A lexicon basically is the knowledge of words and knowing the meaning of such words. Grammar is a set of rules that allow human language to combine those words from the knowledge of lexicons into a meaningful or coherent sentences. The translation process involves understanding or decoding the meaning of the source text, both lexically and grammatically, and then re-coding this meaning into the target language. This re-coding of the language must follow the lexical understanding and grammatical rules of that target language. The complexity in translation lies in the fact that many languages do not have similar grammar rules or lexicons that follow the same meanings. Grammar rules that must be considered are that of the types of words (nouns, verbs, adjectives, pronouns, prepositions, etc.), functions of the
  • 4. 4 words, case markings of the words and finally the gender of the words. In order to understand a language one must also know how a language is structured and how the rules of grammar are applied. For one to do this there must be an in-depth knowledge of the culture the language comes from so that there is an understanding of its semantics2 , syntax3 , idioms4 and ambiguous words that only have meaning given context. Many languages have one word with many meanings but can only be translated given the context of the rest of the sentence. Given the complexity of translating a language, many human translators are only able to translate few source languages into even fewer target languages. For this problem MT has been evolving and advancing so as to mediate the human involvement in translation and make it much easier and faster for accurate translations of hundreds of languages. Rule-Based Machine Translation There are a few main MT systems used to automatically translate language, one of them is called Rule-Based Machine Translation (RBMT). RBMT is a combination of three different systems of translation which include transfer-based, interlingual and dictionary based machine translation. Interlingual and dictionary based machine translation systems are often used the most in RBTM unless the target language from the source has no interlingual standard. Interlingual machine translation originally translates the source language into an independent language separate from any other language. Then from this standard independent language it is transferred 2 The study of linguistic development by classifying and examining changes in meaning and form 3 The study of the patterns of formation of sentences and phrases from words 4 an expression whose meaning is not predictable from theusual meanings of its constituent elements, as kick thebuck et or hang one's head, or from the generalgrammatical rules of a language, as the table round forthe round table, and that is not a constituent of a largerexpression of like characteristics.
  • 5. 5 to a translation of the target language. If the source language and the target language do not have or share the interlingual, independent language then the source language is translated first into an intermediate understanding of the meaning of the sentence. From here it is transferred to the target language through dictionary-based translation. The difference between transfer-based and interlingual translation is that the interlingual system has a standard operational method, which does prohibit certain language pairing, well transfer-based translation uses just an intermediate system of the source language. If the pairing of languages have an interlingual translation the overall RBMT will be more accurate. Once the source language has been translated either through transfer-based or interlingual then that intermediate or independent language then goes through the dictionary-based translation to get the final product of the target language. Dictionary-based translation is as simple as translated the source language word for word to the target language. If the source language was only dictionary translated without either transfer- based or interlingual translation then the translation would miss all the grammar, syntactical, and semantic rules as well as the target language’s idioms and morphemes5 (Wilks, 29). RBMT is great for consistent and predictable translations well following the grammatical rules of the language pairings. The disadvantages of RBMT include a lack in fluency and the translations rarely catch exceptions to rules in any given language (Ashraf, 28). Statistical-Based Machine Translation Another method of Machine Translation is that of Statistical-Based Machine Translation which is involved in analyzing the words and sentences based off of a system of statistics. This is 5 Any of the minimal grammatical units of a language, eachconstituting a word or meaningful part of a word, thatcan not be divided into smaller independent grammaticalparts, as the, write, or the -ed of waited.
  • 6. 6 a methodology that uses statistical data to create a translation from which utilizes a bilingual corpora (Ashraf, 28). A text corpus is a set of stored and processed texts that are very large and structured that are used for statistical analysis. These statistical analysis are based on the occurrences in the language and all the set of rules applied to such source and target language. The SBMT method can only exist, however, if there is enough data to support of parties of languages, source and target language. This means that there must be enough data or information concerning each language stored and analyzed to create a translation. “Building statistical translation models is a fast process, however, the innovation depends intensely on existing multilingual corpora. At least 2 million words for a particular space and considerably more for general dialect are needed” (Ashraf, 28). The problem with SBMT is that there needs to be an existing set of analyzed linguistic data that is very CPU depended for this system of translation to be very accurate. The way in which SBMT works is based off of a probability distribution in which the source languages probability of meaning to target language is high. This means that through the process of statistical analysis the given a source language is translated by the probability that it occurs in the target language, if and only if, the data of analysis between the language pairing is present and the probability high enough. SBMT requires the statistical data be within the domain of the translator’s inquiry and if the pairing languages do not have enough data the translation can be quite unpredictable. Another problem with SBMT is that the system does not actually know the grammatical, syntactical, semantical, etc. rules to but merely relies on loads of statistical information between a bilingual corpuses. The two great things about SBMT are that, given enough stored and analyzed information of the pairing language, the translation from source to target languages are very fluent and good at “catching exceptions to rules” (Ashraf, 28).
  • 7. 7 Example-Based Machine Translation Example-Based Machine Translation (EBMT) is very similar to SBMT in that it compares the language pairings of the translation. The difference is that through the EBMT system there is no probability analysis but rather a system that relies on previous translated sentences between the languages pairings. This means that the bilingual corpus of language pairings are an analogy and there must be prior translated data or information in this system before a new translation between source and target language can be created. The EBMT system is an analogy of previously translated language pairings through phrases rather than complete sentences (Daybelge, 296). When translating a complete sentence from source to target language there must first be data of a previous translation of certain phrases within the sentence. Once these phrases are found in the system they are then put together to form the full translation of source to target language. The inherent problem with EBMT is that there has to be a data log, within the server being used, that has the previous translated sentences that are similar to what the language pairings goal translation will be. Meaning there has to not only be many previous translations between the purposed language pairings but there also has to be translations between the two languages that are very similar to the meaning of the source to target translation. For this reason this system of Machine Translation becomes very limited to very few languages that are capable of being accurately translated. Hybrid Machine Translation All of the machine translation systems mentioned above each have their advantages and disadvantages which leaves much more to be desired when it comes to the accuracy and
  • 8. 8 knowledge of Machine Translations. This lack of satisfaction among translation techniques has led to what is called Hybrid Machine Translation (HMT) (). HMT is the technique of combining multiple machine translation systems into one engine with the hopes of one system making up for the disadvantages of another system and vice versa. The most common methods of HMT are to combine RBMT and SBMT. The HMT called Statistical Rule Generation is one such method that first generates a statistical analysis of grammatical rules within its database, if it has enough of the pairing languages information. This statistical method aims to mimic a ruled based method of translation from an analysis of information and thus forming rules of the grammar, syntax, etc. for the pairing languages. Unlike SBMT, this method will follow the rules that it has generated from its analysis even in the case of ambiguity in the language pairing translation which will often times lack in fluency and expectations to rules within languages. For this reason this hybrid method is only capable of creating accurate translations if the rules of each language are similar or share a close etymological background6 . The most accurate hybrid machine translation there is would be the Multi-pass system of translation. In this method the source language is processed multiple times through both RBMT and SBMT. This process starts with the RBMT process and is often times called the pre-process in which the source language is analyzed for all of the rules common to that language. It then creates an independent language (interlingual) or it creates an intermediate understanding of the sentence (transfer) if the language pairings do not share an interlingual. This rule-based translation is then passed through a statistical machine translation system that analysis both the RBMT pre-processed translation and the information within its database of the two original language pairings. This eliminates the RBMT system from having to translate from the 6 A chronological account of the birth and development of a particular word or element of a word, often delineating its spread from one language to another and its evolving changes in form and meaning.
  • 9. 9 interlingual or transfer-based method and allows for two forms of statistical analysis to create a more accurate translation. This Multi-pass system of machine translation does require both databases of RBMT and SBMT to work coherently which in turn requires more disk space and CPU usage. HMT may need more processing power and information to create automatic language translations but the advantages of combining the two most popular methods of translations has allowed for the most reliable and accurate translations to date(). Conclusion The future for machine translation and automatic language translation systems are getting much more accurate with less human involvement as evolving technology and enhancements in data collection continue to progress. The complexity of language and the amount of different languages, dialects and cultures in the world make it very hard for automatic language translation but because of the importance of communication between cultures, politics, business and learners, there has become a necessity to create a system that helps people around the world connect. The ultimate goal is to have absolutely no human involvement in translating one language from another whether it be from text to text or speech to speech (Grap, 12.6). This would mean that any text scanned onto a computer or found on the internet could be automatically translated to any language in the world with the full meaning of the source language understood in the translated target language. With this type of technology humans would be able to bridge the gap of communication between cultures which would thus progress the knowledge and understanding of the human species in general.
  • 10. 10 Bibliography Ashraf, Neeha, and Manzoor Ahmad. "Machine Translation Techniques and Their Comparative Study." International Journal of Computer Applications IJCA 125.7 (2015): 25-31. Web. Daybelge, Turhan, and Ilyas Cicekli. "A Ranking Method for Example Based Machine Translation Results by Learning from User Feedback."Applied Intelligence Appl Intell 35.2 (2010): 296-321. Web. Grap, Hannah. "Automated language translation ... a solution to public sector communication requirements." Summit Magazine Sept. 2009. General OneFile. Web. 4 May 2016. Park, Eun-Jin, Oh-Woog Kwon, Kangil Kim, and Young-Kil Kim. "Classification-Based Approach for Hybridizing Statistical and Rule-Based Machine Translation." ETRI J ETRI Journal 37.3 (2015): 541-50. Web. Wilks, Yorick. Machine Translation: Its Scope and Limits. New York: Springer, 2008. Print.