ReseachPaper

Ryan Turner
[COM-4450.001 | [ORTIZ]
Machine Translation
FROM HUMAN TRANSLATION TO AUTOMATIC
LANGUAGE TRANSLATION

2
Introduction
Language is a sophisticated usage of symbols and oral speech used by the human species
to form a complex system of communication among one another. This ability is particular to the
human species as no other animal on the planet has the capability to create such a complex
system of communication and understanding. Language has created and defined what we know
as human culture and society. “In annuals of Anthropology, language is considered as a primary
tool for studying the culture of a civilization, what we speak influences what we think, what we
feel and what we believe.” (Ashraf, 25). The importance of language to any given culture is
immense and as there are over 5000 different dialects and languages1
in the world there must be
a system of understanding to not only communicate with different cultures but to also understand
their history. This means that the only way to interact with other cultures is to understand and
translate a vast amount of languages to whatever the primary language of the observer or
communicator understands. As globalization increases it is efficacious that there is a system of
accurate translations between languages to further the relationships of both the political and
business specters. Human translators require a lot of training and knowledge of different cultures
to be able to translate accurately. This is a viable option in many circumstances but with such an
influx of information and documents in so many languages due to the advancement of internet
sharing and communicating, there needs to be a faster and more easily accessible option then just
human translators. Automatic language translation systems or machine translation (MT) is a
leading technology of which computer programs analyze language structures and source texts to
create a translation to a target language with little to no human interaction. Such machine
1
Roughly 6,500 spoken languages but about 2,000 of those languages fewer than 1,000 speakers

3
translation tools available on the internet are Babel Fish, Google Translate, Babylon and StarDict
which are only capable of giving rough translations without human editing. This is because the
technology has still yet to advance to the complexity of many different languages. Understanding
how language systems work and the related terminology is the first thing to understanding how
MT works. Then the most commonly used MT systems will be described with their advantages
and disadvantages in translation so that there can be a better understanding where the technology
still needs to advance. By highlighting the disadvantages of these MT systems and the
complexity of translating languages in general, possible solutions will by explained for a future
technology that is more accurate, with less human interaction.
Language and Translation Terminology
In order to understand how Machine Translation or Automatic Language Translation
works it must be understood how language is constructed and what the human translation process
consists of. Human language generally consists of two main parts: a lexicon and a form of
grammar or set of rules. A lexicon basically is the knowledge of words and knowing the meaning
of such words. Grammar is a set of rules that allow human language to combine those words
from the knowledge of lexicons into a meaningful or coherent sentences. The translation process
involves understanding or decoding the meaning of the source text, both lexically and
grammatically, and then re-coding this meaning into the target language. This re-coding of the
language must follow the lexical understanding and grammatical rules of that target language.
The complexity in translation lies in the fact that many languages do not have similar grammar
rules or lexicons that follow the same meanings. Grammar rules that must be considered are that
of the types of words (nouns, verbs, adjectives, pronouns, prepositions, etc.), functions of the

4
words, case markings of the words and finally the gender of the words. In order to understand a
language one must also know how a language is structured and how the rules of grammar are
applied. For one to do this there must be an in-depth knowledge of the culture the language
comes from so that there is an understanding of its semantics2
, syntax3
, idioms4
and ambiguous
words that only have meaning given context. Many languages have one word with many
meanings but can only be translated given the context of the rest of the sentence. Given the
complexity of translating a language, many human translators are only able to translate few
source languages into even fewer target languages. For this problem MT has been evolving and
advancing so as to mediate the human involvement in translation and make it much easier and
faster for accurate translations of hundreds of languages.
Rule-Based Machine Translation
There are a few main MT systems used to automatically translate language, one of them
is called Rule-Based Machine Translation (RBMT). RBMT is a combination of three different
systems of translation which include transfer-based, interlingual and dictionary based machine
translation. Interlingual and dictionary based machine translation systems are often used the most
in RBTM unless the target language from the source has no interlingual standard. Interlingual
machine translation originally translates the source language into an independent language
separate from any other language. Then from this standard independent language it is transferred
2
The study of linguistic development by classifying and examining changes in meaning and form
3
The study of the patterns of formation of sentences and phrases from words
4
an expression whose meaning is not predictable from theusual meanings of its constituent elements, as kick thebuck
et or hang one's head, or from the generalgrammatical rules of a language, as the table round forthe round table, and
that is not a constituent of a largerexpression of like characteristics.

5
to a translation of the target language. If the source language and the target language do not have
or share the interlingual, independent language then the source language is translated first into an
intermediate understanding of the meaning of the sentence. From here it is transferred to the
target language through dictionary-based translation. The difference between transfer-based and
interlingual translation is that the interlingual system has a standard operational method, which
does prohibit certain language pairing, well transfer-based translation uses just an intermediate
system of the source language. If the pairing of languages have an interlingual translation the
overall RBMT will be more accurate. Once the source language has been translated either
through transfer-based or interlingual then that intermediate or independent language then goes
through the dictionary-based translation to get the final product of the target language.
Dictionary-based translation is as simple as translated the source language word for word to the
target language. If the source language was only dictionary translated without either transfer-
based or interlingual translation then the translation would miss all the grammar, syntactical, and
semantic rules as well as the target language’s idioms and morphemes5
(Wilks, 29). RBMT is
great for consistent and predictable translations well following the grammatical rules of the
language pairings. The disadvantages of RBMT include a lack in fluency and the translations
rarely catch exceptions to rules in any given language (Ashraf, 28).
Statistical-Based Machine Translation
Another method of Machine Translation is that of Statistical-Based Machine Translation
which is involved in analyzing the words and sentences based off of a system of statistics. This is
5
Any of the minimal grammatical units of a language, eachconstituting a word or meaningful part of a word, thatcan
not be divided into smaller independent grammaticalparts, as the, write, or the -ed of waited.

6
a methodology that uses statistical data to create a translation from which utilizes a bilingual
corpora (Ashraf, 28). A text corpus is a set of stored and processed texts that are very large and
structured that are used for statistical analysis. These statistical analysis are based on the
occurrences in the language and all the set of rules applied to such source and target language.
The SBMT method can only exist, however, if there is enough data to support of parties of
languages, source and target language. This means that there must be enough data or information
concerning each language stored and analyzed to create a translation. “Building statistical
translation models is a fast process, however, the innovation depends intensely on existing
multilingual corpora. At least 2 million words for a particular space and considerably more for
general dialect are needed” (Ashraf, 28). The problem with SBMT is that there needs to be an
existing set of analyzed linguistic data that is very CPU depended for this system of translation to
be very accurate. The way in which SBMT works is based off of a probability distribution in
which the source languages probability of meaning to target language is high. This means that
through the process of statistical analysis the given a source language is translated by the
probability that it occurs in the target language, if and only if, the data of analysis between the
language pairing is present and the probability high enough. SBMT requires the statistical data
be within the domain of the translator’s inquiry and if the pairing languages do not have enough
data the translation can be quite unpredictable. Another problem with SBMT is that the system
does not actually know the grammatical, syntactical, semantical, etc. rules to but merely relies on
loads of statistical information between a bilingual corpuses. The two great things about SBMT
are that, given enough stored and analyzed information of the pairing language, the translation
from source to target languages are very fluent and good at “catching exceptions to rules”
(Ashraf, 28).

7
Example-Based Machine Translation
Example-Based Machine Translation (EBMT) is very similar to SBMT in that it
compares the language pairings of the translation. The difference is that through the EBMT
system there is no probability analysis but rather a system that relies on previous translated
sentences between the languages pairings. This means that the bilingual corpus of language
pairings are an analogy and there must be prior translated data or information in this system
before a new translation between source and target language can be created. The EBMT system
is an analogy of previously translated language pairings through phrases rather than complete
sentences (Daybelge, 296). When translating a complete sentence from source to target language
there must first be data of a previous translation of certain phrases within the sentence. Once
these phrases are found in the system they are then put together to form the full translation of
source to target language. The inherent problem with EBMT is that there has to be a data log,
within the server being used, that has the previous translated sentences that are similar to what
the language pairings goal translation will be. Meaning there has to not only be many previous
translations between the purposed language pairings but there also has to be translations between
the two languages that are very similar to the meaning of the source to target translation. For this
reason this system of Machine Translation becomes very limited to very few languages that are
capable of being accurately translated.
Hybrid Machine Translation
All of the machine translation systems mentioned above each have their advantages and
disadvantages which leaves much more to be desired when it comes to the accuracy and

8
knowledge of Machine Translations. This lack of satisfaction among translation techniques has
led to what is called Hybrid Machine Translation (HMT) (). HMT is the technique of combining
multiple machine translation systems into one engine with the hopes of one system making up
for the disadvantages of another system and vice versa. The most common methods of HMT are
to combine RBMT and SBMT. The HMT called Statistical Rule Generation is one such method
that first generates a statistical analysis of grammatical rules within its database, if it has enough
of the pairing languages information. This statistical method aims to mimic a ruled based method
of translation from an analysis of information and thus forming rules of the grammar, syntax, etc.
for the pairing languages. Unlike SBMT, this method will follow the rules that it has generated
from its analysis even in the case of ambiguity in the language pairing translation which will
often times lack in fluency and expectations to rules within languages. For this reason this hybrid
method is only capable of creating accurate translations if the rules of each language are similar
or share a close etymological background6
.
The most accurate hybrid machine translation there is would be the Multi-pass system of
translation. In this method the source language is processed multiple times through both RBMT
and SBMT. This process starts with the RBMT process and is often times called the pre-process
in which the source language is analyzed for all of the rules common to that language. It then
creates an independent language (interlingual) or it creates an intermediate understanding of the
sentence (transfer) if the language pairings do not share an interlingual. This rule-based
translation is then passed through a statistical machine translation system that analysis both the
RBMT pre-processed translation and the information within its database of the two original
language pairings. This eliminates the RBMT system from having to translate from the
6
A chronological account of the birth and development of a particular word or element of a word, often delineating
its spread from one language to another and its evolving changes in form and meaning.

9
interlingual or transfer-based method and allows for two forms of statistical analysis to create a
more accurate translation. This Multi-pass system of machine translation does require both
databases of RBMT and SBMT to work coherently which in turn requires more disk space and
CPU usage. HMT may need more processing power and information to create automatic
language translations but the advantages of combining the two most popular methods of
translations has allowed for the most reliable and accurate translations to date().
Conclusion
The future for machine translation and automatic language translation systems are getting
much more accurate with less human involvement as evolving technology and enhancements in
data collection continue to progress. The complexity of language and the amount of different
languages, dialects and cultures in the world make it very hard for automatic language translation
but because of the importance of communication between cultures, politics, business and
learners, there has become a necessity to create a system that helps people around the world
connect. The ultimate goal is to have absolutely no human involvement in translating one
language from another whether it be from text to text or speech to speech (Grap, 12.6). This
would mean that any text scanned onto a computer or found on the internet could be
automatically translated to any language in the world with the full meaning of the source
language understood in the translated target language. With this type of technology humans
would be able to bridge the gap of communication between cultures which would thus progress
the knowledge and understanding of the human species in general.

10
Bibliography
Ashraf, Neeha, and Manzoor Ahmad. "Machine Translation Techniques and Their Comparative
Study." International Journal of Computer Applications IJCA 125.7 (2015): 25-31. Web.
Daybelge, Turhan, and Ilyas Cicekli. "A Ranking Method for Example Based Machine
Translation Results by Learning from User Feedback."Applied Intelligence Appl Intell 35.2
(2010): 296-321. Web.
Grap, Hannah. "Automated language translation ... a solution to public sector communication
requirements." Summit Magazine Sept. 2009. General OneFile. Web. 4 May 2016.
Park, Eun-Jin, Oh-Woog Kwon, Kangil Kim, and Young-Kil Kim. "Classification-Based
Approach for Hybridizing Statistical and Rule-Based Machine Translation." ETRI J ETRI
Journal 37.3 (2015): 541-50. Web.
Wilks, Yorick. Machine Translation: Its Scope and Limits. New York: Springer, 2008. Print.

ReseachPaper

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a ReseachPaper

Semelhante a ReseachPaper (20)

ReseachPaper