Apidays New York 2024 - The value of a flexible API Management solution for O...
New Tools and Resources to Support Machine Translation
1. Anabela Barreiro
barreiro_anabela@hotmail.com
FLUP & CLUP-Linguateca
New York University
New Tools and Resources to Support
Machine Translation
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
3. Human Translation vs Machine Translation
An objective and purpose distinction must be established
between human translation and machine translation!
•They use different methods
•They apply to different types of texts
•They serve different purposes
•They face different barriers
•They are NOT in competition!
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
4. Human Translation
Professional translation requires:
•a profound knowledge of the source language and native
proficiency of the target language
•above-average writing skills
•an insightful knowledge of the social-cultural aspects of the
source and target languages
•knowledge of the grammar of the two languages, their
writing conventions, and the situational and cultural context
•In the case of scientific and technical translation, subject
matter knowledge is required, including terminologies of the
field or knowledge domain.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
5. Human Translation
Theory of translation has been dealing with controversial
issues:
•problems related to privileging meaning over form
•visibility or invisibility of the translator
•being faithful to the author or trying to make the text
accessible to the reader (and which kind of reader)
•giving value to the source language culture (foreignise) or
making the text suitable for the target language culture
(domesticate)
•Allowing languages/cultures with more impact to
predominate over languages/cultures with less impact, or being
creative, etc.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
6. Human Translation
The most relevant aspect in translation is to define the
purpose of each translation, which is related to the
characteristics of each text.
… And to define paraphrasing capabilities.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
7. Human Translation: Types of Texts
A certain subjectivity and distance from the source
language text is allowed in translation of literary text for the
sake of maintaining the artistic and aesthetic aspects of the
target language text [Hermans, 1985] [Landers, 2001].
Literary translation may be considered an ART [Leighton,
1990] [Weaver, 2002], where the translator has more freedom
of expression.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
8. Human Translation: Types of Texts
Technical, commercial, and legal translators, like the
authors of the original texts, are more restrained in their use of
language, and they need to be precise and convey the exact
meaning of the original text.
Technical texts are not meant to be beautiful but rather
to be informative, instructive and explanatory. Their main
function is to be clear, so the easier they are to read, the better
they are understood.
Technical translation may be regarded as a CRAFT
[Newmark, 1988] [Biguenet & Schulte, 1989] for which both
technical and linguistic competence is essential, but creativity
and vagueness prohibited.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
9. Machine Translation
With more translation being performed by machines,
new challenges are imposed on the field, theoretical traditions
shaken and the need to rethink the status of translation
becomes more evident. Of all automated applications, machine
translation compels us to reconsider the nature of translation.
ART and CRAFT are NOT appropriate concepts for
machine translation, because it has necessarily to rely on
linguistics and computer science.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
10. Machine Translation
1- Automated translation of text or speech from one natural
language into another
2- An important tool that assists human translators
3- It has become available to the general public in the last few
years due to:
• sophisticated computers
• continuous development of computer software capabilities
• internet boom
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
12. Machine Translation Bottlenecks
1.Complexity of language
2.Ambiguity of language
3.Wordiness (related to text quality)
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
13. Machine Translation: Limitations
• The task of delivering high-quality machine translation of certain
types of texts and complex linguistic phenomena is difficult
• It is difficult to grasp humour, sarcasm, and other human feelings
expressed in/by means of sophisticated linguistic expression
• Difficulties in handling extra-sentential and extra-textual and
extra-linguistic information (problems of culture or context),
because knowledge of the world cannot be assumed
• Difficult to deal with anaphora resolution
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
14. Machine Translation Linguistic Challenges
1.Homography
2.Cross-language phenomena (lexical divergences and idioms
and cross-language syntactic transformations, such as
passives)
3.Identification of named entities
4.Capacity to deal with long sentences and wordiness
5.Unusual alterations to the order of words in the target
language
6.Enhanced dictionaries and grammars to recognize and
translate multiword expressions
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
15. Machine Translation Linguistic Challenges: Examples
• Handling of ellipsis
advanced ambiguity problems – related to anaphora
O João visitou muitos países do mundo. A Maria não visitou nenhum.
=> João has visited many countries in the world. Maria hasn’t visited any.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
16. Machine Translation Linguistic Challenges: Examples
• Common-noun nuance resolution / homography
(1) ele não quis tomar partido de ninguém
(2) ele é um bom partido
(3) ele tirou partido da situação
(4) ele pertence a esse partido (político)
(5) o copo está partido
(6) já esteve em melhor partido
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
17. Machine Translation Linguistic Challenges: Examples
Translation Engine Translation Results
FreeTranslation Francisco Scallop advances even if is it do an effort in the sense of take a decision still this
week, defined advances or not for a candidacy to the RTLRS.
WorldLingo advances despite he is to make an effort in the direction to still take a decision this week,
defining if he advances or he does not stop a candidacy to the RTLRS.
Translation Engine Translation Results
Google Eu não posso fazer a uma decisão sobre qualquer coisa estes dias.
Amikai que eu não posso fazer para uma decisão sobre qualquer coisa estes dias.
FreeTranslation Eu não posso tomar uma decisão sobre algo estes dias.
Babelfish Eu não posso fazer a uma decisão sobre qualquer coisa estes dias.
WorldLingo Eu no posso fazer a uma deciso sobre qualquer coisa estes dias.
E-Translation Server Não posso tomar uma decisão sobre qualquer coisa estes dias.
I can't make a decision about anything these days. [Compara]
Francisco Vieira adianta ainda que está a fazer um esforço no sentido de
tomar uma decisão ainda esta semana, definindo se avança ou não para
uma candidatura à RTLRS. [CdP]
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
18. Multiword Expressions: Support Verb Constructions
Support verb construction = predicate noun construction
is a multiword expression containing a verb with weak semantic value
and a noun which is the predicate of the sentence.
Predicate nouns can be:
morphologically related to a verb
fazer uma apresentação de = apresentar
pay a visit to = to visit
autonomous
fazer um mestrado - *mestrar
have fun - *to fun
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
19. Main Objectives
1.Build a body of lexical, syntactic and semantic knowledge
around support verb constructions
2.Apply this linguistic knowledge to paraphrasing
3.Improve machine translation
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
20. Outcome: Resources
Port4NooJ
•an open source, ontology driven Portuguese linguistic
system, which integrates a bilingual extension for
Portuguese-English machine translation
DicTUM
•Dicionário de Termos e Unidades Multipalavra
•a Dictionary of Multiword Expressions
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
21. Outcome: Tools
ReWriter
•a monolingual paraphraser to pre-edit texts, using
paraphrasing capabilities
•Portuguese version ReEscreve
ParaMT
•a bilingual/multilingual paraphraser to be integrated in
machine translation systems
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
22. Resources
Port4NooJ - Publicly available at:
http://www.nooj4nlp.net
http://www.linguateca.pt/Repositorio/Port4Nooj/
Based on:
•NooJ linguistic environment (http://www.nooj4nlp.net/)
•OpenLogos English-Portuguese dictionary (http://logos-
os.dfki.de/)
OpenLogos is an open-source derivative of the Logos Machine Translation System
Data Used
•COMPARA (http://www.linguateca.pt/COMPARA)
•METRA (http://www.linguateca.pt/metra)
•Other corpora
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
23. HIV,N+FLX=PORTUGAL+AB+state+IMMUN+EN=HIV
doença maníaco-depressiva,N+FLX=CASA+AB+state+MH+EN=manic-depressive disorder
doença bipolar,N+FLX=CASA+AB+state+MH+EN=bipolardisorder
asma,N+FLX=CASA+AB+state+PULM+EN=asthma
Amesterdão,N+PL+city+EN=Amsterdam
Estados Unidos da América,N+PL+coun+EN=United States of America
África,N+PL+cont+EN=Africa
Extremo Oriente,N+PL+othprop+EN=Far East
Mediterrâneo,N+FLX=ANO+PL+water+EN=Mediterranean
Alpes Peninos,N+FLX=ALPES+PL+othprop+EN=Pennine Alps
ONU,N+AN+org+EN=UN
Syntactic-
Semantic
Attributes
English
Transfer
Inflectional
Paradigm
Part of
Speech
Lemma
mesa,N+FLX=CASA+CO+surf+EN=table
cair,V+FLX=ATRAIR+INMO+IntoType+EN=fall
holandês,A+FLX=INGLÊS+AN+lang+EN=Dutch
actualmente,ADV+FLX=FACILMENTE+TEMP+punc+pres+EN=nowadays
alguém,PRO+IMPERS+INDEF+EN=somebody
porque,RELINT+why+EN=why
e,CONJ+JOIN+EN=and
durante,PREP+TEMP+EN=during
cada,DET+IMPERS+INDEF+SG+EN=each
terceiro+NUM+ord+EN=one third
Port4NooJ Dictionaries
a curto prazo,ADV+TEMP+EN=in the short run
a favor de,PREP+CAUS+EN=in favor of
cada um,PRO+INDEF+SG+EN=each one
de quem,INT+ThatType+EN=whose
quem quer que seja,REL+WhateverType+EN=whoever
além disso,CONJ+COOR+EN=besides
um quarto,NUM+frac+EN=one fourth
adro da igreja,N+FLX=MENINO+PL+encl+EN=churchyard
cabo de vassoura,N+FLX=MENINO+COtool+EN=broomstick
bebida alcoólica,N+FLX=CASA+MA+liqu+EN=alcoholic drink+UNAMB
bebida alcoólica,N+FLX=CASA+MA+liqu+EN=booze+slang
cor de laranja,A+NAV+Apred+EN=orange
sul-americano,A+FLX=ALTO+AN+des+EN=South American
a curto prazo,ADV+LocTime+TEMP+EN=in the short run
fora de serviço,ADV+STAT+phr+EN=out of order
há muito tempo,ADV+LocTime+TEMP+puncpast+EN=a long time ago
isto é,CONJ+COOR+EN=i.e.
já não,CONJ+COOR+EN=no longer
mesmo assim,CONJ+SUB+EN=even so
juntamente com,PREP+ASSOC+EN=along with
à direita de,PREP+Loc+AT+EN=at the right of
em conformidade com,PREP+ALOG+EN=in congruence with
General dictionary
sample representing all
PoS, variable and
invariable forms Sample of the
dictionary of Terms
and
Multiword Expressions
DicTUMSample of invariable
compounds in the
general dictionary
Sample of the
dictionary of
Biomedical Terms
Sample of the
dictionary of
Proper Names
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
24. Port4NooJ Dictionaries
Sample of terms
classified as Information
+ Instructional/legal
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
26. Syntactic-Semantic Ontology
Noun Supersets
concrete
mass
animate
place
information
abstract
process (intr)
process (tr)
measure
time
aspective
Sets and Subsets of the CONCRETE Noun Superset
Click on CONCRETE Superset, sets and subsets for explanations
functionals
receptacles
bearing surfaces
links/bridges
thresholds, focal
points, barriers
conduits
fasteners
devices, tools
cloth thing
structural elements
concretizations of
verbals
concretizations of
mass nouns
undifferentiated
functionals
product/brand
names
* * *
agentives
software
vehicles
meters
machines/systems
communication agents
concrete chemical
agents
undifferentiated
agentives
* * *
natural things
minute flora
plants
trees
trees/wood
miscellaneous natural
things
* * *
other concrete sets*
impulses/lights
blemishes/marks
edibles (non-mass)
edibles/color
classifiers
amorphous
atomistic
undifferentiated
concrete things
* * *
*With one exception, these
sets have no subsets
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
27. Syntactic-Semantic Ontology
Category Mnemonic Examples in English Examples in Portuguese
agentives CO+undagt See subsets See subsets
software CO+soft routine rotina, ficheiro
concrete chemical agents CO+chem catalyst, warhead ácido sulfúrico
machines/systems CO+mach battery, camera máquina fotográfica
vehicles CO+vehic truck, ship automóvel
meters CO+meter clock, gauge manómetro
communication agents CO+comm radio, radar rádio
functionals CO+undfunc trinket, ornament ornamento
devices/tools CO+tool pliers alicate
fasteners CO+fast nail, tendon prego
bearing surfaces CO+surf table, shelf mesa
receptacles CO+recp bottle, barrel garrafa
conduits CO+cond chute, artery artéria
thresholds/focal points/barriers CO+barr wall, door porta
links/bridges CO+link circuit, nerve circuito
cloth things CO+cloth shirt, blanket camisola
structural elements CO+struc spar, bone osso
concretizations of verbals CO+verb threading
concretizations of mass nouns CO+mass acid lining
product/brand names CO+brand Windows NT Windows NT
natural things CO+nat See subsets See subsets
minute flora CO+flora algae, spore alga
plants CO+plant rose, weed erva
trees CO+tree apple, willow macieira
trees/wood CO+trwd oak, maple carvalho
misc. natural things CO+mnat pebble, iceberg iceberg
edibles (non-mass) CO+ednm pork chop costoleta
edibles/color CO+edcol orange, cherry laranja
impulses/lights Col+ight lamp, beam lâmpada
blemishes/marks CO+blem scratch, freckle sarda
classifiers CO+class element elemento
amorphous CO+amor breeze, tide brisa
atomistic CO+atom electron, atom átomo
undifferentiated CO+obj trifle, curio
Categories of
CONCRETE nouns
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
28. ME - MEASURE Noun Sets and Subsets
Sets and Subsets
Mnemonics (=
SynSem)
Examples
abstract concepts measured by unit ME+abs humidity, length
discrete measurable concepts ME+dis sum, increment
units of measure ME+unit See subsets
units of weight ME+unit+wt ounce, pound
units of velocity ME+unit+vel mph, megahertz
units of volume measure ME+unit+vol gallon, liter
units of temperature ME+unit+temp degrees celsius
units of energy/force ME+unit+ener watt, horsepower
measurement systems ME+unit+sys fahrenheit, kelvin
units of duration ME+unit+dur hour, minute, year
specialized units of measure ME+unit+spec oersted, ohm, phon
units of money/value ME+unit+value dollar, euro, forint
units of linear/area measure ME+unit+lin inch, yard, mile
general undifferentiated measure ME+undif degree, gross, share
Syntactic-Semantic Ontology
Categories of
MEASURE nouns
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
30. Paraphrasing and Translation Grammars
Translation and
bilingual paraphrasing
of simple sentences
Graph to translate simple
sentences
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
31. Verb entries:
• Identification of derivational paradigms for nominalizations
(annotation NDRV) and predicate adjectives (annotation ADRV)
• Link to the derived noun’s support verbs and to the adjective’s
copula verbs (annotation VSUP and annotation VCOP)
adaptar,V+FLX=FALAR+Aux=1+INOP57+Subset=132+EN=adapt+VSUP=fazer+DRV=NDRV00:CANÇÃO
azedar,V+FLX=LIMPAR+Aux=1+OBJTRundif98+Subset=740+EN=sour+VCOP=estar+DRV=ADRV00:ALTO
Explicit Marking of Derivation and Support Verb
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
32. Adjective entries:
• Identification of derivational paradigms for adverbializations
(annotation AVDRV)
literal,A+FLX=PRINCIPAL+IN+symb+EN=literal+DRV=AVDRV00:LITERALMENTE
Autonomous predicate nouns:
• Identification of autonomous predicate nouns (annotation
Npred)
• Identification of a semantically related verb
curso,N+FLX=ANO+Npred+IN+inst+EN=course+VSUP=tirar+VRB=estudar+NPrep=de+Det=um
Explicit Marking of Derivation and Semantic Verb Association
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
33. ReWriter: a Monolingual Standalone Paraphraser
Recognition and monolingual paraphrasing
of support verb constructions
(support verb construction / morphologically related lexical verb)
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
35. ReWriter: Application - Interface
Interactive ReWriter
for word processing applications
such as text editing
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
36. ReWriter: Application - Interface
Interactive ReWriter
for word processing applications
such as text editing
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
37. ReWriter: Application - Interface
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
38. ReWriter: Application - Interface
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
39. ReWriter: Application - Interface
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
40. ReWriter: Extensibility
1.Applications to General Language
2.Applications to Technical Language
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
41. ReWriter: Extensibility - Examples
[Paraphrasing adverbials]
à volta da órbita ≡ periorbital (popular versus technical)
around the orbit of the eye periorbital≡
[Paraphrasing relative clauses - into adjectival past
participles]
N0 que têm sido escritos N0 que foram descritos N0≡ ≡
escritos
N0 that have been written N0 that were described≡ ≡
N0 written
[Paraphrasing if clauses]
se for necessário se necessário≡
if it is necessary if necessary≡Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
42. ReWriter: Extensibility - Examples
[Paraphrasing coordinated noun phrases - conjoining
or disjoining]
recursos linguísticos para o ensino e para a investigação
Ŧ ?linguistic resources for teaching and for research
≡ recursos linguísticos para o ensino e a investigação
Ŧ linguistic resources for teaching and research
[Paraphrasing subjunctive clauses - into infinitives]
pedimos o favor que confirme a sua participação
Ŧ *we ask the favor that you confirm your attendance
≡ pedimos o favor de confirmar a sua participação
Ŧ *we ask the favor of confirming your attendance
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
43. ReWriter: Extensibility - Examples
[Paraphrasing marked-up constructions]
se a necessidade do utilizador é criar um texto em linguagem controlada
Ŧ ?if the end-user need is to create controlled language text
≡ se o utilizador necessita de criar um texto em linguagem controlada
Ŧ if the end-user needs to create controlled language text
[Paraphrasing of vague and undefined or null subject sentences]
(whenever the real subject/actor is known)
[-] houve um grito na rua [N-PRON]/≡ alguém gritou na rua
Ŧ there was shouting in the street [N-PRON]/≡ someone shouted in the
street
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
44. ReWriter: Extensibility - Examples
[Paraphrasing passives - whenever suitable]
Esse livro foi escrito por Saramago em 2008 ≡ Saramago escreveu
esse livro em 2008
That book was written by Saramago in 2008 Saramago wrote that≡
book in 2008
Florida foi atingida por um tornado ≡ Um tornado atingiu a Florida
Florida was hit by a tornado A tornado hit Florida≡
O carro foi roubado ≡ Alguém roubou o carro
The car was stolen ≡ Someone stole the car
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
45. ParaMT: a Bilingual/Multilingual Paraphraser for MT
Recognition and bilingual paraphrasing of support verb constructions
(Portuguese support verb construction / corresponding English verb)
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
46. Preliminary Quantitative Results
SVC Recognition
Precision
SVC Recognition
Recall
SVC Paraphrasing
Precision
Pôr 73/73 - 100% 73/100 – 73% 72/73 - 98.6%
Tomar 75/75 - 100% 75/100 – 75% 68/73 - 93.1%
Ter 65/65 - 100% 65/100 – 65% 59/65 - 90.7%
Dar 57/60 - 95% 57/100 – 57% 46/51 - 90.1%
Fazer 43/45 – 95.5% 43/100 – 43% 40/45 - 88.8%
Average 62.6/63.6 - 98.4% 62.6/100 - 62.6% 57/61 - 93.4%
Evaluation of recognition and paraphrasing
of support verb constructions
500 sentences
100 for each elementary support verb
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
47. Conclusions
Linguistic knowledge applied to a machine
translation system improves its output quality.
Effective results from linguistically based research
on paraphrases can save substantial effort and
resources employed by machine translation systems
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
48. Thank you for your attention!
Acknowledgements
This work was partly supported by grant SFRH/BD/14076/2003
from Fundação para a Ciência e a Tecnologia, co-financed by
POSI and partly by Fundação para a Computação Científica
Nacional.
Mestrado em Tradução Jurídica e Empresarial
Anabela Barreiro Lisboa, 8 January 2008
Notas do Editor
Good afternoon! My name is AB and I am a PhD student working on MT. I am affiliated with Universidade do Porto-Linguateca and New York University. My interests have centered on MT after working on a commercial MT system for over 7 years. In this presentation , I will introduce ParaMT, a paraphraser applied to machine translation, which was developed during my research work.
Outline First an introduction to distinguish HT from MT Then talk about the resources and tools developed within the scope of my PhD research
Human translation cannot be replaced by machine translation, at least until there are breakthroughs in the limitation of machine translation to sentence level translation, and in artificial intelligence.
Some facts about Machine Translation For most of human history, translation was an exclusively human activity. Before that, machine translation was only accessible to a very restricted niche of the market, and computer-aided translation was used only by professional translators.
Despite the availability of funding and many talented researchers worldwide, most efforts to build cost-effective, industrial strength, high-quality machine translation have fallen short of their goals, since first attempts in the 1950's. Successful machine translation has been difficult to achieve because of two major hurdles: complexity and ambiguity of language.
Human translators use ingenuity and skill to artfully reproduce feeling and sound. Human beings can easily and intuitively retrieve information from even distant parts of the text, or from extra-textual knowledge. This is the big advantage human translators have over machines and one of the reasons why human and machine translation do not compete. Human translation can easily solve most ambiguity problems as well as problems of culture and context that machine translation cannot. Human use of language assumes knowledge of the world or user-centric particular worlds, something that is difficult or impossible to program in a machine. Machine translation functions best in a situation where the writer cannot assume shared knowledge of the world. Domain specific contexts and controlled language make machine translation reliable, proving the advantages of a scientific approach to language.
More challenging machine translation grey areas are anaphora resolution, common-noun nuance resolution, and the handling of ellipsis, which constitute a class of more advanced ambiguity problems. To solve them, analysis must go beyond the sentence level to the extra-sentential (discourse) level. They relate to referential associations of utterances in neighboring clauses or the text. Typical problems in machine translation They often produce errors
Human translators use ingenuity and skill to artfully reproduce feeling and sound. Human beings can easily and intuitively retrieve information from even distant parts of the text, or from extra-textual knowledge. This is the big advantage human translators have over machines and one of the reasons why human and machine translation do not compete. Human translation can easily solve most ambiguity problems as well as problems of culture and context that machine translation cannot. Human use of language assumes knowledge of the world or user-centric particular worlds, something that is difficult or impossible to program in a machine. Machine translation functions best in a situation where the writer cannot assume shared knowledge of the world. Domain specific contexts and controlled language make machine translation reliable, proving the advantages of a scientific approach to language. More challenging machine translation grey areas are anaphora resolution, common-noun nuance resolution, and the handling of ellipsis, which constitute a class of more advanced ambiguity problems. To solve them, analysis must go beyond the sentence level to the extra-sentential (discourse) level. They relate to referential associations of utterances in neighboring clauses or the text.
"bom partido" também pode ser considerado um composto e "tirar partido de" como uma expressao fixa ou semi-fixa
Human translators use ingenuity and skill to artfully reproduce feeling and sound. Human beings can easily and intuitively retrieve information from even distant parts of the text, or from extra-textual knowledge. This is the big advantage human translators have over machines and one of the reasons why human and machine translation do not compete. Human translation can easily solve most ambiguity problems as well as problems of culture and context that machine translation cannot. Human use of language assumes knowledge of the world or user-centric particular worlds, something that is difficult or impossible to program in a machine. Machine translation functions best in a situation where the writer cannot assume shared knowledge of the world. Domain specific contexts and controlled language make machine translation reliable, proving the advantages of a scientific approach to language. More challenging machine translation grey areas are anaphora resolution, common-noun nuance resolution, and the handling of ellipsis, which constitute a class of more advanced ambiguity problems. To solve them, analysis must go beyond the sentence level to the extra-sentential (discourse) level. They relate to referential associations of utterances in neighboring clauses or the text.
A support verb construction is defined as a predicate noun construction containing a main verb which has a weak semantic value. Support verb constructions is an area where statistics tend to “trap” systems. If statistical systems are not sensitive to these constructions, the consequence may be misleading translations. Linguistic knowledge about support verb constructions provides a statistical system with special training data that could correct this problem.
So, according to this desire to see better results, my main objectives were: READ 1, 2, 3.
The outcome of this work were new resources (Port4NooJ system) and two new automated tools to recognize multiword expressions and generate paraphrases of them: ReWriter and ParaMT. Both paraphrasers are based on Port4NooJ resources .
The outcome of this work were new resources (Port4NooJ system) and two new automated tools to recognize multiword expressions and generate paraphrases of them: ReWriter and ParaMT. Both paraphrasers are based on Port4NooJ resources .
In any language processing application, the linguistic resources represent the foundation. In machine translation especially, the linguistic resources are the driving force that boosts the translation process. Port4NooJ is developed on two original sources: NooJ linguistic environment and OpenLogos lexical resources. Linguateca’s resources were also used.
The system includes several dictionaries. The structure of the dictionary is XXX
The system includes several dictionaries. The structure of the dictionary is XXX
I will skip this slide on the inflectional and derivational descriptions.
Este slide apresenta uma gramática local para a análise e reconhecimento de construções com verbos suporte elementares e o parafraseamento monolingue que podemos ver na concordância. Paralelamente podemos encontrar, à esquerda a CVS e à direita um verbo lexical que lhe é equivalente.
Neste slide temos representada mais uma concordância, desta vez para o reconhecimento e parafraseamento de construções com verbos suporte elementares que co-ocorrem com nomes predicativos da área biomédica. À esquerda está representada a CVS e à direita um verbo lexical que lhe é equivalente ou uma variante estilística da construção, que pode ser construída a partir de um verbo suporte não elementar, tal como efectuar ou realizar ou por uma construção do tipo “sujeitar-se a” ou “submeter-se a”, no caso de o sujeito da CVS ser obrigatoriamente um paciente. À esquerda está representada a CVS e à direita as suas paráfrases.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
One of the possible applications of ReWriter is its interactive use in word processing. Paraphrasing can work in a similar way to synonyms in in text editing.
A concordância representada neste slide ilustra o reconhecimento e parafraseamento bilingue PT-EN de CVS. À esquerda temos a CVS em português e à direita, um verbo lexical equivalente em inglês.