SlideShare uma empresa Scribd logo
1 de 54
Baixar para ler offline
th

4 intensive summer school on
Natural Language Processing
Bilingual Terminology Mining
Estelle Delpech
30th November, 2010

1
About me

●

●

●
●
●

●
●
●

Estelle Delpech
Research engineer at Lingua et Machina,
France
CAT tools provider
ed(at)lingua-et-machina(dot)com
www.lingua-et-machina.com
Ph. Candidate at LINA, France
taln team : specialises in NLP
estelle.delpech(at)univ-nantes(dot)fr

2
Presentation outline

●

●
●

About terms, terminology, terminology
mining
Term Extraction
Term Alignment

3
Presentation outline

●

●
●

About terms, terminology,
terminology mining
Term Extraction
Term Alignment

4
What is a term ?

●

●

●

Classical definition :
●
“unequivocal expression of a concept
within a technical domain“
Traces back to 1930 Eugene Wüster
« General Theory of Terminology »
Specialized language is / should be
unambiguous
concept

term

referent
Ogden semiotic triangle

5
What is a term ?
“Classical terminology challenged in the 1990's
by :
● sociolinguistics
● corpus-based linguistics
● computational terminology
●

Observe terms in texts :
● there is variation, polysemy
● concepts evolve overtime
● no clear-cut border between
specialized and general languages

6
What is a term ?

●

●

●

●

Definition of « term » depends on the
application / audience of the terminology
Domain expert :
●
Unit of knowledge
Information retrieval :
●
Descriptors for indexation
Translation
●
word or phrase that :
● is not part of general language
● Translates differently in a particular
domain
●
can be :
● Noun, adjective, verb
● Noun phrase, verb phrase, etc.
7
What is a terminology ?
●
●

●

●

Set of terms + terminological records
Terminological record :
●
Part-speech
●
Frequency
●
Variants
●
contexts
Relations between terms / concepts
●
Hypernoymy : cat is a sort of animal
●
Meronymy : head is part of body
Bilingual terminology :
●
Translation relations

8
http://www.termiumplus.gc.ca/
9
Were do you find terms ?

●

●

In specialized texts :
●
Research papers on breast cancer
●
Planes crashes reports
Corpora building :
● important to gather texts following
a well-defined domain / thematic

10
Bilingual terminology mining (1)

Specialized texts
term extraction
data mining
terms

terms
term alignment

bilingual
terminology

terminology
management
software
11
Bilingual terminology mining (2)

Specialized texts
synchronized
term extraction
and alignment
terms

terms

bilingual
terminology

terminology
management
software
12
Presentation outline

●

●
●

About terms, terminology, terminology
mining
Term Extraction
Term Alignment

13
Term extraction : semi-supervised
process
●
●

●

L'Homme, 2004

The notion of term is « slippery »
The same lexical unit may or may not be
considered as a term depending on :
●
Audience
●
Domain
●
Application
Term extractors extract candidate terms
● Frequent in texts of a given domain
● HER2 gene
● Look like terms : well-formed phrase
● human cell lines
● Group of words that frequently occur
together
● to compile a program
Term extraction : semi-supervised,
lexico-semantic process
texts

specialized texts

term extractor

candidate terms
automatic
indexing

candidate terms

manual selection

terms

terms

terminology

concepts
 Termhood  clues (1) : Frequency

●

●

●

L'Homme, 2004

Term occurs frequently in specialized texts
● the higher, the better ?
Comparison with general language :
● Does the term occur more frequently
than expected in general language ?
Compute significance tests :
● ex : ² chi-square
Termhood clues (2) : form
●

●

●

A term is a well-formed phrase
●
...HER2/neu oncogenes are members of...
Match morpho-syntactic patterns
●
Ex: NOUN + NOUN
Many :
●
NOUN PREP DET NOUN
●
alternation of the gene
●
●

●
●

NOUN PREP NOUN COORD ADJ NOUN
susceptibility to breast and ovarian cancer
NOUN NOUN NOUN NOUN NOUN
human breast cancer cell lines
17
Termhood clues (2) : form

●

Preprocessing :
● Tokenization
● Lemmatisation
● POS Tagging
… HER-2/neu oncogenes are members of ....

HER-2/neu

oncogenes

are

members

of

NOUN

NOUN

VERB

NOUN

PREP

HER-2/neu

oncogene

be

member

of
Identification of Syntactic Patterns

●

Patterns expressed as regular expression /
Finite state automata
PREP
START

NOUN

NOUN

NOUN (PREP? NOUN) ?

●
●
●

NOUN : gene
NOUN NOUN : HER2 gene
NOUN PREP NOUN : member of family
Term hood clue (3) : words association

●

●

Significant coocurrences are good clues for
term hood :
● … breast cancer …
● ...breast remains...
● .. alternative cancer...
Must take into account :
● number of times the two word cooccur
● number of times word A occurs
● number of times word B occurs
Measure for cooccurrence significance
●

Mutual Information
MI  a , b=log2

P  a , b
P  a⋅P  b

P a , b=nbocc a , b / N
P  a=nbocc a/ N
N =total nb of words in corpus

invasive carcinoma

20

cancer means

50

invasive

30

cancer

800

carcinoma

20

means

800

MI

9,7

MI

1,69

●

Church and Hanks, 1990
L'Homme, 2004

remarkable attraction between invasive
and carcinoma despite relatively low
number of cooccurrences
Presentation outline

●

●
●

About terms, terminology, terminology
mining
Term Extraction
Term Alignment

22
Presentation outline

●

●
●

●

About terms, terminology, terminology
mining
Term Extraction
Term Alignment
● in parallel corpora
in comparable corpora

23
Parallel and comparable corpora

●

●

Parallel corpora
●
Source text and target texts are translations
●
Reduce search space little by little
● First sentences
● Then terms
Comparable corpora
● Not translation but very similar in topic
● Good proportion of terms translations
● Search space :
● All terms of target corpus

24
Sentence alignement (1)

●

Gale and Church, 1993

Gale and Church (1993) 's hypothesis :
● Translated sentences have roughly the
same length
● Probability P(S,T) that sentence S
translates into T is based on the length
difference
●
Improvements : use seed-lexicon
● Probability P(S,T) is based on the
number of words in common

25
Sentence alignement (2)

●
●

Compute probabilites for all pairs of (S,T)
Build matrix where M(i,j) contains probability
that sentence i translates to sentence j
0

2

...

n

0

0,89

0,56

0,2

...

...

1

0,45

0,9

0,1

...

...

2

...

0,23

0,9

0,3

...

...

...

...

0,44

0,76

...

m

Gale and Church, 1993

1

...

...

...

...

0,88
26
Sentence alignement (2)

●

Use dynamic programming to find the best
“path” i.e. the best alignments

0

2

...

n

0

0,89

0,56

0,2

...

...

1

0,45

0,9

0,1

...

...

2

...

0,23

0,9

0,3

...

...

...

...

0,44

0,76

...

m

Gale and Church, 1993

1

...

...

...

...

0,88
27
Sub sentence alignment : AnyMalign
(Lardilleux, 2010)

●

Lardilleux et al., 2010

AnyMalign is a sub-sentencial aligner
●
Aligns words, groups of words for MT
translation tables
●
Aligned group of words :
● more or less like statistical collocations
● possible to find term patterns in these
groups of words

28
AnyMalign (Lardilleux, 2010)

●

Algorithm is based on « perfect alignments » :
● words or groups of words that occur
exactly in the same aligned sentences
ad ↔ AD
b↔B
b↔C
a e ↔ A DD

a ↔ A is a perfect alignment

Lardilleux et al., 2010

29
AnyMalign (Lardilleux, 2010)

●

●

How to get more « perfect alignments » ?
● with smaller corpora
How to get smaller corpora ?
● randomly select sub corpora from your
corpora
Subcorpora 1

Subcorpora 2

Lardilleux et al., 2010

Sub corpora 1 :
b↔B
Sub corpora 2 :
a↔A

ad ↔ AD
b↔B
b↔C
a e ↔ A DD

30
AnyMalign (Lardilleux, 2010)
Complementaires of perfect alignments are
likely to be good alignments too :

●

ad ↔ AD
b↔B
b↔C
a e ↔ A DD
Perfect alignment
a↔A
●
Complementaries
d↔D
e ↔ DD
●

Lardilleux et al., 2010

31
AnyMalign (Lardilleux, 2010)

●

●

●

●

Lardilleux et al., 2010

Process : Iteratively extract random
samples of of random size from your
corpora
Extract « perfect alignements » and their
complementary
The same alignment can occur several
times
Count, for each alignement the number of
times it occurs

32
AnyMalign (Lardilleux, 2010)

●
●

●

Output :
alignments sorted by descending number of
occurrences
Alignement probability :
CS ,T 
P  S∣T =
C T 

Lardilleux et al., 2010

S = source group of words
T = target group of words
C (S,T) = number of times S was aligned with
T
C (T) = number of times T appears in an
alignment
33
AnyMalign (Lardilleux, 2010)

Advantages :
●
can perform alignment with more than 2
languages at the same time
● 1 language → statistical collocations
●
Extracts and aligns non contiguous
sequences of words
to give something up
to let someone down
●
No a priori expectations on terms
● Sometimes a term in source
language is not translated by a term
●
Terms = what you can align
Lardilleux et al., 2010

34
AnyMalign (Lardilleux, 2010)
●

●

Words groups are not grammatical phrases :
that sample sentences and
exchange format fitted for the
but not
Solutions :
● find term patterns
● use heuristics
● trim stop words
sample sentences
exchange format

Lardilleux et al., 2010

35
Presentation outline

●

●
●

About terms, terminology, terminology
mining
Term Extraction
Term Alignment
● in parallel corpora
● in comparable corpora

36
Advantages of comparable corpora

●

●
●

More available
● new languages
● new language pairs
● new topics / domains
Less expensive to build
More natural
● data was produced
spontaneously
● no influence from source text

37
Contextual approach

●

●

Based on distributional linguistics (Z.
Harris)
●
Words with similar meaning appear
in similar contexts
If source and target words have similar
contexts, they might be translations
●
Compute contexts for each source
and target word
●
Compare contexts
●
Find the most similar contexts

38
Contextual approach

●

●

●

●

●

m
ou
th

be
er

wa
te
r
drink

gl
as
s

Representation of the context of a given
word with a vector :
●
Head word + collocates

●

...

●

Vector associates « head » word with
most frequent collocates
+ some indication of the force of
association between head-word and
collocates
39
Building context vector for « drink »

●

Collocates : word occuring at a distance of n
words from head
is variety of reasons to drink plenty of water each day
simple as a glass of drinking water be the key to the
popular in Japan today to drink water from glass after waking

●
●
●
●
●

(drink,water) = 3
(drink, glass) = 2
(drink, Japan) = 1
(drink, reason) = 1
(drink, plenty) = 1
40
Normalized cooccurrences frequency

●

●
●
●
●
●

Normalization : use measure like IM, log
likehood ratio to counteract the influence
of high frequency words
Ex : log likelihood ratio
1000 cooc. in corpus
(drink,x) = 75 cooc.
(water,y) = 75 cooc.
(drink, water) = 25 cooc.
water
drink

50

25

75

¬ drink

25

900

925

75
Dunning, 1993

¬ water

925

1000
41
Log likelyhood ratio
Contingency table :

●

water

¬ water

drink

a

b

e

¬ drink

c

d

h

f

g

N

log likelihood ratio water , drink =
log a b log bc log c d log  d  N log  N 
−e loge − f log f − g log g −h log h

●

Dunning, 1993

loglikelihoo ratio (drink,water) = 45,05

42
Context vector comparison

m
ou
th

be
er

●

...

●

น
ดม

●

Rapp 1995 ; Fung 1997

●

●

●

ป

ก

●

ยร
เบ

●

ว
แก

drink

gl
as
s

Compute context vectors for words in
source and target corpus
wa
te
r

●

...

●

How to compare words contexts in
different languages ?

43
Context vector comparison

m
ou
th

be
er

●

●

ยร
เบ

●

ว
แก

drink

gl
as
s

Use seed lexicon to map collocates
wa
te
r

●

...

●

น
ดม

Rapp 1995 ; Fung 1997

●

●

●

ป

ก

thaï-english
seed lexicon

...

●

44
Context vector comparison

●

●

Measuring context similarity of words a
and b
= measuring cosinus angle between
vector of a and vector of b
cosinus angle a , b=

∑ b w c , a⋅w c ,b
c∈a∪
∑ w 2 , a⋅∑ w 2 ,b
c
 c
c∈a

c∈ b

c ∈ x=collocate in vector of x
w  c , x =weight of association of collocate c withhead x

●

Rapp 1995 ; Fung 1997

Select the top 1, 10 or 20 most closest
words as candidate translations
45
Contextual approach : improvements

●
●

●

●
●

Using syntactic collocates
Improving dictionary with cognates,
transliterations, other dictionaries
Give more weight to « anchor words »
● cognates, transliterations
● frequent, monosemous
Filter with part-of-speech
Favor reciprocal translations
SOURCE

TARGET

a
Chiao et Zweignebaum, 2002
Sadat et al., 2003
Gamallo and Campos, 2005
Kohen and Knight, 2002
Prochasson, 2010

a'

b

b'

c

c'

d

d'

46
Variant to direct translation of vector
●
●

●

« Interlingual » translation
Translate the n-closest words instead of
context vector
Seed lexicon : some mappings between
source and target words

SOURCE

TARGET

seed
lexicon

Déjean and Gaussier, 2002

47
Variant to direct translation of vector
●
●
●

To translate term T :
Find n-closest words
these closest words are in the lexicon

SOURCE

TARGET

seed
lexicon

Déjean and Gaussier, 2002

48
Variant to direct translation of vector
●

Find the target term which is the closest to
the n closest words

SOURCE

TARGET

seed
lexicon

Déjean and Gaussier, 2002

49
Variant to direct translation of vector
●
●

« Interlingual » approach
Translate closest words instead of direct
context

SOURCE

Déjean and Gaussier, 2002

TARGET

50
Adaptation to multi-word terms

energy
drink

Morin et al., 2004
Morin and Daille, 2009

●

...

be
er

●

●

●

...
m
ou
th

...

be
er

gl
as
s

●

●

...

●
m
ou
th

...

st
ro
ng

drink

●

be
er

●

...

energy

gl
as
s

st
ro
ng

●

Context vector :
Union of vector of each word of the terms

gl
as
s

●

...

●
51
Evaluation
Precison on TopN candidates
50% on Top20
Correct translation is in the Top 20 best
candidates for 50% of source terms

●
●
●

Single word
units
Multi-word
units
Multi-word
terms
●

Morin and Daille, 2010

●

big, general language
corpus
small, specialized
corpus
small, specialized
corpus

80%
60%
42%

big = hundreds milliions of words
small = one million to 100 thousand
words vector

52
Why is it so difficult ?

●
●
●

●

●

●

translation might not be present
target term has not been extracted
polysemous words : undiscriminant,
fuzzy vector
low frequency words : unsignificant
vector
translation has different usage in target
language
big search space : all words of target
corpus
→ can not be fully automatic
→ semi supervised term alignment

53
th

4 Franco-Thai Workshop 2010
intensive summer school on
Natural Language Processing
Thank you
ed(a)lingua-et-machina.com

54

Mais conteúdo relacionado

Mais procurados

Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vecananth
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyMarina Santini
 
Portuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowPortuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowValeria de Paiva
 
Yves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLPYves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLPHendrik D'Oosterlinck
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...Seth Grimes
 
A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...Ilia Karpov
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...Lifeng (Aaron) Han
 
A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...Francisco Manuel Rangel Pardo
 
Introduction to Text Mining and Topic Modelling
Introduction to Text Mining and Topic ModellingIntroduction to Text Mining and Topic Modelling
Introduction to Text Mining and Topic ModellingDavid Paule
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational SemanticsMarina Santini
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language ProcessingSebastian Ruder
 
OUTDATED Text Mining 4/5: Text Classification
OUTDATED Text Mining 4/5: Text ClassificationOUTDATED Text Mining 4/5: Text Classification
OUTDATED Text Mining 4/5: Text ClassificationFlorian Leitner
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine TranslationRIILP
 
Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...Francisco Manuel Rangel Pardo
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role LabelingMarina Santini
 

Mais procurados (19)

L1
L1L1
L1
 
Word representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2VecWord representation: SVD, LSA, Word2Vec
Word representation: SVD, LSA, Word2Vec
 
Lecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language TechnologyLecture 1: Semantic Analysis in Language Technology
Lecture 1: Semantic Analysis in Language Technology
 
Portuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and HowPortuguese Linguistic Tools: What, Why and How
Portuguese Linguistic Tools: What, Why and How
 
Yves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLPYves Peirsman - Deep Learning for NLP
Yves Peirsman - Deep Learning for NLP
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...A general method applicable to the search for anglicisms in russian social ne...
A general method applicable to the search for anglicisms in russian social ne...
 
Ijetcas14 575
Ijetcas14 575Ijetcas14 575
Ijetcas14 575
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
 
Moses
MosesMoses
Moses
 
A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...A Low Dimensionality Representation for Language Variety Identification (CICL...
A Low Dimensionality Representation for Language Variety Identification (CICL...
 
Introduction to Text Mining and Topic Modelling
Introduction to Text Mining and Topic ModellingIntroduction to Text Mining and Topic Modelling
Introduction to Text Mining and Topic Modelling
 
Lecture 2: Computational Semantics
Lecture 2: Computational SemanticsLecture 2: Computational Semantics
Lecture 2: Computational Semantics
 
Frontiers of Natural Language Processing
Frontiers of Natural Language ProcessingFrontiers of Natural Language Processing
Frontiers of Natural Language Processing
 
OUTDATED Text Mining 4/5: Text Classification
OUTDATED Text Mining 4/5: Text ClassificationOUTDATED Text Mining 4/5: Text Classification
OUTDATED Text Mining 4/5: Text Classification
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
 
Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...Language Variety Identification using Distributed Representations of Words an...
Language Variety Identification using Distributed Representations of Words an...
 
Lecture: Word Senses
Lecture: Word SensesLecture: Word Senses
Lecture: Word Senses
 
Semantic Role Labeling
Semantic Role LabelingSemantic Role Labeling
Semantic Role Labeling
 

Destaque

Applicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesApplicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesEstelle Delpech
 
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Estelle Delpech
 
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...Association for Computational Linguistics
 
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Estelle Delpech
 
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Association for Computational Linguistics
 
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeDealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeEstelle Delpech
 
Chelo Vargas-Sierra
Chelo Vargas-SierraChelo Vargas-Sierra
Chelo Vargas-SierraChelo Vargas
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionSarvnaz Karimi
 
A cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexiconA cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexiconİrem Tümer
 
Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...Adrien Barbaresi
 
Macro economische analyse van brazilië
Macro economische analyse van braziliëMacro economische analyse van brazilië
Macro economische analyse van braziliëJan-Willem Lammens
 
Parallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corporaParallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corporaHaithem Afli
 
Embedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationEmbedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationwebLyzard technology
 
Bilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation PatternsBilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation PatternsAlberto Simões
 
Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Tobias Wunner
 
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Association for Computational Linguistics
 
Word Formation in English
Word Formation in EnglishWord Formation in English
Word Formation in Englishteflang
 

Destaque (17)

Applicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesApplicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologies
 
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
 
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
 
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
Identification of Fertile Translations in Comparable Corpora: a Morpho-Compos...
 
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
 
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeDealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
 
Chelo Vargas-Sierra
Chelo Vargas-SierraChelo Vargas-Sierra
Chelo Vargas-Sierra
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
 
A cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexiconA cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexicon
 
Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...
 
Macro economische analyse van brazilië
Macro economische analyse van braziliëMacro economische analyse van brazilië
Macro economische analyse van brazilië
 
Parallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corporaParallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corpora
 
Embedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationEmbedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and Evaluation
 
Bilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation PatternsBilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation Patterns
 
Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...
 
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
 
Word Formation in English
Word Formation in EnglishWord Formation in English
Word Formation in English
 

Semelhante a Bilingual terminology mining

Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageRoelof Pieters
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsRoelof Pieters
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
The Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationThe Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationGennadi Lembersky
 
AINL 2016: Fenogenova, Karpov, Kazorin
AINL 2016: Fenogenova, Karpov, KazorinAINL 2016: Fenogenova, Karpov, Kazorin
AINL 2016: Fenogenova, Karpov, KazorinLidia Pivovarova
 
From NLP to text mining
From NLP to text mining From NLP to text mining
From NLP to text mining Yi-Shin Chen
 
information retrival and text processing
information retrival and text processinginformation retrival and text processing
information retrival and text processingmausamraushan2288
 
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptxENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptxSyedNadeemAbbas6
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddingsRoelof Pieters
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMassimo Schenone
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Saurabh Kaushik
 
A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2Jisoo Jang
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...Facultad de Informática UCM
 
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Marcin Junczys-Dowmunt
 

Semelhante a Bilingual terminology mining (20)

Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
Visual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on LanguageVisual-Semantic Embeddings: some thoughts on Language
Visual-Semantic Embeddings: some thoughts on Language
 
Ontology matching
Ontology matchingOntology matching
Ontology matching
 
Esa act
Esa actEsa act
Esa act
 
Deep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word EmbeddingsDeep Learning for Natural Language Processing: Word Embeddings
Deep Learning for Natural Language Processing: Word Embeddings
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Document similarity
Document similarityDocument similarity
Document similarity
 
The Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationThe Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine Translation
 
AINL 2016: Fenogenova, Karpov, Kazorin
AINL 2016: Fenogenova, Karpov, KazorinAINL 2016: Fenogenova, Karpov, Kazorin
AINL 2016: Fenogenova, Karpov, Kazorin
 
From NLP to text mining
From NLP to text mining From NLP to text mining
From NLP to text mining
 
information retrival and text processing
information retrival and text processinginformation retrival and text processing
information retrival and text processing
 
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptxENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
ENeL_WG3_Survey-AKA4Lexicography-TiberiusHeylenKrek (1).pptx
 
Deep learning for natural language embeddings
Deep learning for natural language embeddingsDeep learning for natural language embeddings
Deep learning for natural language embeddings
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1Engineering Intelligent NLP Applications Using Deep Learning – Part 1
Engineering Intelligent NLP Applications Using Deep Learning – Part 1
 
A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2A Neural Probabilistic Language Model_v2
A Neural Probabilistic Language Model_v2
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
Languages, Ontologies and Automatic Grammar Generation - Prof. Pedro Rangel H...
 
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
Automatic Grammatical Error Correction for ESL-Learners by SMT - Getting it r...
 

Mais de Estelle Delpech

Génération automatique de texte
Génération automatique de texteGénération automatique de texte
Génération automatique de texteEstelle Delpech
 
Identification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieuxIdentification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieuxEstelle Delpech
 
Découverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des LanguesDécouverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des LanguesEstelle Delpech
 
Invited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis awardInvited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis awardEstelle Delpech
 
Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...Estelle Delpech
 
Identification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieuxIdentification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieuxEstelle Delpech
 
Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...Estelle Delpech
 
Nomao: data analysis for personalized local search
Nomao: data analysis for personalized local searchNomao: data analysis for personalized local search
Nomao: data analysis for personalized local searchEstelle Delpech
 
Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)Estelle Delpech
 
Nomao: local search and recommendation engine
Nomao: local search and recommendation engineNomao: local search and recommendation engine
Nomao: local search and recommendation engineEstelle Delpech
 
Évaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialiséeÉvaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialiséeEstelle Delpech
 
Robust rule-based parsing
Robust rule-based parsingRobust rule-based parsing
Robust rule-based parsingEstelle Delpech
 
Experimenting the TextTiling Algorithm
Experimenting the TextTiling AlgorithmExperimenting the TextTiling Algorithm
Experimenting the TextTiling AlgorithmEstelle Delpech
 
Text Processing for Procedural Question Answering
Text Processing for Procedural Question AnsweringText Processing for Procedural Question Answering
Text Processing for Procedural Question AnsweringEstelle Delpech
 

Mais de Estelle Delpech (15)

Génération automatique de texte
Génération automatique de texteGénération automatique de texte
Génération automatique de texte
 
Identification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieuxIdentification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieux
 
Découverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des LanguesDécouverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des Langues
 
Invited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis awardInvited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis award
 
Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...
 
Identification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieuxIdentification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieux
 
Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...
 
Nomao: data analysis for personalized local search
Nomao: data analysis for personalized local searchNomao: data analysis for personalized local search
Nomao: data analysis for personalized local search
 
Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)
 
Nomao: local search and recommendation engine
Nomao: local search and recommendation engineNomao: local search and recommendation engine
Nomao: local search and recommendation engine
 
Évaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialiséeÉvaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialisée
 
R&D Lingua et Machina
R&D Lingua et MachinaR&D Lingua et Machina
R&D Lingua et Machina
 
Robust rule-based parsing
Robust rule-based parsingRobust rule-based parsing
Robust rule-based parsing
 
Experimenting the TextTiling Algorithm
Experimenting the TextTiling AlgorithmExperimenting the TextTiling Algorithm
Experimenting the TextTiling Algorithm
 
Text Processing for Procedural Question Answering
Text Processing for Procedural Question AnsweringText Processing for Procedural Question Answering
Text Processing for Procedural Question Answering
 

Último

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Último (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Bilingual terminology mining

  • 1. th 4 intensive summer school on Natural Language Processing Bilingual Terminology Mining Estelle Delpech 30th November, 2010 1
  • 2. About me ● ● ● ● ● ● ● ● Estelle Delpech Research engineer at Lingua et Machina, France CAT tools provider ed(at)lingua-et-machina(dot)com www.lingua-et-machina.com Ph. Candidate at LINA, France taln team : specialises in NLP estelle.delpech(at)univ-nantes(dot)fr 2
  • 3. Presentation outline ● ● ● About terms, terminology, terminology mining Term Extraction Term Alignment 3
  • 4. Presentation outline ● ● ● About terms, terminology, terminology mining Term Extraction Term Alignment 4
  • 5. What is a term ? ● ● ● Classical definition : ● “unequivocal expression of a concept within a technical domain“ Traces back to 1930 Eugene Wüster « General Theory of Terminology » Specialized language is / should be unambiguous concept term referent Ogden semiotic triangle 5
  • 6. What is a term ? “Classical terminology challenged in the 1990's by : ● sociolinguistics ● corpus-based linguistics ● computational terminology ● Observe terms in texts : ● there is variation, polysemy ● concepts evolve overtime ● no clear-cut border between specialized and general languages 6
  • 7. What is a term ? ● ● ● ● Definition of « term » depends on the application / audience of the terminology Domain expert : ● Unit of knowledge Information retrieval : ● Descriptors for indexation Translation ● word or phrase that : ● is not part of general language ● Translates differently in a particular domain ● can be : ● Noun, adjective, verb ● Noun phrase, verb phrase, etc. 7
  • 8. What is a terminology ? ● ● ● ● Set of terms + terminological records Terminological record : ● Part-speech ● Frequency ● Variants ● contexts Relations between terms / concepts ● Hypernoymy : cat is a sort of animal ● Meronymy : head is part of body Bilingual terminology : ● Translation relations 8
  • 10. Were do you find terms ? ● ● In specialized texts : ● Research papers on breast cancer ● Planes crashes reports Corpora building : ● important to gather texts following a well-defined domain / thematic 10
  • 11. Bilingual terminology mining (1) Specialized texts term extraction data mining terms terms term alignment bilingual terminology terminology management software 11
  • 12. Bilingual terminology mining (2) Specialized texts synchronized term extraction and alignment terms terms bilingual terminology terminology management software 12
  • 13. Presentation outline ● ● ● About terms, terminology, terminology mining Term Extraction Term Alignment 13
  • 14. Term extraction : semi-supervised process ● ● ● L'Homme, 2004 The notion of term is « slippery » The same lexical unit may or may not be considered as a term depending on : ● Audience ● Domain ● Application Term extractors extract candidate terms ● Frequent in texts of a given domain ● HER2 gene ● Look like terms : well-formed phrase ● human cell lines ● Group of words that frequently occur together ● to compile a program
  • 15. Term extraction : semi-supervised, lexico-semantic process texts specialized texts term extractor candidate terms automatic indexing candidate terms manual selection terms terms terminology concepts
  • 16.  Termhood  clues (1) : Frequency ● ● ● L'Homme, 2004 Term occurs frequently in specialized texts ● the higher, the better ? Comparison with general language : ● Does the term occur more frequently than expected in general language ? Compute significance tests : ● ex : ² chi-square
  • 17. Termhood clues (2) : form ● ● ● A term is a well-formed phrase ● ...HER2/neu oncogenes are members of... Match morpho-syntactic patterns ● Ex: NOUN + NOUN Many : ● NOUN PREP DET NOUN ● alternation of the gene ● ● ● ● NOUN PREP NOUN COORD ADJ NOUN susceptibility to breast and ovarian cancer NOUN NOUN NOUN NOUN NOUN human breast cancer cell lines 17
  • 18. Termhood clues (2) : form ● Preprocessing : ● Tokenization ● Lemmatisation ● POS Tagging … HER-2/neu oncogenes are members of .... HER-2/neu oncogenes are members of NOUN NOUN VERB NOUN PREP HER-2/neu oncogene be member of
  • 19. Identification of Syntactic Patterns ● Patterns expressed as regular expression / Finite state automata PREP START NOUN NOUN NOUN (PREP? NOUN) ? ● ● ● NOUN : gene NOUN NOUN : HER2 gene NOUN PREP NOUN : member of family
  • 20. Term hood clue (3) : words association ● ● Significant coocurrences are good clues for term hood : ● … breast cancer … ● ...breast remains... ● .. alternative cancer... Must take into account : ● number of times the two word cooccur ● number of times word A occurs ● number of times word B occurs
  • 21. Measure for cooccurrence significance ● Mutual Information MI  a , b=log2 P  a , b P  a⋅P  b P a , b=nbocc a , b / N P  a=nbocc a/ N N =total nb of words in corpus invasive carcinoma 20 cancer means 50 invasive 30 cancer 800 carcinoma 20 means 800 MI 9,7 MI 1,69 ● Church and Hanks, 1990 L'Homme, 2004 remarkable attraction between invasive and carcinoma despite relatively low number of cooccurrences
  • 22. Presentation outline ● ● ● About terms, terminology, terminology mining Term Extraction Term Alignment 22
  • 23. Presentation outline ● ● ● ● About terms, terminology, terminology mining Term Extraction Term Alignment ● in parallel corpora in comparable corpora 23
  • 24. Parallel and comparable corpora ● ● Parallel corpora ● Source text and target texts are translations ● Reduce search space little by little ● First sentences ● Then terms Comparable corpora ● Not translation but very similar in topic ● Good proportion of terms translations ● Search space : ● All terms of target corpus 24
  • 25. Sentence alignement (1) ● Gale and Church, 1993 Gale and Church (1993) 's hypothesis : ● Translated sentences have roughly the same length ● Probability P(S,T) that sentence S translates into T is based on the length difference ● Improvements : use seed-lexicon ● Probability P(S,T) is based on the number of words in common 25
  • 26. Sentence alignement (2) ● ● Compute probabilites for all pairs of (S,T) Build matrix where M(i,j) contains probability that sentence i translates to sentence j 0 2 ... n 0 0,89 0,56 0,2 ... ... 1 0,45 0,9 0,1 ... ... 2 ... 0,23 0,9 0,3 ... ... ... ... 0,44 0,76 ... m Gale and Church, 1993 1 ... ... ... ... 0,88 26
  • 27. Sentence alignement (2) ● Use dynamic programming to find the best “path” i.e. the best alignments 0 2 ... n 0 0,89 0,56 0,2 ... ... 1 0,45 0,9 0,1 ... ... 2 ... 0,23 0,9 0,3 ... ... ... ... 0,44 0,76 ... m Gale and Church, 1993 1 ... ... ... ... 0,88 27
  • 28. Sub sentence alignment : AnyMalign (Lardilleux, 2010) ● Lardilleux et al., 2010 AnyMalign is a sub-sentencial aligner ● Aligns words, groups of words for MT translation tables ● Aligned group of words : ● more or less like statistical collocations ● possible to find term patterns in these groups of words 28
  • 29. AnyMalign (Lardilleux, 2010) ● Algorithm is based on « perfect alignments » : ● words or groups of words that occur exactly in the same aligned sentences ad ↔ AD b↔B b↔C a e ↔ A DD a ↔ A is a perfect alignment Lardilleux et al., 2010 29
  • 30. AnyMalign (Lardilleux, 2010) ● ● How to get more « perfect alignments » ? ● with smaller corpora How to get smaller corpora ? ● randomly select sub corpora from your corpora Subcorpora 1 Subcorpora 2 Lardilleux et al., 2010 Sub corpora 1 : b↔B Sub corpora 2 : a↔A ad ↔ AD b↔B b↔C a e ↔ A DD 30
  • 31. AnyMalign (Lardilleux, 2010) Complementaires of perfect alignments are likely to be good alignments too : ● ad ↔ AD b↔B b↔C a e ↔ A DD Perfect alignment a↔A ● Complementaries d↔D e ↔ DD ● Lardilleux et al., 2010 31
  • 32. AnyMalign (Lardilleux, 2010) ● ● ● ● Lardilleux et al., 2010 Process : Iteratively extract random samples of of random size from your corpora Extract « perfect alignements » and their complementary The same alignment can occur several times Count, for each alignement the number of times it occurs 32
  • 33. AnyMalign (Lardilleux, 2010) ● ● ● Output : alignments sorted by descending number of occurrences Alignement probability : CS ,T  P  S∣T = C T  Lardilleux et al., 2010 S = source group of words T = target group of words C (S,T) = number of times S was aligned with T C (T) = number of times T appears in an alignment 33
  • 34. AnyMalign (Lardilleux, 2010) Advantages : ● can perform alignment with more than 2 languages at the same time ● 1 language → statistical collocations ● Extracts and aligns non contiguous sequences of words to give something up to let someone down ● No a priori expectations on terms ● Sometimes a term in source language is not translated by a term ● Terms = what you can align Lardilleux et al., 2010 34
  • 35. AnyMalign (Lardilleux, 2010) ● ● Words groups are not grammatical phrases : that sample sentences and exchange format fitted for the but not Solutions : ● find term patterns ● use heuristics ● trim stop words sample sentences exchange format Lardilleux et al., 2010 35
  • 36. Presentation outline ● ● ● About terms, terminology, terminology mining Term Extraction Term Alignment ● in parallel corpora ● in comparable corpora 36
  • 37. Advantages of comparable corpora ● ● ● More available ● new languages ● new language pairs ● new topics / domains Less expensive to build More natural ● data was produced spontaneously ● no influence from source text 37
  • 38. Contextual approach ● ● Based on distributional linguistics (Z. Harris) ● Words with similar meaning appear in similar contexts If source and target words have similar contexts, they might be translations ● Compute contexts for each source and target word ● Compare contexts ● Find the most similar contexts 38
  • 39. Contextual approach ● ● ● ● ● m ou th be er wa te r drink gl as s Representation of the context of a given word with a vector : ● Head word + collocates ● ... ● Vector associates « head » word with most frequent collocates + some indication of the force of association between head-word and collocates 39
  • 40. Building context vector for « drink » ● Collocates : word occuring at a distance of n words from head is variety of reasons to drink plenty of water each day simple as a glass of drinking water be the key to the popular in Japan today to drink water from glass after waking ● ● ● ● ● (drink,water) = 3 (drink, glass) = 2 (drink, Japan) = 1 (drink, reason) = 1 (drink, plenty) = 1 40
  • 41. Normalized cooccurrences frequency ● ● ● ● ● ● Normalization : use measure like IM, log likehood ratio to counteract the influence of high frequency words Ex : log likelihood ratio 1000 cooc. in corpus (drink,x) = 75 cooc. (water,y) = 75 cooc. (drink, water) = 25 cooc. water drink 50 25 75 ¬ drink 25 900 925 75 Dunning, 1993 ¬ water 925 1000 41
  • 42. Log likelyhood ratio Contingency table : ● water ¬ water drink a b e ¬ drink c d h f g N log likelihood ratio water , drink = log a b log bc log c d log  d  N log  N  −e loge − f log f − g log g −h log h ● Dunning, 1993 loglikelihoo ratio (drink,water) = 45,05 42
  • 43. Context vector comparison m ou th be er ● ... ● น ดม ● Rapp 1995 ; Fung 1997 ● ● ● ป ก ● ยร เบ ● ว แก drink gl as s Compute context vectors for words in source and target corpus wa te r ● ... ● How to compare words contexts in different languages ? 43
  • 44. Context vector comparison m ou th be er ● ● ยร เบ ● ว แก drink gl as s Use seed lexicon to map collocates wa te r ● ... ● น ดม Rapp 1995 ; Fung 1997 ● ● ● ป ก thaï-english seed lexicon ... ● 44
  • 45. Context vector comparison ● ● Measuring context similarity of words a and b = measuring cosinus angle between vector of a and vector of b cosinus angle a , b= ∑ b w c , a⋅w c ,b c∈a∪ ∑ w 2 , a⋅∑ w 2 ,b c  c c∈a c∈ b c ∈ x=collocate in vector of x w  c , x =weight of association of collocate c withhead x ● Rapp 1995 ; Fung 1997 Select the top 1, 10 or 20 most closest words as candidate translations 45
  • 46. Contextual approach : improvements ● ● ● ● ● Using syntactic collocates Improving dictionary with cognates, transliterations, other dictionaries Give more weight to « anchor words » ● cognates, transliterations ● frequent, monosemous Filter with part-of-speech Favor reciprocal translations SOURCE TARGET a Chiao et Zweignebaum, 2002 Sadat et al., 2003 Gamallo and Campos, 2005 Kohen and Knight, 2002 Prochasson, 2010 a' b b' c c' d d' 46
  • 47. Variant to direct translation of vector ● ● ● « Interlingual » translation Translate the n-closest words instead of context vector Seed lexicon : some mappings between source and target words SOURCE TARGET seed lexicon Déjean and Gaussier, 2002 47
  • 48. Variant to direct translation of vector ● ● ● To translate term T : Find n-closest words these closest words are in the lexicon SOURCE TARGET seed lexicon Déjean and Gaussier, 2002 48
  • 49. Variant to direct translation of vector ● Find the target term which is the closest to the n closest words SOURCE TARGET seed lexicon Déjean and Gaussier, 2002 49
  • 50. Variant to direct translation of vector ● ● « Interlingual » approach Translate closest words instead of direct context SOURCE Déjean and Gaussier, 2002 TARGET 50
  • 51. Adaptation to multi-word terms energy drink Morin et al., 2004 Morin and Daille, 2009 ● ... be er ● ● ● ... m ou th ... be er gl as s ● ● ... ● m ou th ... st ro ng drink ● be er ● ... energy gl as s st ro ng ● Context vector : Union of vector of each word of the terms gl as s ● ... ● 51
  • 52. Evaluation Precison on TopN candidates 50% on Top20 Correct translation is in the Top 20 best candidates for 50% of source terms ● ● ● Single word units Multi-word units Multi-word terms ● Morin and Daille, 2010 ● big, general language corpus small, specialized corpus small, specialized corpus 80% 60% 42% big = hundreds milliions of words small = one million to 100 thousand words vector 52
  • 53. Why is it so difficult ? ● ● ● ● ● ● translation might not be present target term has not been extracted polysemous words : undiscriminant, fuzzy vector low frequency words : unsignificant vector translation has different usage in target language big search space : all words of target corpus → can not be fully automatic → semi supervised term alignment 53
  • 54. th 4 Franco-Thai Workshop 2010 intensive summer school on Natural Language Processing Thank you ed(a)lingua-et-machina.com 54