SlideShare uma empresa Scribd logo
1 de 37
How to expand your NLP solution to new
languages using transfer learning
Lena Shakurova
shakurova@textkernel.nl
Beata Nyari, Chao Li, Mihai Rotaru
2019-05-12
What this talk is about
You have an NLP solution for several languages
You want to support more languages
No training data, a lot of raw data
How to expand your solution to new languages using
transfer learning?
CV parsing
Textkernel and CV Parsing
1. Matching people and jobs
2. CV parsing - core of
Textkernel software
3. We solve CV parsing in
three stages
CV Parsing
Section segmentation
Personal section
Experience section
Education section
CV Parsing
Experience 1
Experience 2
Item segmentation
CV Parsing
Organisation
University
Job title
Degree
Job title
Organisation
Location
Location
Phrase extraction
Today’s
presentation
CV Parsing
• Formulated as a sequence labeling
(Similar to NER)
• BiLSTM + CRF with pre-trained word
embeddings
Bidirectional LSTM sequence labeling
Huang et al. (2015) Bidirectional LSTM-CRF Models for Sequence Tagging
Pydata London 2018: RNN sequence labeling for document parsing in
Tensorflow
Bi-LSTM
layers
Embeddings
O B-Name O B-LocationI-Name O B-PhoneOutput
CV Fiona City ParisLee Phone 0965...Words
CRF A0,1A0 A1,2 A... Ai-1,i Ai,i+1 A... An-
1,n
Our starting point
Issue Proposed solution
Multilingual model
• Implement models for new
languages as fast as
possible
• Improve performance on
low-resource languages
(using transfer learning and
cross-lingual embeddings)
• 20 languages
• Separate models for
separate languages
• New languages (100+)
lack labeled data
Transfer learning with cross-lingual embeddings
Cross-lingual embeddings
• semantically similar
words in the same
language nearby
• translationally
equivalent words in
different languages
nearby
Pre-trained
embeddings
MUSE
• 30 languages in a shared space
• already give good results
Open source
alignment code
Bilingual:
- Vecmap
Multilingual:
- Multilingual Fasttext
- UMWE
- CCA:
github.com/gallantlab/pyrcca
In our research we used
Canonical correlation analysis (CCA)
• Train monolingual word embeddings
• Learn the transformation matrices
using bilingual dictionary
• Map the monolingual spaces into one
shared semantic space in such a way
that translation pairs are maximally
correlated
Faruqui, M., & Dyer, C. (2014)
Σ
English
space
------------
Bilingual
dictionary
Ω
German
space
------------
Bilingual
dictionary
V
Transformation
matrix
(English)
Σ*
Transformed
English
space
W
Transformation
matrix
(German)
Ω*
Transformed
German
space
Shared space
Canonical correlation analysis (CCA)
• Ω* and Σ* lie in the same space
• Ω* can be projected into the English
embedding space Σ using the inverse
of V:
Ω** = V−1 * Ω*
Σ
English
space
------------
Bilingual
dictionary
Ω
German
space
------------
Bilingual
dictionary
V
Transformati
on matrix
(English)
Σ*
Transformed
English
space
W
Transformation
matrix
(German)
Ω*
Transformed
German
space
Shared space
Cross-lingual embeddings
developer
manager
engineer
English
entwickler
leiter
ingenieur
German
Bilingual
dictionary
Zero-shot parsing
Joint training
Parsed
German CV
Projected German
embeddings
English train
data
English
embeddings
Trained model
English train
data
German
train data
Projected German
embeddings
Trained model
Parsed
German CV
Projected German
embeddings
Testing
English
embeddings
Training
Experiments
Tweaking bilingual dictionary and experiments with no German data
Experimental setup
Task:
• Parse German
• Extract job title and
organisation
Embeddings:
• Trained on domain data
• Word2vec
• CCA
Does transfer learning work for us?
How bilingual dictionary influences downstream
performance?
Experimental setup
3700
English
200
German
500
German
1300
German
3700
English
3700
English
3700
English
Zero shot
Joint training
75.8
+4.1
+0.2
Does transfer learning work?
Monolingual
• More German data -> better
performance
Cross lingual
• Zero-shot parsing works
• Gain from transfer learning
• The more data we have the
smaller is cross-lingual gain
Zero-shot parsing
2. Construct your own:
• Use domain data
How to construct bilingual dictionary?
1. Use ready bilingual
dictionaries:
○ Internet Dictionary Project (IDP)
○ MUSE
■ 110 bilingual dictionaries
■ Created for development and the
evaluation of cross-lingual word
embeddings
Choose
English words
Translate into
German
Frequency
Size
Filtering
Google
translate
Yandex
translate
Source of data: IDP vs. muse vs. CV
Using bilingual dictionary with domain data boosts performance
Zero shot parsing
Joint training
CV vocabulary
CV vocabulary
61.5
72.1
75.879.5
80.4 81.1
Frequency: top vs. less frequent words
Using bilingual dictionary with top frequency words boosts
performance
Zero shot parsing
Joint training
Top frequent
Top frequent
65.6
75.780.1
80.7
Size of bilingual dictionary: 1k vs. 5k vs. 10k
Bilingual dictionary of bigger size boosts performance
Zero shot parsing
Joint training
5k / 10k
5k / 10k
70.4
76.3 76.680.0
81.1 81.4
Bilingual dictionary: what did we learn?
Best practices for constructing bilingual dictionary:
1. Domain words
2. Frequent words
3. Of size 5k or 10k
The less training data you have available, the more attention you
need to pay to bilingual dictionary.
Examples
Words from bilingual dictionary
Top 20 closest words
Words outside of bilingual dictionary
Job titles
Top 20 closest words
Locations
Header words
English names
Persoenliche in heldout set
Persoenliche in bilingual dictionary
Other languages?
Dutch to English
Slavic languages
Dutch to English
• Zero-shot parsing works
• Gain from transfer
learning
• The more data we have
the smaller is cross-
lingual gain
79.1 +6.3 +1.3
DutchZero-shot parsing
Slavic languages: on Russian
84.3
+2.3 +0.1
81.9
+1.4 +0.5
Czech Polish
Zero-shot parsing
Zero-shot parsing
What did we learn?
Summary
• Transfer learning works
• Pretty good results on zero shot
• Cross-lingual gain reduces as we add more data from target
language
• The quality of bilingual dictionary affects the end task
performance
• The less training data you have available, the more attention you
need to pay to bilingual dictionary
• Use top 5k most frequent words in your domain corpora

Mais conteúdo relacionado

Mais procurados

Putting DITA Localization into Practice
Putting DITA Localization into PracticePutting DITA Localization into Practice
Putting DITA Localization into PracticeXMetaL
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extractionGabriel Hamilton
 
HOW TO MATCH BILINGUAL TWEETS?
HOW TO MATCH BILINGUAL TWEETS?HOW TO MATCH BILINGUAL TWEETS?
HOW TO MATCH BILINGUAL TWEETS?csandit
 
DITA and Translation Best Praticices
DITA and Translation Best PraticicesDITA and Translation Best Praticices
DITA and Translation Best PraticicesAndrzej Zydroń MBCS
 
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...Association for Computational Linguistics
 
Applications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and ClassificationApplications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and Classificationshakimov
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Sheeyam Shellvacumar
 
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesParallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesAntonio Toral
 
Linguistic markup and transclusion processing in XML documents
Linguistic markup and transclusion processing in XML documentsLinguistic markup and transclusion processing in XML documents
Linguistic markup and transclusion processing in XML documentsSimon Dew
 
An investigation of diachronic change in hypotaxis and parataxis in German th...
An investigation of diachronic change in hypotaxis and parataxis in German th...An investigation of diachronic change in hypotaxis and parataxis in German th...
An investigation of diachronic change in hypotaxis and parataxis in German th...Mario Bisiada
 
Challenges in transfer learning in nlp
Challenges in transfer learning in nlpChallenges in transfer learning in nlp
Challenges in transfer learning in nlpLaraOlmosCamarena
 
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...Normunds Grūzītis
 
P4 P Update January 2009
P4 P Update January 2009P4 P Update January 2009
P4 P Update January 2009vsainteluce
 
Embracing diversity searching over multiple languages
Embracing diversity  searching over multiple languagesEmbracing diversity  searching over multiple languages
Embracing diversity searching over multiple languagesSuneel Marthi
 
Pré Descobrimento Do Brasil
Pré Descobrimento Do BrasilPré Descobrimento Do Brasil
Pré Descobrimento Do Brasilecsette
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...Apache OpenNLP
 

Mais procurados (20)

Putting DITA Localization into Practice
Putting DITA Localization into PracticePutting DITA Localization into Practice
Putting DITA Localization into Practice
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
 
HOW TO MATCH BILINGUAL TWEETS?
HOW TO MATCH BILINGUAL TWEETS?HOW TO MATCH BILINGUAL TWEETS?
HOW TO MATCH BILINGUAL TWEETS?
 
DITA and Translation Best Praticices
DITA and Translation Best PraticicesDITA and Translation Best Praticices
DITA and Translation Best Praticices
 
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
 
Applications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and ClassificationApplications of Word Vectors in Text Retrieval and Classification
Applications of Word Vectors in Text Retrieval and Classification
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.
 
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologiesParallel Corpora in (Machine) Translation: goals, issues and methodologies
Parallel Corpora in (Machine) Translation: goals, issues and methodologies
 
Linguistic markup and transclusion processing in XML documents
Linguistic markup and transclusion processing in XML documentsLinguistic markup and transclusion processing in XML documents
Linguistic markup and transclusion processing in XML documents
 
An investigation of diachronic change in hypotaxis and parataxis in German th...
An investigation of diachronic change in hypotaxis and parataxis in German th...An investigation of diachronic change in hypotaxis and parataxis in German th...
An investigation of diachronic change in hypotaxis and parataxis in German th...
 
Ijetcas14 444
Ijetcas14 444Ijetcas14 444
Ijetcas14 444
 
Challenges in transfer learning in nlp
Challenges in transfer learning in nlpChallenges in transfer learning in nlp
Challenges in transfer learning in nlp
 
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
Controlled Natural Language Generation from a Multilingual FrameNet-based Gra...
 
co:op-READ-Convention Marburg - Roger Labahn
co:op-READ-Convention Marburg - Roger Labahnco:op-READ-Convention Marburg - Roger Labahn
co:op-READ-Convention Marburg - Roger Labahn
 
Ebay News 2000 10 19 Earnings
Ebay News 2000 10 19 EarningsEbay News 2000 10 19 Earnings
Ebay News 2000 10 19 Earnings
 
P4 P Update January 2009
P4 P Update January 2009P4 P Update January 2009
P4 P Update January 2009
 
Ebay News 2001 4 19 Earnings
Ebay News 2001 4 19 EarningsEbay News 2001 4 19 Earnings
Ebay News 2001 4 19 Earnings
 
Embracing diversity searching over multiple languages
Embracing diversity  searching over multiple languagesEmbracing diversity  searching over multiple languages
Embracing diversity searching over multiple languages
 
Pré Descobrimento Do Brasil
Pré Descobrimento Do BrasilPré Descobrimento Do Brasil
Pré Descobrimento Do Brasil
 
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...Big Data Spain 2017  - Deriving Actionable Insights from High Volume Media St...
Big Data Spain 2017 - Deriving Actionable Insights from High Volume Media St...
 

Semelhante a How to expand your nlp solution to new languages using transfer learning

A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingTed Xiao
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Yuki Tomo
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationSeonghyun Kim
 
The Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationThe Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationGennadi Lembersky
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMassimo Schenone
 
2021-0509_JAECS2021_Spring
2021-0509_JAECS2021_Spring2021-0509_JAECS2021_Spring
2021-0509_JAECS2021_SpringMizumoto Atsushi
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
 
Single-Sourcing and Localization
Single-Sourcing and LocalizationSingle-Sourcing and Localization
Single-Sourcing and LocalizationLaura Dent
 
Publish perish as an instruction-end learning opportunity
Publish perish as an instruction-end learning opportunityPublish perish as an instruction-end learning opportunity
Publish perish as an instruction-end learning opportunityLawrie Hunter
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Alia Hamwi
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...Lifeng (Aaron) Han
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Tobias Wunner
 

Semelhante a How to expand your nlp solution to new languages using transfer learning (20)

Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
 
A Panorama of Natural Language Processing
A Panorama of Natural Language ProcessingA Panorama of Natural Language Processing
A Panorama of Natural Language Processing
 
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
Deep Learning勉強会@小町研 "Learning Character-level Representations for Part-of-Sp...
 
co:op-READ-Convention Marburg - Enrique Vidal
co:op-READ-Convention Marburg - Enrique Vidalco:op-READ-Convention Marburg - Enrique Vidal
co:op-READ-Convention Marburg - Enrique Vidal
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
 
The Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationThe Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine Translation
 
MACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSISMACHINE-DRIVEN TEXT ANALYSIS
MACHINE-DRIVEN TEXT ANALYSIS
 
2021-0509_JAECS2021_Spring
2021-0509_JAECS2021_Spring2021-0509_JAECS2021_Spring
2021-0509_JAECS2021_Spring
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
Single-Sourcing and Localization
Single-Sourcing and LocalizationSingle-Sourcing and Localization
Single-Sourcing and Localization
 
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine TranslationRoee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
Roee Aharoni - 2017 - Towards String-to-Tree Neural Machine Translation
 
CICLing_2016_paper_52
CICLing_2016_paper_52CICLing_2016_paper_52
CICLing_2016_paper_52
 
Publish perish as an instruction-end learning opportunity
Publish perish as an instruction-end learning opportunityPublish perish as an instruction-end learning opportunity
Publish perish as an instruction-end learning opportunity
 
Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)Introduction to natural language processing (NLP)
Introduction to natural language processing (NLP)
 
AINL 2016: Eyecioglu
AINL 2016: EyeciogluAINL 2016: Eyecioglu
AINL 2016: Eyecioglu
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1Enriching the semantic web tutorial session 1
Enriching the semantic web tutorial session 1
 
Anabela Barreiro - Alinhamentos
Anabela Barreiro - AlinhamentosAnabela Barreiro - Alinhamentos
Anabela Barreiro - Alinhamentos
 

Último

POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 

Último (20)

Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 

How to expand your nlp solution to new languages using transfer learning

  • 1. How to expand your NLP solution to new languages using transfer learning Lena Shakurova shakurova@textkernel.nl Beata Nyari, Chao Li, Mihai Rotaru 2019-05-12
  • 2. What this talk is about You have an NLP solution for several languages You want to support more languages No training data, a lot of raw data How to expand your solution to new languages using transfer learning?
  • 4. Textkernel and CV Parsing 1. Matching people and jobs 2. CV parsing - core of Textkernel software 3. We solve CV parsing in three stages
  • 5. CV Parsing Section segmentation Personal section Experience section Education section
  • 7. CV Parsing Organisation University Job title Degree Job title Organisation Location Location Phrase extraction Today’s presentation
  • 8. CV Parsing • Formulated as a sequence labeling (Similar to NER) • BiLSTM + CRF with pre-trained word embeddings
  • 9. Bidirectional LSTM sequence labeling Huang et al. (2015) Bidirectional LSTM-CRF Models for Sequence Tagging Pydata London 2018: RNN sequence labeling for document parsing in Tensorflow
  • 10. Bi-LSTM layers Embeddings O B-Name O B-LocationI-Name O B-PhoneOutput CV Fiona City ParisLee Phone 0965...Words CRF A0,1A0 A1,2 A... Ai-1,i Ai,i+1 A... An- 1,n
  • 12. Issue Proposed solution Multilingual model • Implement models for new languages as fast as possible • Improve performance on low-resource languages (using transfer learning and cross-lingual embeddings) • 20 languages • Separate models for separate languages • New languages (100+) lack labeled data
  • 13. Transfer learning with cross-lingual embeddings
  • 14. Cross-lingual embeddings • semantically similar words in the same language nearby • translationally equivalent words in different languages nearby
  • 15. Pre-trained embeddings MUSE • 30 languages in a shared space • already give good results Open source alignment code Bilingual: - Vecmap Multilingual: - Multilingual Fasttext - UMWE - CCA: github.com/gallantlab/pyrcca In our research we used
  • 16. Canonical correlation analysis (CCA) • Train monolingual word embeddings • Learn the transformation matrices using bilingual dictionary • Map the monolingual spaces into one shared semantic space in such a way that translation pairs are maximally correlated Faruqui, M., & Dyer, C. (2014) Σ English space ------------ Bilingual dictionary Ω German space ------------ Bilingual dictionary V Transformation matrix (English) Σ* Transformed English space W Transformation matrix (German) Ω* Transformed German space Shared space
  • 17. Canonical correlation analysis (CCA) • Ω* and Σ* lie in the same space • Ω* can be projected into the English embedding space Σ using the inverse of V: Ω** = V−1 * Ω* Σ English space ------------ Bilingual dictionary Ω German space ------------ Bilingual dictionary V Transformati on matrix (English) Σ* Transformed English space W Transformation matrix (German) Ω* Transformed German space Shared space
  • 19. Zero-shot parsing Joint training Parsed German CV Projected German embeddings English train data English embeddings Trained model English train data German train data Projected German embeddings Trained model Parsed German CV Projected German embeddings Testing English embeddings Training
  • 20. Experiments Tweaking bilingual dictionary and experiments with no German data
  • 21. Experimental setup Task: • Parse German • Extract job title and organisation Embeddings: • Trained on domain data • Word2vec • CCA Does transfer learning work for us? How bilingual dictionary influences downstream performance?
  • 23. 75.8 +4.1 +0.2 Does transfer learning work? Monolingual • More German data -> better performance Cross lingual • Zero-shot parsing works • Gain from transfer learning • The more data we have the smaller is cross-lingual gain Zero-shot parsing
  • 24. 2. Construct your own: • Use domain data How to construct bilingual dictionary? 1. Use ready bilingual dictionaries: ○ Internet Dictionary Project (IDP) ○ MUSE ■ 110 bilingual dictionaries ■ Created for development and the evaluation of cross-lingual word embeddings Choose English words Translate into German Frequency Size Filtering Google translate Yandex translate
  • 25. Source of data: IDP vs. muse vs. CV Using bilingual dictionary with domain data boosts performance Zero shot parsing Joint training CV vocabulary CV vocabulary 61.5 72.1 75.879.5 80.4 81.1
  • 26. Frequency: top vs. less frequent words Using bilingual dictionary with top frequency words boosts performance Zero shot parsing Joint training Top frequent Top frequent 65.6 75.780.1 80.7
  • 27. Size of bilingual dictionary: 1k vs. 5k vs. 10k Bilingual dictionary of bigger size boosts performance Zero shot parsing Joint training 5k / 10k 5k / 10k 70.4 76.3 76.680.0 81.1 81.4
  • 28. Bilingual dictionary: what did we learn? Best practices for constructing bilingual dictionary: 1. Domain words 2. Frequent words 3. Of size 5k or 10k The less training data you have available, the more attention you need to pay to bilingual dictionary.
  • 30. Words from bilingual dictionary Top 20 closest words
  • 31. Words outside of bilingual dictionary Job titles Top 20 closest words Locations
  • 32. Header words English names Persoenliche in heldout set Persoenliche in bilingual dictionary
  • 33. Other languages? Dutch to English Slavic languages
  • 34. Dutch to English • Zero-shot parsing works • Gain from transfer learning • The more data we have the smaller is cross- lingual gain 79.1 +6.3 +1.3 DutchZero-shot parsing
  • 35. Slavic languages: on Russian 84.3 +2.3 +0.1 81.9 +1.4 +0.5 Czech Polish Zero-shot parsing Zero-shot parsing
  • 36. What did we learn?
  • 37. Summary • Transfer learning works • Pretty good results on zero shot • Cross-lingual gain reduces as we add more data from target language • The quality of bilingual dictionary affects the end task performance • The less training data you have available, the more attention you need to pay to bilingual dictionary • Use top 5k most frequent words in your domain corpora