SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
Cross-Language Information Retrieval
University of Arizona

Sumin Byeon

1
Overview
안드로이드 이메일 암호화&

Matching&
algorithm&

Bilingual&
corpus&
database&

Results&in&
English&

Android&email&encryp3on&

Google&
Search&

2
Background
•

Corpus - a collection of written text; a single word or multiple words, or even
phrases and sentences

•

Comparable corpus - a collection of text from pairs of languages referring to
the same domain[1]; (source text, target text) pair

•

N-gram - n-character or n-word slice of a longer string[2]. We refer n-character
slices by the term n-gram. We use 4-gram (four-gram or quad-gram)

•

Source language - the language of the original phrases

•

Target language - the language into which CLIR translates the original phrases
[1]: Picchi, Eugenio, and Carol Peters. Cross-Language Information Retrieval: A System for Comparable Corpus Querying. Vol. 2. N.p.: Springer US, 1998. Print. 1387-5264.
[2]: Cavnar, William B., and John M. Trenkle. "N-Gram-Based Text Categorization." (1994) Print.

3
Motivation
•

Desire to acquire information even if the information is not
sufficiently available in their native language

•

Survey has shown people have a higher foreign language
proficiency level in reading than in writing

•

CLIR may bridge the gap between their desire to obtain
information and unavailability or under-availability of such
information in their native language

4
Goals
•

Allow users to query for domain-specific (i.e., computer science and software
engineering) information in their native language

•

Present relevant search results in the target language; the language in which
the largest amount of information is available

5
Components
•

Domain-specific bilingual corpus extraction from multiple sources

•

Corpus indexing

•

Querying and string matching

6
Corpus Extraction

7
Corpus Indexing
(S, T) -> (i1, h1), (i2, h2), …, (in, hn)

•

Java$

•

Quad-grams (k=4)

0:$Java$(20451)$

•

Fingerprint overlapping is okay, although it is not the most
space-efficient way

global$variable$

자바$

Frequency

전역 변수$

3:$bal_$(14870)$

50000

8:$aria$(14269)$

37500

25000

example$

예제$

12500

1:$xamp$(20451)$
0

1

3

5

7

9

11

13

15

17

19

21

23

25

27

29

31

33

35

37

39

41

43

45

47

49

51

53

55

57

59

61

63

65

67

69

71

73

75

77

79

81

83

86

88

90

92

95

97

99 103

8
Querying & Matching
Java$global$variable$example$$

Java$

자바$

0:$Java$(20451)$

0:$Java$(20451)$
1:$ava_$(24085)$

…$

global$variable$

8:$bal_$(14870)$

전역 변수$

3:$bal_$(14870)$

…$

8:$aria$(14269)$

13:$aria$(14269)$

…$
22:$xamp$(20451)$

example$

예제$

1:$xamp$(20451)$

9
Multiple Candidates
global&variable&

•
•

Longest match first
Confidence: how many times does this comparable
corpus pair appear in a set of documents?

3:&bal_&(14870)&
8:&aria&(14269)&

global&

•

Outcome of matching depends on the domain of the
documents stored in the database

전역 변수&

세계적인&

0:&loba&(25848)&

variable&

변수&

1:&aria&(14269)&

variable&

가변적인&

1:&aria&(14269)&
10
Indexing and Querying Recap

자바 전역 변수 예제!

자바 :!Java!
전역 :!transfer!
전역 :!all!parts!(of)!
전역 변수 :!global!variable!
변수 :!variable!
예제 :!example!

Java!global!variable!
example!!

11
Relationship with Content Addressability

자바 전역 변수 예제&
자바&

Java&

전역 변수&
예제&

global&variable&
example&

Lorem&ipsum&dolor&sit&amet,&consectetur&adipiscing&elit.&
Quisque&id&Java&tris8que&nunc.&Ves8bulum&sit&amet&tortor&
ullamcorper,&pre8um&augue&ac,&facilisis&quam.&Ut&convallis&
suscipit&mauris,&at&porta&erat&vulputate&in.&Nulla&vitae&
consectetur&risus.&global&variable&Aenean&justo&risus,&mollis&
sed&condimentum&sed,&sagi@s&eget&nisl.&Phasellus&sem&leo,&
commodo&at&dignissim&vitae,&ullamcorper&nec&metus.&Proin&
pre8um&porta&lectus&nec&example&pulvinar.&Nulla&non&
elementum&nisi,&vel&hendrerit&quam.&Curabitur&bibendum&
lobor8s&8ncidunt.&Proin&vel&velit&porta,&tempus&ligula&a,&
interdum&leo.&Aenean&lorem&nibh,&facilisis&ut&porta&sit&amet,&
ornare&quis&ligula.&

12
Evaluation
•

Matching
•
•

•

Did it translate all the search terms to the target language properly?
Did it preserve domain-specific information?

Searching
•

Hit ratio: # of relevant web pages / # of results on the first page

•

Total number of search results
13
Evaluation
•

재귀 열거 집합 - recursively enumerable sets
•

•

배낭 문제 시간 복잡도 - 배낭 issue the time complexity
•

•

(3/3, 1/1)

(3/4, 1/2)

가상화를 통한 데이터센터 에너지 효율 극대화 - through virtualization datacenter
energy efficiency maximization
•

(7/7, 4/4)
14
Evaluation
•

Query in source language “재귀 열거 집합”
•

•

Query in target language “recursively enumerable sets”
•

•

(6/10, 15,300)

(10/10, 105,000)

Google Translate result “Set of recursive enumeration”
•

(10/10, 1,990,000)
15
Evaluation
•

Query in source language “배낭 문제 시간 복잡도”
•

•

Query in target language “배낭 issue time complexity”
•

•

(10/10, 31,200)

(2/6, 2,270)

Google Translate result “Knapsack problem, the time complexity”
•

(10/10, 206,000)
16
Evaluation
•

Query in source language “가상화를 통한 데이터센터 에너지 효율 극대화”
•

•

Query in target language “through virtualization datacenter energy efficiency
maximization”
•

•

(5/10, 36,100)

(8/10, 264,000)

Google Translate result “Maximize energy efficiency through data center
virtualization”
•

(10/10, 284,000)
17
Conclusion & Future Work
•

Preliminary results look satisfactory

•

Machine translation based CLIR appears to be more useful in many cases

•

Evaluation factors may not reflect the actual quality of the system

•

Labor-intensive evaluation process - need for an automated evaluation

•

Fuzzy matching based on lexical information (e.g., call, calls)

•

Fuzzy matching based on semantic information (e.g., maximize, maximizing,
maximization, maximum)
18

Mais conteúdo relacionado

Mais procurados

Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining TechniquesHouw Liong The
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)Uma Se
 
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Julien PLU
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationssChandan Deb
 
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...shakimov
 
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...KozoChikai
 
Python-Introduction-slides-pkt
Python-Introduction-slides-pktPython-Introduction-slides-pkt
Python-Introduction-slides-pktPradyumna Tripathy
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLPRobert Viseur
 
Topic Modelling and APIs
Topic Modelling and APIsTopic Modelling and APIs
Topic Modelling and APIsAli Kheyrollahi
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)Marina Santini
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain OntologyKeerti Bhogaraju
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologiesProf. Wim Van Criekinge
 
The vector space model
The vector space modelThe vector space model
The vector space modelpkgosh
 

Mais procurados (20)

Text Mining with R
Text Mining with RText Mining with R
Text Mining with R
 
Chapter 10 Data Mining Techniques
 Chapter 10 Data Mining Techniques Chapter 10 Data Mining Techniques
Chapter 10 Data Mining Techniques
 
Copy of 10text (2)
Copy of 10text (2)Copy of 10text (2)
Copy of 10text (2)
 
Web and text
Web and textWeb and text
Web and text
 
Working with text data
Working with text dataWorking with text data
Working with text data
 
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
Knowledge extraction in Web media: at the frontier of NLP, Machine Learning a...
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
 
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
Learning Multilingual Semantic Parsers for Question Answering over Linked Dat...
 
Profile of NPOESS HDF5 Files
Profile of NPOESS HDF5 FilesProfile of NPOESS HDF5 Files
Profile of NPOESS HDF5 Files
 
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
Analysis of Similarity Measures between Short Text for the NTCIR-12 Short Tex...
 
Python-Introduction-slides-pkt
Python-Introduction-slides-pktPython-Introduction-slides-pkt
Python-Introduction-slides-pkt
 
Presentation of OpenNLP
Presentation of OpenNLPPresentation of OpenNLP
Presentation of OpenNLP
 
Topic Modelling and APIs
Topic Modelling and APIsTopic Modelling and APIs
Topic Modelling and APIs
 
IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)IE: Named Entity Recognition (NER)
IE: Named Entity Recognition (NER)
 
Text Similarity
Text SimilarityText Similarity
Text Similarity
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain Ontology
 
Bio ontologies and semantic technologies
Bio ontologies and semantic technologiesBio ontologies and semantic technologies
Bio ontologies and semantic technologies
 
The vector space model
The vector space modelThe vector space model
The vector space model
 
SAC 2019 ester giallonardo
SAC 2019 ester giallonardoSAC 2019 ester giallonardo
SAC 2019 ester giallonardo
 
NLTK
NLTKNLTK
NLTK
 

Destaque

Ponsetti,bermudez,nellen,gaido
Ponsetti,bermudez,nellen,gaidoPonsetti,bermudez,nellen,gaido
Ponsetti,bermudez,nellen,gaidoaledalmasso
 
Actualog - Facebook для сложных технических изделий, материалов, оборудования
Actualog - Facebook для сложных технических изделий, материалов, оборудованияActualog - Facebook для сложных технических изделий, материалов, оборудования
Actualog - Facebook для сложных технических изделий, материалов, оборудованияActualog
 
Mano miestas Tokijus
Mano miestas TokijusMano miestas Tokijus
Mano miestas Tokijustokyo18
 
第7章 语法制导翻译和中间代码生成
第7章 语法制导翻译和中间代码生成第7章 语法制导翻译和中间代码生成
第7章 语法制导翻译和中间代码生成tjpucompiler
 
Blog pp cultural diversity
Blog pp cultural diversityBlog pp cultural diversity
Blog pp cultural diversityPaulineHeadley
 
د _______ _د_____ç_د_خ_è _____ث___â_د__ _د___ç___»___è_ر
 د _______ _د_____ç_د_خ_è _____ث___â_د__ _د___ç___»___è_ر د _______ _د_____ç_د_خ_è _____ث___â_د__ _د___ç___»___è_ر
د _______ _د_____ç_د_خ_è _____ث___â_د__ _د___ç___»___è_رrawan102
 
Presentation to Global Hair & Fashion Group Members
Presentation to Global Hair & Fashion Group MembersPresentation to Global Hair & Fashion Group Members
Presentation to Global Hair & Fashion Group MembersCandi Williams
 
Internet marketing overview
Internet marketing overviewInternet marketing overview
Internet marketing overviewTom Gray
 
動畫表演
動畫表演動畫表演
動畫表演zi_yong
 
Professional Business Results & Selected Accomplishments
Professional Business Results & Selected AccomplishmentsProfessional Business Results & Selected Accomplishments
Professional Business Results & Selected Accomplishmentsmjleib
 
день семьи
день семьидень семьи
день семьиSokol194
 
Presentación t3
Presentación t3Presentación t3
Presentación t3pll-latam
 
Depositos de agua (SPANISH)
Depositos de agua (SPANISH)Depositos de agua (SPANISH)
Depositos de agua (SPANISH)Silos Cordoba
 
Success Story - Dr Sonica Krishan Author, Speaker, Ayurveda Consultant
Success Story - Dr Sonica Krishan Author, Speaker, Ayurveda ConsultantSuccess Story - Dr Sonica Krishan Author, Speaker, Ayurveda Consultant
Success Story - Dr Sonica Krishan Author, Speaker, Ayurveda ConsultantDrSonica Krishan
 
iPad Crazy Session
iPad Crazy SessioniPad Crazy Session
iPad Crazy SessionKdeethomas1
 
東京ソーシャルデザイン研究所Ver4ドラフト
東京ソーシャルデザイン研究所Ver4ドラフト東京ソーシャルデザイン研究所Ver4ドラフト
東京ソーシャルデザイン研究所Ver4ドラフトTakayuki Toda
 

Destaque (20)

Ponsetti,bermudez,nellen,gaido
Ponsetti,bermudez,nellen,gaidoPonsetti,bermudez,nellen,gaido
Ponsetti,bermudez,nellen,gaido
 
Actualog - Facebook для сложных технических изделий, материалов, оборудования
Actualog - Facebook для сложных технических изделий, материалов, оборудованияActualog - Facebook для сложных технических изделий, материалов, оборудования
Actualog - Facebook для сложных технических изделий, материалов, оборудования
 
Mano miestas Tokijus
Mano miestas TokijusMano miestas Tokijus
Mano miestas Tokijus
 
第7章 语法制导翻译和中间代码生成
第7章 语法制导翻译和中间代码生成第7章 语法制导翻译和中间代码生成
第7章 语法制导翻译和中间代码生成
 
Blog pp cultural diversity
Blog pp cultural diversityBlog pp cultural diversity
Blog pp cultural diversity
 
د _______ _د_____ç_د_خ_è _____ث___â_د__ _د___ç___»___è_ر
 د _______ _د_____ç_د_خ_è _____ث___â_د__ _د___ç___»___è_ر د _______ _د_____ç_د_خ_è _____ث___â_د__ _د___ç___»___è_ر
د _______ _د_____ç_د_خ_è _____ث___â_د__ _د___ç___»___è_ر
 
Presentation to Global Hair & Fashion Group Members
Presentation to Global Hair & Fashion Group MembersPresentation to Global Hair & Fashion Group Members
Presentation to Global Hair & Fashion Group Members
 
Internet marketing overview
Internet marketing overviewInternet marketing overview
Internet marketing overview
 
動畫表演
動畫表演動畫表演
動畫表演
 
Professional Business Results & Selected Accomplishments
Professional Business Results & Selected AccomplishmentsProfessional Business Results & Selected Accomplishments
Professional Business Results & Selected Accomplishments
 
K401 L2
K401 L2K401 L2
K401 L2
 
день семьи
день семьидень семьи
день семьи
 
Uyoc
UyocUyoc
Uyoc
 
Schoo01 130906042632-
Schoo01 130906042632-Schoo01 130906042632-
Schoo01 130906042632-
 
Presentación t3
Presentación t3Presentación t3
Presentación t3
 
Depositos de agua (SPANISH)
Depositos de agua (SPANISH)Depositos de agua (SPANISH)
Depositos de agua (SPANISH)
 
Success Story - Dr Sonica Krishan Author, Speaker, Ayurveda Consultant
Success Story - Dr Sonica Krishan Author, Speaker, Ayurveda ConsultantSuccess Story - Dr Sonica Krishan Author, Speaker, Ayurveda Consultant
Success Story - Dr Sonica Krishan Author, Speaker, Ayurveda Consultant
 
iPad Crazy Session
iPad Crazy SessioniPad Crazy Session
iPad Crazy Session
 
東京ソーシャルデザイン研究所Ver4ドラフト
東京ソーシャルデザイン研究所Ver4ドラフト東京ソーシャルデザイン研究所Ver4ドラフト
東京ソーシャルデザイン研究所Ver4ドラフト
 
Gamze bilg ödevi
Gamze bilg ödeviGamze bilg ödevi
Gamze bilg ödevi
 

Semelhante a Cross-Language Information Retrieval

Final quantitative analysis of egyptian aphorisms by using r
Final quantitative analysis of egyptian aphorisms by using rFinal quantitative analysis of egyptian aphorisms by using r
Final quantitative analysis of egyptian aphorisms by using rAlexandria University
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionSarvnaz Karimi
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsTomoyuki Kajiwara
 
SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...
SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...
SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...geraintduck
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measuresankit_ppt
 
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015RIILP
 
Automated Abstracts and Big Data
Automated Abstracts and Big DataAutomated Abstracts and Big Data
Automated Abstracts and Big DataSameer Wadkar
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguisticsshrey bhate
 
தமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்புதமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்புBalaSundaraRaman (Sundar)
 
A Corpus-based Approach to Tracking L2 Development
A Corpus-based Approach to Tracking L2 DevelopmentA Corpus-based Approach to Tracking L2 Development
A Corpus-based Approach to Tracking L2 DevelopmentCALPER
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingSean Golliher
 
Using selectors for nouns, verbs and adjectives
Using selectors for nouns, verbs and adjectivesUsing selectors for nouns, verbs and adjectives
Using selectors for nouns, verbs and adjectivesAndrés Vargas
 
Cork AI Meetup Number 3
Cork AI Meetup Number 3Cork AI Meetup Number 3
Cork AI Meetup Number 3Nick Grattan
 
Mixed Effects Models - Crossed Random Effects
Mixed Effects Models - Crossed Random EffectsMixed Effects Models - Crossed Random Effects
Mixed Effects Models - Crossed Random EffectsScott Fraundorf
 
How to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningHow to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningLena Shakurova
 

Semelhante a Cross-Language Information Retrieval (20)

Final quantitative analysis of egyptian aphorisms by using r
Final quantitative analysis of egyptian aphorisms by using rFinal quantitative analysis of egyptian aphorisms by using r
Final quantitative analysis of egyptian aphorisms by using r
 
C8 akumaran
C8 akumaranC8 akumaran
C8 akumaran
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of Contexts
 
SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...
SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...
SMBM 2012: Ambiguity and Variability of Database and Software Names in Bioinf...
 
Text similarity measures
Text similarity measuresText similarity measures
Text similarity measures
 
Plagirism checker
Plagirism checkerPlagirism checker
Plagirism checker
 
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
 
Automated Abstracts and Big Data
Automated Abstracts and Big DataAutomated Abstracts and Big Data
Automated Abstracts and Big Data
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Computational linguistics
Computational linguisticsComputational linguistics
Computational linguistics
 
தமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்புதமிழ்க்கணிமை கட்டமைப்பு
தமிழ்க்கணிமை கட்டமைப்பு
 
A Corpus-based Approach to Tracking L2 Development
A Corpus-based Approach to Tracking L2 DevelopmentA Corpus-based Approach to Tracking L2 Development
A Corpus-based Approach to Tracking L2 Development
 
Lecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document ParsingLecture 7- Text Statistics and Document Parsing
Lecture 7- Text Statistics and Document Parsing
 
Using selectors for nouns, verbs and adjectives
Using selectors for nouns, verbs and adjectivesUsing selectors for nouns, verbs and adjectives
Using selectors for nouns, verbs and adjectives
 
IR.pptx
IR.pptxIR.pptx
IR.pptx
 
Cork AI Meetup Number 3
Cork AI Meetup Number 3Cork AI Meetup Number 3
Cork AI Meetup Number 3
 
Ir 03
Ir   03Ir   03
Ir 03
 
Mixed Effects Models - Crossed Random Effects
Mixed Effects Models - Crossed Random EffectsMixed Effects Models - Crossed Random Effects
Mixed Effects Models - Crossed Random Effects
 
How to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningHow to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learning
 

Mais de Sumin Byeon

PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]Sumin Byeon
 
BD Talk 2017 봄 - 원정코딩
BD Talk 2017 봄 - 원정코딩BD Talk 2017 봄 - 원정코딩
BD Talk 2017 봄 - 원정코딩Sumin Byeon
 
NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법
NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법
NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법Sumin Byeon
 
Are Credit Cards Evil
Are Credit Cards EvilAre Credit Cards Evil
Are Credit Cards EvilSumin Byeon
 
NDC 2016 마이크로토크 - 프로그래머가 투자하는 법
NDC 2016 마이크로토크 - 프로그래머가 투자하는 법NDC 2016 마이크로토크 - 프로그래머가 투자하는 법
NDC 2016 마이크로토크 - 프로그래머가 투자하는 법Sumin Byeon
 
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기Sumin Byeon
 
더 나은 번역기는 나의 삶을 어떻게 바꾸었는가
더 나은 번역기는 나의 삶을 어떻게 바꾸었는가더 나은 번역기는 나의 삶을 어떻게 바꾸었는가
더 나은 번역기는 나의 삶을 어떻게 바꾸었는가Sumin Byeon
 
2015 PyCon - 프로그래머가 이사하는 법
2015 PyCon - 프로그래머가 이사하는 법2015 PyCon - 프로그래머가 이사하는 법
2015 PyCon - 프로그래머가 이사하는 법Sumin Byeon
 
[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담
[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담
[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담Sumin Byeon
 
SLINKY: Static Linking Reloaded
SLINKY: Static Linking ReloadedSLINKY: Static Linking Reloaded
SLINKY: Static Linking ReloadedSumin Byeon
 
Project Proposal: Translation Example Search Engine
Project Proposal: Translation Example Search EngineProject Proposal: Translation Example Search Engine
Project Proposal: Translation Example Search EngineSumin Byeon
 
Self-Tuning Wireless Network Power Management
Self-Tuning Wireless Network Power ManagementSelf-Tuning Wireless Network Power Management
Self-Tuning Wireless Network Power ManagementSumin Byeon
 
Error tolerant search
Error tolerant searchError tolerant search
Error tolerant searchSumin Byeon
 
Git with bitbucket
Git with bitbucketGit with bitbucket
Git with bitbucketSumin Byeon
 
Git with bitbucket (draft)
Git with bitbucket (draft)Git with bitbucket (draft)
Git with bitbucket (draft)Sumin Byeon
 
RNA Secondary Structure Prediction
RNA Secondary Structure PredictionRNA Secondary Structure Prediction
RNA Secondary Structure PredictionSumin Byeon
 

Mais de Sumin Byeon (16)

PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
PyCon 2017 프로그래머가 이사하는 법 2 [천원경매]
 
BD Talk 2017 봄 - 원정코딩
BD Talk 2017 봄 - 원정코딩BD Talk 2017 봄 - 원정코딩
BD Talk 2017 봄 - 원정코딩
 
NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법
NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법
NDC 2017 마이크로토크 - 프로그래머가 뉴스 읽는 법
 
Are Credit Cards Evil
Are Credit Cards EvilAre Credit Cards Evil
Are Credit Cards Evil
 
NDC 2016 마이크로토크 - 프로그래머가 투자하는 법
NDC 2016 마이크로토크 - 프로그래머가 투자하는 법NDC 2016 마이크로토크 - 프로그래머가 투자하는 법
NDC 2016 마이크로토크 - 프로그래머가 투자하는 법
 
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
[야생의 땅: 듀랑고] 지형 관리 완전 자동화 - 생생한 AWS와 Docker 체험기
 
더 나은 번역기는 나의 삶을 어떻게 바꾸었는가
더 나은 번역기는 나의 삶을 어떻게 바꾸었는가더 나은 번역기는 나의 삶을 어떻게 바꾸었는가
더 나은 번역기는 나의 삶을 어떻게 바꾸었는가
 
2015 PyCon - 프로그래머가 이사하는 법
2015 PyCon - 프로그래머가 이사하는 법2015 PyCon - 프로그래머가 이사하는 법
2015 PyCon - 프로그래머가 이사하는 법
 
[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담
[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담
[야생의 땅: 듀랑고]의 식물 생태계를 담당하는 21세기 정원사의 OpenCL 경험담
 
SLINKY: Static Linking Reloaded
SLINKY: Static Linking ReloadedSLINKY: Static Linking Reloaded
SLINKY: Static Linking Reloaded
 
Project Proposal: Translation Example Search Engine
Project Proposal: Translation Example Search EngineProject Proposal: Translation Example Search Engine
Project Proposal: Translation Example Search Engine
 
Self-Tuning Wireless Network Power Management
Self-Tuning Wireless Network Power ManagementSelf-Tuning Wireless Network Power Management
Self-Tuning Wireless Network Power Management
 
Error tolerant search
Error tolerant searchError tolerant search
Error tolerant search
 
Git with bitbucket
Git with bitbucketGit with bitbucket
Git with bitbucket
 
Git with bitbucket (draft)
Git with bitbucket (draft)Git with bitbucket (draft)
Git with bitbucket (draft)
 
RNA Secondary Structure Prediction
RNA Secondary Structure PredictionRNA Secondary Structure Prediction
RNA Secondary Structure Prediction
 

Último

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Último (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Cross-Language Information Retrieval

  • 3. Background • Corpus - a collection of written text; a single word or multiple words, or even phrases and sentences • Comparable corpus - a collection of text from pairs of languages referring to the same domain[1]; (source text, target text) pair • N-gram - n-character or n-word slice of a longer string[2]. We refer n-character slices by the term n-gram. We use 4-gram (four-gram or quad-gram) • Source language - the language of the original phrases • Target language - the language into which CLIR translates the original phrases [1]: Picchi, Eugenio, and Carol Peters. Cross-Language Information Retrieval: A System for Comparable Corpus Querying. Vol. 2. N.p.: Springer US, 1998. Print. 1387-5264. [2]: Cavnar, William B., and John M. Trenkle. "N-Gram-Based Text Categorization." (1994) Print. 3
  • 4. Motivation • Desire to acquire information even if the information is not sufficiently available in their native language • Survey has shown people have a higher foreign language proficiency level in reading than in writing • CLIR may bridge the gap between their desire to obtain information and unavailability or under-availability of such information in their native language 4
  • 5. Goals • Allow users to query for domain-specific (i.e., computer science and software engineering) information in their native language • Present relevant search results in the target language; the language in which the largest amount of information is available 5
  • 6. Components • Domain-specific bilingual corpus extraction from multiple sources • Corpus indexing • Querying and string matching 6
  • 8. Corpus Indexing (S, T) -> (i1, h1), (i2, h2), …, (in, hn) • Java$ • Quad-grams (k=4) 0:$Java$(20451)$ • Fingerprint overlapping is okay, although it is not the most space-efficient way global$variable$ 자바$ Frequency 전역 변수$ 3:$bal_$(14870)$ 50000 8:$aria$(14269)$ 37500 25000 example$ 예제$ 12500 1:$xamp$(20451)$ 0 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 86 88 90 92 95 97 99 103 8
  • 9. Querying & Matching Java$global$variable$example$$ Java$ 자바$ 0:$Java$(20451)$ 0:$Java$(20451)$ 1:$ava_$(24085)$ …$ global$variable$ 8:$bal_$(14870)$ 전역 변수$ 3:$bal_$(14870)$ …$ 8:$aria$(14269)$ 13:$aria$(14269)$ …$ 22:$xamp$(20451)$ example$ 예제$ 1:$xamp$(20451)$ 9
  • 10. Multiple Candidates global&variable& • • Longest match first Confidence: how many times does this comparable corpus pair appear in a set of documents? 3:&bal_&(14870)& 8:&aria&(14269)& global& • Outcome of matching depends on the domain of the documents stored in the database 전역 변수& 세계적인& 0:&loba&(25848)& variable& 변수& 1:&aria&(14269)& variable& 가변적인& 1:&aria&(14269)& 10
  • 11. Indexing and Querying Recap 자바 전역 변수 예제! 자바 :!Java! 전역 :!transfer! 전역 :!all!parts!(of)! 전역 변수 :!global!variable! 변수 :!variable! 예제 :!example! Java!global!variable! example!! 11
  • 12. Relationship with Content Addressability 자바 전역 변수 예제& 자바& Java& 전역 변수& 예제& global&variable& example& Lorem&ipsum&dolor&sit&amet,&consectetur&adipiscing&elit.& Quisque&id&Java&tris8que&nunc.&Ves8bulum&sit&amet&tortor& ullamcorper,&pre8um&augue&ac,&facilisis&quam.&Ut&convallis& suscipit&mauris,&at&porta&erat&vulputate&in.&Nulla&vitae& consectetur&risus.&global&variable&Aenean&justo&risus,&mollis& sed&condimentum&sed,&sagi@s&eget&nisl.&Phasellus&sem&leo,& commodo&at&dignissim&vitae,&ullamcorper&nec&metus.&Proin& pre8um&porta&lectus&nec&example&pulvinar.&Nulla&non& elementum&nisi,&vel&hendrerit&quam.&Curabitur&bibendum& lobor8s&8ncidunt.&Proin&vel&velit&porta,&tempus&ligula&a,& interdum&leo.&Aenean&lorem&nibh,&facilisis&ut&porta&sit&amet,& ornare&quis&ligula.& 12
  • 13. Evaluation • Matching • • • Did it translate all the search terms to the target language properly? Did it preserve domain-specific information? Searching • Hit ratio: # of relevant web pages / # of results on the first page • Total number of search results 13
  • 14. Evaluation • 재귀 열거 집합 - recursively enumerable sets • • 배낭 문제 시간 복잡도 - 배낭 issue the time complexity • • (3/3, 1/1) (3/4, 1/2) 가상화를 통한 데이터센터 에너지 효율 극대화 - through virtualization datacenter energy efficiency maximization • (7/7, 4/4) 14
  • 15. Evaluation • Query in source language “재귀 열거 집합” • • Query in target language “recursively enumerable sets” • • (6/10, 15,300) (10/10, 105,000) Google Translate result “Set of recursive enumeration” • (10/10, 1,990,000) 15
  • 16. Evaluation • Query in source language “배낭 문제 시간 복잡도” • • Query in target language “배낭 issue time complexity” • • (10/10, 31,200) (2/6, 2,270) Google Translate result “Knapsack problem, the time complexity” • (10/10, 206,000) 16
  • 17. Evaluation • Query in source language “가상화를 통한 데이터센터 에너지 효율 극대화” • • Query in target language “through virtualization datacenter energy efficiency maximization” • • (5/10, 36,100) (8/10, 264,000) Google Translate result “Maximize energy efficiency through data center virtualization” • (10/10, 284,000) 17
  • 18. Conclusion & Future Work • Preliminary results look satisfactory • Machine translation based CLIR appears to be more useful in many cases • Evaluation factors may not reflect the actual quality of the system • Labor-intensive evaluation process - need for an automated evaluation • Fuzzy matching based on lexical information (e.g., call, calls) • Fuzzy matching based on semantic information (e.g., maximize, maximizing, maximization, maximum) 18