Enviar pesquisa
Carregar
Reciprocal Enrichment between Wikipedia and Machine Translators
•
0 gostou
•
269 visualizações
M
Mikel Iturbe
Seguir
The slides of the talk given in Wikimania 2010 in Gdansk, Poland.
Leia menos
Leia mais
Tecnologia
Negócios
Vista de apresentação de diapositivos
Denunciar
Compartilhar
Vista de apresentação de diapositivos
Denunciar
Compartilhar
1 de 41
Baixar agora
Baixar para ler offline
Recomendados
Software lokalizazioa: Zer? Nola? Nork?
Software lokalizazioa: Zer? Nola? Nork?
Mikel Iturbe
Promoting the Use of Basque via Language Technology
Promoting the Use of Basque via Language Technology
techiaith
Transcribe Bentham
Transcribe Bentham
Franny Gaede
Parc floss-wikipedia
Parc floss-wikipedia
José Felipe Ortega
Web Metaphysics between Logic and Ontology
Web Metaphysics between Logic and Ontology
PhiloWeb
Increasing access to free and open knowledge for speakers of underserved lang...
Increasing access to free and open knowledge for speakers of underserved lang...
Lucie-Aimée Kaffee
Research at RMOD
Research at RMOD
Marcus Denker
Multilingual challenges in Europeana
Multilingual challenges in Europeana
Antoine Isaac
Recomendados
Software lokalizazioa: Zer? Nola? Nork?
Software lokalizazioa: Zer? Nola? Nork?
Mikel Iturbe
Promoting the Use of Basque via Language Technology
Promoting the Use of Basque via Language Technology
techiaith
Transcribe Bentham
Transcribe Bentham
Franny Gaede
Parc floss-wikipedia
Parc floss-wikipedia
José Felipe Ortega
Web Metaphysics between Logic and Ontology
Web Metaphysics between Logic and Ontology
PhiloWeb
Increasing access to free and open knowledge for speakers of underserved lang...
Increasing access to free and open knowledge for speakers of underserved lang...
Lucie-Aimée Kaffee
Research at RMOD
Research at RMOD
Marcus Denker
Multilingual challenges in Europeana
Multilingual challenges in Europeana
Antoine Isaac
TraduXio project - Cosi10
TraduXio project - Cosi10
PhilippeLacour
Learning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology Engineering
butest
M&L 2012 - Translectures: tackling the translation issue in a cost effective ...
M&L 2012 - Translectures: tackling the translation issue in a cost effective ...
Media & Learning Conference
The META-NET Strategic Research Agenda for Multilingual Europe 2020
The META-NET Strategic Research Agenda for Multilingual Europe 2020
Georg Rehm
Organising a GLAM wiki
Organising a GLAM wiki
Europeana_Sounds
Niatalk24jan10
Niatalk24jan10
Sunita Barve
LIASCD_carriero
LIASCD_carriero
Corinne Carriero
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Anja Jentzsch
eLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in Linguistics
Cornelius Puschmann
Olf2016
Olf2016
Dru Lavigne
Community SUmmit: Legal & Licensing / Tools for developers to ensure legal in...
Community SUmmit: Legal & Licensing / Tools for developers to ensure legal in...
Paris Open Source Summit
Anton Kasyanov, Introduction to Python, Lecture1
Anton Kasyanov, Introduction to Python, Lecture1
Anton Kasyanov
Tools for developers to ensure legal integrity of their code - Antelink OWF
Tools for developers to ensure legal integrity of their code - Antelink OWF
Antelink
Wikipedia : Workshop
Wikipedia : Workshop
NIFT
Why to Choose Python for Data Science Master.pptx
Why to Choose Python for Data Science Master.pptx
HGLLearn
Presentation OntoCommons Workshop March 2021
Presentation OntoCommons Workshop March 2021
INRAE (MISTEA) and University of Montpellier (LIRMM)
Traduco: A collaborative web-based CAT environment for the interpretation and...
Traduco: A collaborative web-based CAT environment for the interpretation and...
antonellarose
Improving writing aids, the community way
Improving writing aids, the community way
Alexandro Colorado
Models and tools for aggregating and annotating content on ECLAP
Models and tools for aggregating and annotating content on ECLAP
Paolo Nesi
EMMA presentation - Alfons Juan - Language technologies for Education: recent...
EMMA presentation - Alfons Juan - Language technologies for Education: recent...
EUmoocs
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
TopCSSGallery
Mais conteúdo relacionado
Semelhante a Reciprocal Enrichment between Wikipedia and Machine Translators
TraduXio project - Cosi10
TraduXio project - Cosi10
PhilippeLacour
Learning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology Engineering
butest
M&L 2012 - Translectures: tackling the translation issue in a cost effective ...
M&L 2012 - Translectures: tackling the translation issue in a cost effective ...
Media & Learning Conference
The META-NET Strategic Research Agenda for Multilingual Europe 2020
The META-NET Strategic Research Agenda for Multilingual Europe 2020
Georg Rehm
Organising a GLAM wiki
Organising a GLAM wiki
Europeana_Sounds
Niatalk24jan10
Niatalk24jan10
Sunita Barve
LIASCD_carriero
LIASCD_carriero
Corinne Carriero
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Anja Jentzsch
eLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in Linguistics
Cornelius Puschmann
Olf2016
Olf2016
Dru Lavigne
Community SUmmit: Legal & Licensing / Tools for developers to ensure legal in...
Community SUmmit: Legal & Licensing / Tools for developers to ensure legal in...
Paris Open Source Summit
Anton Kasyanov, Introduction to Python, Lecture1
Anton Kasyanov, Introduction to Python, Lecture1
Anton Kasyanov
Tools for developers to ensure legal integrity of their code - Antelink OWF
Tools for developers to ensure legal integrity of their code - Antelink OWF
Antelink
Wikipedia : Workshop
Wikipedia : Workshop
NIFT
Why to Choose Python for Data Science Master.pptx
Why to Choose Python for Data Science Master.pptx
HGLLearn
Presentation OntoCommons Workshop March 2021
Presentation OntoCommons Workshop March 2021
INRAE (MISTEA) and University of Montpellier (LIRMM)
Traduco: A collaborative web-based CAT environment for the interpretation and...
Traduco: A collaborative web-based CAT environment for the interpretation and...
antonellarose
Improving writing aids, the community way
Improving writing aids, the community way
Alexandro Colorado
Models and tools for aggregating and annotating content on ECLAP
Models and tools for aggregating and annotating content on ECLAP
Paolo Nesi
EMMA presentation - Alfons Juan - Language technologies for Education: recent...
EMMA presentation - Alfons Juan - Language technologies for Education: recent...
EUmoocs
Semelhante a Reciprocal Enrichment between Wikipedia and Machine Translators
(20)
TraduXio project - Cosi10
TraduXio project - Cosi10
Learning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology Engineering
M&L 2012 - Translectures: tackling the translation issue in a cost effective ...
M&L 2012 - Translectures: tackling the translation issue in a cost effective ...
The META-NET Strategic Research Agenda for Multilingual Europe 2020
The META-NET Strategic Research Agenda for Multilingual Europe 2020
Organising a GLAM wiki
Organising a GLAM wiki
Niatalk24jan10
Niatalk24jan10
LIASCD_carriero
LIASCD_carriero
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
eLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in Linguistics
Olf2016
Olf2016
Community SUmmit: Legal & Licensing / Tools for developers to ensure legal in...
Community SUmmit: Legal & Licensing / Tools for developers to ensure legal in...
Anton Kasyanov, Introduction to Python, Lecture1
Anton Kasyanov, Introduction to Python, Lecture1
Tools for developers to ensure legal integrity of their code - Antelink OWF
Tools for developers to ensure legal integrity of their code - Antelink OWF
Wikipedia : Workshop
Wikipedia : Workshop
Why to Choose Python for Data Science Master.pptx
Why to Choose Python for Data Science Master.pptx
Presentation OntoCommons Workshop March 2021
Presentation OntoCommons Workshop March 2021
Traduco: A collaborative web-based CAT environment for the interpretation and...
Traduco: A collaborative web-based CAT environment for the interpretation and...
Improving writing aids, the community way
Improving writing aids, the community way
Models and tools for aggregating and annotating content on ECLAP
Models and tools for aggregating and annotating content on ECLAP
EMMA presentation - Alfons Juan - Language technologies for Education: recent...
EMMA presentation - Alfons Juan - Language technologies for Education: recent...
Último
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
TopCSSGallery
The Metaverse: Are We There Yet?
The Metaverse: Are We There Yet?
Mark Billinghurst
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
Patrick Viafore
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
ScyllaDB
AI mind or machine power point presentation
AI mind or machine power point presentation
yogeshlabana357357
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
GDSC PJATK
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
BrainSell Technologies
Working together SRE & Platform Engineering
Working together SRE & Platform Engineering
Marcus Vechiato
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
FIDO Alliance
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Leah Henrickson
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
Stephen Perrenod
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
Hiroshi SHIBATA
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
Paolo Missier
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Exakis Nelite
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
Lorenzo Miniero
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Paige Cruz
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
FIDO Alliance
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
iSEO AI
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
DianaGray10
Último
(20)
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
The Metaverse: Are We There Yet?
The Metaverse: Are We There Yet?
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
AI mind or machine power point presentation
AI mind or machine power point presentation
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
Working together SRE & Platform Engineering
Working together SRE & Platform Engineering
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
Reciprocal Enrichment between Wikipedia and Machine Translators
1.
Reciprocal Enrichment
between Wikipedia and Machine Translators OpenMT2 project Mikel Iturbe Wikimania 2010 Gdańsk, Poland
2.
languages in
wikipedia
3.
Distribution of wikipedia
articles by language English German French Polish Italian Japanese Spanish Dutch Other
4.
Less than 1% of
languages have more than 50% of articles
5.
Can we ease good
article creation?
6.
How can we boost
article creation in minority languages?
7.
OpenMT2 project
http://ixa.si.ehu.es/openmt2/
8.
What is it?
9.
EHU, UPC and Basque wikipedians
10.
Funded by the
Spanish government
11.
Free
12.
Hybrid Machine
Translation and advanced evaluation system
13.
Hybrid?
14.
Rule-based MT
+ Statistical post-editing
15.
The aim: To teach the
existing MT to correct it's own mistakes when translating
16.
Using wikipedia
17.
How?
18.
(1)
19.
Translate using
rulebased MatxinOpentrad http://opentrad.com/
20.
100 long articles
es eu
21.
(2)
22.
Correct Basque
output manually
23.
(3)
24.
Analyze logs
25.
(4)
26.
Make
improvements to the MT system
27.
28.
Final test and
results
29.
Tools
30.
Google translator
toolkit
31.
Specific help for wikipedia
Not Free Software
32.
OmegaT
http://omegat.org
33.
Suitable to do the job
Free software
34.
What's in?
35.
100 new and good
articles for the Basque Wikipedia
36.
Provide research
material
37.
Walk towards a MT
system that can be used in our wikipedia
38.
Thank you.
39.
Aurélio A. Heckert (source), David Vignoni (source),
Wilfredor (source), Tango project & Arkanosis (source) , OmegaT project (source) Image credits
40.
email: mikel@hamahiru.org
User page: http://eu.wikipedia.org/wiki/Lankide:Janfri Address: http://hamahiru.org/media/wikimania2010.pdf contact
41.
Text licensed under
ccbysa 3.0 images maintain their original licenses
Baixar agora