SlideShare uma empresa Scribd logo
1 de 41
Baixar para ler offline
Reciprocal Enrichment 
    between Wikipedia and 
     Machine Translators
         OpenMT­2 project

             Mikel Iturbe
           Wikimania 2010 
           Gdańsk, Poland 



                   
languages in 
      wikipedia

           
Distribution of wikipedia
      articles by language
                                English
                                German
                                French
                                Polish
                                Italian
                                Japanese
                                Spanish
                                Dutch
                                Other




                 
Less than 1% of 
     languages have 
    more than 50% of 
         articles 
             
Can we ease good 
    article creation?  

              
How can we boost 
    article creation in 
          minority 
       languages?
              
OpenMT­2 project
     http://ixa.si.ehu.es/openmt2/



                    
What is it?

          
EHU, UPC and 
Basque wikipedians

         
Funded by the 
      Spanish 
     government
           
Free    

        
Hybrid Machine 
      Translation and 
    advanced evaluation 
          system
              
Hybrid?

        
Rule-based MT
                +
    Statistical post-editing

                
The aim: To teach the 
     existing MT to correct 
    it's own mistakes when 
           translating 
                
Using wikipedia

            
How?

       
(1)

      
Translate using 
      rule­based 
    Matxin­Opentrad
       http://opentrad.com/

                 
100 long articles
      es         eu

             
(2)

      
Correct Basque 
    output manually
            
(3)

      
Analyze logs

          
(4)

      
Make 
    improvements to 
     the MT system
            
     
Final test and 
       results

            
Tools

       
Google translator 
        toolkit

             
Specific help for wikipedia
            Not Free Software



                     
OmegaT
    http://omegat.org



             
Suitable to do the job
           Free software



                
What's in?

          
100 new and good 
      articles for the 
    Basque Wikipedia
              
Provide research 
        material

             
Walk towards a MT 
     system that can be 
    used in our wikipedia

               
Thank you.

         
Aurélio A. Heckert (source), David Vignoni (source), 
    Wilfredor (source), Tango project & Arkanosis (source) 
    , OmegaT project (source)




                      Image credits 
                               
e­mail: mikel@hamahiru.org

    User page: http://eu.wikipedia.org/wiki/Lankide:Janfri

    Address: http://hamahiru.org/media/wikimania2010.pdf



                                              contact 
                                    
Text licensed under
       cc­by­sa 3.0
    images maintain their original licenses


                        

Mais conteúdo relacionado

Semelhante a Reciprocal Enrichment between Wikipedia and Machine Translators

TraduXio project - Cosi10
TraduXio project - Cosi10TraduXio project - Cosi10
TraduXio project - Cosi10PhilippeLacour
 
Learning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology EngineeringLearning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology Engineeringbutest
 
M&L 2012 - Translectures: tackling the translation issue in a cost effective ...
M&L 2012 - Translectures: tackling the translation issue in a cost effective ...M&L 2012 - Translectures: tackling the translation issue in a cost effective ...
M&L 2012 - Translectures: tackling the translation issue in a cost effective ...Media & Learning Conference
 
The META-NET Strategic Research Agenda for Multilingual Europe 2020
The META-NET Strategic Research Agenda for Multilingual Europe 2020The META-NET Strategic Research Agenda for Multilingual Europe 2020
The META-NET Strategic Research Agenda for Multilingual Europe 2020Georg Rehm
 
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...Anja Jentzsch
 
eLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticseLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticsCornelius Puschmann
 
Community SUmmit: Legal & Licensing / Tools for developers to ensure legal in...
Community SUmmit: Legal & Licensing / Tools for developers to ensure legal in...Community SUmmit: Legal & Licensing / Tools for developers to ensure legal in...
Community SUmmit: Legal & Licensing / Tools for developers to ensure legal in...Paris Open Source Summit
 
Anton Kasyanov, Introduction to Python, Lecture1
Anton Kasyanov, Introduction to Python, Lecture1Anton Kasyanov, Introduction to Python, Lecture1
Anton Kasyanov, Introduction to Python, Lecture1Anton Kasyanov
 
Tools for developers to ensure legal integrity of their code - Antelink OWF
Tools for developers to ensure legal integrity of their code - Antelink OWFTools for developers to ensure legal integrity of their code - Antelink OWF
Tools for developers to ensure legal integrity of their code - Antelink OWFAntelink
 
Wikipedia : Workshop
Wikipedia : WorkshopWikipedia : Workshop
Wikipedia : WorkshopNIFT
 
Why to Choose Python for Data Science Master.pptx
Why to Choose Python for Data Science Master.pptxWhy to Choose Python for Data Science Master.pptx
Why to Choose Python for Data Science Master.pptxHGLLearn
 
Traduco: A collaborative web-based CAT environment for the interpretation and...
Traduco: A collaborative web-based CAT environment for the interpretation and...Traduco: A collaborative web-based CAT environment for the interpretation and...
Traduco: A collaborative web-based CAT environment for the interpretation and...antonellarose
 
Improving writing aids, the community way
Improving writing aids, the community wayImproving writing aids, the community way
Improving writing aids, the community wayAlexandro Colorado
 
Models and tools for aggregating and annotating content on ECLAP
Models and tools for aggregating and annotating content on ECLAPModels and tools for aggregating and annotating content on ECLAP
Models and tools for aggregating and annotating content on ECLAPPaolo Nesi
 
EMMA presentation - Alfons Juan - Language technologies for Education: recent...
EMMA presentation - Alfons Juan - Language technologies for Education: recent...EMMA presentation - Alfons Juan - Language technologies for Education: recent...
EMMA presentation - Alfons Juan - Language technologies for Education: recent...EUmoocs
 

Semelhante a Reciprocal Enrichment between Wikipedia and Machine Translators (20)

TraduXio project - Cosi10
TraduXio project - Cosi10TraduXio project - Cosi10
TraduXio project - Cosi10
 
Learning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology EngineeringLearning and Text Analysis for Ontology Engineering
Learning and Text Analysis for Ontology Engineering
 
M&L 2012 - Translectures: tackling the translation issue in a cost effective ...
M&L 2012 - Translectures: tackling the translation issue in a cost effective ...M&L 2012 - Translectures: tackling the translation issue in a cost effective ...
M&L 2012 - Translectures: tackling the translation issue in a cost effective ...
 
The META-NET Strategic Research Agenda for Multilingual Europe 2020
The META-NET Strategic Research Agenda for Multilingual Europe 2020The META-NET Strategic Research Agenda for Multilingual Europe 2020
The META-NET Strategic Research Agenda for Multilingual Europe 2020
 
Organising a GLAM wiki
Organising a GLAM wikiOrganising a GLAM wiki
Organising a GLAM wiki
 
Niatalk24jan10
Niatalk24jan10Niatalk24jan10
Niatalk24jan10
 
LIASCD_carriero
LIASCD_carrieroLIASCD_carriero
LIASCD_carriero
 
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
Wikidata - The free knowledge base that anyone can edit (1st Linked Data Meet...
 
eLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in LinguisticseLanguage.net: Shifting the paradigm in Linguistics
eLanguage.net: Shifting the paradigm in Linguistics
 
Olf2016
Olf2016Olf2016
Olf2016
 
Community SUmmit: Legal & Licensing / Tools for developers to ensure legal in...
Community SUmmit: Legal & Licensing / Tools for developers to ensure legal in...Community SUmmit: Legal & Licensing / Tools for developers to ensure legal in...
Community SUmmit: Legal & Licensing / Tools for developers to ensure legal in...
 
Anton Kasyanov, Introduction to Python, Lecture1
Anton Kasyanov, Introduction to Python, Lecture1Anton Kasyanov, Introduction to Python, Lecture1
Anton Kasyanov, Introduction to Python, Lecture1
 
Tools for developers to ensure legal integrity of their code - Antelink OWF
Tools for developers to ensure legal integrity of their code - Antelink OWFTools for developers to ensure legal integrity of their code - Antelink OWF
Tools for developers to ensure legal integrity of their code - Antelink OWF
 
Wikipedia : Workshop
Wikipedia : WorkshopWikipedia : Workshop
Wikipedia : Workshop
 
Why to Choose Python for Data Science Master.pptx
Why to Choose Python for Data Science Master.pptxWhy to Choose Python for Data Science Master.pptx
Why to Choose Python for Data Science Master.pptx
 
Presentation OntoCommons Workshop March 2021
Presentation OntoCommons Workshop March 2021Presentation OntoCommons Workshop March 2021
Presentation OntoCommons Workshop March 2021
 
Traduco: A collaborative web-based CAT environment for the interpretation and...
Traduco: A collaborative web-based CAT environment for the interpretation and...Traduco: A collaborative web-based CAT environment for the interpretation and...
Traduco: A collaborative web-based CAT environment for the interpretation and...
 
Improving writing aids, the community way
Improving writing aids, the community wayImproving writing aids, the community way
Improving writing aids, the community way
 
Models and tools for aggregating and annotating content on ECLAP
Models and tools for aggregating and annotating content on ECLAPModels and tools for aggregating and annotating content on ECLAP
Models and tools for aggregating and annotating content on ECLAP
 
EMMA presentation - Alfons Juan - Language technologies for Education: recent...
EMMA presentation - Alfons Juan - Language technologies for Education: recent...EMMA presentation - Alfons Juan - Language technologies for Education: recent...
EMMA presentation - Alfons Juan - Language technologies for Education: recent...
 

Último

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxFIDO Alliance
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?Mark Billinghurst
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentationyogeshlabana357357
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGDSC PJATK
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024Stephen Perrenod
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Hiroshi SHIBATA
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Paige Cruz
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxFIDO Alliance
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPTiSEO AI
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewDianaGray10
 

Último (20)

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024TopCryptoSupers 12thReport OrionX May2024
TopCryptoSupers 12thReport OrionX May2024
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
Observability Concepts EVERY Developer Should Know (DevOpsDays Seattle)
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 

Reciprocal Enrichment between Wikipedia and Machine Translators