SlideShare a Scribd company logo
1 of 9
Supporting languages...
all of them
        Amir Aharoni
       Gerard Meijssen
Language technology
•Wikimedia Foundation aims to share the sum
of all knowledge
  oAll knowledge with everyone
  oAll knowledge is distributed over all languages and
  cultures
•To do this we need to support information in
over 7000 languages
  oInformation can be translated but we need to support
  sources in any language to provide provenance
•All languages have their requirements
Language technology II
•To write a language you need a script
•A script and its characters are in Unicode
   the Unicode is a standard and can support any script
   o
•However,
   oseveral  scripts are not yet in Unicode
   oeven a language like Polish or Slovenian is not
   supported for all its requirements in the Latin script
   o(old Polish or Slovenian for instance)
Language technology III
•To support languages in MediaWiki,
  owe   need support for its scripts
  owe   need to know how numbers and dates are written
•But also
  owe support plural in our software
  owe support gender in our software
  owe support formality in our software
Language technology IV
•We support a language on our server but...
•is it supported on the device of the reader
    ocan   the end user actually see the characters?
    ocan   the end user actually write the characters?

We support webfonts and input methods...
•for a small selection of languages
•using only freely licensed Unicode fonts
Language standards
We use language tools that are standards
We use language data from standards
•ISO-639-3 to know the languages to support
•Unicode for scripts
•CLDR for information about languages
•the US keyboard layout - it's just practical
Language standards II
The language standards support only a few
languages
•MediaWiki supports more languages than the
CLDR
•Not all scripts are defined in Unicode yet
•Not all characters are defined in Unicode yet
•Many people do not use the US keyboard
Language standards III
To "fix" the standards we need people to
amend and append the information in the
standards
• language teams are organized at
translatewiki.net
• we communicate with standard organisations
• we communicate about our needs
as always, it is about the sum of all knowledge;
this exists only in all languages
As always, it is about the sum of all knowledge;
this exists only in all languages

More Related Content

Viewers also liked

txtr at TechCrunch Mobile 2010
txtr at TechCrunch Mobile 2010txtr at TechCrunch Mobile 2010
txtr at TechCrunch Mobile 2010
txtr
 
micron technollogy 8-K_030106_tech
micron technollogy 8-K_030106_techmicron technollogy 8-K_030106_tech
micron technollogy 8-K_030106_tech
finance36
 
Education for empowerment by narenedra modi
Education for empowerment by narenedra modiEducation for empowerment by narenedra modi
Education for empowerment by narenedra modi
Bhim Upadhyaya
 
Countable & uncountable nouns
Countable & uncountable nounsCountable & uncountable nouns
Countable & uncountable nouns
imamfauzi
 

Viewers also liked (16)

txtr at TechCrunch Mobile 2010
txtr at TechCrunch Mobile 2010txtr at TechCrunch Mobile 2010
txtr at TechCrunch Mobile 2010
 
Earth Day 2015
Earth Day 2015Earth Day 2015
Earth Day 2015
 
micron technollogy 8-K_030106_tech
micron technollogy 8-K_030106_techmicron technollogy 8-K_030106_tech
micron technollogy 8-K_030106_tech
 
Trending developments vol 2 issue 12
Trending developments vol 2 issue 12Trending developments vol 2 issue 12
Trending developments vol 2 issue 12
 
Education for empowerment by narenedra modi
Education for empowerment by narenedra modiEducation for empowerment by narenedra modi
Education for empowerment by narenedra modi
 
Countable & uncountable nouns
Countable & uncountable nounsCountable & uncountable nouns
Countable & uncountable nouns
 
Companies Act Ireland 2014 Guide
Companies Act Ireland 2014 GuideCompanies Act Ireland 2014 Guide
Companies Act Ireland 2014 Guide
 
Getting Into Government: A Guide for High Achievers
Getting Into Government: A Guide for High AchieversGetting Into Government: A Guide for High Achievers
Getting Into Government: A Guide for High Achievers
 
Chap003 BUS137
Chap003 BUS137Chap003 BUS137
Chap003 BUS137
 
A first Draft to Java Configuration
A first Draft to Java ConfigurationA first Draft to Java Configuration
A first Draft to Java Configuration
 
Nanotechnology In Nysr
Nanotechnology In NysrNanotechnology In Nysr
Nanotechnology In Nysr
 
Moving Enterprise Applications To The Cloud
Moving Enterprise Applications To The CloudMoving Enterprise Applications To The Cloud
Moving Enterprise Applications To The Cloud
 
Cyber Crime In Ne Conf 27 April 08
Cyber Crime In Ne Conf 27 April 08Cyber Crime In Ne Conf 27 April 08
Cyber Crime In Ne Conf 27 April 08
 
DesignCrowd.com Exclusive - Presentation of Winning Indian Rupee Symbol by Ud...
DesignCrowd.com Exclusive - Presentation of Winning Indian Rupee Symbol by Ud...DesignCrowd.com Exclusive - Presentation of Winning Indian Rupee Symbol by Ud...
DesignCrowd.com Exclusive - Presentation of Winning Indian Rupee Symbol by Ud...
 
easywalker duo user manual General English
easywalker duo user manual General Englisheasywalker duo user manual General English
easywalker duo user manual General English
 
Mobile FabLab
Mobile FabLabMobile FabLab
Mobile FabLab
 

Similar to Supporting languages, all of them

Python-unit -I.pptx
Python-unit -I.pptxPython-unit -I.pptx
Python-unit -I.pptx
crAmth
 
PCEP Module 1.pptx
PCEP Module 1.pptxPCEP Module 1.pptx
PCEP Module 1.pptx
zakariaHujale
 
Translate.org Presentation
Translate.org PresentationTranslate.org Presentation
Translate.org Presentation
SANGONeT
 
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Lucidworks
 

Similar to Supporting languages, all of them (20)

ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)ICANN 51: IDN Root Zone LGR (workshop)
ICANN 51: IDN Root Zone LGR (workshop)
 
Wikimedia and the multilingual web
Wikimedia and the multilingual webWikimedia and the multilingual web
Wikimedia and the multilingual web
 
Computer languages
Computer languagesComputer languages
Computer languages
 
How to build language technology resources for the next 100 years
How to build language technology resources for the next 100 yearsHow to build language technology resources for the next 100 years
How to build language technology resources for the next 100 years
 
Python-unit -I.pptx
Python-unit -I.pptxPython-unit -I.pptx
Python-unit -I.pptx
 
Achievement And Lessons Learned By An Loc
Achievement And Lessons Learned By An LocAchievement And Lessons Learned By An Loc
Achievement And Lessons Learned By An Loc
 
languagetranslator-211028085026.pptx
languagetranslator-211028085026.pptxlanguagetranslator-211028085026.pptx
languagetranslator-211028085026.pptx
 
MP Notes BCA
MP Notes BCAMP Notes BCA
MP Notes BCA
 
PCEP Module 1.pptx
PCEP Module 1.pptxPCEP Module 1.pptx
PCEP Module 1.pptx
 
Lecture 8
Lecture 8Lecture 8
Lecture 8
 
Waiting For The Babel Fish
 Waiting For The Babel Fish Waiting For The Babel Fish
Waiting For The Babel Fish
 
What is Coding
What is CodingWhat is Coding
What is Coding
 
Towards Universal Language Understanding (2020 version)
Towards Universal Language Understanding (2020 version)Towards Universal Language Understanding (2020 version)
Towards Universal Language Understanding (2020 version)
 
whatiscodingslideshare-180406204414.pptx
whatiscodingslideshare-180406204414.pptxwhatiscodingslideshare-180406204414.pptx
whatiscodingslideshare-180406204414.pptx
 
Language translator
Language translatorLanguage translator
Language translator
 
Experience Design 2
Experience Design 2Experience Design 2
Experience Design 2
 
Translate.org Presentation
Translate.org PresentationTranslate.org Presentation
Translate.org Presentation
 
Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language Role of Language Engineering to Preserve Endangered Language
Role of Language Engineering to Preserve Endangered Language
 
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
Embracing Diversity: Searching over Multiple Languages - Suneel Marthi, Red H...
 
Unicode & PHP6
Unicode & PHP6Unicode & PHP6
Unicode & PHP6
 

More from Gerard Meijssen

More from Gerard Meijssen (14)

Wikimedia historic perspective
Wikimedia historic perspectiveWikimedia historic perspective
Wikimedia historic perspective
 
Missing bassel wikidata as a tool
Missing bassel   wikidata as a toolMissing bassel   wikidata as a tool
Missing bassel wikidata as a tool
 
Wikidata nl conferentie 2014
Wikidata nl conferentie 2014Wikidata nl conferentie 2014
Wikidata nl conferentie 2014
 
Wikidata & dbpedia
Wikidata & dbpediaWikidata & dbpedia
Wikidata & dbpedia
 
Wikidata workshop
Wikidata workshopWikidata workshop
Wikidata workshop
 
Diversity conference 2013 berlin
Diversity conference 2013 berlinDiversity conference 2013 berlin
Diversity conference 2013 berlin
 
Glamwiki Paris - GerardM
Glamwiki Paris  - GerardMGlamwiki Paris  - GerardM
Glamwiki Paris - GerardM
 
Nen Introduction
Nen IntroductionNen Introduction
Nen Introduction
 
Media Wiki Testing Environment
Media Wiki Testing EnvironmentMedia Wiki Testing Environment
Media Wiki Testing Environment
 
Do The Wave
Do The WaveDo The Wave
Do The Wave
 
Wikidraft
WikidraftWikidraft
Wikidraft
 
Gm Wikimania 2008
Gm   Wikimania 2008Gm   Wikimania 2008
Gm Wikimania 2008
 
Wikimania 2007 Gm Wf P Omega Wiki
Wikimania 2007   Gm   Wf P Omega WikiWikimania 2007   Gm   Wf P Omega Wiki
Wikimania 2007 Gm Wf P Omega Wiki
 
Wikimania 2007 Gm Language
Wikimania 2007   Gm   LanguageWikimania 2007   Gm   Language
Wikimania 2007 Gm Language
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Supporting languages, all of them

  • 1. Supporting languages... all of them Amir Aharoni Gerard Meijssen
  • 2. Language technology •Wikimedia Foundation aims to share the sum of all knowledge oAll knowledge with everyone oAll knowledge is distributed over all languages and cultures •To do this we need to support information in over 7000 languages oInformation can be translated but we need to support sources in any language to provide provenance •All languages have their requirements
  • 3. Language technology II •To write a language you need a script •A script and its characters are in Unicode the Unicode is a standard and can support any script o •However, oseveral scripts are not yet in Unicode oeven a language like Polish or Slovenian is not supported for all its requirements in the Latin script o(old Polish or Slovenian for instance)
  • 4. Language technology III •To support languages in MediaWiki, owe need support for its scripts owe need to know how numbers and dates are written •But also owe support plural in our software owe support gender in our software owe support formality in our software
  • 5. Language technology IV •We support a language on our server but... •is it supported on the device of the reader ocan the end user actually see the characters? ocan the end user actually write the characters? We support webfonts and input methods... •for a small selection of languages •using only freely licensed Unicode fonts
  • 6. Language standards We use language tools that are standards We use language data from standards •ISO-639-3 to know the languages to support •Unicode for scripts •CLDR for information about languages •the US keyboard layout - it's just practical
  • 7. Language standards II The language standards support only a few languages •MediaWiki supports more languages than the CLDR •Not all scripts are defined in Unicode yet •Not all characters are defined in Unicode yet •Many people do not use the US keyboard
  • 8. Language standards III To "fix" the standards we need people to amend and append the information in the standards • language teams are organized at translatewiki.net • we communicate with standard organisations • we communicate about our needs as always, it is about the sum of all knowledge; this exists only in all languages
  • 9. As always, it is about the sum of all knowledge; this exists only in all languages