SlideShare uma empresa Scribd logo
1 de 33
Twitter International
Twitter International
by Matt Sanford
Agenda:
                                          * Who am I

                                          * Some business-y talk about
                                         popularity outside of the US


What’s on Tap                             * Some quick notes on our translation
                                         process

                                          * Technical details on what’s hard
                                         about non-English text



• Hashtag for Questions: #chirpintl
• Who’s this guy?
• Twitter’s popularity outside of the US
• Twitter’s Current & Future Translation Tools
• Non-English Tweet Handling
   • Extraction and Auto-linking with Twitter Text
   • Character Counting
   • Invalid Tweet Text
Matt Sanford / @mzsanford
• Joined Twitter from Summize (Twitter Search)
• Worked on Search and Platform                   Short bio slide. Helpful
                                                  when it comes to Q&A
                                                  time.
   • Search by language, search refresh bar
   • Original OAuth implementer at Twitter
• Now tech lead of the International team
   • Working on translation tools and non-US features
   • Standardized character counting
   • Author of Open Source Twitter Text libraries
Before I cover any technical details I
                  wanted to give a little information on
                  why people using the Twitter Platform
                  should be interested in International

                  The best way to do that is with numbers …




International Business.
Why Bother With International?
International: 60% & Growing
100%

 75%

50%

25%

 0%
 June 2009              September 2009              December 2009       March 2010
         Bam. 60% of all Twitter   A big part of this is Japan, where
         accounts are non-US       we’re quite popular.

         … We crossed the 50%      Another big part was the new
         mark September of         translation efforts we launched.
         2009                      Spanish especially has been well
                                   received.
Attendees vs. Users

  Non-US
   17%
                                   US


                   International

            US
           83%



Chirp Attendees   Twitter Accounts
A good example of Twitter
                           International is Chile.

                           Translating didn’t create an
                           explosion in Twitter usage. What
                           created an explosion was a need
                           for faster information.




Case Study: Chile
We’re There When People Need Us.
Twitter Signups in Chile
     We’re There When People Need Us.




Fenruary 21st          February 24th    February 27th   March 2nd
Twitter Signups in Chile
     We’re There When People Need Us.




Fenruary 21st          February 24th    February 27th                      March 2nd


     Urgent. In en Constitución apareció IVAN LARA DE
     URGENTEConstitucion an eight-year old boy named 8
     Ivan Lara showed ABANDONADO en esa ciudad...busca
     AÑOS QUE ESTÁ up alone. He's looking for his family
     parientes en todo Chile favor copiar y pegar
                                                        10:50 AM Mar 2nd via web
As opposed to the event inflection we
                              saw in Chile, in Japan we’ve seen long
                              term, sustained growth. We’ve also
                              been dedicating resources to some local-
                              specific features.




Case Study: Japan
Not Godzilla Big, But We’re Working On It.
Daily Tweeters in Japan
      More Users Are Good. More Engaged Users Are Better.




July ‘09                October ‘09                 January ‘10   April ‘10
Japanese Mobile Follow Me
              Take Advantage of Existing Behavior
Japanese Mobile Follow Me
                               Take Advantage of Existing Behavior




     Photo: flickr.com/cogdog
Japanese Mobile Follow Me
                               Take Advantage of Existing Behavior




     Photo: flickr.com/cogdog




                                Photo: flickr.com/netwalkerz
Japanese Mobile Follow Me
                               Take Advantage of Existing Behavior




     Photo: flickr.com/cogdog




                                Photo: flickr.com/netwalkerz
Since translation is a big
                   part of what we’re
                   working on I want to cover
                   that a little bit.

                   Like all Twitter features
                   we rely on user need to
                   help define what we do. We
                   could have paid translators
                   but we felt like having
                   user’s participate in the
                   process was important.




Translation Tool
                   That led us to our current
                   crow-sourcing model …




Present & Future
2,600
Participating Translators
          And we plan more than
          double that number very
          soon when we send out
          more invites.
3,500
Strings to Translate
3,600
Strings to Translate
480,000
 Translations
     Staggering passion and
     participation from the
     community.
On context: Point out the                                    Post-slide note: We’ll be
arrow versus the list-                                       rolling out changes very
view of other sites. Also:                                   soon that focus on


                             Translation Tools
suggestions                                                  consensus over new
                                                             translations.
On deploy: unaided

                                           today
                                           • Volunteer crowd-sourcing
                                             • Augmented by in-house people
                                           • Built-in to twitter.com
                                           • Provides context during translation
                                             • Significantly higher quality
                                             • Social game dynamics
                                           • Database backed and heavily cached
                                             • Edits are launched in ~2 hours
                                           • Multiple levels of voting
                                             • Helps prevent abuse
                                             • Built-in proofing system
Translation Tools
                      tomorrow
• We’ve released some common terms on the API wiki
   • So you can benefit from our translation work
   • To help with consistency across clients
• We’re hoping to provide even more data in the future
   • More languages. More strings. More ease.
• New translation UI changes coming soon
              On releasing translations: We made
              this a goal and covered it in the
              translation agreement. Let me know
              after this talk what would help you.
Up until now we’ve covered more general Twitter topics.
                      Now we’re going to talk about some of the more
                      complicated topics. Most international issues boil down
                      to things you think are simple turning out to be
                      deceptively hard to get right. Things like:

                          * Parsing t weets (and what’s so hard about it)
                          * Counting characters (and why it’s not that simple)
                          * Tweet text that we cannot accept (today)




Engineering Topics
Yeah, It’s Complicated.
Twitter Text Libraries
                                                                       Rather than re-implement
                                                                       these common features we
                                                                       recommend using the Open
                                                                       Source libraries we help
                                                                       maintain.



          • Provides extraction and auto-linking
                                                           If you’re not using Ruby or
             • @user, @user/list, #hashtag, URLs           Java: We provide a cross-
                                                           language test suite so you
                                                           can implement the same
          • Open Source*                                   rules in another language.


          • Available in Ruby and Java from Twitter
          • Conformance Testing Data
             • Modeled after the Unicode conformance suite
             • YAML description of test cases for any language
             • Assurance that you meet the same standards
          • Many non-English test cases


* http://twitter.com/about/opensource and on github
Twitter Text: Japanese Linking
 Issues not encountered in English:
  • Additional punctuation characters
                                                            Quick tour of the issues
     • s in many languages ignores U+3000 (‘ ’)            the Twitter Text libraries
                                                            handle in Japanese that
                                                            many previous libraries
                                                            didn’t handle.
    • Full-width punctuation forms:
      • @ versus
                                          The lack of word spaces is a fundamental
      •   #   versus                      issue when it comes to parsing Tweets.



  • No spaces between words
Twitter Text: Japanese Linking
 Issues not encountered in English:
  • Additional punctuation characters
                                                            Quick tour of the issues
     • s in many languages ignores U+3000 (‘ ’)            the Twitter Text libraries
                                                            handle in Japanese that
                                                            many previous libraries
                                                            didn’t handle.
    • Full-width punctuation forms:
      • @ versus
                                          The lack of word spaces is a fundamental
      •   #   versus                      issue when it comes to parsing Tweets.



  • No spaces between words
   My homepage is http://twitter.com
                        http://twitter.com
Character counting
Unicode FTW!
Character counting
Unicode FTW!

Don’t count bytes
          UTF-8: 0xE5 0x91 0xB3 (3 bytes)

          UTF-16: 0x54 0x73 (2 bytes)

 U+5473   Human: 1 character
Character counting
Unicode FTW!

Don’t count bytes
          UTF-8: 0xE5 0x91 0xB3 (3 bytes)

          UTF-16: 0x54 0x73 (2 bytes)

 U+5473   Human: 1 character




Don’t even count Unicode code points
 e +
U+0065    U+0301
                   =é  {U+0065, U+0301}
                                            OR
                                                  é
                                                 U+00E9
Character counting
              Unicode FTW!

              Don’t count bytes
                                UTF-8: 0xE5 0x91 0xB3 (3 bytes)

                                UTF-16: 0x54 0x73 (2 bytes)

                   U+5473       Human: 1 character




              Don’t even count Unicode code points
                  e +
                 U+0065        U+0301
                                         =é    {U+0065, U+0301}
                                                                  OR
                                                                        é
                                                                       U+00E9




              We count the shortest representation*
* Unicode NFC form. See: http://unicode.org/reports/tr15/
Invalid Tweet Text
                                                                         Slide on characters that Twitter does not
                                                                         allow in a Tweet.

                                                                         We purposely disallow those that have no
                                                                         meaning in the context of a Tweet, or that
            For a variety of reasons                                     have security implications.

                                                                         We also have a technical limitation in
                                                                         MySQL that disallows certain characters.
                                                                         It’s fixed in MySQL 6 but we’ll be moving to

           Disallowed on Purpose                                         Cassandra.


           • Byte order Marks (not needed since we only accept UTF-8): U+FFFE & U+FEFF
           • Reserved Unicode Special: U+FFFF
           • Directional Change Characters (they allow complicated phishing attacks)*: U
           +202A, U+202B, U+202C, U+202D & U+202E



           Disallowed Due to Technical Limitations
           • Characters outside of the Basic Multilingual Plane (BMP)
              • That means all Unicode code points above U+FFFF
              • Some Unicode 5 Kanji, Many ancient writing systems and things like musical symbols.
           • We’re actively working on the move from MySQL to Cassandra, which will solve this.


* Unicode Security Considerations: http://www.unicode.org/reports/tr36
Questions & Answers
Here To Help.

Mais conteúdo relacionado

Semelhante a Chirp 2010: Twitter International

Nett / LunchnLearn webinar "Twitter for Business" Director's Cut
Nett / LunchnLearn webinar "Twitter for Business" Director's CutNett / LunchnLearn webinar "Twitter for Business" Director's Cut
Nett / LunchnLearn webinar "Twitter for Business" Director's CutJonathan Crossfield
 
Social Zombies II: Your Friends Need More Brains
Social Zombies II: Your Friends Need More BrainsSocial Zombies II: Your Friends Need More Brains
Social Zombies II: Your Friends Need More BrainsTom Eston
 
Social Media Overview: June 2012
Social Media Overview: June 2012Social Media Overview: June 2012
Social Media Overview: June 2012Sociabull
 
Community building lessons from Ansible
Community building lessons from AnsibleCommunity building lessons from Ansible
Community building lessons from AnsibleGreg DeKoenigsberg
 
Write for media ucsd_ext_spring12_6
Write for media ucsd_ext_spring12_6Write for media ucsd_ext_spring12_6
Write for media ucsd_ext_spring12_6dml communications
 
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...MongoDB
 
Nycon social media nyfa presentation
Nycon social media nyfa presentationNycon social media nyfa presentation
Nycon social media nyfa presentationAndrew Marietta
 
Do Users Really Generate Content? Tips and Tools for Building Engaged Online ...
Do Users Really Generate Content? Tips and Tools for Building Engaged Online ...Do Users Really Generate Content? Tips and Tools for Building Engaged Online ...
Do Users Really Generate Content? Tips and Tools for Building Engaged Online ...Laura Norvig
 
Social Media for NGOs - new and improved version!
Social Media for NGOs - new and improved version!Social Media for NGOs - new and improved version!
Social Media for NGOs - new and improved version!AfricanCommonsProject
 
Short Essay On Importance Of School Library
Short Essay On Importance Of School LibraryShort Essay On Importance Of School Library
Short Essay On Importance Of School LibraryNikki Wheeler
 
Worldware: Software internationalization and globalization conference summary...
Worldware: Software internationalization and globalization conference summary...Worldware: Software internationalization and globalization conference summary...
Worldware: Software internationalization and globalization conference summary...Lingoport (www.lingoport.com)
 
N1 how to guide: make money from Twitter
N1 how to guide: make money from TwitterN1 how to guide: make money from Twitter
N1 how to guide: make money from TwitterAndrew Grant
 
Liveblogging and mobile journalism
Liveblogging and mobile journalismLiveblogging and mobile journalism
Liveblogging and mobile journalismPaul Bradshaw
 
Velocity Conference NYC 2014 - Real World DevOps
Velocity Conference NYC 2014 - Real World DevOpsVelocity Conference NYC 2014 - Real World DevOps
Velocity Conference NYC 2014 - Real World DevOpsRodrigo Campos
 
2012 02 Gnunify - 7 lessons from mozilla
2012 02 Gnunify - 7 lessons from mozilla2012 02 Gnunify - 7 lessons from mozilla
2012 02 Gnunify - 7 lessons from mozillaGen Kanai
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 KeynotePeter Wang
 
Social Media and Crisis Management
Social Media and Crisis ManagementSocial Media and Crisis Management
Social Media and Crisis ManagementMark Gibbs
 
東日本大震災から学ぶソーシャル翻訳
東日本大震災から学ぶソーシャル翻訳 東日本大震災から学ぶソーシャル翻訳
東日本大震災から学ぶソーシャル翻訳 chrissalzberg
 

Semelhante a Chirp 2010: Twitter International (20)

Nett / LunchnLearn webinar "Twitter for Business" Director's Cut
Nett / LunchnLearn webinar "Twitter for Business" Director's CutNett / LunchnLearn webinar "Twitter for Business" Director's Cut
Nett / LunchnLearn webinar "Twitter for Business" Director's Cut
 
Social Zombies II: Your Friends Need More Brains
Social Zombies II: Your Friends Need More BrainsSocial Zombies II: Your Friends Need More Brains
Social Zombies II: Your Friends Need More Brains
 
Doonish
DoonishDoonish
Doonish
 
Doonish
DoonishDoonish
Doonish
 
Social Media Overview: June 2012
Social Media Overview: June 2012Social Media Overview: June 2012
Social Media Overview: June 2012
 
Community building lessons from Ansible
Community building lessons from AnsibleCommunity building lessons from Ansible
Community building lessons from Ansible
 
Write for media ucsd_ext_spring12_6
Write for media ucsd_ext_spring12_6Write for media ucsd_ext_spring12_6
Write for media ucsd_ext_spring12_6
 
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
How Appboy’s Marketing Automation for Apps Platform Grew 40x on the ObjectRoc...
 
Nycon social media nyfa presentation
Nycon social media nyfa presentationNycon social media nyfa presentation
Nycon social media nyfa presentation
 
Do Users Really Generate Content? Tips and Tools for Building Engaged Online ...
Do Users Really Generate Content? Tips and Tools for Building Engaged Online ...Do Users Really Generate Content? Tips and Tools for Building Engaged Online ...
Do Users Really Generate Content? Tips and Tools for Building Engaged Online ...
 
Social Media for NGOs - new and improved version!
Social Media for NGOs - new and improved version!Social Media for NGOs - new and improved version!
Social Media for NGOs - new and improved version!
 
Short Essay On Importance Of School Library
Short Essay On Importance Of School LibraryShort Essay On Importance Of School Library
Short Essay On Importance Of School Library
 
Worldware: Software internationalization and globalization conference summary...
Worldware: Software internationalization and globalization conference summary...Worldware: Software internationalization and globalization conference summary...
Worldware: Software internationalization and globalization conference summary...
 
N1 how to guide: make money from Twitter
N1 how to guide: make money from TwitterN1 how to guide: make money from Twitter
N1 how to guide: make money from Twitter
 
Liveblogging and mobile journalism
Liveblogging and mobile journalismLiveblogging and mobile journalism
Liveblogging and mobile journalism
 
Velocity Conference NYC 2014 - Real World DevOps
Velocity Conference NYC 2014 - Real World DevOpsVelocity Conference NYC 2014 - Real World DevOps
Velocity Conference NYC 2014 - Real World DevOps
 
2012 02 Gnunify - 7 lessons from mozilla
2012 02 Gnunify - 7 lessons from mozilla2012 02 Gnunify - 7 lessons from mozilla
2012 02 Gnunify - 7 lessons from mozilla
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
Social Media and Crisis Management
Social Media and Crisis ManagementSocial Media and Crisis Management
Social Media and Crisis Management
 
東日本大震災から学ぶソーシャル翻訳
東日本大震災から学ぶソーシャル翻訳 東日本大震災から学ぶソーシャル翻訳
東日本大震災から学ぶソーシャル翻訳
 

Último

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Último (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Chirp 2010: Twitter International

  • 3. Agenda: * Who am I * Some business-y talk about popularity outside of the US What’s on Tap * Some quick notes on our translation process * Technical details on what’s hard about non-English text • Hashtag for Questions: #chirpintl • Who’s this guy? • Twitter’s popularity outside of the US • Twitter’s Current & Future Translation Tools • Non-English Tweet Handling • Extraction and Auto-linking with Twitter Text • Character Counting • Invalid Tweet Text
  • 4. Matt Sanford / @mzsanford • Joined Twitter from Summize (Twitter Search) • Worked on Search and Platform Short bio slide. Helpful when it comes to Q&A time. • Search by language, search refresh bar • Original OAuth implementer at Twitter • Now tech lead of the International team • Working on translation tools and non-US features • Standardized character counting • Author of Open Source Twitter Text libraries
  • 5. Before I cover any technical details I wanted to give a little information on why people using the Twitter Platform should be interested in International The best way to do that is with numbers … International Business. Why Bother With International?
  • 6. International: 60% & Growing 100% 75% 50% 25% 0% June 2009 September 2009 December 2009 March 2010 Bam. 60% of all Twitter A big part of this is Japan, where accounts are non-US we’re quite popular. … We crossed the 50% Another big part was the new mark September of translation efforts we launched. 2009 Spanish especially has been well received.
  • 7. Attendees vs. Users Non-US 17% US International US 83% Chirp Attendees Twitter Accounts
  • 8. A good example of Twitter International is Chile. Translating didn’t create an explosion in Twitter usage. What created an explosion was a need for faster information. Case Study: Chile We’re There When People Need Us.
  • 9. Twitter Signups in Chile We’re There When People Need Us. Fenruary 21st February 24th February 27th March 2nd
  • 10. Twitter Signups in Chile We’re There When People Need Us. Fenruary 21st February 24th February 27th March 2nd Urgent. In en Constitución apareció IVAN LARA DE URGENTEConstitucion an eight-year old boy named 8 Ivan Lara showed ABANDONADO en esa ciudad...busca AÑOS QUE ESTÁ up alone. He's looking for his family parientes en todo Chile favor copiar y pegar 10:50 AM Mar 2nd via web
  • 11. As opposed to the event inflection we saw in Chile, in Japan we’ve seen long term, sustained growth. We’ve also been dedicating resources to some local- specific features. Case Study: Japan Not Godzilla Big, But We’re Working On It.
  • 12. Daily Tweeters in Japan More Users Are Good. More Engaged Users Are Better. July ‘09 October ‘09 January ‘10 April ‘10
  • 13. Japanese Mobile Follow Me Take Advantage of Existing Behavior
  • 14. Japanese Mobile Follow Me Take Advantage of Existing Behavior Photo: flickr.com/cogdog
  • 15. Japanese Mobile Follow Me Take Advantage of Existing Behavior Photo: flickr.com/cogdog Photo: flickr.com/netwalkerz
  • 16. Japanese Mobile Follow Me Take Advantage of Existing Behavior Photo: flickr.com/cogdog Photo: flickr.com/netwalkerz
  • 17. Since translation is a big part of what we’re working on I want to cover that a little bit. Like all Twitter features we rely on user need to help define what we do. We could have paid translators but we felt like having user’s participate in the process was important. Translation Tool That led us to our current crow-sourcing model … Present & Future
  • 18. 2,600 Participating Translators And we plan more than double that number very soon when we send out more invites.
  • 21. 480,000 Translations Staggering passion and participation from the community.
  • 22. On context: Point out the Post-slide note: We’ll be arrow versus the list- rolling out changes very view of other sites. Also: soon that focus on Translation Tools suggestions consensus over new translations. On deploy: unaided today • Volunteer crowd-sourcing • Augmented by in-house people • Built-in to twitter.com • Provides context during translation • Significantly higher quality • Social game dynamics • Database backed and heavily cached • Edits are launched in ~2 hours • Multiple levels of voting • Helps prevent abuse • Built-in proofing system
  • 23. Translation Tools tomorrow • We’ve released some common terms on the API wiki • So you can benefit from our translation work • To help with consistency across clients • We’re hoping to provide even more data in the future • More languages. More strings. More ease. • New translation UI changes coming soon On releasing translations: We made this a goal and covered it in the translation agreement. Let me know after this talk what would help you.
  • 24. Up until now we’ve covered more general Twitter topics. Now we’re going to talk about some of the more complicated topics. Most international issues boil down to things you think are simple turning out to be deceptively hard to get right. Things like: * Parsing t weets (and what’s so hard about it) * Counting characters (and why it’s not that simple) * Tweet text that we cannot accept (today) Engineering Topics Yeah, It’s Complicated.
  • 25. Twitter Text Libraries Rather than re-implement these common features we recommend using the Open Source libraries we help maintain. • Provides extraction and auto-linking If you’re not using Ruby or • @user, @user/list, #hashtag, URLs Java: We provide a cross- language test suite so you can implement the same • Open Source* rules in another language. • Available in Ruby and Java from Twitter • Conformance Testing Data • Modeled after the Unicode conformance suite • YAML description of test cases for any language • Assurance that you meet the same standards • Many non-English test cases * http://twitter.com/about/opensource and on github
  • 26. Twitter Text: Japanese Linking Issues not encountered in English: • Additional punctuation characters Quick tour of the issues • s in many languages ignores U+3000 (‘ ’) the Twitter Text libraries handle in Japanese that many previous libraries didn’t handle. • Full-width punctuation forms: • @ versus The lack of word spaces is a fundamental • # versus issue when it comes to parsing Tweets. • No spaces between words
  • 27. Twitter Text: Japanese Linking Issues not encountered in English: • Additional punctuation characters Quick tour of the issues • s in many languages ignores U+3000 (‘ ’) the Twitter Text libraries handle in Japanese that many previous libraries didn’t handle. • Full-width punctuation forms: • @ versus The lack of word spaces is a fundamental • # versus issue when it comes to parsing Tweets. • No spaces between words My homepage is http://twitter.com http://twitter.com
  • 29. Character counting Unicode FTW! Don’t count bytes UTF-8: 0xE5 0x91 0xB3 (3 bytes) UTF-16: 0x54 0x73 (2 bytes) U+5473 Human: 1 character
  • 30. Character counting Unicode FTW! Don’t count bytes UTF-8: 0xE5 0x91 0xB3 (3 bytes) UTF-16: 0x54 0x73 (2 bytes) U+5473 Human: 1 character Don’t even count Unicode code points e + U+0065 U+0301 =é {U+0065, U+0301} OR é U+00E9
  • 31. Character counting Unicode FTW! Don’t count bytes UTF-8: 0xE5 0x91 0xB3 (3 bytes) UTF-16: 0x54 0x73 (2 bytes) U+5473 Human: 1 character Don’t even count Unicode code points e + U+0065 U+0301 =é {U+0065, U+0301} OR é U+00E9 We count the shortest representation* * Unicode NFC form. See: http://unicode.org/reports/tr15/
  • 32. Invalid Tweet Text Slide on characters that Twitter does not allow in a Tweet. We purposely disallow those that have no meaning in the context of a Tweet, or that For a variety of reasons have security implications. We also have a technical limitation in MySQL that disallows certain characters. It’s fixed in MySQL 6 but we’ll be moving to Disallowed on Purpose Cassandra. • Byte order Marks (not needed since we only accept UTF-8): U+FFFE & U+FEFF • Reserved Unicode Special: U+FFFF • Directional Change Characters (they allow complicated phishing attacks)*: U +202A, U+202B, U+202C, U+202D & U+202E Disallowed Due to Technical Limitations • Characters outside of the Basic Multilingual Plane (BMP) • That means all Unicode code points above U+FFFF • Some Unicode 5 Kanji, Many ancient writing systems and things like musical symbols. • We’re actively working on the move from MySQL to Cassandra, which will solve this. * Unicode Security Considerations: http://www.unicode.org/reports/tr36

Notas do Editor

  1. Holding pattern.
  2. Title Slide. • “let’s get to it”
  3. • Who am I? • Some business-y talk for the entrepreneurs in the group • Notes on how we’ve gone about translation • Engineering challenges for the coders in the group
  4. • From Summmize • Search, platform (might remember API Group) • Original OAuth (sorry) • International (translation, char counting, twitter-text)
  5. • Before the technical stuff a little info on why you should be interested. • the main reason: Users
  6. • You might have seen the blog post on international growth. • Passed 50% not long after the team formed • In large part: Japan, translation
  7. • Take advantage of these markets. • Due to a slew of factors dev is mainly US (Twitter, EN, etc) but that does not mean it’s not looking outward
  8. • Case in point, Chile • Translating alone was not a big jump • But we had set the stage. When the need for faster information arose we were there
  9. • You can see the Earthquake effect clearly. Not sahown here is that signups have remained higher than pre-quake levels. • What’s great isn’t the users, but the utility [click] • This tweet for example. • It’s not what someone had for breakfast, but solving a real communication problem.
  10. • You can see the Earthquake effect clearly. Not sahown here is that signups have remained higher than pre-quake levels. • What’s great isn’t the users, but the utility [click] • This tweet for example. • It’s not what someone had for breakfast, but solving a real communication problem.
  11. • unlike the event-driven growth in Chile, Japan is a long-term stead growth • We’ve been dedicating resources and working on more local features
  12. • Rather than users I want to highlight daily unique ‘Tweeters’ (people who tweet) - We’ve been working as much on adding people as increasing the utility to those people - Done this via a new mobile site matching Japanese expectations, along with email/photoposting • The red dot here is the ‘follow me’ feature on the mobile site. It’s not the sole cause of the uptake but it’s helped. • I’d like to take a moment and explain that … [next]
  13. • We’ve done a bunch of features on the JP mobile site (Yoshi, Sean), one of those is the ‘follow me’ flow. • This is something that people can learn from: We took advantage of existing user behavior, even though it’s not a behavior in the US. We use the QR-code. • QR-codes are big in Japan [click] … like this one on a sign. Goes to the store site • People are so used to this they use it for context like [click] these real estate listings • We used this existing behavior [click] to let people share their ‘contact info’ in the form of their twitter profile. - Like ‘Bump’ on the iPhone but it works on all handsets in Japan and is immediately evident to users.
  14. • We’ve done a bunch of features on the JP mobile site (Yoshi, Sean), one of those is the ‘follow me’ flow. • This is something that people can learn from: We took advantage of existing user behavior, even though it’s not a behavior in the US. We use the QR-code. • QR-codes are big in Japan [click] … like this one on a sign. Goes to the store site • People are so used to this they use it for context like [click] these real estate listings • We used this existing behavior [click] to let people share their ‘contact info’ in the form of their twitter profile. - Like ‘Bump’ on the iPhone but it works on all handsets in Japan and is immediately evident to users.
  15. • We’ve done a bunch of features on the JP mobile site (Yoshi, Sean), one of those is the ‘follow me’ flow. • This is something that people can learn from: We took advantage of existing user behavior, even though it’s not a behavior in the US. We use the QR-code. • QR-codes are big in Japan [click] … like this one on a sign. Goes to the store site • People are so used to this they use it for context like [click] these real estate listings • We used this existing behavior [click] to let people share their ‘contact info’ in the form of their twitter profile. - Like ‘Bump’ on the iPhone but it works on all handsets in Japan and is immediately evident to users.
  16. • We’ve done a bunch of features on the JP mobile site (Yoshi, Sean), one of those is the ‘follow me’ flow. • This is something that people can learn from: We took advantage of existing user behavior, even though it’s not a behavior in the US. We use the QR-code. • QR-codes are big in Japan [click] … like this one on a sign. Goes to the store site • People are so used to this they use it for context like [click] these real estate listings • We used this existing behavior [click] to let people share their ‘contact info’ in the form of their twitter profile. - Like ‘Bump’ on the iPhone but it works on all handsets in Japan and is immediately evident to users.
  17. • We’ve done a bunch of features on the JP mobile site (Yoshi, Sean), one of those is the ‘follow me’ flow. • This is something that people can learn from: We took advantage of existing user behavior, even though it’s not a behavior in the US. We use the QR-code. • QR-codes are big in Japan [click] … like this one on a sign. Goes to the store site • People are so used to this they use it for context like [click] these real estate listings • We used this existing behavior [click] to let people share their ‘contact info’ in the form of their twitter profile. - Like ‘Bump’ on the iPhone but it works on all handsets in Japan and is immediately evident to users.
  18. • Translation is a big part of what we do, and we do it a little different • Like all features we turn to users for feedback. Could have paid, would have been cheaper, but would not have had community feedback • Crowd-source, like open source for data. We had a great group … [next]
  19. Of more than 2,600 translators. - Soon to send out more invites. Planning to make it open to anyone later this year.
  20. Twitter isn’t just 200 labels. Settings, about pages, features, etc. [click] and more features every day.
  21. Twitter isn’t just 200 labels. Settings, about pages, features, etc. [click] and more features every day.
  22. Those 2,600 translators have been so passionate it just blows me away. As of today they’ve contributed 480k translation
  23. • We augmented with a wonderful group in-house (shoutout) • Built the tool into twitter.com, provides context (see pointer) for quality, social game dynamic in jump-around prompt (see counter) • DB backed with cache, no-deploy launching. • Multi-level voting
  24. We’ve released translations of the most common terms on the wiki so you can use them. We want to provide even more help, let us know how. New translation UI upcoming (not of too much interest, other than more data)
  25. Engineering topics. Not complete but most i18n topics boil down to things that are easy 99% of the time and very hard 1% of the time. We’ll cover parsing tweets, counting characters, and invalid tweet text
  26. Twitter-text libs. - Extract, autolink - Open Source Ruby and Java. Also following community ports to Python and PHP (though PHP could use some love). Look forward to more. - Conformance data: Unicode, YAML, assurance, non-EN test cases • A good example of the 1% issues we handle in the libs are Japanese Tweets …[next]
  27. • Punctuation: s sucks in most languages. Full-width @ and # (if you want more info on this let me know afterward.) • No spaces between words. Turns out, we assume a lot [click] - http://S+ does not work.
  28. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  29. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  30. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  31. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  32. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  33. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  34. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  35. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  36. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  37. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  38. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  39. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  40. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  41. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  42. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  43. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  44. • When your product is char based, it matters. Some issues are obvious, some not. • [click] Don’t count bytes. You knew that. • [click] Don’t count code points. That’s news to many people. • We try to count what a person would call a char, where possible. So, we [click] use the shortest.
  45. • Two types of things we don’t allow. On purpose, technical limitation. • On Purpose: BOM (not utf-16), reserved, dir change (security, layout is not at home in a Tweet) • Limitations of MySQL (<v6) prevent some chars. (small set of Kanji, musical symbols, ancient scripts)