SlideShare uma empresa Scribd logo
1 de 18
Hybrid method
for modeless Japanese input
using N-gram based binary classification
and dictionary
Yukino Ikegami
Setsuo Tsuruta
2014/01/20
Necessity of Japanese Input Method
• Japanese has many characters
– Kana
• Hiragana
– 81 characters e.g.) いろはにほへと
• Katakana
– 81 characters e.g.) イロハニホヘト
– Kanji (Chinese-characters)
• More than 6,000 characters e.g.) 以呂波仁保反止
• We can’t input directly by a keyboard
 Japanese input method (Converting alphabet to
Japanese character) is necessary
2
If all Japanese characters are assigned
to each key…
• Toooo many keys!
• Japanese input method is necessary
Japanese Input Method
-Roman to Kana-Kanji Converter-
• Flow
1. Receive the Romanized alphabets
2. Convert the Romanized alphabets
into Kana using Roman-to-Kana table
3. Convert Kana into Kanji (if necessary)
①n e k o d e s u
②ねこです
③猫です
4
Problems
on Japanese Input Method
• Need to switch input modes between
Japanese and ASCII
e.g. To input ‘あれは8Byteです’ (That is 8Byte)
areha [Return][ASCII Mode] 8byte [Japanese Mode] desu
Switching Switching
• Switching is cumbersome!
5
Adding Term to Dictionary
for Switching Mode Problem
• Adding term of other languages to
dictionary of conventional input method
editor
• Shortcoming
– New term is created continuously
– Homograph problem
Related Work
• Modeless Pinyin-Chinese Input [Chen et al. 2000]
– Convert alphabet (Pinyin) to Chinese
– Using word-surface feature only for classification
• Type-Any [Ehara et al. 2009]
– Convert Alphabet to Any Language
– Need press Delimiter-key when converting
– Using word-surface feature only for classification
7
Approach
-Modeless Japanese Input Method-
• Automatically switching input mode
1. Generate discriminating model by Support Vector
Machine (SVM)
– the model describe multiple n-gram features
2. Distinguish a segment whether Kana or not
in alphabet sequences using the discriminating
model
– e.g.
nekohacatdesu → nekoha / cat / desu → ねこはcatです
Japanese / English / Japanese
8
Main flow of
Modeless Japanese Input Method
each character in user inputs
if character is
still ASCII?
Kana conversion
System Response
(Kana & alphabet sequence)
User input
(alphabet sequence)
True
False
Kana-conversion
Discriminative
Model
9
Non Japanese Dic.
Flow of
Generating Discriminative Model
• 猫はcatですLoad Texts
• Using Japanese Morphological Analyzer (MeCab)
• ネコハcatデス
Kanji to Kana
• Using Kana to ASCII table (used by Google Japanese input)
• nakohacatdesu
Kana to ASCII
•character-surface: ne, ek, nek, ko, eko, oh, koh, ha, oha...
•character-type: LL, LL, LLL, LL, LLL, LL, LLL...
•History: KK,KK, KKK, KK, KKK, KKK...
ASCII to n-gram
• 1, 3, 4, 13, 22...n-gram to ID
• 1:1, 3:1, 4:1, 13:1, 32:1...Describe as binary model
• 1.344, 0.691, 0,023, -1.398...Learning on SVM
10
n-gram Features
あ れ は 8 B y t e
a r e h a 8 B y t e
(in case of n-gram upper limit n = 2, window size m = 2, focus-point xi = 2nd “a”)
• Character-Surface
– Substring of backward and forward at focus point
– e.g.) -2/ha -1/a8 0/8B 1/By
• Character-Type
– Upper-case(U), Lower-case(L), Number(N), and
Symbol(S).
– e.g.) -2/LL -1/LN 0/NU 1/UL
11
Generating Non-Japanese Dictionary
• Words never appeared in Japanese only text
– More than 5 length
– Contains substring can’t convert to Kana
• Source
– Corpus of Contemporary American English (COCA)
– Japanese Wikipedia article title list
12
Compare with Conventional IME
Conventional method
areha [Return][Alphabet Mode] 8Byte [Japanese Mode] desu
Switching Switching
Typing : 17
• The number of typing key is decreased
Modeless Japanese input method
areha8Bytedesu
Typing : 14
13
Datasets
used in Evaluation Experiment
• Generating Model & Evaluating Method
– Balanced Corpus of Contemporary Written
Japanese (BCCWJ)
• book, magazine, blog, government document and
others
• Non Japanese Dictionary Source
– COCA
– Japanese Wikipedia article title list
14
Criteria
Results of Evaluation
• Outperforms baseline
Baseline
(Char. surface
n-gram)
Proposed method
(Char. {surface, type}
n-gram & Dictionary)
Kana Precision .998 .999
ASCII Precision .989 .996
Kana Recall .993 .998
ASCII Recall .780 .884
Kana F1-measure .953 .968
ASCII F1-measure .858 .924
16
User test
• Outperforms conventional method
Person No. 1 2 3 4 5 6 7 8 9
Conventional
IME
18.18 17.89 15.4 12.71 11.09 10.18 11.42 12.38 10.48
Proposed
method
13.34 14.68 9.88 12.23 6.03 7.00 11.03 11.37 10.30
17
…
• 4 females and 7 males
• Input example sentences (chat, mail, technological
text)
Summary
• Switching input mode is cumbersome
• Hybrid Modeless Japanese Input Method
– Automatically switching input mode between
Japanese and ASCII
– Using n-gram features model for discrimination
• character-{surface, type}
– Outperforms conventional methods
18

Mais conteúdo relacionado

Mais procurados

System Programming Unit III
System Programming Unit IIISystem Programming Unit III
System Programming Unit III
Manoj Patil
 

Mais procurados (19)

TCUK 2013 - Mike Unwalla - Patterns in language for POS disambiguation in a s...
TCUK 2013 - Mike Unwalla - Patterns in language for POS disambiguation in a s...TCUK 2013 - Mike Unwalla - Patterns in language for POS disambiguation in a s...
TCUK 2013 - Mike Unwalla - Patterns in language for POS disambiguation in a s...
 
7 expressions and assignment statements
7 expressions and assignment statements7 expressions and assignment statements
7 expressions and assignment statements
 
ANTLR4 and its testing
ANTLR4 and its testingANTLR4 and its testing
ANTLR4 and its testing
 
Scala
ScalaScala
Scala
 
Why Scala?
Why Scala?Why Scala?
Why Scala?
 
Why Scala for Web 2.0?
Why Scala for Web 2.0?Why Scala for Web 2.0?
Why Scala for Web 2.0?
 
How does intellisense work?
How does intellisense work?How does intellisense work?
How does intellisense work?
 
Preparing for Scala 3
Preparing for Scala 3Preparing for Scala 3
Preparing for Scala 3
 
Python Lambda Function
Python Lambda FunctionPython Lambda Function
Python Lambda Function
 
C++vs java
C++vs javaC++vs java
C++vs java
 
05 functional programming
05 functional programming05 functional programming
05 functional programming
 
20130329 introduction to linq
20130329 introduction to linq20130329 introduction to linq
20130329 introduction to linq
 
JSR 335 / java 8 - update reference
JSR 335 / java 8 - update referenceJSR 335 / java 8 - update reference
JSR 335 / java 8 - update reference
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
LIL Presentation
LIL PresentationLIL Presentation
LIL Presentation
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 
System Programming Unit III
System Programming Unit IIISystem Programming Unit III
System Programming Unit III
 
Whats new in Java 7
Whats new in Java 7Whats new in Java 7
Whats new in Java 7
 

Destaque

Clause Anaphora Resolution for Japanese Demonstrative Determiner based on Sem...
Clause Anaphora Resolution for Japanese Demonstrative Determiner based on Sem...Clause Anaphora Resolution for Japanese Demonstrative Determiner based on Sem...
Clause Anaphora Resolution for Japanese Demonstrative Determiner based on Sem...
Yukino Ikegami
 
PyAutoGUI等Pythonライブラリによる自動化支援
PyAutoGUI等Pythonライブラリによる自動化支援PyAutoGUI等Pythonライブラリによる自動化支援
PyAutoGUI等Pythonライブラリによる自動化支援
H Iseri
 
Finding Primary Sources
Finding Primary SourcesFinding Primary Sources
Finding Primary Sources
vikap8
 
Speaking,pronun,vocabulary,visual dictionary,and british & american english
Speaking,pronun,vocabulary,visual dictionary,and british & american englishSpeaking,pronun,vocabulary,visual dictionary,and british & american english
Speaking,pronun,vocabulary,visual dictionary,and british & american english
heavysiwonest
 
Nationalism in schools during the japanese occupation
Nationalism in schools during the japanese occupationNationalism in schools during the japanese occupation
Nationalism in schools during the japanese occupation
Redge R.
 
Japanese printmaking elementary lesson ppt
Japanese printmaking elementary lesson pptJapanese printmaking elementary lesson ppt
Japanese printmaking elementary lesson ppt
dandeliondandelion23
 
ProfEd113 Educational System (Japan)
ProfEd113 Educational System (Japan)ProfEd113 Educational System (Japan)
ProfEd113 Educational System (Japan)
Nalyn Ramirez
 

Destaque (20)

Clause Anaphora Resolution for Japanese Demonstrative Determiner based on Sem...
Clause Anaphora Resolution for Japanese Demonstrative Determiner based on Sem...Clause Anaphora Resolution for Japanese Demonstrative Determiner based on Sem...
Clause Anaphora Resolution for Japanese Demonstrative Determiner based on Sem...
 
Topic and Opinion Classification based Information Credibility Analysis on Tw...
Topic and Opinion Classification based Information Credibility Analysis on Tw...Topic and Opinion Classification based Information Credibility Analysis on Tw...
Topic and Opinion Classification based Information Credibility Analysis on Tw...
 
続・本当にあった怖い話 クローラ編
続・本当にあった怖い話 クローラ編続・本当にあった怖い話 クローラ編
続・本当にあった怖い話 クローラ編
 
本当にあった怖い話 「Hadoopで炎上しかけた話」
本当にあった怖い話 「Hadoopで炎上しかけた話」本当にあった怖い話 「Hadoopで炎上しかけた話」
本当にあった怖い話 「Hadoopで炎上しかけた話」
 
Pythonで機械学習を自動化 auto sklearn
Pythonで機械学習を自動化 auto sklearnPythonで機械学習を自動化 auto sklearn
Pythonで機械学習を自動化 auto sklearn
 
mecab-ipadic-NEologd の効果的な使い方
mecab-ipadic-NEologd の効果的な使い方mecab-ipadic-NEologd の効果的な使い方
mecab-ipadic-NEologd の効果的な使い方
 
PyAutoGUI等Pythonライブラリによる自動化支援
PyAutoGUI等Pythonライブラリによる自動化支援PyAutoGUI等Pythonライブラリによる自動化支援
PyAutoGUI等Pythonライブラリによる自動化支援
 
Mt framework nagao_makoto
Mt framework nagao_makotoMt framework nagao_makoto
Mt framework nagao_makoto
 
Man who made_dictionaries
Man who made_dictionariesMan who made_dictionaries
Man who made_dictionaries
 
Finding Primary Sources
Finding Primary SourcesFinding Primary Sources
Finding Primary Sources
 
Speaking,pronun,vocabulary,visual dictionary,and british & american english
Speaking,pronun,vocabulary,visual dictionary,and british & american englishSpeaking,pronun,vocabulary,visual dictionary,and british & american english
Speaking,pronun,vocabulary,visual dictionary,and british & american english
 
Nationalism in schools during the japanese occupation
Nationalism in schools during the japanese occupationNationalism in schools during the japanese occupation
Nationalism in schools during the japanese occupation
 
C++の話(本当にあった怖い話)
C++の話(本当にあった怖い話)C++の話(本当にあった怖い話)
C++の話(本当にあった怖い話)
 
Japanese elementary 1_grammar_explanation
Japanese elementary 1_grammar_explanationJapanese elementary 1_grammar_explanation
Japanese elementary 1_grammar_explanation
 
Minna no-nihongo-tu vung-50_bai
Minna no-nihongo-tu vung-50_baiMinna no-nihongo-tu vung-50_bai
Minna no-nihongo-tu vung-50_bai
 
Japanese printmaking elementary lesson ppt
Japanese printmaking elementary lesson pptJapanese printmaking elementary lesson ppt
Japanese printmaking elementary lesson ppt
 
Minna no-nihongo - Từ vựng
Minna no-nihongo - Từ vựngMinna no-nihongo - Từ vựng
Minna no-nihongo - Từ vựng
 
American and Japanese occupation
American and Japanese occupationAmerican and Japanese occupation
American and Japanese occupation
 
Education in japan
Education in japanEducation in japan
Education in japan
 
ProfEd113 Educational System (Japan)
ProfEd113 Educational System (Japan)ProfEd113 Educational System (Japan)
ProfEd113 Educational System (Japan)
 

Semelhante a Modeless Japanese Input Method

Localizing your apps for multibyte languages
Localizing your apps for multibyte languagesLocalizing your apps for multibyte languages
Localizing your apps for multibyte languages
WO Community
 

Semelhante a Modeless Japanese Input Method (20)

Localizing your apps for multibyte languages
Localizing your apps for multibyte languagesLocalizing your apps for multibyte languages
Localizing your apps for multibyte languages
 
Syntax
SyntaxSyntax
Syntax
 
Introduction to Hanjp-IM Project (DebConf18 - Hsinchu, Taiwan)
Introduction to Hanjp-IM Project (DebConf18 - Hsinchu, Taiwan)Introduction to Hanjp-IM Project (DebConf18 - Hsinchu, Taiwan)
Introduction to Hanjp-IM Project (DebConf18 - Hsinchu, Taiwan)
 
introduction to python
 introduction to python introduction to python
introduction to python
 
Chapter7-Introduction to Python.pptx
Chapter7-Introduction to Python.pptxChapter7-Introduction to Python.pptx
Chapter7-Introduction to Python.pptx
 
1909 paclic
1909 paclic1909 paclic
1909 paclic
 
9781284077247_PPTx_CH01.pptx
9781284077247_PPTx_CH01.pptx9781284077247_PPTx_CH01.pptx
9781284077247_PPTx_CH01.pptx
 
Learning Japanese Hiragana and Katakana_ Workbook and Practice Sheets ( PDFDr...
Learning Japanese Hiragana and Katakana_ Workbook and Practice Sheets ( PDFDr...Learning Japanese Hiragana and Katakana_ Workbook and Practice Sheets ( PDFDr...
Learning Japanese Hiragana and Katakana_ Workbook and Practice Sheets ( PDFDr...
 
Introduction to python
Introduction to pythonIntroduction to python
Introduction to python
 
intro.ppt
intro.pptintro.ppt
intro.ppt
 
Syntax Analyzer.pdf
Syntax Analyzer.pdfSyntax Analyzer.pdf
Syntax Analyzer.pdf
 
Data.ppt
Data.pptData.ppt
Data.ppt
 
intro.ppt
intro.pptintro.ppt
intro.ppt
 
Lisp, An Introduction.ppt
Lisp, An Introduction.pptLisp, An Introduction.ppt
Lisp, An Introduction.ppt
 
intro.ppt
intro.pptintro.ppt
intro.ppt
 
Lecture 04 syntax analysis
Lecture 04 syntax analysisLecture 04 syntax analysis
Lecture 04 syntax analysis
 
NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions
 
Lexical1
Lexical1Lexical1
Lexical1
 
Compiler lecture 05
Compiler lecture 05Compiler lecture 05
Compiler lecture 05
 
Compiler lecture 05
Compiler lecture 05Compiler lecture 05
Compiler lecture 05
 

Último

Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 

Último (20)

COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Moment Distribution Method For Btech Civil
Moment Distribution Method For Btech CivilMoment Distribution Method For Btech Civil
Moment Distribution Method For Btech Civil
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLEGEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
GEAR TRAIN- BASIC CONCEPTS AND WORKING PRINCIPLE
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 

Modeless Japanese Input Method

  • 1. Hybrid method for modeless Japanese input using N-gram based binary classification and dictionary Yukino Ikegami Setsuo Tsuruta 2014/01/20
  • 2. Necessity of Japanese Input Method • Japanese has many characters – Kana • Hiragana – 81 characters e.g.) いろはにほへと • Katakana – 81 characters e.g.) イロハニホヘト – Kanji (Chinese-characters) • More than 6,000 characters e.g.) 以呂波仁保反止 • We can’t input directly by a keyboard  Japanese input method (Converting alphabet to Japanese character) is necessary 2
  • 3. If all Japanese characters are assigned to each key… • Toooo many keys! • Japanese input method is necessary
  • 4. Japanese Input Method -Roman to Kana-Kanji Converter- • Flow 1. Receive the Romanized alphabets 2. Convert the Romanized alphabets into Kana using Roman-to-Kana table 3. Convert Kana into Kanji (if necessary) ①n e k o d e s u ②ねこです ③猫です 4
  • 5. Problems on Japanese Input Method • Need to switch input modes between Japanese and ASCII e.g. To input ‘あれは8Byteです’ (That is 8Byte) areha [Return][ASCII Mode] 8byte [Japanese Mode] desu Switching Switching • Switching is cumbersome! 5
  • 6. Adding Term to Dictionary for Switching Mode Problem • Adding term of other languages to dictionary of conventional input method editor • Shortcoming – New term is created continuously – Homograph problem
  • 7. Related Work • Modeless Pinyin-Chinese Input [Chen et al. 2000] – Convert alphabet (Pinyin) to Chinese – Using word-surface feature only for classification • Type-Any [Ehara et al. 2009] – Convert Alphabet to Any Language – Need press Delimiter-key when converting – Using word-surface feature only for classification 7
  • 8. Approach -Modeless Japanese Input Method- • Automatically switching input mode 1. Generate discriminating model by Support Vector Machine (SVM) – the model describe multiple n-gram features 2. Distinguish a segment whether Kana or not in alphabet sequences using the discriminating model – e.g. nekohacatdesu → nekoha / cat / desu → ねこはcatです Japanese / English / Japanese 8
  • 9. Main flow of Modeless Japanese Input Method each character in user inputs if character is still ASCII? Kana conversion System Response (Kana & alphabet sequence) User input (alphabet sequence) True False Kana-conversion Discriminative Model 9 Non Japanese Dic.
  • 10. Flow of Generating Discriminative Model • 猫はcatですLoad Texts • Using Japanese Morphological Analyzer (MeCab) • ネコハcatデス Kanji to Kana • Using Kana to ASCII table (used by Google Japanese input) • nakohacatdesu Kana to ASCII •character-surface: ne, ek, nek, ko, eko, oh, koh, ha, oha... •character-type: LL, LL, LLL, LL, LLL, LL, LLL... •History: KK,KK, KKK, KK, KKK, KKK... ASCII to n-gram • 1, 3, 4, 13, 22...n-gram to ID • 1:1, 3:1, 4:1, 13:1, 32:1...Describe as binary model • 1.344, 0.691, 0,023, -1.398...Learning on SVM 10
  • 11. n-gram Features あ れ は 8 B y t e a r e h a 8 B y t e (in case of n-gram upper limit n = 2, window size m = 2, focus-point xi = 2nd “a”) • Character-Surface – Substring of backward and forward at focus point – e.g.) -2/ha -1/a8 0/8B 1/By • Character-Type – Upper-case(U), Lower-case(L), Number(N), and Symbol(S). – e.g.) -2/LL -1/LN 0/NU 1/UL 11
  • 12. Generating Non-Japanese Dictionary • Words never appeared in Japanese only text – More than 5 length – Contains substring can’t convert to Kana • Source – Corpus of Contemporary American English (COCA) – Japanese Wikipedia article title list 12
  • 13. Compare with Conventional IME Conventional method areha [Return][Alphabet Mode] 8Byte [Japanese Mode] desu Switching Switching Typing : 17 • The number of typing key is decreased Modeless Japanese input method areha8Bytedesu Typing : 14 13
  • 14. Datasets used in Evaluation Experiment • Generating Model & Evaluating Method – Balanced Corpus of Contemporary Written Japanese (BCCWJ) • book, magazine, blog, government document and others • Non Japanese Dictionary Source – COCA – Japanese Wikipedia article title list 14
  • 16. Results of Evaluation • Outperforms baseline Baseline (Char. surface n-gram) Proposed method (Char. {surface, type} n-gram & Dictionary) Kana Precision .998 .999 ASCII Precision .989 .996 Kana Recall .993 .998 ASCII Recall .780 .884 Kana F1-measure .953 .968 ASCII F1-measure .858 .924 16
  • 17. User test • Outperforms conventional method Person No. 1 2 3 4 5 6 7 8 9 Conventional IME 18.18 17.89 15.4 12.71 11.09 10.18 11.42 12.38 10.48 Proposed method 13.34 14.68 9.88 12.23 6.03 7.00 11.03 11.37 10.30 17 … • 4 females and 7 males • Input example sentences (chat, mail, technological text)
  • 18. Summary • Switching input mode is cumbersome • Hybrid Modeless Japanese Input Method – Automatically switching input mode between Japanese and ASCII – Using n-gram features model for discrimination • character-{surface, type} – Outperforms conventional methods 18