Kotonoha: An Example Sentence Based Spaced Repetition System
1. Kotonoha
An Example Sentence Based
Spaced Repetition System
Arseny Tolmachev, Sadao Kurohashi
Kyoto University
Graduate School of Informatics
D1
2017-03-15
2. Background: Learning words
with flashcards
• Lexical knowledge is crucial for language learning
• Mostly self-learning
• E.g. Japanese Language Proficiency Test level N1
requires knowing about 10,000 words
2
Flashcards: a method
of organizing
information, which
can be formulated in
question-answer
form, for learning
Question Answer
4. Spaced Repetition: Software
• One of first implementations:
https://www.supermemo.com
• The most popular SRS: Anki
http://ankisrs.net/
• And much more
4
5. Japanese (Word) Learning Tools
• Most of them are for elementary/beginner learners
• Hiragana/katakana
• Fixed word lists/lessons
• Tools for advanced learners are scarce
• Anki
• Several more
5
6. Motivation: Context
• We use words in context: with other words
• Contextual word usage differ from language to
language
Life example:
• バスは乗客を拾った
Non-canonical usage in Japanese, OK in Russian
• バスは乗客を乗せた
Canonical usage in Japanese
6
7. Flashcard problems
• Creating flashcards from scratch is time-consuming
• Need to fill all information
• Possibly find example sentences somewhere
• Premade decks do not work as well as manually
created
• Matter of UI and system implementation
• Lack of context
• Especially in questions
• Card content is usually fixed
(e.g. only one context)
7
8. Kotonoha SRS
• Web (responsive)
• +mobile apps (in plans)
• Flashcards
• Spaced Repetition
• Intermediate+
Features
• Example sentences
• In question cards
• Batch operation
• Japanese-oriented 8
https://kotonoha.ws
9. Kotonoha: Usage Pattern
• Find new words
• Reading books, classes, assignments
• Add words into the system
• Kotonoha makes it easy to add new words
• Repeat flashcards
• E.g 100 cards every day
• Learn word usage too:
Kotonoha shows a new example each repetition
• Have a rich vocabulary (in a long term)
9
10. Kotonoha: Adding words
10
Batch operation
Words are added in
lists
Kotonoha fills reading
and glosses from
dictionary (JMDict,
Warodai)
Kotonoha assigns
example sentences
Easy to learn
words you want
Word was
already
added
Word was not
added before
Report that
you forgot
the word
11. Kotonoha: Adding words (2)
11
Check what gets into flashcards
Recommendations: words using
same characters
14. Example sentences
• Automatically extracted from web corpus
• Tatoeba corpus is small and not very diverse
• Consider a set of sentences for a target word
• Three aspects: Value, Diversity, Coverage
• Intrinsic Value (for a single sentence)
• Not a garbage sentence like a fragment of something
• Representative usage of target
• Understandable by a learner
• Grammatical
• Diversity (for a sentence set)
• Different usages of target, distinct words
• Coverage: acquire usages of rare words and rare senses
14
15. Example sentence extraction overview
15
私は走るのすき
走っている子供を見た
…
遊びに走る若者
酒に走りたい気持ち
…
悪事千里を走る
Query
走る
Search
Engine
High-quality
sentences
(~10-15)
Preprocessing
Raw Corpus Analyze and index
Search
Selection
Solving coverage
problem Dealing with value
and diversity
• Distributed
• Handles huge corpora
• Uses lexical dependency information
• Prefers sentences with rich syntactic
structure near target
Example Candidates
(~10k sentences)
Details are out of scope of this presentation
20. Example sentence evaluation
This is idea, no
results yet
Show different
sentences to learners
of similar level
Assumption:
Good example
sentences help to
remember words.
Assumption 2:
We can use
confidence to judge
sentence educational
quality
21. Collecting NLP training data
21
Kotonoha can be
useful source of NLP
training data for:
Reading estimation
Word sense
disambiguation
Learners are interested
to get this information
right
Presently only reporting is
implemented
22. Implementation problems
• Three segmentation standards in one package
• Flashcards are mostly JMDict-based
• And words over there are rather inconsistent
• On the other hand, it is not a segmentation dictionary
• Example sentence extraction uses JUMAN/KNP pipeline
• Reading estimation is done using KyTea/UniDic
• And resources for reading-annotated Japanese are extremely
sparse :(
• Because of this
• Some example sentence coverage problems
• Reading estimation errors
22
23. Kotonoha: Present
• Available: https://kotonoha.ws
• Open source (core SRS)
• https://github.com/kotonoha/server
• (Very low volume) open beta test
• Will try to increase user base in following months
• Potential users (Japanese Learners) are very welcome!
• Side note:
• https://github.com/kotonoha/akane
• JUMAN/KNP/KyTea + other Scala library
23