80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
Building a phonics engine for automated text guidance
1. Building a Phonics Engine for
Automated Text Guidance
Dominik Lukeš
Dyslexia Action
Chris Litsas
NTUA
www.ilearnrw.eu
2. Outline
• Struggling readers needs
• Linguistic background
• Phonics engine need
• Phonics engine specification
• Phonics engine implementation
• Phonics engine applications
• Next steps
3. Needs of dyslexic people
• Identifying the syllables in a word
• Recognising the structure of words (stem,
prefix, suffix)
• Highlighting typical or repeated patterns of
English orthography
• Identifying phoneme/grapheme
correspondence
• Learning the pronunciation of a word
• Learning the meaning of a word
4. Linguistic background
Dearest creature in creation
Studying English pronunciation,
I will teach you in my verse
Sounds like corpse, corps, horse and worse.
Though the difference seems little,
We say actual, but victual,
Seat, sweat, chaste, caste, Leigh, eight, height,
Put, nut, granite, and unite.
Gerard Nolst Trenité - The Chaos (1922)
5. Linguistic background
• tough, though, through, bough, thought,
cough, hiccough
• hosp.i.tal vs. hos.pit.al
• kitt.en vs. kit.ten
• walked, stopped, faked, tried
• exgirlfriend vs. exigent vs. exit
• English vs. Greek
6. Phonics engine need
• Finding all examples of ‘a’ spelled to rhyme
with ‘hay’ in a text or a corpus.
• Sorting words by their phoneme/grapheme
ratio.
• Identifying appropriate syllable boundaries in
the written form of a multi-syllable word
based on knowledge of the syllable
boundaries in pronunciation
7. Phonics engine specification
• provide automated guidance to students and
teachers reading texts (using highlighting as
well as explicit information)
• generate more extensive word lists for
practice activities within the serious games
• provide information about word structure to
the game engine
11. Phonic dictionary
Word form: feelings
Related stem: feeling
Pronunciation: ˈfiː.lɪŋz
Phoneme/Grapheme Mapping: f-f,ee-iː,l-l,i-ɪ,ng-ŋ,s-z
Orthographic syllabification: fee.lings
Number of letters: 8
Number of phonemes: 6
Number of syllables: 2
Frequency band: 4
Suffix type: SUFFIX_ADD
Suffix form: s
Prefix type: PREFIX_NONE
Prefix form: NULL
12. Building the phonic dictionary
• 5,000 most frequent words based on COCA
• Generated derived forms by reversing
hunspell
• Used online tool to generate pronunciation
• Create rules for matching pronunciation with
spelling patterns
• Create rules for displaying
• Mark suffixes and prefixes and types
• Adjust frequencies
• Manual fine tuning (lots of regex)
13. Phonics engine applications
• Phonics aware reader
• Game support – generating word lists
• Game support – provide word structure
• Game support – link word structures to profile
• Text classification tool
• Online text annotation tool
21. Next steps
• Bigger dictionary with more information on
words
• Finetuning of look up routines
• More sophisticated highlighting routines
• More sophisticated NLP
– PoS
– Sentence structure
– Semantics
• WordNet, Framenet
• Named Entities
• Collocations