SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
Estimating Dyslexia in the Web

Ricardo Baeza-Yates                      Luz Rello

Yahoo! Research &                        Web Research and
Web Research Group,                      NLP Groups
Pompeu Fabra University,                 Pompeu Fabra University,
Barcelona, Spain                         Barcelona, Spain




                   W4A 2011, Hyderabad
Outline
                                      Outline




                       — What

                       — Why
                                                    to distinguish dyslexic errors
                       — How                        to build a sample
                                                    to measure dyslexia

                       — Results



Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad          Estimating Dyslexia in the Web
What
                                      Outline

                                    Dyslexia is a neurologically-based disorder which
         Dyslexia                   interferes with the acquisition and processing of
                                    language. It manifests itself with difficulties in
                                    receptive and expressive language, including
                                    phonological processing, in reading, writing, spelling
          (The Boder’s Test         and handwriting and sometimes in arithmetic.
          of Reading-Spelling
          Patterns)                                            (Committee of Members Orton
                                                               Dyslexia Society. Definition of
                                                               Dyslexia, 1994.)

                                    The largest of the three subtypes of dyslexia that
         Dysphonetic                the author presents. Dysphonetic dyslexia is
         dyslexia                   viewed as a disability in associating symbols with
                                    sounds. The misspellings typical of this disorder
                                    are due to phonetic inaccuracy.         (Boder &
                                                                          Jarrico, 1982)

Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad           Estimating Dyslexia in the Web
Why
                                       Outline


                                    There is a universal neuro-cognitive basis for
                                    dyslexia.
                                                                   (Paulesu et al. 2001)


                                    It manifestations are culture-specific due to
        All languages               different orthographies.
                                                                            (Alegria, 2006)


                                    English is a language with deep orthography,
                                    the mapping between letters, speech sounds, and
                                    whole-word sounds is often highly ambiguous and
                                    therefore dyslexics examples are more
                                    widespread than in other languages with
                                    transparent or shallow orthography.
                                                                      (Paulesu et al. 2001)

Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad            Estimating Dyslexia in the Web
Why
                                         Outline




                               Researchers estimate that 10-17% of the population
                               in the U.S.A. has dyslexia and only 30% of dyslexics
                               have trouble with reversing letters and numbers. On
                               the other hand, the level of dyslexia in other regions
                               such as Europe or China is lower.
         Frequent
                                                                       (H. Meng et al., 2005)




                               There are around 38 million of dyslexics in Europe.

                                                                      (Ruiz del Árbol, 2008)




Ricardo Baeza-Yates and Luz Rello      W4A 2011, Hyderabad            Estimating Dyslexia in the Web
Why
                                         Outline


                             Detecting the presence of dyslexic texts in the Web helps us
                             to know the real impact of dyslexia in the Web as well as
                             to value dyslexic-accessible practices.


         Useful              There is a common agreement in these studies that the
                             application of dyslexic-accessible practices benefits also the
                             readability for non-dyslexic users as well as other users
                             with disabilities such as low vision. (McCarthy & Swierenga, 2010)
                                                                           (Evett & Brown, 2005)

                             Spelling error rates has proven to be a useful index for
                             website content quality.
                                                                      (Gelman & Barletta, 2008)




Ricardo Baeza-Yates and Luz Rello      W4A 2011, Hyderabad           Estimating Dyslexia in the Web
Why
                                         Outline



                               Estimating dyslexia in a group of web pages depending
                               on their domain.
                                                                  (Ringlstetter et al. 2006)



             Novel




                               This is the first attempt to estimate the amount of
                               texts containing English dyslexic errors in the Web.




Ricardo Baeza-Yates and Luz Rello      W4A 2011, Hyderabad            Estimating Dyslexia in the Web
How
                                          Outline

                               Two examples of dyslexic texts


      There seams to be some confusetion. Althrow
      he rembers the situartion, he is not clear on
                                    z
      detailes. With regard to deleteing parts,
      could you advice me of the excat nature of the
      promblem and I will investgate it imeaditly.



                                                        I halve a spelling chequer
                                                        It cam with my pea see
                                                        Eye now I’ve gut the spilling rite
                                                        Its plane fore al too sea ... I
                                                        ts latter prefect awl the weigh
                                                        My chequer tolled mi sew.
     (Pedler, 2007)

Ricardo Baeza-Yates and Luz Rello       W4A 2011, Hyderabad          Estimating Dyslexia in the Web
How
                                         Outline

            How many kinds of errors can be produced by a dyslexic?


                                    Simple errors             53%
                                    Multi errors              39%
                                    Word boundary errors       8%
                                                             ——
                                                             100%
              dyslexic
              errors                Real-word errors          17%
                                    Non-word errors           83%
                                                             ——
                                                             100%

                                    First letter errors       5%
                                                                    (Pedler, 2007)




Ricardo Baeza-Yates and Luz Rello      W4A 2011, Hyderabad          Estimating Dyslexia in the Web
How
                                       Outline

                         How many kinds of errors in the Web?

         1. Dyslexic errors: Among the different kinds of errors commonly made made by
         dyslexics (i.e. unfinishedwords or letters, omitted words, inconsistent spaces
         between words and letters (Vellutino, 1979). *reiecve instead of receive

         2. Regular spelling errors produced by non-impaired native English individuals,
         such as the transposition error, i.e. *recieve.

         3. Regular typos caused by the adjacency of letters in the keyboard, i.e. *teceive.

         4. OCR errors, due to letters of similar shape, such as *ieceive.

         5. Errors made by non-native speakers who use English as a foreign
         language. For example, *receibe is a typical error made by Spanish learners of
         English, since the graphemes ‘b’ and ‘v’ are pronounced as /b/, and
         the phoneme /v/ does not exist in the standard Spanish phonemic system.

Ricardo Baeza-Yates and Luz Rello    W4A 2011, Hyderabad           Estimating Dyslexia in the Web
How
                                         Outline

                                       Selection criteria

    To avoid the overlap of dyslexic errors and other errors:

                 — We consider only words written by dyslexics containing multi-
                 errors, that is, the dyslexic word differs from the intended correct
                 word by more than one letter. For example, the dyslexic word
                 *konwlegde from knowledge.


    To avoid the overlap of dyslexic errors and real words:

                 — Errors which coincide with other existing words in English are
                 omitted, i.e. *trust being the intended word truth.

                 — Errors which give as a result a proper name are also filtered, for
                 instance the typo *wirries from worries is also a proper name.


Ricardo Baeza-Yates and Luz Rello     W4A 2011, Hyderabad            Estimating Dyslexia in the Web
                                                                                         in the
How
                                       Outline

                                     Selection criteria

     — All the dyslexic spelling errors are extracted from samples of text written by adults
     with diagnosed dyslexia (extracted from a corpus compiled for this purpose) and from
     literature (Pedler, 2007).

     — Among the dyslexic errors, we take in account the ones which include the letters
     that produce more confusion among dyslexic individuals, such as ‘b’, ‘d’, ‘p’, ‘m’, ‘n’,
     ‘u’ and ‘w’ together with other similar looking letters. For instance, it is specially
     frequent to find reversals of similar letters, such as ‘b’ and ‘d’ (Deloche et al. 1982).
     i.e. *impossidle being the intended word impossible.


     — Errors due to homophone confusion, that is words which have a similar
     pronunciation (Pedler, 2007), are not selected even though 15% of the dyslexic errors
     presented homophone confusion in a corpus of dyslexic texts (witch and which).


Ricardo Baeza-Yates and Luz Rello    W4A 2011, Hyderabad           Estimating Dyslexia in the Web
How
                                        Outline

                   Sample D, an example for the word comparison


      1. Dyslexic error:            *comaprsion.

      2. Spelling errors:           *comparision, *conparison and *coparison.

      3. Typos:                     *vomparison, *xomparison, *cimparison, *cpmparison,
                                    *conparison, *co,parison, *comoarison, *com[arison,
                                    *comprison, *compsrison, *compaeison, *compatison,
                                    *comparuson, *comparoson, *compariaon,*comparidon,
                                    *comparisin, *comparispn, *comparisob and *comparisom.

      4. OCR errors:                *compaiison and *comparisom.

      5. Non-native speakers        *comparition and *comparizon.
      errors:

Ricardo Baeza-Yates and Luz Rello     W4A 2011, Hyderabad           Estimating Dyslexia in the Web
How
                                          Outline


                                    Sample D, dyslexic errors


                          comparison                          *comaprsion
                          understanding                       *understangind
                          knowledge                           *knwolegde
                          impossible                          *inpossbile
                          tomorrow                            *torromow
                          worries                             *worires
                          explain                             *exaplin
                          interesting                         *intersenting
                          situation                           *situartion
                          confusion                           *confusetion


Ricardo Baeza-Yates and Luz Rello       W4A 2011, Hyderabad              Estimating Dyslexia in the
How
                                       Outline

                              Estimating Dyslexia in the Web


           — Let us define:

                   f : fraction of Web pages with lexical errors.
                   d : fraction of dyslexic errors among all lexical errors.

           — Then, the fraction of Web pages with dyslexia is f × d.



           — We find a lower bound for f and d, to obtain a lower bound for the
           fraction of dyslexic pages in the Web.




Ricardo Baeza-Yates and Luz Rello    W4A 2011, Hyderabad         Estimating Dyslexia in the Web
How
                                       Outline

                              Estimating Dyslexia in the Web




          — We use the main search engines (Bing, Google and Yahoo!)
          to estimate the document frequency of a word.

          — Each of the words in our list is searched only in English web
          pages to avoid cases of wrong words that may have a meaning
          in other language.




Ricardo Baeza-Yates and Luz Rello    W4A 2011, Hyderabad       Estimating Dyslexia in the Web
How
                                      Outline

                              Estimating Dyslexia in the Web



       — We bound the relative fraction of documents with lexical error, f, by
       using a sample of frequent words that appear in most documents,
       usually called stopwords in information retrieval (becuase, trhough, etc.).

       — We use the largest relative fraction of misspells for all these words to
       estimate f, as we cannot assume that all of them appear in different pages.

       — To bound d we do the same frequency search with a sample of non-
       frequent words (Sample D) where we can distinguish the different types of
       errors without ambiguity.



Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad         Estimating Dyslexia in the Web
Results
                                      Outline

                              Estimating Dyslexia in the Web




                       Range of percentages and average for the
                                different error classes.

             We use the real document frequencies of the terms from one of
             the search engines to validate the results obtained, finding very
             similar results.



Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad          Estimating Dyslexia in the Web
Results
                                      Outline

                              Estimating Dyslexia in the Web


             — From the sample D, the percentage of dyslexic errors among all
             lexical errors is very low with an average of 0.67%

             — From Pedler (2007), only 39% of dyslexics errors are multi-errors

             — This implies that the lower bound is at least d/0.39, but we can
             safely use a factor of 3 to correct this fact.

             — We have that f is at least 0.27% from the word becuase.

             — Then, we can estimate d as 2.01%.

             — Lower bound for dyslexia in the Web is 0.005%.


Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad       Estimating Dyslexia in the Web
Conclusions
                                      Outline




         • The amount of dyslexic texts in the Web is not as large as it could
         be. This suggests the idea that the widespread use of spell checkers
         ameliorates dyslexia in the Web.



         • Particular words can be used to detect dyslexic texts, and hence
         dyslexic users. This can be used to improve Web accessibility as
         well as future spell checkers or other tools targeted to dyslexic users.




Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad          Estimating Dyslexia in the Web
Conclusions
                                     Outline


        • Since this is the first attempt to estimate text written by dyslexics
        individuals in the Web, a comparison with previous work is not possible.



        • Previous research on dyslexia reveals that error frequency is related
        with word length (Pedler, 2007). Short words such as there, where, form,
        etc. are misspelled much more frequently in dyslexic texts than long words
        like the ones used in our experiments. Hence, we can do a better estimation
        by using a larger sample of stopwords as well as long dyslexic words.



        • As a byproduct we have found that other types of errors are much more
        frequent in the Web and this can be used to assess the quality of Web
        text.


Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad         Estimating Dyslexia in the Web
On-going Work
                                      Outline


        New methodology.
                  Sample enlarged to 50 words.
                  Real data extracted from a leading search engine.
                  Up-down/Left-right typos.
                  New lower bound: 0.8 % (16 times better).




                        Range of percentages and average for the
                                 different error classes.


Ricardo Baeza-Yates and Luz Rello    W4A 2011, Hyderabad      Estimating Dyslexia in the Web
Future Work
                                     Outline




             1 — Identification of dyslexic errors. Dyslexia diagnosis.

             2 — NLP techniques for making text more accessible for
             dyslexic users.

             3 — Web quality estimation (Gelman & Barletta, 2008),
             across countries, domiens and social media.




Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad   Estimating Dyslexia in the Web
Outline




                             Zank u beri mach




Ricardo Baeza-Yates and Luz Rello   W4A 2011, Hyderabad   Estimating Dyslexia in the Web

Mais conteúdo relacionado

Semelhante a Ricardo Baeza-Yates, Luz Rello - Estimating Dyslexia in the Web - W4A - 2011

Dyslexia: a case study of everyday neurobiology
Dyslexia: a case study of everyday neurobiologyDyslexia: a case study of everyday neurobiology
Dyslexia: a case study of everyday neurobiology2PIR
 
Presentation of SpecialNeed
Presentation of SpecialNeedPresentation of SpecialNeed
Presentation of SpecialNeedCArol Pun
 
Neurological Basis of Dyslexia
Neurological Basis of DyslexiaNeurological Basis of Dyslexia
Neurological Basis of DyslexiaCecilia Marcano
 
Understanding Nonverbal Learning Disabilities
Understanding Nonverbal Learning DisabilitiesUnderstanding Nonverbal Learning Disabilities
Understanding Nonverbal Learning DisabilitiesBin Goldman, PsyD
 
Meeting the needs of families part 1
Meeting the needs of families part 1Meeting the needs of families part 1
Meeting the needs of families part 1elaine santos
 
Diagnosing Dyslexia in Your Classroom
Diagnosing Dyslexia in Your ClassroomDiagnosing Dyslexia in Your Classroom
Diagnosing Dyslexia in Your Classroomjoepvdw
 
Strategies employed by teachers in the management of dyslexia in primary scho...
Strategies employed by teachers in the management of dyslexia in primary scho...Strategies employed by teachers in the management of dyslexia in primary scho...
Strategies employed by teachers in the management of dyslexia in primary scho...CHIBUIKE CHINE
 
Dare2 read parent information evening
Dare2 read parent information eveningDare2 read parent information evening
Dare2 read parent information eveningRobyn Monaghan
 
Diagnosing Dyslexia in Your Classroom MEXTESOL
Diagnosing Dyslexia in Your Classroom MEXTESOLDiagnosing Dyslexia in Your Classroom MEXTESOL
Diagnosing Dyslexia in Your Classroom MEXTESOLKLSagert
 
Dyseggxia (Piruletras): A scientifically validated app to help children to ov...
Dyseggxia (Piruletras): A scientifically validated app to help children to ov...Dyseggxia (Piruletras): A scientifically validated app to help children to ov...
Dyseggxia (Piruletras): A scientifically validated app to help children to ov...Luz Rello
 
LdEduTalk - Learning To Read - Will My Child Ever Learn to Read?
LdEduTalk - Learning To Read - Will My Child Ever Learn to Read?LdEduTalk - Learning To Read - Will My Child Ever Learn to Read?
LdEduTalk - Learning To Read - Will My Child Ever Learn to Read?LdEduTalk
 

Semelhante a Ricardo Baeza-Yates, Luz Rello - Estimating Dyslexia in the Web - W4A - 2011 (20)

Dyslexia: a case study of everyday neurobiology
Dyslexia: a case study of everyday neurobiologyDyslexia: a case study of everyday neurobiology
Dyslexia: a case study of everyday neurobiology
 
Role of Speech Therapy in Overcoming Lexical Deficit in Adult Broca’s Aphasia
Role of Speech Therapy in Overcoming Lexical Deficit in Adult Broca’s AphasiaRole of Speech Therapy in Overcoming Lexical Deficit in Adult Broca’s Aphasia
Role of Speech Therapy in Overcoming Lexical Deficit in Adult Broca’s Aphasia
 
المجلد: 2 ، العدد: 3 ، مجلة الأهواز لدراسات علم اللغة
المجلد: 2 ، العدد: 3 ، مجلة الأهواز لدراسات علم اللغةالمجلد: 2 ، العدد: 3 ، مجلة الأهواز لدراسات علم اللغة
المجلد: 2 ، العدد: 3 ، مجلة الأهواز لدراسات علم اللغة
 
Vol. 2, No. 3 , Ahwaz Journal of Linguistics Studies
Vol. 2, No. 3 , Ahwaz Journal of Linguistics StudiesVol. 2, No. 3 , Ahwaz Journal of Linguistics Studies
Vol. 2, No. 3 , Ahwaz Journal of Linguistics Studies
 
Presentation of SpecialNeed
Presentation of SpecialNeedPresentation of SpecialNeed
Presentation of SpecialNeed
 
Role of Speech Therapy in Overcoming Lexical Deficit in Adult Broca’s Aphasia
Role of Speech Therapy in Overcoming Lexical Deficit in Adult Broca’s Aphasia   Role of Speech Therapy in Overcoming Lexical Deficit in Adult Broca’s Aphasia
Role of Speech Therapy in Overcoming Lexical Deficit in Adult Broca’s Aphasia
 
Neurological Basis of Dyslexia
Neurological Basis of DyslexiaNeurological Basis of Dyslexia
Neurological Basis of Dyslexia
 
Dyslexia
DyslexiaDyslexia
Dyslexia
 
Understanding Nonverbal Learning Disabilities
Understanding Nonverbal Learning DisabilitiesUnderstanding Nonverbal Learning Disabilities
Understanding Nonverbal Learning Disabilities
 
Dyslexia and Dysgraphia
Dyslexia and DysgraphiaDyslexia and Dysgraphia
Dyslexia and Dysgraphia
 
Meeting the needs of families part 1
Meeting the needs of families part 1Meeting the needs of families part 1
Meeting the needs of families part 1
 
Diagnosing Dyslexia in Your Classroom
Diagnosing Dyslexia in Your ClassroomDiagnosing Dyslexia in Your Classroom
Diagnosing Dyslexia in Your Classroom
 
Brain Research
Brain ResearchBrain Research
Brain Research
 
surface dyslexia
surface dyslexia surface dyslexia
surface dyslexia
 
Strategies employed by teachers in the management of dyslexia in primary scho...
Strategies employed by teachers in the management of dyslexia in primary scho...Strategies employed by teachers in the management of dyslexia in primary scho...
Strategies employed by teachers in the management of dyslexia in primary scho...
 
Dare2 read parent information evening
Dare2 read parent information eveningDare2 read parent information evening
Dare2 read parent information evening
 
Diagnosing Dyslexia in Your Classroom MEXTESOL
Diagnosing Dyslexia in Your Classroom MEXTESOLDiagnosing Dyslexia in Your Classroom MEXTESOL
Diagnosing Dyslexia in Your Classroom MEXTESOL
 
Dyseggxia (Piruletras): A scientifically validated app to help children to ov...
Dyseggxia (Piruletras): A scientifically validated app to help children to ov...Dyseggxia (Piruletras): A scientifically validated app to help children to ov...
Dyseggxia (Piruletras): A scientifically validated app to help children to ov...
 
Dyslexia
DyslexiaDyslexia
Dyslexia
 
LdEduTalk - Learning To Read - Will My Child Ever Learn to Read?
LdEduTalk - Learning To Read - Will My Child Ever Learn to Read?LdEduTalk - Learning To Read - Will My Child Ever Learn to Read?
LdEduTalk - Learning To Read - Will My Child Ever Learn to Read?
 

Último

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Último (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Ricardo Baeza-Yates, Luz Rello - Estimating Dyslexia in the Web - W4A - 2011

  • 1. Estimating Dyslexia in the Web Ricardo Baeza-Yates Luz Rello Yahoo! Research & Web Research and Web Research Group, NLP Groups Pompeu Fabra University, Pompeu Fabra University, Barcelona, Spain Barcelona, Spain W4A 2011, Hyderabad
  • 2. Outline Outline — What — Why to distinguish dyslexic errors — How to build a sample to measure dyslexia — Results Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 3. What Outline Dyslexia is a neurologically-based disorder which Dyslexia interferes with the acquisition and processing of language. It manifests itself with difficulties in receptive and expressive language, including phonological processing, in reading, writing, spelling (The Boder’s Test and handwriting and sometimes in arithmetic. of Reading-Spelling Patterns) (Committee of Members Orton Dyslexia Society. Definition of Dyslexia, 1994.) The largest of the three subtypes of dyslexia that Dysphonetic the author presents. Dysphonetic dyslexia is dyslexia viewed as a disability in associating symbols with sounds. The misspellings typical of this disorder are due to phonetic inaccuracy. (Boder & Jarrico, 1982) Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 4. Why Outline There is a universal neuro-cognitive basis for dyslexia. (Paulesu et al. 2001) It manifestations are culture-specific due to All languages different orthographies. (Alegria, 2006) English is a language with deep orthography, the mapping between letters, speech sounds, and whole-word sounds is often highly ambiguous and therefore dyslexics examples are more widespread than in other languages with transparent or shallow orthography. (Paulesu et al. 2001) Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 5. Why Outline Researchers estimate that 10-17% of the population in the U.S.A. has dyslexia and only 30% of dyslexics have trouble with reversing letters and numbers. On the other hand, the level of dyslexia in other regions such as Europe or China is lower. Frequent (H. Meng et al., 2005) There are around 38 million of dyslexics in Europe. (Ruiz del Árbol, 2008) Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 6. Why Outline Detecting the presence of dyslexic texts in the Web helps us to know the real impact of dyslexia in the Web as well as to value dyslexic-accessible practices. Useful There is a common agreement in these studies that the application of dyslexic-accessible practices benefits also the readability for non-dyslexic users as well as other users with disabilities such as low vision. (McCarthy & Swierenga, 2010) (Evett & Brown, 2005) Spelling error rates has proven to be a useful index for website content quality. (Gelman & Barletta, 2008) Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 7. Why Outline Estimating dyslexia in a group of web pages depending on their domain. (Ringlstetter et al. 2006) Novel This is the first attempt to estimate the amount of texts containing English dyslexic errors in the Web. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 8. How Outline Two examples of dyslexic texts There seams to be some confusetion. Althrow he rembers the situartion, he is not clear on z detailes. With regard to deleteing parts, could you advice me of the excat nature of the promblem and I will investgate it imeaditly. I halve a spelling chequer It cam with my pea see Eye now I’ve gut the spilling rite Its plane fore al too sea ... I ts latter prefect awl the weigh My chequer tolled mi sew. (Pedler, 2007) Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 9. How Outline How many kinds of errors can be produced by a dyslexic? Simple errors 53% Multi errors 39% Word boundary errors 8% —— 100% dyslexic errors Real-word errors 17% Non-word errors 83% —— 100% First letter errors 5% (Pedler, 2007) Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 10. How Outline How many kinds of errors in the Web? 1. Dyslexic errors: Among the different kinds of errors commonly made made by dyslexics (i.e. unfinishedwords or letters, omitted words, inconsistent spaces between words and letters (Vellutino, 1979). *reiecve instead of receive 2. Regular spelling errors produced by non-impaired native English individuals, such as the transposition error, i.e. *recieve. 3. Regular typos caused by the adjacency of letters in the keyboard, i.e. *teceive. 4. OCR errors, due to letters of similar shape, such as *ieceive. 5. Errors made by non-native speakers who use English as a foreign language. For example, *receibe is a typical error made by Spanish learners of English, since the graphemes ‘b’ and ‘v’ are pronounced as /b/, and the phoneme /v/ does not exist in the standard Spanish phonemic system. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 11. How Outline Selection criteria To avoid the overlap of dyslexic errors and other errors: — We consider only words written by dyslexics containing multi- errors, that is, the dyslexic word differs from the intended correct word by more than one letter. For example, the dyslexic word *konwlegde from knowledge. To avoid the overlap of dyslexic errors and real words: — Errors which coincide with other existing words in English are omitted, i.e. *trust being the intended word truth. — Errors which give as a result a proper name are also filtered, for instance the typo *wirries from worries is also a proper name. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web in the
  • 12. How Outline Selection criteria — All the dyslexic spelling errors are extracted from samples of text written by adults with diagnosed dyslexia (extracted from a corpus compiled for this purpose) and from literature (Pedler, 2007). — Among the dyslexic errors, we take in account the ones which include the letters that produce more confusion among dyslexic individuals, such as ‘b’, ‘d’, ‘p’, ‘m’, ‘n’, ‘u’ and ‘w’ together with other similar looking letters. For instance, it is specially frequent to find reversals of similar letters, such as ‘b’ and ‘d’ (Deloche et al. 1982). i.e. *impossidle being the intended word impossible. — Errors due to homophone confusion, that is words which have a similar pronunciation (Pedler, 2007), are not selected even though 15% of the dyslexic errors presented homophone confusion in a corpus of dyslexic texts (witch and which). Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 13. How Outline Sample D, an example for the word comparison 1. Dyslexic error: *comaprsion. 2. Spelling errors: *comparision, *conparison and *coparison. 3. Typos: *vomparison, *xomparison, *cimparison, *cpmparison, *conparison, *co,parison, *comoarison, *com[arison, *comprison, *compsrison, *compaeison, *compatison, *comparuson, *comparoson, *compariaon,*comparidon, *comparisin, *comparispn, *comparisob and *comparisom. 4. OCR errors: *compaiison and *comparisom. 5. Non-native speakers *comparition and *comparizon. errors: Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 14. How Outline Sample D, dyslexic errors comparison *comaprsion understanding *understangind knowledge *knwolegde impossible *inpossbile tomorrow *torromow worries *worires explain *exaplin interesting *intersenting situation *situartion confusion *confusetion Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the
  • 15. How Outline Estimating Dyslexia in the Web — Let us define: f : fraction of Web pages with lexical errors. d : fraction of dyslexic errors among all lexical errors. — Then, the fraction of Web pages with dyslexia is f × d. — We find a lower bound for f and d, to obtain a lower bound for the fraction of dyslexic pages in the Web. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 16. How Outline Estimating Dyslexia in the Web — We use the main search engines (Bing, Google and Yahoo!) to estimate the document frequency of a word. — Each of the words in our list is searched only in English web pages to avoid cases of wrong words that may have a meaning in other language. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 17. How Outline Estimating Dyslexia in the Web — We bound the relative fraction of documents with lexical error, f, by using a sample of frequent words that appear in most documents, usually called stopwords in information retrieval (becuase, trhough, etc.). — We use the largest relative fraction of misspells for all these words to estimate f, as we cannot assume that all of them appear in different pages. — To bound d we do the same frequency search with a sample of non- frequent words (Sample D) where we can distinguish the different types of errors without ambiguity. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 18. Results Outline Estimating Dyslexia in the Web Range of percentages and average for the different error classes. We use the real document frequencies of the terms from one of the search engines to validate the results obtained, finding very similar results. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 19. Results Outline Estimating Dyslexia in the Web — From the sample D, the percentage of dyslexic errors among all lexical errors is very low with an average of 0.67% — From Pedler (2007), only 39% of dyslexics errors are multi-errors — This implies that the lower bound is at least d/0.39, but we can safely use a factor of 3 to correct this fact. — We have that f is at least 0.27% from the word becuase. — Then, we can estimate d as 2.01%. — Lower bound for dyslexia in the Web is 0.005%. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 20. Conclusions Outline • The amount of dyslexic texts in the Web is not as large as it could be. This suggests the idea that the widespread use of spell checkers ameliorates dyslexia in the Web. • Particular words can be used to detect dyslexic texts, and hence dyslexic users. This can be used to improve Web accessibility as well as future spell checkers or other tools targeted to dyslexic users. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 21. Conclusions Outline • Since this is the first attempt to estimate text written by dyslexics individuals in the Web, a comparison with previous work is not possible. • Previous research on dyslexia reveals that error frequency is related with word length (Pedler, 2007). Short words such as there, where, form, etc. are misspelled much more frequently in dyslexic texts than long words like the ones used in our experiments. Hence, we can do a better estimation by using a larger sample of stopwords as well as long dyslexic words. • As a byproduct we have found that other types of errors are much more frequent in the Web and this can be used to assess the quality of Web text. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 22. On-going Work Outline New methodology. Sample enlarged to 50 words. Real data extracted from a leading search engine. Up-down/Left-right typos. New lower bound: 0.8 % (16 times better). Range of percentages and average for the different error classes. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 23. Future Work Outline 1 — Identification of dyslexic errors. Dyslexia diagnosis. 2 — NLP techniques for making text more accessible for dyslexic users. 3 — Web quality estimation (Gelman & Barletta, 2008), across countries, domiens and social media. Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web
  • 24. Outline Zank u beri mach Ricardo Baeza-Yates and Luz Rello W4A 2011, Hyderabad Estimating Dyslexia in the Web