SlideShare a Scribd company logo
1 of 42
Supporting the Authoring Process
    with Linguistic Software
          Melanie Siegel
The authoring process – and
where it needs support
challenges for correctness
• time pressure
• non-native writing
• not enough capacity for careful proofreading

automatic support possibilities
• spell checking
• grammar checking
The authoring process – and
where it needs support
challenges for understandability and
readability
• authors are experts of subject and language –
  users often are not

automatic support possibilities

• style checking
The authoring process – and
where it needs support

challenges for consistence and corporate wording
• guidelines for corporate wording exist – in a large document on
  the shelf
• terminology lists exist – in an excel sheet somewhere in the file
  system
• distributed writing

automatic support possibilities
• terminology checking
• sentence clustering
The authoring process – and
where it needs support

challenges for translatability
• authors write without having the translation process in
  mind
• lexical, syntactic and semantic ambiguity
• translation costs depend on translation memory matches

automatic support possibilities
• style checking
• terminology checking
   tokenization
   POS-tagging
   morphology
   dictionary
   error dictionary
tokenization

• Close the door of our XYZ car.

  capital word   lower word   space       dot_EOS




  花子が本を読んだ。
                                                    based on rules
                                                      and lists of
花子        が      本    を       読ん      だ    。        abbreviations


     Kanji       Hiragana             dot_EOS
POS tagging
Close the door of our XYZ car.
V    DET N PREP PRON NE N



XML and attribute
value structures

                                 statistical methods
                                 large dictionaries
morphology
• Close the door of our XYZ car.


  Lemma: close
  Tense: present_imp            Lemma: car
  Person: third                 Number: singular
  Number: singular              Case: nominative_accusative




                       based on dictionaries,
                       rules for inflection
                        and derivation
dictionary
• words
  unknown to
  the standard
  NLP system




                 http://wiki.openoffice.org/wiki/Documentation/
   words are defined in a      errors are defined
    dictionary                  unknown words that
                                 are not defined as
   anything not in the
                                 errors are term
    dictionary is an error       candidates
   high recall, low            based on words and
    precision (depending         rules
    on the domain)              consider terminology
                                high precision, recall is
                                 dependent on data
                                 work

language analysis            error analysis
error dictionary

•   stylesheet  style sheet
•   begginning  beginning
•   beleive  believe
•   definately  definitely
•   gotta  have to
•   hided  hid|hidden|hides
why work on terminology?
•   avoid false alarms in spelling
•   consistency
•   less ambiguity
•   translatability
•   corporate wording

ultimate goal:
1 term - 1 meaning - 1 translation
reality: variants

•   web server – web-server
•   upload protection – upload-protection
•   timeout – time out
•   Reset – ReSet
•   sub station – sub-station
term variants
 – orthographic variants
   - hyphen, blank, case: term bank, termbank
 – semi-orthographic variants
   - number : 6-digit, six-digit
   - trademark : MyCompany™, MyCompany
 – syntactic variants
   - preposition: oil level, level of oil
   - gerund/noun : call center, calling center
 – synonyms
    “classical” : vehicle, car
 – language-specific variants
   (e.g. Fugenelemente DE, Katakana JA)
how to
get consistent terminology
• author/company defines the term bank

• list of deprecated terms
  deprecated term: vehicle
  approved term: car

• list of approved terms
    automatic identification of variants
  approved term: SWASSNet User
  deprecated term: SWASSNet user, SWASS-Net
  User
terminology and spelling
terminology and spelling
NLP for terminology
• NLP methods for term extraction
   – corpus analysis (morphology, POS, NER)
   – information extraction (potential product names)
   – ontologies (e.g. semantic groups)

• NLP methods for setting up a term database
   – morphology (finding the base form)
   – POS

• NLP methods for term checking
   – variants
   – similar words
   – inflection
approaches to grammar
checking
       descriptive grammar                      error grammar

• definition of correct grammar      • implementation of grammar
  • e.g. HPSG, LFG, chunk-grammar,     errors
    statistical grammars               • preconditions:
  • anything that‘s not analyzable       • work with error corpora
    must be a grammar error              • error grammar with a high
  • preconditions:                         number of error types
    • grammar with large coverage        • „deepness“ of analysis varies
    • large dictionaries                   with the type of error to be
    • robust, but not too robust           described
      parsing                          • high precision, recall is based on
    • efficient parsing methods          the number of rules
  • high recall, low precision
grammar rules, examples

• subject verb agreement:
 – Check if instructions are programmed in
   such a way that a scan never finish.
 – When the operations is completed, the
   return to home completes.
grammar rules, examples

• a an distinction:
  – a isolating transformer
  – an program
• wrong verb form:
  – it cannot communicates with them
  – IP can be automatically get
example grammar rule*
•   write_words_together

     – @can ::= [ TOK "^(can)$"
     –      MORPH.READING.MCAT "^Verb$" ];

     – The application can not start.
     – The application can tomorrow not start.

     – TRIGGER(80) == @can^1 [@adv]* 'not'^2
     –   -> ($can, $not)
     –   -> { mark: $can, $not;
     –       suggest: $can -> '', $not -> 'cannot';
     –     }

     – Branch circuits can not only minimize system damage but can interrupt the flow of fault
       current

     – NEG_EV(40) == $can 'not' 'only' @verbInf []* 'but';



                                                                    * implemented in Acrolinx
style - controlled language
• controlled languages
   • AeroSpace and Defence Industries Association of Europe (ASD)
     ASD-STE100 (simplified English)
   • Caterpillar Technical English (CTE)

• disadvantages:
   • very restrictive
   • low acceptance of users
style – moderately controlled
language
• rules define errors (like grammar rules)
• rules (and instructional information) are
  defined by authors
• implementation in authoring support systems
• high acceptance
• good usability
style guidelines

• different for different usages
  – text type
     • (e.g., press release – technical documentation)
  – domain
     • (e.g., software – machines)
  – readers
     • (e.g., end users – service personnel)
  – authors
     • (e.g., Germans tend to write long sentences)
style rule examples*: best practise

• avoid_latin_expressions
• avoid_modal_verbs
• avoid_passive
• avoid_split_infinitives
• avoid_subjunctive
• use_serial_comma
• use_comma_after_introductory_phrase
• spell_out_numerals

                                  *style rule implemented in Acrolinx
style rule examples: company

• use_units_consistently
• abbreviate_currency
• COMPANY_trademark
• do_not_refer_to_COMPANY_intranet
• add_tag_to_UI_string
• avoid_trademark_as_noun
• avoid_articles_in_title
style rule examples MT preediting

• avoid_nested_sentences

• avoid_ing_words

• keep_two_verb_parts_together

• avoid_parenthetical_expressions


dependent of MT system and language pair
automatic suggestions for style rules

  – replacement of words or phrases
  – replacement using the correct writing with
    uppercase or lowercase
  – replacement of words using the correct inflection
  – generation of whole sentences (e.g. passive –
    active) requires semantic analysis and generation
    and is therefore not (yet) possible
example style rule*
• avoid_future_tense

• /* Example: „.. It will be necessary .." */

• TRIGGER (80) == @will^1 [-@comma]* @verbInf^2
•             ->($will, $verbInf)
•             -> { mark : $will, $verbInf;}

• /* Example: „.. The router services will be offered in the future
  .." */

• NEG_EV(40) == $will []* @in @det @time;

                                                * implemented in Acrolinx
consistent phrasing
• Use the same phrase for the same meaning.

• Examples:
   – Congratulations on acquiring your new wearable digital
     audio player
   – Congratulations, you have acquired your new wearable
     digital audio player!
   – Dear Customer, congratulations on purchasing the new
     wearable digital audio player!
Acrolinx intelligent reuse™
                                                                                         Acrolinx server
                                                                                                            Terminology           Writing
                                                                                                                                 Standards
                                                                                              Intelligent            Grammar
                                                                                                Reuse                    &
                                                                                                                      Spelling
   Content / Translation                                                                        Reuse
        repository                                                                            Repository




                                     Clusters
micro-clustering
                     the cat sat on the mat
                     The dog sat on the rug          the cat sat on the carpet
                     The elk sat on the moss         The cat slept on the sofa
                     The moose sat on the elk
                                                                                          review and release
                                                     the cat sat on the mat
                                                     this is a sentence you can’t
                     Fish swam in the blue water     read
                     The fish swam in the green
                     water

                                                                                     redundancy and quality
                     The fish swam in the red sea.
                                                     the cat sat on the mat
                                                     Another small test snippet
                                                     the cat sat on the mat

                     the cat sat on the malt
                     The cat ate on the mat
                                                     This is the same as the other
                                                     one.
                                                                                            filters
                                                     the cat sat on the mat
                     the cat sat on the doormat



                     the cat sat on the mat.
                                                     the cat sat on the mat
                     The cat sat on the mat
                                                     More useless data points
                     the cat sat on the mat
DEMO
checking OpenOffice documentation
correctness
understandability
consistency
consistency
translatabiliy
summary

• The authoring process is challenging
  – correctness
  – consistency
  – understandability
  – translatability
• It can be effectively supported by NLP-
  enhanced tools
Thank you!


                  Melanie Siegel
Hochschule Darmstadt – University of Applied Sciences

              melanie.siegel@h-da.de

More Related Content

Viewers also liked

Supporting the software process management with model driven engineering
Supporting the software process management with model driven engineeringSupporting the software process management with model driven engineering
Supporting the software process management with model driven engineeringIván Ruiz-Rube
 
Emprende en social media
Emprende en social mediaEmprende en social media
Emprende en social mediaJosep Claret
 
Zoottle's presentation at MeetUatReload
Zoottle's  presentation at MeetUatReload Zoottle's  presentation at MeetUatReload
Zoottle's presentation at MeetUatReload Reload Greece
 
2. Implications For Content Or Messages
2. Implications For Content Or Messages2. Implications For Content Or Messages
2. Implications For Content Or Messagesmctripletwo
 
BrainBank Idealinke Open
BrainBank Idealinke OpenBrainBank Idealinke Open
BrainBank Idealinke Openbrainbankinc
 
Myowns2M.com Meetup 14-06-2010
Myowns2M.com Meetup 14-06-2010Myowns2M.com Meetup 14-06-2010
Myowns2M.com Meetup 14-06-2010Seats2meet.com
 
Evaluation question (number one)
Evaluation question (number one)Evaluation question (number one)
Evaluation question (number one)TashaWx
 
Ms. Vera Cornish & Ms. Amma Jo second e-mail from the first one than we sent.
Ms. Vera Cornish & Ms. Amma Jo second e-mail from the first one than we sent.Ms. Vera Cornish & Ms. Amma Jo second e-mail from the first one than we sent.
Ms. Vera Cornish & Ms. Amma Jo second e-mail from the first one than we sent.Yoshea Nyelle Hazzette
 
Informacion de tecnologia
Informacion de tecnologiaInformacion de tecnologia
Informacion de tecnologiaEDITH_CUAREZ
 
Aplicacionjes en word
Aplicacionjes en wordAplicacionjes en word
Aplicacionjes en wordEDITH_CUAREZ
 
Desastre de notecia
Desastre de noteciaDesastre de notecia
Desastre de noteciaEDITH_CUAREZ
 
Ростовский цирк
Ростовский циркРостовский цирк
Ростовский циркinreds
 
Communicative Language Teaching (CLT)/Task Based Learning (TBL)
Communicative Language Teaching (CLT)/Task Based Learning (TBL)Communicative Language Teaching (CLT)/Task Based Learning (TBL)
Communicative Language Teaching (CLT)/Task Based Learning (TBL)Janine Medrano
 

Viewers also liked (15)

Supporting the software process management with model driven engineering
Supporting the software process management with model driven engineeringSupporting the software process management with model driven engineering
Supporting the software process management with model driven engineering
 
Emprende en social media
Emprende en social mediaEmprende en social media
Emprende en social media
 
Zoottle's presentation at MeetUatReload
Zoottle's  presentation at MeetUatReload Zoottle's  presentation at MeetUatReload
Zoottle's presentation at MeetUatReload
 
2. Implications For Content Or Messages
2. Implications For Content Or Messages2. Implications For Content Or Messages
2. Implications For Content Or Messages
 
BrainBank Idealinke Open
BrainBank Idealinke OpenBrainBank Idealinke Open
BrainBank Idealinke Open
 
Myowns2M.com Meetup 14-06-2010
Myowns2M.com Meetup 14-06-2010Myowns2M.com Meetup 14-06-2010
Myowns2M.com Meetup 14-06-2010
 
Evaluation question (number one)
Evaluation question (number one)Evaluation question (number one)
Evaluation question (number one)
 
Ms. Vera Cornish & Ms. Amma Jo second e-mail from the first one than we sent.
Ms. Vera Cornish & Ms. Amma Jo second e-mail from the first one than we sent.Ms. Vera Cornish & Ms. Amma Jo second e-mail from the first one than we sent.
Ms. Vera Cornish & Ms. Amma Jo second e-mail from the first one than we sent.
 
Informacion de tecnologia
Informacion de tecnologiaInformacion de tecnologia
Informacion de tecnologia
 
Tic fg
Tic fgTic fg
Tic fg
 
Aplicacionjes en word
Aplicacionjes en wordAplicacionjes en word
Aplicacionjes en word
 
Documento
DocumentoDocumento
Documento
 
Desastre de notecia
Desastre de noteciaDesastre de notecia
Desastre de notecia
 
Ростовский цирк
Ростовский циркРостовский цирк
Ростовский цирк
 
Communicative Language Teaching (CLT)/Task Based Learning (TBL)
Communicative Language Teaching (CLT)/Task Based Learning (TBL)Communicative Language Teaching (CLT)/Task Based Learning (TBL)
Communicative Language Teaching (CLT)/Task Based Learning (TBL)
 

Similar to Supporting the authoring process with linguistic software

1.2 Evaluation of PLs.ppt
1.2 Evaluation of PLs.ppt1.2 Evaluation of PLs.ppt
1.2 Evaluation of PLs.pptmeenabairagi1
 
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmmUnit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmmDhruvKushwaha12
 
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...Ahmed Magdy Ezzeldin, MSc.
 
introductiontoperl-springpeople-150605065831-lva1-app6891.pptx
introductiontoperl-springpeople-150605065831-lva1-app6891.pptxintroductiontoperl-springpeople-150605065831-lva1-app6891.pptx
introductiontoperl-springpeople-150605065831-lva1-app6891.pptxmayilcebrayilov15
 
System Programming Unit III
System Programming Unit IIISystem Programming Unit III
System Programming Unit IIIManoj Patil
 
Introduction To Perl - SpringPeople
Introduction To Perl - SpringPeopleIntroduction To Perl - SpringPeople
Introduction To Perl - SpringPeopleSpringPeople
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Abdullah al Mamun
 
Code reviews
Code reviewsCode reviews
Code reviewsRoger Xia
 
Type checking in compiler design
Type checking in compiler designType checking in compiler design
Type checking in compiler designSudip Singh
 
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdfbeshahashenafe20
 
Capturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language ModelingCapturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language ModelingGuy De Pauw
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
PL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesPL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesSchwannden Kuo
 

Similar to Supporting the authoring process with linguistic software (20)

1.2 Evaluation of PLs.ppt
1.2 Evaluation of PLs.ppt1.2 Evaluation of PLs.ppt
1.2 Evaluation of PLs.ppt
 
Tips and tricks for PE
Tips and tricks for PETips and tricks for PE
Tips and tricks for PE
 
7. name binding and scopes
7. name binding and scopes7. name binding and scopes
7. name binding and scopes
 
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmmUnit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
 
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
Arabic Question Answering: Challenges, Tasks, Approaches, Test-sets, Tools, A...
 
Natural Language Processing using Java
Natural Language Processing using JavaNatural Language Processing using Java
Natural Language Processing using Java
 
Introduction
IntroductionIntroduction
Introduction
 
introductiontoperl-springpeople-150605065831-lva1-app6891.pptx
introductiontoperl-springpeople-150605065831-lva1-app6891.pptxintroductiontoperl-springpeople-150605065831-lva1-app6891.pptx
introductiontoperl-springpeople-150605065831-lva1-app6891.pptx
 
System Programming Unit III
System Programming Unit IIISystem Programming Unit III
System Programming Unit III
 
Introduction To Perl - SpringPeople
Introduction To Perl - SpringPeopleIntroduction To Perl - SpringPeople
Introduction To Perl - SpringPeople
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
Code reviews
Code reviewsCode reviews
Code reviews
 
Type checking in compiler design
Type checking in compiler designType checking in compiler design
Type checking in compiler design
 
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
02 Text Operatiohhfdhjghdfshjgkhjdfjhglkdfjhgiuyihjufidhcun.pdf
 
Build your own ASR engine
Build your own ASR engineBuild your own ASR engine
Build your own ASR engine
 
Part of speech tagging for Arabic
Part of speech tagging for ArabicPart of speech tagging for Arabic
Part of speech tagging for Arabic
 
Capturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language ModelingCapturing Word-level Dependencies in Morpheme-based Language Modeling
Capturing Word-level Dependencies in Morpheme-based Language Modeling
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
PL Lecture 01 - preliminaries
PL Lecture 01 - preliminariesPL Lecture 01 - preliminaries
PL Lecture 01 - preliminaries
 

Supporting the authoring process with linguistic software

  • 1. Supporting the Authoring Process with Linguistic Software Melanie Siegel
  • 2. The authoring process – and where it needs support challenges for correctness • time pressure • non-native writing • not enough capacity for careful proofreading automatic support possibilities • spell checking • grammar checking
  • 3. The authoring process – and where it needs support challenges for understandability and readability • authors are experts of subject and language – users often are not automatic support possibilities • style checking
  • 4. The authoring process – and where it needs support challenges for consistence and corporate wording • guidelines for corporate wording exist – in a large document on the shelf • terminology lists exist – in an excel sheet somewhere in the file system • distributed writing automatic support possibilities • terminology checking • sentence clustering
  • 5. The authoring process – and where it needs support challenges for translatability • authors write without having the translation process in mind • lexical, syntactic and semantic ambiguity • translation costs depend on translation memory matches automatic support possibilities • style checking • terminology checking
  • 6. tokenization  POS-tagging  morphology  dictionary  error dictionary
  • 7. tokenization • Close the door of our XYZ car. capital word lower word space dot_EOS 花子が本を読んだ。 based on rules and lists of 花子 が 本 を 読ん だ 。 abbreviations Kanji Hiragana dot_EOS
  • 8. POS tagging Close the door of our XYZ car. V DET N PREP PRON NE N XML and attribute value structures statistical methods large dictionaries
  • 9. morphology • Close the door of our XYZ car. Lemma: close Tense: present_imp Lemma: car Person: third Number: singular Number: singular Case: nominative_accusative based on dictionaries, rules for inflection and derivation
  • 10. dictionary • words unknown to the standard NLP system http://wiki.openoffice.org/wiki/Documentation/
  • 11. words are defined in a  errors are defined dictionary  unknown words that are not defined as  anything not in the errors are term dictionary is an error candidates  high recall, low  based on words and precision (depending rules on the domain)  consider terminology  high precision, recall is dependent on data work language analysis error analysis
  • 12. error dictionary • stylesheet  style sheet • begginning  beginning • beleive  believe • definately  definitely • gotta  have to • hided  hid|hidden|hides
  • 13. why work on terminology? • avoid false alarms in spelling • consistency • less ambiguity • translatability • corporate wording ultimate goal: 1 term - 1 meaning - 1 translation
  • 14. reality: variants • web server – web-server • upload protection – upload-protection • timeout – time out • Reset – ReSet • sub station – sub-station
  • 15. term variants – orthographic variants - hyphen, blank, case: term bank, termbank – semi-orthographic variants - number : 6-digit, six-digit - trademark : MyCompany™, MyCompany – syntactic variants - preposition: oil level, level of oil - gerund/noun : call center, calling center – synonyms “classical” : vehicle, car – language-specific variants (e.g. Fugenelemente DE, Katakana JA)
  • 16. how to get consistent terminology • author/company defines the term bank • list of deprecated terms deprecated term: vehicle approved term: car • list of approved terms  automatic identification of variants approved term: SWASSNet User deprecated term: SWASSNet user, SWASS-Net User
  • 19. NLP for terminology • NLP methods for term extraction – corpus analysis (morphology, POS, NER) – information extraction (potential product names) – ontologies (e.g. semantic groups) • NLP methods for setting up a term database – morphology (finding the base form) – POS • NLP methods for term checking – variants – similar words – inflection
  • 20. approaches to grammar checking descriptive grammar error grammar • definition of correct grammar • implementation of grammar • e.g. HPSG, LFG, chunk-grammar, errors statistical grammars • preconditions: • anything that‘s not analyzable • work with error corpora must be a grammar error • error grammar with a high • preconditions: number of error types • grammar with large coverage • „deepness“ of analysis varies • large dictionaries with the type of error to be • robust, but not too robust described parsing • high precision, recall is based on • efficient parsing methods the number of rules • high recall, low precision
  • 21. grammar rules, examples • subject verb agreement: – Check if instructions are programmed in such a way that a scan never finish. – When the operations is completed, the return to home completes.
  • 22. grammar rules, examples • a an distinction: – a isolating transformer – an program • wrong verb form: – it cannot communicates with them – IP can be automatically get
  • 23. example grammar rule* • write_words_together – @can ::= [ TOK "^(can)$" – MORPH.READING.MCAT "^Verb$" ]; – The application can not start. – The application can tomorrow not start. – TRIGGER(80) == @can^1 [@adv]* 'not'^2 – -> ($can, $not) – -> { mark: $can, $not; – suggest: $can -> '', $not -> 'cannot'; – } – Branch circuits can not only minimize system damage but can interrupt the flow of fault current – NEG_EV(40) == $can 'not' 'only' @verbInf []* 'but'; * implemented in Acrolinx
  • 24. style - controlled language • controlled languages • AeroSpace and Defence Industries Association of Europe (ASD) ASD-STE100 (simplified English) • Caterpillar Technical English (CTE) • disadvantages: • very restrictive • low acceptance of users
  • 25. style – moderately controlled language • rules define errors (like grammar rules) • rules (and instructional information) are defined by authors • implementation in authoring support systems • high acceptance • good usability
  • 26. style guidelines • different for different usages – text type • (e.g., press release – technical documentation) – domain • (e.g., software – machines) – readers • (e.g., end users – service personnel) – authors • (e.g., Germans tend to write long sentences)
  • 27. style rule examples*: best practise • avoid_latin_expressions • avoid_modal_verbs • avoid_passive • avoid_split_infinitives • avoid_subjunctive • use_serial_comma • use_comma_after_introductory_phrase • spell_out_numerals *style rule implemented in Acrolinx
  • 28. style rule examples: company • use_units_consistently • abbreviate_currency • COMPANY_trademark • do_not_refer_to_COMPANY_intranet • add_tag_to_UI_string • avoid_trademark_as_noun • avoid_articles_in_title
  • 29. style rule examples MT preediting • avoid_nested_sentences • avoid_ing_words • keep_two_verb_parts_together • avoid_parenthetical_expressions dependent of MT system and language pair
  • 30. automatic suggestions for style rules – replacement of words or phrases – replacement using the correct writing with uppercase or lowercase – replacement of words using the correct inflection – generation of whole sentences (e.g. passive – active) requires semantic analysis and generation and is therefore not (yet) possible
  • 31. example style rule* • avoid_future_tense • /* Example: „.. It will be necessary .." */ • TRIGGER (80) == @will^1 [-@comma]* @verbInf^2 • ->($will, $verbInf) • -> { mark : $will, $verbInf;} • /* Example: „.. The router services will be offered in the future .." */ • NEG_EV(40) == $will []* @in @det @time; * implemented in Acrolinx
  • 32. consistent phrasing • Use the same phrase for the same meaning. • Examples: – Congratulations on acquiring your new wearable digital audio player – Congratulations, you have acquired your new wearable digital audio player! – Dear Customer, congratulations on purchasing the new wearable digital audio player!
  • 33. Acrolinx intelligent reuse™ Acrolinx server Terminology Writing Standards Intelligent Grammar Reuse & Spelling Content / Translation Reuse repository Repository Clusters micro-clustering the cat sat on the mat The dog sat on the rug the cat sat on the carpet The elk sat on the moss The cat slept on the sofa The moose sat on the elk review and release the cat sat on the mat this is a sentence you can’t Fish swam in the blue water read The fish swam in the green water redundancy and quality The fish swam in the red sea. the cat sat on the mat Another small test snippet the cat sat on the mat the cat sat on the malt The cat ate on the mat This is the same as the other one. filters the cat sat on the mat the cat sat on the doormat the cat sat on the mat. the cat sat on the mat The cat sat on the mat More useless data points the cat sat on the mat
  • 34. DEMO
  • 41. summary • The authoring process is challenging – correctness – consistency – understandability – translatability • It can be effectively supported by NLP- enhanced tools
  • 42. Thank you! Melanie Siegel Hochschule Darmstadt – University of Applied Sciences melanie.siegel@h-da.de