SlideShare a Scribd company logo
1 of 13
1. Automatic input editing
2. Automatic segmentation
3. Syntactical analysis
4. Transformation with output editing
   Japanese Characteristics
    › No spaces
    › Kanas and Kanjis
   Thus, requires
    › Automatically cutting into components
   However, to prevent too much sized dictionary
    › Regulations can be set
       Kana texts in which no kanjis are used
       Kana-kanji texts in which kanjis are used wherever
        possible according to the official directives about the
        use of kana and kanjis.
    › This is “pre-editing”
   Each kana will be Romanized
    › To preserve
       one-to-one correspondence between kanas and
        their correspondent Roman letters
    › Better analyzed with Roman letters than kanas
       Fewer varieties of suffixes
       Fewer rules of permissible combinations with
        canonical stems
       Fewer possibilities of homographic verbal stems
   Kanji will be replaced with irreducible unit
    token
    › No kanji will contain more than one
      “morpheme”
   Segmentation of a continuous run of
    tokens
    › Based on following prospects:
       Auxiliary items will be shorter in length and
        fewer in number
       No problem will be caused by:
         assuming every “phrase” in a sentence begins with a
          dictionary item
         including “prefixes” in the category of dictionary items
   Predictive analysis:
    › Originally by Rhodes
   Peculiarity seen in Japanese :
    › More convenient to start from end of sentence:
       Words having a final position in a sentence are
        limited
       Particles which show case, prepositional or
        conjunctional relationships always follow words,
        phrases or clauses to which they are attached
       Attributive words, phrases and clauses always
        stand before DT substantives which they modify
   Each word in a sentence will be assigned
    › An essence which has been fulfilled by it
    › A linkage number which shows by which word it
      has been predicted
    › A group number which shows to which clause in
      the sentence it belongs
   Another peculiarity about Japanese:
    › The subject of a sentence is very often omitted
   Hence, in this analysis:
    › Subject market and relative subject marker
      predictions is essential
   例)ネズミがネコを殺した話は私を驚かせた.
   This stage deals with the synthesis of the TL
   Brief explanation:
    › Words with same group num. are gathered
    › Transformation of word order is performed
   In concrete:
    › Subject marker, object marker & relative subject
      marker are omitted
    › Subject master or relative subject master comes
      first within each group
    › followed by predicate head or relative
      predicate head
    › and then by object master
   Readings in Machine Translation
    › Edited by Sergei Nirenburg, Harold Somers,
      and Yorick Wilks
    › The MIT Press

More Related Content

What's hot

Advanced Search & Boolean Connectors
Advanced Search & Boolean ConnectorsAdvanced Search & Boolean Connectors
Advanced Search & Boolean ConnectorsCristy Bolton
 
Modern Day "Witch-Hunt" 2012
Modern Day "Witch-Hunt" 2012Modern Day "Witch-Hunt" 2012
Modern Day "Witch-Hunt" 2012Cristy Bolton
 
Passive and active voice
Passive and active voicePassive and active voice
Passive and active voicepatriciasp1995
 

What's hot (6)

Advanced Search & Boolean Connectors
Advanced Search & Boolean ConnectorsAdvanced Search & Boolean Connectors
Advanced Search & Boolean Connectors
 
Modern Day "Witch-Hunt" 2012
Modern Day "Witch-Hunt" 2012Modern Day "Witch-Hunt" 2012
Modern Day "Witch-Hunt" 2012
 
Passive voice
Passive voicePassive voice
Passive voice
 
Passive and active voice
Passive and active voicePassive and active voice
Passive and active voice
 
format.rtf.rtf
format.rtf.rtfformat.rtf.rtf
format.rtf.rtf
 
Semi Colon
Semi Colon Semi Colon
Semi Colon
 

More from Hiroshi Matsumoto

Phrase linguistic classification and generalization for improving statistical...
Phrase linguistic classification and generalization for improving statistical...Phrase linguistic classification and generalization for improving statistical...
Phrase linguistic classification and generalization for improving statistical...Hiroshi Matsumoto
 
Paraphrasing Swedish Compound Nouns in Machine Translation
Paraphrasing Swedish Compound Nouns in Machine TranslationParaphrasing Swedish Compound Nouns in Machine Translation
Paraphrasing Swedish Compound Nouns in Machine TranslationHiroshi Matsumoto
 
Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...
Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...
Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...Hiroshi Matsumoto
 
Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...
Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...
Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...Hiroshi Matsumoto
 
Improving translation via targeted paraphrasing
Improving translation via targeted paraphrasingImproving translation via targeted paraphrasing
Improving translation via targeted paraphrasingHiroshi Matsumoto
 
Summary: A Sense-Based Translation Model for Statistical Machine Translation
Summary: A Sense-Based Translation Model for Statistical Machine TranslationSummary: A Sense-Based Translation Model for Statistical Machine Translation
Summary: A Sense-Based Translation Model for Statistical Machine TranslationHiroshi Matsumoto
 
Summary of Rule-based Reordering Space in Statistical Machine Translation
Summary of Rule-based Reordering Space in Statistical Machine TranslationSummary of Rule-based Reordering Space in Statistical Machine Translation
Summary of Rule-based Reordering Space in Statistical Machine TranslationHiroshi Matsumoto
 
Predicting Power Relations between Participants in Written Dialog from a Sing...
Predicting Power Relations between Participants in Written Dialog from a Sing...Predicting Power Relations between Participants in Written Dialog from a Sing...
Predicting Power Relations between Participants in Written Dialog from a Sing...Hiroshi Matsumoto
 
10.combination of sm_tn_rbmt
10.combination of sm_tn_rbmt10.combination of sm_tn_rbmt
10.combination of sm_tn_rbmtHiroshi Matsumoto
 
9. cgc parser with_norml_std
9. cgc parser with_norml_std9. cgc parser with_norml_std
9. cgc parser with_norml_stdHiroshi Matsumoto
 
Summary of English Japanese Translation by MSR-MT
Summary of English Japanese Translation by MSR-MTSummary of English Japanese Translation by MSR-MT
Summary of English Japanese Translation by MSR-MTHiroshi Matsumoto
 
A statistical approach to machine translation
A statistical approach to machine translationA statistical approach to machine translation
A statistical approach to machine translationHiroshi Matsumoto
 

More from Hiroshi Matsumoto (19)

Phrase linguistic classification and generalization for improving statistical...
Phrase linguistic classification and generalization for improving statistical...Phrase linguistic classification and generalization for improving statistical...
Phrase linguistic classification and generalization for improving statistical...
 
Paraphrasing Swedish Compound Nouns in Machine Translation
Paraphrasing Swedish Compound Nouns in Machine TranslationParaphrasing Swedish Compound Nouns in Machine Translation
Paraphrasing Swedish Compound Nouns in Machine Translation
 
Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...
Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...
Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Mac...
 
Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...
Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...
Summary of Dialectal to standard Arabic paraphrasing to improve Arabic-Englis...
 
Improving translation via targeted paraphrasing
Improving translation via targeted paraphrasingImproving translation via targeted paraphrasing
Improving translation via targeted paraphrasing
 
Summary: A Sense-Based Translation Model for Statistical Machine Translation
Summary: A Sense-Based Translation Model for Statistical Machine TranslationSummary: A Sense-Based Translation Model for Statistical Machine Translation
Summary: A Sense-Based Translation Model for Statistical Machine Translation
 
Summary of Rule-based Reordering Space in Statistical Machine Translation
Summary of Rule-based Reordering Space in Statistical Machine TranslationSummary of Rule-based Reordering Space in Statistical Machine Translation
Summary of Rule-based Reordering Space in Statistical Machine Translation
 
Predicting Power Relations between Participants in Written Dialog from a Sing...
Predicting Power Relations between Participants in Written Dialog from a Sing...Predicting Power Relations between Participants in Written Dialog from a Sing...
Predicting Power Relations between Participants in Written Dialog from a Sing...
 
Modeling Irony in Twitter
Modeling Irony in TwitterModeling Irony in Twitter
Modeling Irony in Twitter
 
Factored translationmodel
Factored translationmodelFactored translationmodel
Factored translationmodel
 
10.combination of sm_tn_rbmt
10.combination of sm_tn_rbmt10.combination of sm_tn_rbmt
10.combination of sm_tn_rbmt
 
9. cgc parser with_norml_std
9. cgc parser with_norml_std9. cgc parser with_norml_std
9. cgc parser with_norml_std
 
8. relearnt rbmt
8. relearnt rbmt8. relearnt rbmt
8. relearnt rbmt
 
7. ebmt based on st sm
7. ebmt based on st sm7. ebmt based on st sm
7. ebmt based on st sm
 
Summary of English Japanese Translation by MSR-MT
Summary of English Japanese Translation by MSR-MTSummary of English Japanese Translation by MSR-MT
Summary of English Japanese Translation by MSR-MT
 
5. bleu
5. bleu5. bleu
5. bleu
 
A statistical approach to machine translation
A statistical approach to machine translationA statistical approach to machine translation
A statistical approach to machine translation
 
Mt framework nagao_makoto
Mt framework nagao_makotoMt framework nagao_makoto
Mt framework nagao_makoto
 
Machine translation
Machine translationMachine translation
Machine translation
 

Recently uploaded

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 

Recently uploaded (20)

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 

Approach to japanese english automatic translation by Susumu Kuno

  • 1.
  • 2. 1. Automatic input editing 2. Automatic segmentation 3. Syntactical analysis 4. Transformation with output editing
  • 3. Japanese Characteristics › No spaces › Kanas and Kanjis  Thus, requires › Automatically cutting into components  However, to prevent too much sized dictionary › Regulations can be set  Kana texts in which no kanjis are used  Kana-kanji texts in which kanjis are used wherever possible according to the official directives about the use of kana and kanjis. › This is “pre-editing”
  • 4. Each kana will be Romanized › To preserve  one-to-one correspondence between kanas and their correspondent Roman letters › Better analyzed with Roman letters than kanas  Fewer varieties of suffixes  Fewer rules of permissible combinations with canonical stems  Fewer possibilities of homographic verbal stems  Kanji will be replaced with irreducible unit token › No kanji will contain more than one “morpheme”
  • 5. Segmentation of a continuous run of tokens › Based on following prospects:  Auxiliary items will be shorter in length and fewer in number  No problem will be caused by:  assuming every “phrase” in a sentence begins with a dictionary item  including “prefixes” in the category of dictionary items
  • 6.
  • 7. Predictive analysis: › Originally by Rhodes  Peculiarity seen in Japanese : › More convenient to start from end of sentence:  Words having a final position in a sentence are limited  Particles which show case, prepositional or conjunctional relationships always follow words, phrases or clauses to which they are attached  Attributive words, phrases and clauses always stand before DT substantives which they modify
  • 8. Each word in a sentence will be assigned › An essence which has been fulfilled by it › A linkage number which shows by which word it has been predicted › A group number which shows to which clause in the sentence it belongs  Another peculiarity about Japanese: › The subject of a sentence is very often omitted  Hence, in this analysis: › Subject market and relative subject marker predictions is essential
  • 9. 例)ネズミがネコを殺した話は私を驚かせた.
  • 10.
  • 11. This stage deals with the synthesis of the TL  Brief explanation: › Words with same group num. are gathered › Transformation of word order is performed  In concrete: › Subject marker, object marker & relative subject marker are omitted › Subject master or relative subject master comes first within each group › followed by predicate head or relative predicate head › and then by object master
  • 12.
  • 13. Readings in Machine Translation › Edited by Sergei Nirenburg, Harold Somers, and Yorick Wilks › The MIT Press