SlideShare uma empresa Scribd logo
1 de 22
Shashank 10503883
Harshit Goel 10103559
B-Tech Project
Project Mentor : Ms. Parmeet Kaur
Shallow Parser
With Input From A Transliterator
 Introduction
 Literary Review
 Problem Statement
 Plan of Action
 System Architecture
 Flow Chart
 Conclusion & findings
 References
Content
 Shallow Parser
 Morphological Analyzer
 Transliteration
Introduction
 Shallow parsing (also chunking, "light parsing") is an
analysis of a sentence which identifies the
constituents (noun groups, verbs, verb groups, etc.),
but does not specify their internal structure, nor their
role in the main sentence.
 It is a technique widely used in natural language
processing. It is similar to the concept of lexical
analysis for computer languages.
Shallow Parser
A "parser" is a system that transforms sentences (strings of
characters) into a representation that describes the groupings
of words (phrases) and their relations (e.g. subject and
object). The representation of choice for such information is a
syntactic tree in which nodes refer to phrases, word
categories, or words, and links refer to relations between
these objects:
Why Shallow Parser?
 Parsing the sentence into a tree whose leaves will hold POS tags (which
correspond to words in the sentence), but the rest of the tree would tell
you how exactly these words are joining together to make the overall
sentence.
 Example an adjective and a noun might combine to be a 'Noun Phrase',
which might combine with another adjective to form another Noun
Phrase (e.g. quick brown fox) (the exact way the pieces combine depends
on the parser in question).
 A shallow parser or 'chunker' comes somewhere in between these two. A
plain POS tagger is really fast but does not give you enough information
and a full blown parser is slow and gives you too much. A POS tagger can
be thought of as a parser which only returns the bottom-most tier of the
parse tree to you.
 A chunker might be thought of as a parser that returns some other tier of
the parse tree to you instead. Sometimes you just need to know that a
bunch of words together form a Noun Phrase but don't care about the
sub-structure of the tree within those words (i.e. which words are
adjectives, determiners, nouns, etc and how do they combine). In such
cases you can use a chunker to get exactly the information you need
instead of wasting time generating the full parse tree for the sentence.
Difference b/w Shallow
Parser and POS Tagger
 Morphology
Morphology is the part of linguistics that deals with the
study of words, their internal structure and partially their
meanings. It refers to identification of a word stem from a full
word form. A morpheme in morphology is the smallest units
that carry meaning and fulfill some grammatical function.
Morphology
 Morphological analysis
Morphological Analysis is the process of providing grammatical
information of a word given its suffix.
 Models
There are three principal approaches to morphology, which each try to
capture the distinctions above in different ways. These are,
• Morpheme-based morphology also known as Item-and-Arrangement
approach.
• Lexeme-based morphology also known as Item-and-Process
approach.
• Word-based morphology also known as Word-and-Paradigm
approach.
Morphological Analysis
and Models
 Morphological Analyzer
A morphological analyzer is a program for analyzing the
morphology of an input word, it detects morphemes of any
text.
 Presently we are referring to two types of morph analyzers
for Indian languages:
1. Phrase level Morph Analyzer
2. Word level Morph Analyzer
Morphological Analyzer
 Transliteration is the conversion of a text from one script to
another.
 For instance:
kaay kam karato = काय कम करतो
kyaa chal rahaa hai = क्या चल रहा है
 Transliteration can form an essential part
of transcription which converts text from one writing
system into another. Transliteration is not concerned with
representing the phonemics of the original
Transliteration
 We have researched in detail about our project by means of research
papers, blogs and internet. There are various approaches for the
development of the morphological analyzers such as Finite State
Automata (FSA) approach, Two Level Morphology approach, Finite
State Transducers (FST) approach, Stemmer Algorithm, Corpus
Based Approach, DAWG (Directed Acrylic Word Graph) and
Paradigm Based Approach in which the FST based approach is the
most efficient approach for the development of the morphological
analyzer for Hindi that is highly inflectional language.
 There are several approaches for the construction of Shallow parser
such as Chunker based Shallow parser, HMM based Shallow parser,
Memory based Shallow parser, Shallow parser based on conditional
random fields and Shallow parser based on Winnow algorithm. Among
these, Shallow parser based on conditional random fields is proven to
be the most efficient and flexible approach. Shallow parsers are very
essential tools for various NLP applications as they provide a complete
set of the natural language while decreasing the complexity inherent in
the complete parser. Thus, shallow parsers are important for
applications that require only syntactic analysis of the sentence and
don’t require relationships between the chunks of the sentence. This
includes applications like auto-text summarization, speech-to-speech
translation systems and text-mining applications.
Literary Survey-Summary
 Many cultures around the world use different scripts to
represent their languages. By transliterating, people can make
their languages more accessible to people who do not
understand their scripts. For example, to someone who knows
the Roman alphabet, the name ‫محمد‬is incomprehensible.
However, when it is transliterated as Muhammad, readers of the
Roman alphabet understand that it means the Muslim prophet
Muhammad.
 So Transliterator helps the non-native speakers to type the Hindi
phrase in Roman Script using any keyboard and thus providing
the input for Shallow Parser
Literary Survey-Summary
 We intent to develop a ‘Shallow Parser for Hindi Language’ and
a FST based Morphological Analyzer which can be used as a tool
in building more application specific tools like auto-text
summarizer, speech-to-speech translators etc. Key objective of
the project is to provide the shallow parser and morphological
analyzer open source software.
 We also want to develop a simple tool to convert roman script to
Indic(Devanagari) script. As most keyboards are English, so to
write in Indic script is difficult. It is easy to write Hindi in roman
script this gives inspiration to make a tool for Linux to write
Hindi text easily.
Problem Statement
Plan of Action
1. Transliteration
2. Lexicon Generator
3. Morphological Analyzer
4. Shallow Parsing
1. Transliterator
Figure: Block Diagram of transliteration process
It is a simple tool to convert roman script to Indic(Devanagari) script. As most
keyboards are English, so to write in Indic script is difficult. It is easy to write
Hindi in roman script this gives inspiration to make a tool for Linux to write
Hindi text easily.
2. Lexicon Generator
Figure: Block Diagram of Lexicon Generation
There are three steps to process the corpus to extract the words. The first step is to
extract the words from the given corpus' sentences. In the next step the duplicate
words are removed to extract the unique words. After that the sorting of the
words are done which makes easier to processing of the words manually such as
the classification of the words. The lexicon files for each word classes are
classified as per its inflection, and derivations types.
3. Morphological Analyzer
Figure: Architecture of the Morphological Processor
The analyzer takes the input, the word that is of surface form and produces the
result as the grammatical structure of the word that is of the lexicon form. The
Generator takes the input, the grammatical structure of the word that is lexicon
form and produces the result, the corresponding word that is of surface form.
4. Shallow Parsing by CFG
 A CFG is a 4-tuple <N,E,R,S >
 A set of non-terminals N
 (e.g. N = {S, NP, VP, PP, Noun, Verb, ....})
 A set of terminals E
 (e.g. E = {In, the, popular, mythology, the, computer, is, a, mathematics,
machine })
 A set of rules R
 A start symbol S (sentence)
System Architecture
Flow Chart
Input : Ram School Jaata Hai.
Output1: राम स्कू ल जाता है|
Transliterator
Shallow Parser
Output2: NP NP VP
NP – Noun Phrase
VP – Verb Phrase
Findings and Conclusion
 It is challenging to translate names and technical terms across
languages with different alphabets and sound inventories.
These items are commonly transliterated, i.e., replaced with
approximate phonetic equivalents. An efficient shallow parser
for Hindi is needed to build a full-blown parser.
 Since proper nouns and technical terms — which need
phonetical translation — are part of most text documents,
transliteration is an important problem to study.
 Found only few shallow parsers for Hindi
 Analysed different approaches for creating shallow parser
 Parsing by CFG is the used approach.
 Approach is labour-intensive as rules are crafted manually.
References
 ‘Transliterated Search using Syllabification Approach’ by
Hardik Joshi, Apurva Bhatt, Honey Patel
 ‘Transliteration Systems Across Indian Languages Using
Parallel Corpora’ by RishabhSrivastava and Riyaz
Ahmad Bhat
 ‘Semi-Supervised Learning of Hindi Morphology’ by
Teena Bajaj and Parteek Bhatia
 ‘Phonetically Rich Hindi Sentence Corpus for Creation of
Speech Database’ by Vishal Chourasia, Samudravijaya K,
Manohar Chandwani

Mais conteúdo relacionado

Mais procurados

Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Mariana Soffer
 
Towards an it support model
Towards an it support modelTowards an it support model
Towards an it support model
HRoi Consulting
 
02. chapter 3 lexical analysis
02. chapter 3   lexical analysis02. chapter 3   lexical analysis
02. chapter 3 lexical analysis
raosir123
 

Mais procurados (20)

Lecture: Word Sense Disambiguation
Lecture: Word Sense DisambiguationLecture: Word Sense Disambiguation
Lecture: Word Sense Disambiguation
 
Text summerization
Text summerizationText summerization
Text summerization
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Semantic web an overview and projects
Semantic web   an  overview and projectsSemantic web   an  overview and projects
Semantic web an overview and projects
 
Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...Natural language processing with python and amharic syntax parse tree by dani...
Natural language processing with python and amharic syntax parse tree by dani...
 
Speech Synthesis.pptx
Speech Synthesis.pptxSpeech Synthesis.pptx
Speech Synthesis.pptx
 
Unit 1-uses for scripting languages,web scripting
Unit 1-uses for scripting languages,web scriptingUnit 1-uses for scripting languages,web scripting
Unit 1-uses for scripting languages,web scripting
 
Assembler
AssemblerAssembler
Assembler
 
Lexical analysis
Lexical analysisLexical analysis
Lexical analysis
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
Project report on An Energy Efficient Routing Protocol in Wireless Sensor Net...
Project report on An Energy Efficient Routing Protocol in Wireless Sensor Net...Project report on An Energy Efficient Routing Protocol in Wireless Sensor Net...
Project report on An Energy Efficient Routing Protocol in Wireless Sensor Net...
 
ppt project pk.pptx
ppt project pk.pptxppt project pk.pptx
ppt project pk.pptx
 
Logging configuration in mule
Logging configuration in muleLogging configuration in mule
Logging configuration in mule
 
5. phases of nlp
5. phases of nlp5. phases of nlp
5. phases of nlp
 
SOAP, WSDL and UDDI
SOAP, WSDL and UDDISOAP, WSDL and UDDI
SOAP, WSDL and UDDI
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
TEXT SUMMARIZATION
TEXT SUMMARIZATIONTEXT SUMMARIZATION
TEXT SUMMARIZATION
 
UNIT III.docx
UNIT III.docxUNIT III.docx
UNIT III.docx
 
Towards an it support model
Towards an it support modelTowards an it support model
Towards an it support model
 
02. chapter 3 lexical analysis
02. chapter 3   lexical analysis02. chapter 3   lexical analysis
02. chapter 3 lexical analysis
 

Semelhante a Shallow parser for hindi language with an input from a transliterator

Implementation Of Syntax Parser For English Language Using Grammar Rules
Implementation Of Syntax Parser For English Language Using Grammar RulesImplementation Of Syntax Parser For English Language Using Grammar Rules
Implementation Of Syntax Parser For English Language Using Grammar Rules
IJERA Editor
 

Semelhante a Shallow parser for hindi language with an input from a transliterator (20)

Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
REPORT.doc
REPORT.docREPORT.doc
REPORT.doc
 
Machine Translation Approaches and Design Aspects
Machine Translation Approaches and Design AspectsMachine Translation Approaches and Design Aspects
Machine Translation Approaches and Design Aspects
 
5a use of annotated corpus
5a use of annotated corpus5a use of annotated corpus
5a use of annotated corpus
 
Nlp
NlpNlp
Nlp
 
Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)Ijarcet vol-3-issue-3-623-625 (1)
Ijarcet vol-3-issue-3-623-625 (1)
 
Implementation Of Syntax Parser For English Language Using Grammar Rules
Implementation Of Syntax Parser For English Language Using Grammar RulesImplementation Of Syntax Parser For English Language Using Grammar Rules
Implementation Of Syntax Parser For English Language Using Grammar Rules
 
An implementation of apertium based assamese morphological analyzer
An implementation of apertium based assamese morphological analyzerAn implementation of apertium based assamese morphological analyzer
An implementation of apertium based assamese morphological analyzer
 
nlp (1).pptx
nlp (1).pptxnlp (1).pptx
nlp (1).pptx
 
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
Applying Rule-Based Maximum Matching Approach for Verb Phrase Identification ...
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
NLP todo
NLP todoNLP todo
NLP todo
 
An Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding SystemAn Intuitive Natural Language Understanding System
An Intuitive Natural Language Understanding System
 
D3 dhanalakshmi
D3 dhanalakshmiD3 dhanalakshmi
D3 dhanalakshmi
 
Building of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert SystemBuilding of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert System
 
Arabic MT Project
Arabic MT ProjectArabic MT Project
Arabic MT Project
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
ReseachPaper
ReseachPaperReseachPaper
ReseachPaper
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
FIRE2014_IIT-P
FIRE2014_IIT-PFIRE2014_IIT-P
FIRE2014_IIT-P
 

Último

DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
MayuraD1
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
Kamal Acharya
 

Último (20)

data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
Unit 4_Part 1 CSE2001 Exception Handling and Function Template and Class Temp...
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
Hospital management system project report.pdf
Hospital management system project report.pdfHospital management system project report.pdf
Hospital management system project report.pdf
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 

Shallow parser for hindi language with an input from a transliterator

  • 1. Shashank 10503883 Harshit Goel 10103559 B-Tech Project Project Mentor : Ms. Parmeet Kaur Shallow Parser With Input From A Transliterator
  • 2.  Introduction  Literary Review  Problem Statement  Plan of Action  System Architecture  Flow Chart  Conclusion & findings  References Content
  • 3.  Shallow Parser  Morphological Analyzer  Transliteration Introduction
  • 4.  Shallow parsing (also chunking, "light parsing") is an analysis of a sentence which identifies the constituents (noun groups, verbs, verb groups, etc.), but does not specify their internal structure, nor their role in the main sentence.  It is a technique widely used in natural language processing. It is similar to the concept of lexical analysis for computer languages. Shallow Parser
  • 5. A "parser" is a system that transforms sentences (strings of characters) into a representation that describes the groupings of words (phrases) and their relations (e.g. subject and object). The representation of choice for such information is a syntactic tree in which nodes refer to phrases, word categories, or words, and links refer to relations between these objects: Why Shallow Parser?
  • 6.  Parsing the sentence into a tree whose leaves will hold POS tags (which correspond to words in the sentence), but the rest of the tree would tell you how exactly these words are joining together to make the overall sentence.  Example an adjective and a noun might combine to be a 'Noun Phrase', which might combine with another adjective to form another Noun Phrase (e.g. quick brown fox) (the exact way the pieces combine depends on the parser in question).  A shallow parser or 'chunker' comes somewhere in between these two. A plain POS tagger is really fast but does not give you enough information and a full blown parser is slow and gives you too much. A POS tagger can be thought of as a parser which only returns the bottom-most tier of the parse tree to you.  A chunker might be thought of as a parser that returns some other tier of the parse tree to you instead. Sometimes you just need to know that a bunch of words together form a Noun Phrase but don't care about the sub-structure of the tree within those words (i.e. which words are adjectives, determiners, nouns, etc and how do they combine). In such cases you can use a chunker to get exactly the information you need instead of wasting time generating the full parse tree for the sentence. Difference b/w Shallow Parser and POS Tagger
  • 7.  Morphology Morphology is the part of linguistics that deals with the study of words, their internal structure and partially their meanings. It refers to identification of a word stem from a full word form. A morpheme in morphology is the smallest units that carry meaning and fulfill some grammatical function. Morphology
  • 8.  Morphological analysis Morphological Analysis is the process of providing grammatical information of a word given its suffix.  Models There are three principal approaches to morphology, which each try to capture the distinctions above in different ways. These are, • Morpheme-based morphology also known as Item-and-Arrangement approach. • Lexeme-based morphology also known as Item-and-Process approach. • Word-based morphology also known as Word-and-Paradigm approach. Morphological Analysis and Models
  • 9.  Morphological Analyzer A morphological analyzer is a program for analyzing the morphology of an input word, it detects morphemes of any text.  Presently we are referring to two types of morph analyzers for Indian languages: 1. Phrase level Morph Analyzer 2. Word level Morph Analyzer Morphological Analyzer
  • 10.  Transliteration is the conversion of a text from one script to another.  For instance: kaay kam karato = काय कम करतो kyaa chal rahaa hai = क्या चल रहा है  Transliteration can form an essential part of transcription which converts text from one writing system into another. Transliteration is not concerned with representing the phonemics of the original Transliteration
  • 11.  We have researched in detail about our project by means of research papers, blogs and internet. There are various approaches for the development of the morphological analyzers such as Finite State Automata (FSA) approach, Two Level Morphology approach, Finite State Transducers (FST) approach, Stemmer Algorithm, Corpus Based Approach, DAWG (Directed Acrylic Word Graph) and Paradigm Based Approach in which the FST based approach is the most efficient approach for the development of the morphological analyzer for Hindi that is highly inflectional language.  There are several approaches for the construction of Shallow parser such as Chunker based Shallow parser, HMM based Shallow parser, Memory based Shallow parser, Shallow parser based on conditional random fields and Shallow parser based on Winnow algorithm. Among these, Shallow parser based on conditional random fields is proven to be the most efficient and flexible approach. Shallow parsers are very essential tools for various NLP applications as they provide a complete set of the natural language while decreasing the complexity inherent in the complete parser. Thus, shallow parsers are important for applications that require only syntactic analysis of the sentence and don’t require relationships between the chunks of the sentence. This includes applications like auto-text summarization, speech-to-speech translation systems and text-mining applications. Literary Survey-Summary
  • 12.  Many cultures around the world use different scripts to represent their languages. By transliterating, people can make their languages more accessible to people who do not understand their scripts. For example, to someone who knows the Roman alphabet, the name ‫محمد‬is incomprehensible. However, when it is transliterated as Muhammad, readers of the Roman alphabet understand that it means the Muslim prophet Muhammad.  So Transliterator helps the non-native speakers to type the Hindi phrase in Roman Script using any keyboard and thus providing the input for Shallow Parser Literary Survey-Summary
  • 13.  We intent to develop a ‘Shallow Parser for Hindi Language’ and a FST based Morphological Analyzer which can be used as a tool in building more application specific tools like auto-text summarizer, speech-to-speech translators etc. Key objective of the project is to provide the shallow parser and morphological analyzer open source software.  We also want to develop a simple tool to convert roman script to Indic(Devanagari) script. As most keyboards are English, so to write in Indic script is difficult. It is easy to write Hindi in roman script this gives inspiration to make a tool for Linux to write Hindi text easily. Problem Statement
  • 14. Plan of Action 1. Transliteration 2. Lexicon Generator 3. Morphological Analyzer 4. Shallow Parsing
  • 15. 1. Transliterator Figure: Block Diagram of transliteration process It is a simple tool to convert roman script to Indic(Devanagari) script. As most keyboards are English, so to write in Indic script is difficult. It is easy to write Hindi in roman script this gives inspiration to make a tool for Linux to write Hindi text easily.
  • 16. 2. Lexicon Generator Figure: Block Diagram of Lexicon Generation There are three steps to process the corpus to extract the words. The first step is to extract the words from the given corpus' sentences. In the next step the duplicate words are removed to extract the unique words. After that the sorting of the words are done which makes easier to processing of the words manually such as the classification of the words. The lexicon files for each word classes are classified as per its inflection, and derivations types.
  • 17. 3. Morphological Analyzer Figure: Architecture of the Morphological Processor The analyzer takes the input, the word that is of surface form and produces the result as the grammatical structure of the word that is of the lexicon form. The Generator takes the input, the grammatical structure of the word that is lexicon form and produces the result, the corresponding word that is of surface form.
  • 18. 4. Shallow Parsing by CFG  A CFG is a 4-tuple <N,E,R,S >  A set of non-terminals N  (e.g. N = {S, NP, VP, PP, Noun, Verb, ....})  A set of terminals E  (e.g. E = {In, the, popular, mythology, the, computer, is, a, mathematics, machine })  A set of rules R  A start symbol S (sentence)
  • 20. Flow Chart Input : Ram School Jaata Hai. Output1: राम स्कू ल जाता है| Transliterator Shallow Parser Output2: NP NP VP NP – Noun Phrase VP – Verb Phrase
  • 21. Findings and Conclusion  It is challenging to translate names and technical terms across languages with different alphabets and sound inventories. These items are commonly transliterated, i.e., replaced with approximate phonetic equivalents. An efficient shallow parser for Hindi is needed to build a full-blown parser.  Since proper nouns and technical terms — which need phonetical translation — are part of most text documents, transliteration is an important problem to study.  Found only few shallow parsers for Hindi  Analysed different approaches for creating shallow parser  Parsing by CFG is the used approach.  Approach is labour-intensive as rules are crafted manually.
  • 22. References  ‘Transliterated Search using Syllabification Approach’ by Hardik Joshi, Apurva Bhatt, Honey Patel  ‘Transliteration Systems Across Indian Languages Using Parallel Corpora’ by RishabhSrivastava and Riyaz Ahmad Bhat  ‘Semi-Supervised Learning of Hindi Morphology’ by Teena Bajaj and Parteek Bhatia  ‘Phonetically Rich Hindi Sentence Corpus for Creation of Speech Database’ by Vishal Chourasia, Samudravijaya K, Manohar Chandwani