Enviar pesquisa
Carregar
Ijetcas14 458
•
1 gostou
•
217 visualizações
Iasir Journals
Seguir
Educação
Tecnologia
Denunciar
Compartilhar
Denunciar
Compartilhar
1 de 4
Baixar agora
Baixar para ler offline
Recomendados
551 466-472
551 466-472
idescitation
Ey4301913917
Ey4301913917
IJERA Editor
Myanmar named entity corpus and its use in syllable-based neural named entity...
Myanmar named entity corpus and its use in syllable-based neural named entity...
IJECEIAES
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
IJERD Editor
Phonetic Recognition In Words For Persian Text To Speech Systems
Phonetic Recognition In Words For Persian Text To Speech Systems
paperpublications3
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
kevig
Natural language-processing
Natural language-processing
Hareem Naz
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis System
IJERA Editor
Mais conteúdo relacionado
Mais procurados
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
paperpublications3
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
kevig
Natural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine Learning
IJMER
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
ijnlc
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
kevig
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
kevig
Kannada Phonemes to Speech Dictionary: Statistical Approach
Kannada Phonemes to Speech Dictionary: Statistical Approach
IJERA Editor
Natural Language Processing from Object Automation
Natural Language Processing from Object Automation
Object Automation
Hidden markov model based part of speech tagger for sinhala language
Hidden markov model based part of speech tagger for sinhala language
ijnlc
NLPinAAC
NLPinAAC
Divya Sugumar
Transliteration by orthography or phonology for hindi and marathi to english ...
Transliteration by orthography or phonology for hindi and marathi to english ...
ijnlc
Sipij040305SPEECH EVALUATION WITH SPECIAL FOCUS ON CHILDREN SUFFERING FROM AP...
Sipij040305SPEECH EVALUATION WITH SPECIAL FOCUS ON CHILDREN SUFFERING FROM AP...
sipij
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
IOSR Journals
Translation Equivalence of Person Reference found in the Subtitle of Harry Po...
Translation Equivalence of Person Reference found in the Subtitle of Harry Po...
Eny Parina
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
csandit
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
kevig
Mais procurados
(16)
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
EXTRACTING LINGUISTIC SPEECH PATTERNS OF JAPANESE FICTIONAL CHARACTERS USING ...
Natural Language Ambiguity and its Effect on Machine Learning
Natural Language Ambiguity and its Effect on Machine Learning
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
INTEGRATION OF PHONOTACTIC FEATURES FOR LANGUAGE IDENTIFICATION ON CODE-SWITC...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
MORPHOLOGICAL ANALYZER USING THE BILSTM MODEL ONLY FOR JAPANESE HIRAGANA SENT...
Kannada Phonemes to Speech Dictionary: Statistical Approach
Kannada Phonemes to Speech Dictionary: Statistical Approach
Natural Language Processing from Object Automation
Natural Language Processing from Object Automation
Hidden markov model based part of speech tagger for sinhala language
Hidden markov model based part of speech tagger for sinhala language
NLPinAAC
NLPinAAC
Transliteration by orthography or phonology for hindi and marathi to english ...
Transliteration by orthography or phonology for hindi and marathi to english ...
Sipij040305SPEECH EVALUATION WITH SPECIAL FOCUS ON CHILDREN SUFFERING FROM AP...
Sipij040305SPEECH EVALUATION WITH SPECIAL FOCUS ON CHILDREN SUFFERING FROM AP...
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Implementation of English-Text to Marathi-Speech (ETMS) Synthesizer
Translation Equivalence of Person Reference found in the Subtitle of Harry Po...
Translation Equivalence of Person Reference found in the Subtitle of Harry Po...
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A SURVEY OF GRAMMAR CHECKERS FOR NATURAL LANGUAGES
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
A ROBUST THREE-STAGE HYBRID FRAMEWORK FOR ENGLISH TO BANGLA TRANSLITERATION
Destaque
Ijetcas14 516
Ijetcas14 516
Iasir Journals
Ijebea14 229
Ijebea14 229
Iasir Journals
Aijrfans14 293
Aijrfans14 293
Iasir Journals
Ijetcas14 468
Ijetcas14 468
Iasir Journals
Ijetcas14 472
Ijetcas14 472
Iasir Journals
Ijebea14 275
Ijebea14 275
Iasir Journals
Ijetcas14 580
Ijetcas14 580
Iasir Journals
Ijetcas14 483
Ijetcas14 483
Iasir Journals
Destaque
(8)
Ijetcas14 516
Ijetcas14 516
Ijebea14 229
Ijebea14 229
Aijrfans14 293
Aijrfans14 293
Ijetcas14 468
Ijetcas14 468
Ijetcas14 472
Ijetcas14 472
Ijebea14 275
Ijebea14 275
Ijetcas14 580
Ijetcas14 580
Ijetcas14 483
Ijetcas14 483
Semelhante a Ijetcas14 458
Nlp
Nlp
Nishanthini Mary
REPORT.doc
REPORT.doc
IswaryaPurushothaman1
Nlp ambiguity presentation
Nlp ambiguity presentation
Gurram Poorna Prudhvi
Natural language processing
Natural language processing
Saurav Aryal
Artificial Intelligence_NLP
Artificial Intelligence_NLP
ThenmozhiK5
AI UNIT-3 FINAL (1).pptx
AI UNIT-3 FINAL (1).pptx
prakashvs7
Natural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and Challenges
antonellarose
Technology Has Taken Over Students Learning And Communication
Technology Has Taken Over Students Learning And Communication
Lindsey Rivera
5. phases of nlp
5. phases of nlp
monircse2
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
kevig
Processing of Written Language
Processing of Written Language
Home and School
Natural Language Processing
Natural Language Processing
Mariana Soffer
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliterator
Shashank Shisodia
Nlp (1)
Nlp (1)
SubramanianMuthusamy3
Processing Written English
Processing Written English
Ruel Montefolka
Natural language processing PPT presentation
Natural language processing PPT presentation
Sai Mohith
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
SHIBDASDUTTA
Natural language processing
Natural language processing
prashantdahake
Natural Language Processing
Natural Language Processing
Rishikese MR
NLP
NLP
Shallote Dsouza
Semelhante a Ijetcas14 458
(20)
Nlp
Nlp
REPORT.doc
REPORT.doc
Nlp ambiguity presentation
Nlp ambiguity presentation
Natural language processing
Natural language processing
Artificial Intelligence_NLP
Artificial Intelligence_NLP
AI UNIT-3 FINAL (1).pptx
AI UNIT-3 FINAL (1).pptx
Natural Language Processing: State of The Art, Current Trends and Challenges
Natural Language Processing: State of The Art, Current Trends and Challenges
Technology Has Taken Over Students Learning And Communication
Technology Has Taken Over Students Learning And Communication
5. phases of nlp
5. phases of nlp
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
OPTIMIZE THE LEARNING RATE OF NEURAL ARCHITECTURE IN MYANMAR STEMMER
Processing of Written Language
Processing of Written Language
Natural Language Processing
Natural Language Processing
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliterator
Nlp (1)
Nlp (1)
Processing Written English
Processing Written English
Natural language processing PPT presentation
Natural language processing PPT presentation
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
Natural language processing
Natural language processing
Natural Language Processing
Natural Language Processing
NLP
NLP
Mais de Iasir Journals
ijetcas14 650
ijetcas14 650
Iasir Journals
Ijetcas14 648
Ijetcas14 648
Iasir Journals
Ijetcas14 647
Ijetcas14 647
Iasir Journals
Ijetcas14 643
Ijetcas14 643
Iasir Journals
Ijetcas14 641
Ijetcas14 641
Iasir Journals
Ijetcas14 639
Ijetcas14 639
Iasir Journals
Ijetcas14 632
Ijetcas14 632
Iasir Journals
Ijetcas14 624
Ijetcas14 624
Iasir Journals
Ijetcas14 619
Ijetcas14 619
Iasir Journals
Ijetcas14 615
Ijetcas14 615
Iasir Journals
Ijetcas14 608
Ijetcas14 608
Iasir Journals
Ijetcas14 605
Ijetcas14 605
Iasir Journals
Ijetcas14 604
Ijetcas14 604
Iasir Journals
Ijetcas14 598
Ijetcas14 598
Iasir Journals
Ijetcas14 594
Ijetcas14 594
Iasir Journals
Ijetcas14 593
Ijetcas14 593
Iasir Journals
Ijetcas14 591
Ijetcas14 591
Iasir Journals
Ijetcas14 589
Ijetcas14 589
Iasir Journals
Ijetcas14 585
Ijetcas14 585
Iasir Journals
Ijetcas14 584
Ijetcas14 584
Iasir Journals
Mais de Iasir Journals
(20)
ijetcas14 650
ijetcas14 650
Ijetcas14 648
Ijetcas14 648
Ijetcas14 647
Ijetcas14 647
Ijetcas14 643
Ijetcas14 643
Ijetcas14 641
Ijetcas14 641
Ijetcas14 639
Ijetcas14 639
Ijetcas14 632
Ijetcas14 632
Ijetcas14 624
Ijetcas14 624
Ijetcas14 619
Ijetcas14 619
Ijetcas14 615
Ijetcas14 615
Ijetcas14 608
Ijetcas14 608
Ijetcas14 605
Ijetcas14 605
Ijetcas14 604
Ijetcas14 604
Ijetcas14 598
Ijetcas14 598
Ijetcas14 594
Ijetcas14 594
Ijetcas14 593
Ijetcas14 593
Ijetcas14 591
Ijetcas14 591
Ijetcas14 589
Ijetcas14 589
Ijetcas14 585
Ijetcas14 585
Ijetcas14 584
Ijetcas14 584
Último
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
Yu Kanazawa / Osaka University
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 Sales
Celine George
Optical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptx
Purva Nikam
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
iammrhaywood
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?
TechSoup
10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdf
Jayanti Pande
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICE
Sayali Powar
A gentle introduction to Artificial Intelligence
A gentle introduction to Artificial Intelligence
Apostolos Syropoulos
How to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using Code
Celine George
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
Dr. Asif Anas
How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17
Celine George
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
raviapr7
March 2024 Directors Meeting, Division of Student Affairs and Academic Support
March 2024 Directors Meeting, Division of Student Affairs and Academic Support
University of South Carolina Division of Student Affairs and Academic Support
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
Conquiztadors- the Quiz Society of Sri Venkateswara College
Protein Structure - threading Protein modelling pptx
Protein Structure - threading Protein modelling pptx
vidhisharma994099
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
Nguyen Thanh Tu Collection
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapitolTechU
Department of Health Compounder Question Solution 2022.pdf
Department of Health Compounder Question Solution 2022.pdf
MohonDas
3.26.24 Race, the Draft, and the Vietnam War.pptx
3.26.24 Race, the Draft, and the Vietnam War.pptx
mary850239
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptx
raviapr7
Último
(20)
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
P4C x ELT = P4ELT: Its Theoretical Background (Kanazawa, 2024 March).pdf
How to Manage Cross-Selling in Odoo 17 Sales
How to Manage Cross-Selling in Odoo 17 Sales
Optical Fibre and It's Applications.pptx
Optical Fibre and It's Applications.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
AUDIENCE THEORY -- FANDOM -- JENKINS.pptx
What is the Future of QuickBooks DeskTop?
What is the Future of QuickBooks DeskTop?
10 Topics For MBA Project Report [HR].pdf
10 Topics For MBA Project Report [HR].pdf
Quality Assurance_GOOD LABORATORY PRACTICE
Quality Assurance_GOOD LABORATORY PRACTICE
A gentle introduction to Artificial Intelligence
A gentle introduction to Artificial Intelligence
How to Send Emails From Odoo 17 Using Code
How to Send Emails From Odoo 17 Using Code
Ultra structure and life cycle of Plasmodium.pptx
Ultra structure and life cycle of Plasmodium.pptx
How to Create a Toggle Button in Odoo 17
How to Create a Toggle Button in Odoo 17
Drug Information Services- DIC and Sources.
Drug Information Services- DIC and Sources.
March 2024 Directors Meeting, Division of Student Affairs and Academic Support
March 2024 Directors Meeting, Division of Student Affairs and Academic Support
Prelims of Kant get Marx 2.0: a general politics quiz
Prelims of Kant get Marx 2.0: a general politics quiz
Protein Structure - threading Protein modelling pptx
Protein Structure - threading Protein modelling pptx
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CHUYÊN ĐỀ DẠY THÊM TIẾNG ANH LỚP 11 - GLOBAL SUCCESS - NĂM HỌC 2023-2024 - HK...
CapTechU Doctoral Presentation -March 2024 slides.pptx
CapTechU Doctoral Presentation -March 2024 slides.pptx
Department of Health Compounder Question Solution 2022.pdf
Department of Health Compounder Question Solution 2022.pdf
3.26.24 Race, the Draft, and the Vietnam War.pptx
3.26.24 Race, the Draft, and the Vietnam War.pptx
Education and training program in the hospital APR.pptx
Education and training program in the hospital APR.pptx
Ijetcas14 458
1.
International Association of
Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) . International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) www.iasir.net IJETCAS 14- 458; © 2014, IJETCAS All Rights Reserved Page 437 ISSN (Print): 2279-0047 ISSN (Online): 2279-0055 Natural Language Processing Basic Tasks for Text Mining Dhanashree K. Thakur Department of Computer Engineering, Dr. Babasaheb Ambedkar Technological University, Lonere, Mangaon, 402 103, Dist: Raigad, Maharashtra, India _________________________________________________________________________________________ Abstract: Natural Language processing (NLP) is a field of computer science and linguistics concerned with the interactions between computers and human (natural) languages. NLP is among the most heavily research areas in computer science today. Historically, NLP has been concerned with developing machine translation and natural language generation systems. For processing natural language, we have to go through several phases such as from meaning of each word with its utterance to the meaning of whole sentence to paragraph. This paper contain different phases, tasks, applications and challenges of NLP and there use in real life. _________________________________________________________________________________________ I. Introduction Language is the primary means of communication used by humans. It is a tool we use to express the greater part of our ideas and emotions. It shapes thoughts, has a structure, and carries meaning. Learning new concepts and expressing ideas through them is so natural. But there must be some kind of representation in our mind, of the content of language. Natural languages are languages used by human to communicate with each other like English, Hindi, and Turkish other than computer languages such as C, C++, java, etc. The basic goal of Natural language Processing is to enable a person to communicate with a computer in a language that they use in their everyday life. Natural Language Processing (NLP) means processing of natural language text using computational models which uses different algorithms to solve different tasks of that language which is applied to computer machine or useful to make software which behaves with human like human being. It means that we can use these tools or machines by giving natural language commands. So, we can say that it is a field of computer science, human- computer interaction, Artificial Intelligence, Linguistics, Text Mining, etc. II. Phases of NLP It is not an easy task to teach a person or computer a natural language. The main problems are syntax, and understanding context to determine the meaning of a word. To interpret even simple phrases requires a vast amount of knowledge. In NLP, we have to consider text form of the Language and the content of it as Knowledge. Hence, to process a language means to process the content of it. As computer is not able to understand natural language, methods are developed to map its content in formal language. The Language and speech community on the other hand, considers a Language as a set of sounds, conveys meaning to listener. Therefore, to process natural language, we have to go through number of phases. Each phase contain particular ambiguity which is nothing but one thing has two meanings and in our NLP task, computer must understand exact meaning of that thing otherwise whole meaning of that paragraph may be change. A. Phonetics and Phonology Phonetics deals with the physical building blocks of a language sound system. E.g. sounds of 'k', ’t’ and 'e' in 'kite'. Phonology deals with organization of speech sounds within a language. E.g. (1) different k sounds in kite verses coat. (2) different t and p sounds in top verses pot. B. Lexical Analysis The simplest level which involve analysis of words. Word-level processing requires morphological Knowledge, i.e. Knowledge about structure and formation of words from basic units called morphemes. The rules for forming words from morphemes are language specific. It is also concerned with derivation of new words from existing ones, e.g. light-house (formed from light & house). C. Syntactic Analysis The next level which considers a sequence of words as a unit, usually a sentence, and its structure. It decomposes the sentence into word and identifies how they relate to each other. Not every sequence of words results in a sentence. For example, 'I went to the market' is valid sentence while 'went the market to' is not. Similarly, 'She is going to the market' is valid, but 'She are going to the market' is not. Thus, this level of analysis required detailed knowledge about rules of grammar. D. Semantic Analysis Semantic is associated with the meaning of language. Semantic analysis map natural language sentences into some representation of meaning. Defining meaning of components is difficult, as grammatically valid sentences can be meaningless. Example is, 'Colorless green ideas sleep furiously'. The sentence is syntactically correct but
2.
Dhanashree K. Thakur,
International Journal of Emerging Technologies in Computational and Applied Sciences, 8(5), March-May, 2014, pp. 437-440 IJETCAS 14- 458; © 2014, IJETCAS All Rights Reserved Page 438 semantically anomalous. Unfortunately, many words have several meanings, for example, the word ‘diamond’ might have the following set of meanings: (1) A geometrical shape with four equal sides. (2) A baseball field (3) An extremely hard and valuable gemstone To select the correct meaning for the word 'diamond' in the sentence "John saw Susan diamond shimmering from across the room." It is necessary to know that neither geometrical shapes nor baseball fields shimmer, whereas gemstones do. The concept of meaning is quite broad. A word can have number of possible meanings associated with it. But in a given context, only one of the meanings participates. The process of determining the correct meaning of an individual word is call word sense disambiguation or lexical disambiguation. It is done by associating, with each word in the lexicon, information about the contexts in which each of the words senses may appear. E. Discourse Analysis It is higher level of analysis attempt to interpret the structure and meaning of even larger units, e.g. at the paragraph and document level, in terms of words, phrases, clusters and sentences. It requires resolution of anaphoric reference and identification of discourse structure. That is knowledge of how the meaning of sentence is determined by preceding sentence. E.g. how a pronoun refers to the preceding noun and how to determine function of a sentence in a text. There are many important relationships that may hold between phrases and parts of their discourse context. Consider "`Bill had a red balloon. John wanted it."' The word 'it' should be identified as referring to the red balloon. References such as this are call anaphoric or anaphora. F. Pragmatic Analysis The highest level of processing is pragmatic analysis which deals with the purposeful use of sentences in situations. It requires knowledge of the world, i.e. knowledge that extends beyond the contents of the text. This is an additional stage of analysis concerned with the pragmatic use of the language. This is important in the understanding of texts and dialogues. For example, suppose teacher giving a lecture and forgot to bring book. So, he asked to one student to go in his cabin and to see if his book is there or not. Student went and came without bringing book and say that his book is there. It means that student cannot understand the situation. III. NLP Activities To natural language processing contain different basic tasks or activities required to process any language which are done by computer programming languages such as java, python, etc. These basic tasks of NLP are explained here. A. Tokenizer Tokenization is the process of breaking a stream of text into words, phrases, symbols, or other meaningful elements called tokens. The list of tokens becomes input for further processing such as parsing or text mining. For a language like English, this is fairly trivial, since words are usually separated by spaces. However, some written languages like Chinese and Japanese do not mark word boundaries in such a fashion, and in those languages text segmentation is a significant task requiring knowledge of the vocabulary and morphology of words in the language. E.g. Suppose a sentence is "Sony sang a song." Then tokens are 'Sony', 'sang', 'a', 'song' '.' and 3 space tokens. B. Sentence Splitter It segments the text into sentences. It assigns annotations of type split to sentence boundaries in text based on punctuation. A question mark or exclamation mark always ends a sentence. A period followed by an upper-case letter generally ends a sentence, but there are a number of exceptions. For example, if the period is part of an abbreviated title (e.g. "Mr.", "Gen."), it does not end of the sentence. A period following a single capitalized letter is assumed to be a person's initial, and is not considered the end of a sentence. Sentence Detector can detect that a punctuation character marks at the end of a sentence or not. The sample text below should be segmented into its sentences. Ex. Suppose text is "My friends are Sony and Sangita. Sony sang a song. She is very happy today." Then output is in three separate strings: “My friends are Sony and Sangita.” "Sony sang a song." "She is very happy today." C. POS Tagger It assigns parts of speech such as noun, verb, pronoun, preposition, adverb, adjective, etc. to each word in a sentence. The sequence of words in natural language sentence and specified tag sets is input to tagging algorithm. Number of tags in a tag set varies substantially. The Penn Treebank tag set contains 45 tags while C7 uses 164. Output is single best POS tag for each word. Ex. Suppose input sentence is "Sandesh sang a song." Then output is: Sandesh_NP sang_VBD a_ DT song_NN. Here NP for proper noun, VBD for verb in past tense, DT for determiner, NN for noun [3].
3.
Dhanashree K. Thakur,
International Journal of Emerging Technologies in Computational and Applied Sciences, 8(5), March-May, 2014, pp. 437-440 IJETCAS 14- 458; © 2014, IJETCAS All Rights Reserved Page 439 D. Named Entity Recognition Named Entity Recognition (NER) is a technology for recognizing proper nouns (entities) in text and associating them with the appropriate types. Common types in NER systems are location, person name, date, city, organization, etc. It uses Dictionary, lexicon or Gazetteer. E.g. Suppose input sentence is "Lalita sang a song. She is friend of Sangita. She lives in Mumbai." Then output is Lalita-person, Sangita-person, Mumbai-city. E. Corefernce Resolution In linguistics, coreference (sometimes written co-reference) occurs when two or more expressions in a text refer to the same person or thing; they have the same referent. Coreference is the main concept underlying binding phenomena in the field of syntax. The theory of binding explores the syntactic relationship that exists between coreferential expressions in sentences and texts. E.g. "Saniya and I bought a camera. We like capturing nature scenes." Here, 'We' refers to 'I' and 'Saniya'. Another E.g. is: "Neha bought a printer. She is printing now." Here, 'She' refers to 'Neha' not a 'printer'. Another E.g. is 'Neha bought a printer. It is printing now.' Here, 'It' refers to 'printer'. F. Chunker It divides a text in syntactically correlated parts of words, like noun groups, verb groups, but does not specify their internal structure nor their role in the main sentence. Chunking is an alternative to parsing that provides a partial syntactic structure of a sentence, with a limited tree depth, as opposed to parsing. E.g. Suppose input sentence is "Amey, Soniya and Tejas sang a song.", then chunking is: [NP Amey, Soniya and Tejas] [VP sang] [NP a song]. Here, NP is Noun Phrase and VP is Verb Phrase. G. Parser A parser is a software component that takes input data (frequently text) and builds a data structure often some kind of parse tree, abstract syntax tree or other hierarchical structure giving a structural representation of the input, checking for correct syntax in the process. Parsing or syntactic analysis is the process of analyzing a string of symbols, either in natural language or in computer languages, according to the rules of a formal grammar. Example is, "My dog also likes eating sausage.” after parsing, it gives output as: (S (NP (PRP My) (NN dog)) ((ADVP (RB also)) (VP (VBZ likes))) (VP (VBG eating) (NP (NN sausage) ))) Here, S-Sentence; PRP-Personal Pronoun; RB0–Particle; VB -Verb, gerund or present participle; VBG–Verb, 3rd person singular present; VBZ-Verb, third person singular present; NN-Noun, singular or mass. H. Document Reset Some tools can take large number of documents for annotations. So they may tend to employ weak referencing in the JVM, thus overriding garbage collection. Consequently there is need to explicitly delete resources mostly when using these documents. This has two consequences: 1) Old annotations residing in memory from a previous document may be taken into account when processing the next document. 2) The JVM heap will begin to fill up with old annotation sets generated from previous documents which can severely impact on performance. Hence the purpose of the Document Reset PR is to clean out any previous annotations, essentially releasing the annotation set objects for garbage collection. IV. Applications of NLP There are huge amounts of data in Internet, at least 20 billion pages. Applications for processing large amounts of texts require NLP expertise. Machine translation is the first application area of NLP. A. Speech Recognition This is the process of mapping acoustic speech signals to a set of words. The difficulties occur due to the wide variations in pronunciation of words (e.g. dear and deer). Example is to get flight information or book a hotel over the phone. B. Speech Synthesis Speech Synthesis refers to the automatic production of speech (utterance of natural language sentences) from text. Such systems can read out mails on telephone or even read out storybook. In order to generate utterances, text has to be processed. C. Information Extraction It captures and outputs factual information contained within a document and responds to the users information need. Example is to discover names of people and event they participate in, from a document. D. Information Retrieval
4.
Dhanashree K. Thakur,
International Journal of Emerging Technologies in Computational and Applied Sciences, 8(5), March-May, 2014, pp. 437-440 IJETCAS 14- 458; © 2014, IJETCAS All Rights Reserved Page 440 This is concerned with identifying documents relevant to user's query. It is activity of obtaining information resources relevant to an information need from a collection of information resources. Many universities and public libraries use IR systems to provide access to books, journals and other documents. Web search engines are most visible IR applications. E. Machine Translation This refers to the automatic translation of text from one human language to another. In order to carry out this translation, it is necessary to have an understanding of words and phrases, grammars of the both languages involved, semantics of languages and world knowledge. F. Natural Language Interfaces to Databases Natural language interfaces allow querying a structured database using natural language sentences. Intelligent databases systems are systems that manage information in natural way, making that information easy to access and use. This system translates a natural language sentence to database query. Anyone can gather information from the database by using such systems. G. Question Answering System Question Answering system find answers to natural language questions in a text collection or database. It requires not only precise analysis of questions and portions of texts but also semantic as well as background knowledge to answer certain type of question. H. Text Summarization This deals with the creation of summaries of documents. It generates a short biography from one or more news articles. Automatic summarization is the process of reducing a text document with a computer program in order to create summary that retains the most important points of the original document. As the problem of information overload is grown, and as quantity of data has increased, so has interest in text summarization. V. Some successful NLP Systems A. ELIZA (Weizenbaum 1966) It uses syntactic patterns to mimic human conversation with the user. Here is a sample conversation ELIZA: Hello, I am ELIZA. How may I help you? User: I am feeling a little bit sleepy. ELIZA: How long have you been feeling a little bit sleepy? User: For almost half an hour. ELIZA: Please go on. B. SysTran (System Translation) The first SysTran machine translation system was developed in 1969 for Russian-English translation. It also provided the first online machine translation service called Babel Fish, which is used by AltaVista search engines for handling translation requests from users. C. TAUM METEO This is natural language generation system used in Canada to generate weather reports. It accepts daily weather data and generates weather reports in English and French. D. SHRDLU (Winogard 1972) This is natural language understanding system that simulates actions of robot in a block word domain. The user can ask the robot to manipulate the blocks, to tell the blocks configuration, and to explain its reasoning. E. LUNAR (Woods 1977) This was an early question answering system that answered questions about moon rock. F. Google Translator Now-a-days, Google translator is one of the pioneer applications supporting a number of languages to translate from one to another. Although, it has been successfully implemented for many languages, but many languages still in developing phase. The others translators e.g. Yahoo Babel Fish, SDL Free Translation, Systran Language Translation etc. support multi language translation like Danish, English, Chinese, Italian, Japanese, French, Greek, Korean etc. VI. Conclusion Human languages are interesting & challenging and therefore NLP is one of the most comprehensive and most challenging tasks. NLP is among the most heavily researched areas in Computer Sciences today. NLP also has been part of the ongoing research in the field of information extraction. NLP programs can carry out a number of very interesting tasks. These programs have impacts on the way we communicate. Various tools are available for different NLP tasks. It is very difficult to develop perfect NLP System. So, development in this area is helpful for individual human, different companies and research areas. References [1] Tanveer Siddiqui, U. S. Tiwary, Natural Language Processing and Information Retrieval. [2] http://en.wikipedia.org/wiki/ Naturallanguageprocessing [3] https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html [4] http://www-nlp.stanford.edu/ [5] http://en.wikipedia.org/wiki/TSext_mining
Baixar agora