SlideShare a Scribd company logo
1 of 7
Download to read offline
Module
          13
Natural Language
      Processing
        Version 2 CSE IIT, Kharagpur
13.1 Instructional Objective
•   The students should understand the necessity of natural language processing in
    building an intelligent system
•   Students should understand the difference between natural and formal language and
    the difficulty in processing the former
•   Students should understand the ambiguities that arise in natural language processing
•   Students should understand the language information required like like
        o Phonology
        o Morphology
        o Syntax
        o Semantic
        o Discourse
        o World knowledge
•   Students should understand the steps involved in natural language understanding and
    generation
•   The student should be familiar with basic language processing operations like
        o Morphological analysis
        o Parts-of-Speech tagging
        o Lexical processing
        o Semantic processing
        o Knowledge representation

At the end of this lesson the student should be able to do the following:
    • Design the processing steps required for a NLP task
    • Implement the processing techniques.




                                                            Version 2 CSE IIT, Kharagpur
Lesson
        40
Issues in NLP
    Version 2 CSE IIT, Kharagpur
13.1 Natural Language Processing
Natural Language Processing (NLP) is the process of computer analysis of input provided
in a human language (natural language), and conversion of this input into a useful form of
representation.

The field of NLP is primarily concerned with getting computers to perform useful and
interesting tasks with human languages. The field of NLP is secondarily concerned with
helping us come to a better understanding of human language.

   •   The input/output of a NLP system can be:
           – written text
           – speech
   •   We will mostly concerned with written text (not speech).
   •   To process written text, we need:
           – lexical, syntactic, semantic knowledge about the language
           – discourse information, real world knowledge
   •   To process spoken language, we need everything required to process written text,
       plus the challenges of speech recognition and speech synthesis.

There are two components of NLP.

   •   Natural Language Understanding
          – Mapping the given input in the natural language into a useful
             representation.
          – Different level of analysis required:
             morphological analysis,
             syntactic analysis,
             semantic analysis,
             discourse analysis, …
   •   Natural Language Generation
          – Producing output in the natural language from some internal
             representation.
          – Different level of synthesis required:
             deep planning (what to say),
             syntactic generation
   •   NL Understanding is much harder than NL Generation. But, still both of them are
       hard.

The difficulty in NL understanding arises from the following facts:

   •   Natural language is extremely rich in form and structure, and very ambiguous.
          – How to represent meaning,
          – Which structures map to which meaning structures.
   •   One input can mean many different things. Ambiguity can be at different levels.

                                                           Version 2 CSE IIT, Kharagpur
–    Lexical (word level) ambiguity -- different meanings of words
          –    Syntactic ambiguity -- different ways to parse the sentence
          –    Interpreting partial information -- how to interpret pronouns
          –    Contextual information -- context of the sentence may affect the meaning
               of that sentence.
   •   Many input can mean the same thing.
   •   Interaction among components of the input is not clear.

The following language related information are useful in NLP:

   •   Phonology – concerns how words are related to the sounds that realize them.

   •   Morphology – concerns how words are constructed from more        basic meaning
       units called morphemes. A morpheme is the primitive unit of meaning in a
       language.

   •   Syntax – concerns how can be put together to form correct sentences and
       determines what structural role each word plays in the sentence and what phrases
       are subparts of other phrases.

   •   Semantics – concerns what words mean and how these meaning combine in
       sentences to form sentence meaning. The study of context-independent meaning.

   •   Pragmatics – concerns how sentences are used in different situations and how
       use affects the interpretation of the sentence.

   •   Discourse – concerns how the immediately preceding sentences affect the
       interpretation of the next sentence. For example, interpreting pronouns and
       interpreting the temporal aspects of the information.

   •   World Knowledge – includes general knowledge about the world. What each
       language user must know about the other’s beliefs and goals.


13.1.1 Ambiguity

I made her duck.

   •   How many different interpretations does this sentence have?
   •   What are the reasons for the ambiguity?
   •   The categories of knowledge of language can be thought of as ambiguity
       resolving components.
   •   How can each ambiguous piece be resolved?
   •   Does speech input make the sentence even more ambiguous?
           – Yes – deciding word boundaries
   •   Some interpretations of : I made her duck.

                                                         Version 2 CSE IIT, Kharagpur
1. I cooked duck for her.
           2. I cooked duck belonging to her.
           3. I created a toy duck which she owns.
           4. I caused her to quickly lower her head or body.
           5. I used magic and turned her into a duck.
   •   duck – morphologically and syntactically ambiguous:
               noun or verb.
   •   her – syntactically ambiguous: dative or possessive.
   •   make – semantically ambiguous: cook or create.
   •   make – syntactically ambiguous:
           – Transitive – takes a direct object. => 2
           – Di-transitive – takes two objects. => 5
           – Takes a direct object and a verb. => 4

Ambiguities are resolved using the following methods.

   •   models and algorithms are introduced to resolve ambiguities at different levels.
   •   part-of-speech tagging -- Deciding whether duck is verb or noun.
   •   word-sense disambiguation -- Deciding whether make is create or cook.
   •   lexical disambiguation -- Resolution of part-of-speech and        word-sense
       ambiguities are two important kinds of lexical disambiguation.
   •   syntactic ambiguity -- her duck is an example of syntactic ambiguity, and can be
       addressed by probabilistic parsing.

13.1.2 Models to represent Linguistic Knowledge

   •   We will use certain formalisms (models) to represent the required linguistic
       knowledge.
   •   State Machines -- FSAs, FSTs, HMMs, ATNs, RTNs
   •   Formal Rule Systems -- Context Free Grammars, Unification Grammars,
       Probabilistic CFGs.
   •   Logic-based Formalisms -- first order predicate logic, some higher order logic.
   •   Models of Uncertainty -- Bayesian probability theory.

13.1.3 Algorithms to Manipulate Linguistic Knowledge

   •   We will use algorithms to manipulate the models of linguistic knowledge to
       produce the desired behavior.
   •   Most of the algorithms we will study are transducers and parsers.
           – These algorithms construct some structure based on their input.
   •   Since the language is ambiguous at all levels,
       these algorithms are never simple processes.
   •   Categories of most algorithms that will be used can fall into following categories.
           – state space search
           – dynamic programming


                                                           Version 2 CSE IIT, Kharagpur
13.2 Natural Language Understanding
The steps in natural language understanding are as follows:

           Words

Morphological Analysis

           Morphologically analyzed words (another step: POS tagging)

Syntactic Analysis

           Syntactic Structure

Semantic Analysis

           Context-independent meaning representation

Discourse Processing

            Final meaning representation




                                                          Version 2 CSE IIT, Kharagpur

More Related Content

What's hot

Natural language processing
Natural language processingNatural language processing
Natural language processingBasha Chand
 
Natural language processing
Natural language processingNatural language processing
Natural language processingKarenVacca
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingSaurabh Kaushik
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingYasir Khan
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)Kuppusamy P
 
Natural language processing
Natural language processingNatural language processing
Natural language processingYogendra Tamang
 
Hidden markov model based part of speech tagger for sinhala language
Hidden markov model based part of speech tagger for sinhala languageHidden markov model based part of speech tagger for sinhala language
Hidden markov model based part of speech tagger for sinhala languageijnlc
 
Analysing interlanguage: how do we know what learners know?
Analysing interlanguage: how do we know what learners know?Analysing interlanguage: how do we know what learners know?
Analysing interlanguage: how do we know what learners know?Pun Yanut
 
Processing Written English
Processing Written EnglishProcessing Written English
Processing Written EnglishRuel Montefolka
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4DigiGurukul
 
Spotting The Difference–Machine Versus Human Translation
Spotting The Difference–Machine Versus Human TranslationSpotting The Difference–Machine Versus Human Translation
Spotting The Difference–Machine Versus Human TranslationUlatus
 
Warnikchow - SAIT - 0529
Warnikchow - SAIT - 0529Warnikchow - SAIT - 0529
Warnikchow - SAIT - 0529WarNik Chow
 
Roger t bell
Roger t bellRoger t bell
Roger t bellnobedi12
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionArif A.
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review Jayneel Vora
 

What's hot (20)

Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural language processing (nlp)
Natural language processing (nlp)Natural language processing (nlp)
Natural language processing (nlp)
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Hidden markov model based part of speech tagger for sinhala language
Hidden markov model based part of speech tagger for sinhala languageHidden markov model based part of speech tagger for sinhala language
Hidden markov model based part of speech tagger for sinhala language
 
Analysing interlanguage: how do we know what learners know?
Analysing interlanguage: how do we know what learners know?Analysing interlanguage: how do we know what learners know?
Analysing interlanguage: how do we know what learners know?
 
Processing Written English
Processing Written EnglishProcessing Written English
Processing Written English
 
Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4Artificial Intelligence Notes Unit 4
Artificial Intelligence Notes Unit 4
 
NLP_KASHK: Introduction
NLP_KASHK: Introduction NLP_KASHK: Introduction
NLP_KASHK: Introduction
 
Spotting The Difference–Machine Versus Human Translation
Spotting The Difference–Machine Versus Human TranslationSpotting The Difference–Machine Versus Human Translation
Spotting The Difference–Machine Versus Human Translation
 
ReseachPaper
ReseachPaperReseachPaper
ReseachPaper
 
Warnikchow - SAIT - 0529
Warnikchow - SAIT - 0529Warnikchow - SAIT - 0529
Warnikchow - SAIT - 0529
 
Roger t bell
Roger t bellRoger t bell
Roger t bell
 
Psycolinguistic
PsycolinguisticPsycolinguistic
Psycolinguistic
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review
 
Interlanguage7777
Interlanguage7777Interlanguage7777
Interlanguage7777
 

Similar to Lesson 40

CNN for NLP using text analysis by using deep learning
CNN for NLP using text analysis by using deep learningCNN for NLP using text analysis by using deep learning
CNN for NLP using text analysis by using deep learningKv Sagar
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxSHIBDASDUTTA
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.netwww.myassignmenthelp.net
 
Natural Language Processing Course in AI
Natural Language Processing Course in AINatural Language Processing Course in AI
Natural Language Processing Course in AISATHYANARAYANAKB
 
nlp-01.pptxvvvffffffvvvvvfeddeeddffffffffff
nlp-01.pptxvvvffffffvvvvvfeddeeddffffffffffnlp-01.pptxvvvffffffvvvvvfeddeeddffffffffff
nlp-01.pptxvvvffffffvvvvvfeddeeddffffffffffSushantVyas1
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)Abdullah al Mamun
 
LARG-20010118-Natasha e wejkwrlkwr klwrlknrklnr k.ppt
LARG-20010118-Natasha e wejkwrlkwr klwrlknrklnr k.pptLARG-20010118-Natasha e wejkwrlkwr klwrlknrklnr k.ppt
LARG-20010118-Natasha e wejkwrlkwr klwrlknrklnr k.pptMonsefJraid
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptOlusolaTop
 
Teaching reading
Teaching readingTeaching reading
Teaching readingDanny Guo
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingRishikese MR
 
Sla glossary
Sla glossarySla glossary
Sla glossarygyhsarah
 
Eng19 week 6 (aural comprehension instruction2)
Eng19 week 6 (aural comprehension instruction2)Eng19 week 6 (aural comprehension instruction2)
Eng19 week 6 (aural comprehension instruction2)leolita
 
Transformational grammar
Transformational grammarTransformational grammar
Transformational grammarJack Feng
 

Similar to Lesson 40 (20)

L1 nlp intro
L1 nlp introL1 nlp intro
L1 nlp intro
 
CNN for NLP using text analysis by using deep learning
CNN for NLP using text analysis by using deep learningCNN for NLP using text analysis by using deep learning
CNN for NLP using text analysis by using deep learning
 
Natural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptxNatural Language Processing (NLP).pptx
Natural Language Processing (NLP).pptx
 
AI Lesson 41
AI Lesson 41AI Lesson 41
AI Lesson 41
 
natural language processing help at myassignmenthelp.net
natural language processing  help at myassignmenthelp.netnatural language processing  help at myassignmenthelp.net
natural language processing help at myassignmenthelp.net
 
AI - natural language processing
AI - natural language processingAI - natural language processing
AI - natural language processing
 
Natural Language Processing Course in AI
Natural Language Processing Course in AINatural Language Processing Course in AI
Natural Language Processing Course in AI
 
1 Introduction.ppt
1 Introduction.ppt1 Introduction.ppt
1 Introduction.ppt
 
nlp-01.pptxvvvffffffvvvvvfeddeeddffffffffff
nlp-01.pptxvvvffffffvvvvvfeddeeddffffffffffnlp-01.pptxvvvffffffvvvvvfeddeeddffffffffff
nlp-01.pptxvvvffffffvvvvvfeddeeddffffffffff
 
Lesson 41.pdf
Lesson 41.pdfLesson 41.pdf
Lesson 41.pdf
 
English for Specific Purposes
English for Specific PurposesEnglish for Specific Purposes
English for Specific Purposes
 
Ch9
Ch9Ch9
Ch9
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 
LARG-20010118-Natasha e wejkwrlkwr klwrlknrklnr k.ppt
LARG-20010118-Natasha e wejkwrlkwr klwrlknrklnr k.pptLARG-20010118-Natasha e wejkwrlkwr klwrlknrklnr k.ppt
LARG-20010118-Natasha e wejkwrlkwr klwrlknrklnr k.ppt
 
NLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.pptNLP introduced and in 47 slides Lecture 1.ppt
NLP introduced and in 47 slides Lecture 1.ppt
 
Teaching reading
Teaching readingTeaching reading
Teaching reading
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Sla glossary
Sla glossarySla glossary
Sla glossary
 
Eng19 week 6 (aural comprehension instruction2)
Eng19 week 6 (aural comprehension instruction2)Eng19 week 6 (aural comprehension instruction2)
Eng19 week 6 (aural comprehension instruction2)
 
Transformational grammar
Transformational grammarTransformational grammar
Transformational grammar
 

More from Avijit Kumar (20)

Lesson 18
Lesson 18Lesson 18
Lesson 18
 
Lesson 19
Lesson 19Lesson 19
Lesson 19
 
Lesson 20
Lesson 20Lesson 20
Lesson 20
 
Lesson 21
Lesson 21Lesson 21
Lesson 21
 
Lesson 23
Lesson 23Lesson 23
Lesson 23
 
Lesson 25
Lesson 25Lesson 25
Lesson 25
 
Lesson 24
Lesson 24Lesson 24
Lesson 24
 
Lesson 22
Lesson 22Lesson 22
Lesson 22
 
Lesson 26
Lesson 26Lesson 26
Lesson 26
 
Lesson 27
Lesson 27Lesson 27
Lesson 27
 
Lesson 28
Lesson 28Lesson 28
Lesson 28
 
Lesson 29
Lesson 29Lesson 29
Lesson 29
 
Lesson 30
Lesson 30Lesson 30
Lesson 30
 
Lesson 31
Lesson 31Lesson 31
Lesson 31
 
Lesson 32
Lesson 32Lesson 32
Lesson 32
 
Lesson 33
Lesson 33Lesson 33
Lesson 33
 
Lesson 36
Lesson 36Lesson 36
Lesson 36
 
Lesson 35
Lesson 35Lesson 35
Lesson 35
 
Lesson 37
Lesson 37Lesson 37
Lesson 37
 
Lesson 39
Lesson 39Lesson 39
Lesson 39
 

Recently uploaded

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Recently uploaded (20)

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

Lesson 40

  • 1. Module 13 Natural Language Processing Version 2 CSE IIT, Kharagpur
  • 2. 13.1 Instructional Objective • The students should understand the necessity of natural language processing in building an intelligent system • Students should understand the difference between natural and formal language and the difficulty in processing the former • Students should understand the ambiguities that arise in natural language processing • Students should understand the language information required like like o Phonology o Morphology o Syntax o Semantic o Discourse o World knowledge • Students should understand the steps involved in natural language understanding and generation • The student should be familiar with basic language processing operations like o Morphological analysis o Parts-of-Speech tagging o Lexical processing o Semantic processing o Knowledge representation At the end of this lesson the student should be able to do the following: • Design the processing steps required for a NLP task • Implement the processing techniques. Version 2 CSE IIT, Kharagpur
  • 3. Lesson 40 Issues in NLP Version 2 CSE IIT, Kharagpur
  • 4. 13.1 Natural Language Processing Natural Language Processing (NLP) is the process of computer analysis of input provided in a human language (natural language), and conversion of this input into a useful form of representation. The field of NLP is primarily concerned with getting computers to perform useful and interesting tasks with human languages. The field of NLP is secondarily concerned with helping us come to a better understanding of human language. • The input/output of a NLP system can be: – written text – speech • We will mostly concerned with written text (not speech). • To process written text, we need: – lexical, syntactic, semantic knowledge about the language – discourse information, real world knowledge • To process spoken language, we need everything required to process written text, plus the challenges of speech recognition and speech synthesis. There are two components of NLP. • Natural Language Understanding – Mapping the given input in the natural language into a useful representation. – Different level of analysis required: morphological analysis, syntactic analysis, semantic analysis, discourse analysis, … • Natural Language Generation – Producing output in the natural language from some internal representation. – Different level of synthesis required: deep planning (what to say), syntactic generation • NL Understanding is much harder than NL Generation. But, still both of them are hard. The difficulty in NL understanding arises from the following facts: • Natural language is extremely rich in form and structure, and very ambiguous. – How to represent meaning, – Which structures map to which meaning structures. • One input can mean many different things. Ambiguity can be at different levels. Version 2 CSE IIT, Kharagpur
  • 5. Lexical (word level) ambiguity -- different meanings of words – Syntactic ambiguity -- different ways to parse the sentence – Interpreting partial information -- how to interpret pronouns – Contextual information -- context of the sentence may affect the meaning of that sentence. • Many input can mean the same thing. • Interaction among components of the input is not clear. The following language related information are useful in NLP: • Phonology – concerns how words are related to the sounds that realize them. • Morphology – concerns how words are constructed from more basic meaning units called morphemes. A morpheme is the primitive unit of meaning in a language. • Syntax – concerns how can be put together to form correct sentences and determines what structural role each word plays in the sentence and what phrases are subparts of other phrases. • Semantics – concerns what words mean and how these meaning combine in sentences to form sentence meaning. The study of context-independent meaning. • Pragmatics – concerns how sentences are used in different situations and how use affects the interpretation of the sentence. • Discourse – concerns how the immediately preceding sentences affect the interpretation of the next sentence. For example, interpreting pronouns and interpreting the temporal aspects of the information. • World Knowledge – includes general knowledge about the world. What each language user must know about the other’s beliefs and goals. 13.1.1 Ambiguity I made her duck. • How many different interpretations does this sentence have? • What are the reasons for the ambiguity? • The categories of knowledge of language can be thought of as ambiguity resolving components. • How can each ambiguous piece be resolved? • Does speech input make the sentence even more ambiguous? – Yes – deciding word boundaries • Some interpretations of : I made her duck. Version 2 CSE IIT, Kharagpur
  • 6. 1. I cooked duck for her. 2. I cooked duck belonging to her. 3. I created a toy duck which she owns. 4. I caused her to quickly lower her head or body. 5. I used magic and turned her into a duck. • duck – morphologically and syntactically ambiguous: noun or verb. • her – syntactically ambiguous: dative or possessive. • make – semantically ambiguous: cook or create. • make – syntactically ambiguous: – Transitive – takes a direct object. => 2 – Di-transitive – takes two objects. => 5 – Takes a direct object and a verb. => 4 Ambiguities are resolved using the following methods. • models and algorithms are introduced to resolve ambiguities at different levels. • part-of-speech tagging -- Deciding whether duck is verb or noun. • word-sense disambiguation -- Deciding whether make is create or cook. • lexical disambiguation -- Resolution of part-of-speech and word-sense ambiguities are two important kinds of lexical disambiguation. • syntactic ambiguity -- her duck is an example of syntactic ambiguity, and can be addressed by probabilistic parsing. 13.1.2 Models to represent Linguistic Knowledge • We will use certain formalisms (models) to represent the required linguistic knowledge. • State Machines -- FSAs, FSTs, HMMs, ATNs, RTNs • Formal Rule Systems -- Context Free Grammars, Unification Grammars, Probabilistic CFGs. • Logic-based Formalisms -- first order predicate logic, some higher order logic. • Models of Uncertainty -- Bayesian probability theory. 13.1.3 Algorithms to Manipulate Linguistic Knowledge • We will use algorithms to manipulate the models of linguistic knowledge to produce the desired behavior. • Most of the algorithms we will study are transducers and parsers. – These algorithms construct some structure based on their input. • Since the language is ambiguous at all levels, these algorithms are never simple processes. • Categories of most algorithms that will be used can fall into following categories. – state space search – dynamic programming Version 2 CSE IIT, Kharagpur
  • 7. 13.2 Natural Language Understanding The steps in natural language understanding are as follows: Words Morphological Analysis Morphologically analyzed words (another step: POS tagging) Syntactic Analysis Syntactic Structure Semantic Analysis Context-independent meaning representation Discourse Processing Final meaning representation Version 2 CSE IIT, Kharagpur