SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
CHALLENGES IN BUILDING
NATURAL LANGUAGE PROCESSING
APPLICATIONS FOR
!पाली LANGUAGE
- Chandan Goopta
Unicode number: U+0915
HTML-code: क
NATURAL LANGUAGE PROCESSING
NLP Task English Indic Languages Nepali
Machine Translation Very Good Good
Very Poor
(Google/M$)
Named Entity
Recognition
Very Good Fair None
(Few Ground work)
Optical Character
Recognition
Very Good Poor Very Poor
POS Tagging Good Poor Very Poor
Sentiment Analysis Very Good Fair
Poor
(works on-going)
Speech Recognition Good Poor
None
(Google’s on-work)
What So Far?
SENTIMENT ANALYSIS
• Chunking | Sentence Chunker
• Tagging | POS Tagger
• Resources | SentiWordNet, Subjectivity WordList
• Machine Learning | Corpus, Tagged Samples
Build Everything from Scratch
OR
I CAN USE ENGLISH
LANGUAGE
RESOURCES FOR
NEPALI
SENTIMENT ANALYSIS
• Chunking | Sentence Chunker
• Tagging | POS Tagger
• Resources | SentiWordNet, Subjectivity WordList
• Machine Learning | Corpus, Tagged Samples
I am like Others are Like Professors are Like
BACK TO CHALLENGES
• Unicode Rendering in
Dev-tools
• Lack of Resources
• Very Less Previous 

Works/Research
WHY PYTHON?
–Prof. James A. Hendler

University of Maryland
“I have the students learn Python in our
undergraduate and graduate Semantic Web
courses. Why? Because basically there's nothing
else with the flexibility and as many web
libraries”
WHY PYTHON?
• NLTK, although not the most efficient
implementation, provides a lot of awesome tools
to quickly prototype a hypothesis
Source: Quora
WHY PYTHON?
• Scipy + Numpy: Everything that isn't in NLTK is
definitely in these libraries. If you want to use more
advanced algorithms like Latent Semantic
Indexing or Latent Dirichlet Allocation, Python has
libraries to do that.
Source: Quora
WHY PYTHON?
• Python has really great XML/HTML parsing
libraries such as Beautiful Soup and Scrape.py. 



You can use these libraries to quickly scrape the web and generate large
data sets to improve the performance of your models (because lets face
it, big data trumps complexity)
Source: Quora
WHY PYTHON?
• Python has great web-frameworks like Django/
Pylons/Tornado. 



If you invent a revolutionary sarcasm detector that can predict trends in
the stock market, you can quickly integrated it into a web service, make
millions, and buy a large island in a third-world country.
Source: Quora
WHY PYTHON?
• Consider your other options: It would not make
sense to use a compiled language like C++/Java
for this type of work unless you needed to increase
performance (computational speed, not model
accuracy). 



As far as I can tell, Ruby is completely useless for any Machine Learning,
Data Mining, or Natural Language Processing task. Maybe you could use
Lisp, but at this point, Python has a larger eco-system.
Source: Quora
THANK YOU

Mais conteúdo relacionado

Semelhante a Challenges in Building NLP Applications in Nepali Language

Indextank east bay ruby meetup slides
Indextank east bay ruby meetup slidesIndextank east bay ruby meetup slides
Indextank east bay ruby meetup slidesYogiWanKenobi
 
090216 Presentatie Evernote And Tarpi
090216 Presentatie Evernote And Tarpi090216 Presentatie Evernote And Tarpi
090216 Presentatie Evernote And Tarpiguest2af082
 
Finding Anything: Real-time Search with IndexTank
Finding Anything: Real-time Search with IndexTankFinding Anything: Real-time Search with IndexTank
Finding Anything: Real-time Search with IndexTankYogiWanKenobi
 
Finding Anything: Real-time Search with IndexTank
Finding Anything:  Real-time Search with IndexTankFinding Anything:  Real-time Search with IndexTank
Finding Anything: Real-time Search with IndexTankYogiWanKenobi
 
Machine Learning 101 | Essential Tools for Machine Learning
Machine Learning 101 | Essential Tools for Machine LearningMachine Learning 101 | Essential Tools for Machine Learning
Machine Learning 101 | Essential Tools for Machine LearningHafiz Muhammad Attaullah
 
Building multi billion ( dollars, users, documents ) search engines on open ...
Building multi billion ( dollars, users, documents ) search engines  on open ...Building multi billion ( dollars, users, documents ) search engines  on open ...
Building multi billion ( dollars, users, documents ) search engines on open ...Andrei Lopatenko
 
Picking programming packages
Picking programming packagesPicking programming packages
Picking programming packagesAbe Gong
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLPaco Nathan
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and searchNathan McMinn
 
Python Programming Introduction For Students
Python Programming Introduction For StudentsPython Programming Introduction For Students
Python Programming Introduction For StudentsShaunakBale1
 
Enterprise Frameworks: Java & .NET
Enterprise Frameworks: Java & .NETEnterprise Frameworks: Java & .NET
Enterprise Frameworks: Java & .NETAnant Corporation
 
Engaging a Developer Audience: Documentation and More
Engaging a Developer Audience: Documentation and MoreEngaging a Developer Audience: Documentation and More
Engaging a Developer Audience: Documentation and MoreAnya Stettler
 
🌟Is Learning Python Your Career Game-Changer? 🚀🐍
🌟Is Learning Python Your  Career Game-Changer? 🚀🐍🌟Is Learning Python Your  Career Game-Changer? 🚀🐍
🌟Is Learning Python Your Career Game-Changer? 🚀🐍abhishekdf3
 
PARC Forum 2009: Adventures in SearchLand
PARC Forum 2009: Adventures in SearchLandPARC Forum 2009: Adventures in SearchLand
PARC Forum 2009: Adventures in SearchLandValeria de Paiva
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEDiana Maynard
 
Python programming ppt.pptx
Python programming ppt.pptxPython programming ppt.pptx
Python programming ppt.pptxnagendrasai12
 
How to start Python? - lesson 1
How to start Python? - lesson 1How to start Python? - lesson 1
How to start Python? - lesson 1Shohel Rana
 

Semelhante a Challenges in Building NLP Applications in Nepali Language (20)

Indextank east bay ruby meetup slides
Indextank east bay ruby meetup slidesIndextank east bay ruby meetup slides
Indextank east bay ruby meetup slides
 
Learning to code in 2020
Learning to code in 2020Learning to code in 2020
Learning to code in 2020
 
090216 Presentatie Evernote And Tarpi
090216 Presentatie Evernote And Tarpi090216 Presentatie Evernote And Tarpi
090216 Presentatie Evernote And Tarpi
 
Finding Anything: Real-time Search with IndexTank
Finding Anything: Real-time Search with IndexTankFinding Anything: Real-time Search with IndexTank
Finding Anything: Real-time Search with IndexTank
 
Finding Anything: Real-time Search with IndexTank
Finding Anything:  Real-time Search with IndexTankFinding Anything:  Real-time Search with IndexTank
Finding Anything: Real-time Search with IndexTank
 
Machine Learning 101 | Essential Tools for Machine Learning
Machine Learning 101 | Essential Tools for Machine LearningMachine Learning 101 | Essential Tools for Machine Learning
Machine Learning 101 | Essential Tools for Machine Learning
 
Building multi billion ( dollars, users, documents ) search engines on open ...
Building multi billion ( dollars, users, documents ) search engines  on open ...Building multi billion ( dollars, users, documents ) search engines  on open ...
Building multi billion ( dollars, users, documents ) search engines on open ...
 
Picking programming packages
Picking programming packagesPicking programming packages
Picking programming packages
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and search
 
Python Programming Introduction For Students
Python Programming Introduction For StudentsPython Programming Introduction For Students
Python Programming Introduction For Students
 
Enterprise Frameworks: Java & .NET
Enterprise Frameworks: Java & .NETEnterprise Frameworks: Java & .NET
Enterprise Frameworks: Java & .NET
 
Engaging a Developer Audience: Documentation and More
Engaging a Developer Audience: Documentation and MoreEngaging a Developer Audience: Documentation and More
Engaging a Developer Audience: Documentation and More
 
🌟Is Learning Python Your Career Game-Changer? 🚀🐍
🌟Is Learning Python Your  Career Game-Changer? 🚀🐍🌟Is Learning Python Your  Career Game-Changer? 🚀🐍
🌟Is Learning Python Your Career Game-Changer? 🚀🐍
 
resume
resumeresume
resume
 
PARC Forum 2009: Adventures in SearchLand
PARC Forum 2009: Adventures in SearchLandPARC Forum 2009: Adventures in SearchLand
PARC Forum 2009: Adventures in SearchLand
 
Text Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATEText Analysis and Semantic Search with GATE
Text Analysis and Semantic Search with GATE
 
Python programming ppt.pptx
Python programming ppt.pptxPython programming ppt.pptx
Python programming ppt.pptx
 
How to start Python? - lesson 1
How to start Python? - lesson 1How to start Python? - lesson 1
How to start Python? - lesson 1
 
PYTHON UNIT 1
PYTHON UNIT 1PYTHON UNIT 1
PYTHON UNIT 1
 

Último

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Challenges in Building NLP Applications in Nepali Language

  • 1. CHALLENGES IN BUILDING NATURAL LANGUAGE PROCESSING APPLICATIONS FOR !पाली LANGUAGE - Chandan Goopta Unicode number: U+0915 HTML-code: क
  • 3. NLP Task English Indic Languages Nepali Machine Translation Very Good Good Very Poor (Google/M$) Named Entity Recognition Very Good Fair None (Few Ground work) Optical Character Recognition Very Good Poor Very Poor POS Tagging Good Poor Very Poor Sentiment Analysis Very Good Fair Poor (works on-going) Speech Recognition Good Poor None (Google’s on-work) What So Far?
  • 4.
  • 5. SENTIMENT ANALYSIS • Chunking | Sentence Chunker • Tagging | POS Tagger • Resources | SentiWordNet, Subjectivity WordList • Machine Learning | Corpus, Tagged Samples
  • 7. OR I CAN USE ENGLISH LANGUAGE RESOURCES FOR NEPALI
  • 8. SENTIMENT ANALYSIS • Chunking | Sentence Chunker • Tagging | POS Tagger • Resources | SentiWordNet, Subjectivity WordList • Machine Learning | Corpus, Tagged Samples
  • 9. I am like Others are Like Professors are Like
  • 10. BACK TO CHALLENGES • Unicode Rendering in Dev-tools • Lack of Resources • Very Less Previous 
 Works/Research
  • 12. –Prof. James A. Hendler
 University of Maryland “I have the students learn Python in our undergraduate and graduate Semantic Web courses. Why? Because basically there's nothing else with the flexibility and as many web libraries”
  • 13. WHY PYTHON? • NLTK, although not the most efficient implementation, provides a lot of awesome tools to quickly prototype a hypothesis Source: Quora
  • 14. WHY PYTHON? • Scipy + Numpy: Everything that isn't in NLTK is definitely in these libraries. If you want to use more advanced algorithms like Latent Semantic Indexing or Latent Dirichlet Allocation, Python has libraries to do that. Source: Quora
  • 15. WHY PYTHON? • Python has really great XML/HTML parsing libraries such as Beautiful Soup and Scrape.py. 
 
 You can use these libraries to quickly scrape the web and generate large data sets to improve the performance of your models (because lets face it, big data trumps complexity) Source: Quora
  • 16. WHY PYTHON? • Python has great web-frameworks like Django/ Pylons/Tornado. 
 
 If you invent a revolutionary sarcasm detector that can predict trends in the stock market, you can quickly integrated it into a web service, make millions, and buy a large island in a third-world country. Source: Quora
  • 17. WHY PYTHON? • Consider your other options: It would not make sense to use a compiled language like C++/Java for this type of work unless you needed to increase performance (computational speed, not model accuracy). 
 
 As far as I can tell, Ruby is completely useless for any Machine Learning, Data Mining, or Natural Language Processing task. Maybe you could use Lisp, but at this point, Python has a larger eco-system. Source: Quora