SlideShare uma empresa Scribd logo
1 de 13
Baixar para ler offline
Introduction to Natural
Language Processing (NLP)
Presented By Wing Chan
Agenda
What is NLP? How NLP
works?
NLP
Applications
Project Demo –
Switch
What's NLP?
• Stands for Natural Language
Processing.
• A process that try to understand or
generate pieces of human language.
NLP often uses AI techniques such
as Machine Learning.
• To convert unstructured data such
as text documents and voice data
from human to structured and
meaningful knowledges. This is an
important part of Text mining / Text
Analysis.
Basic Concepts in NLP
How NLP Works
Key features in NLP pipeline can do:
• Entity recognition- extracting
entities from text documents
• Topics analysis - extracting
features and topics
• Sentiment analysis - Analyzing
text for positive and negative
feelings
• Classification - Classifying
documents
List of common NLP Algorithms
NLP Applications
• Spellcheckers or Spam filters
• Recommendation system, i.e. Netflix
• Voice assistants, I.e. Alexa, Siri
• Online search, i.e. Google
• Language Translation, i.e. Google translate
Project Demo - Switch
• Switch is a comprehensive Jobs search engine using content from Twitter. It allows Job
seekers to search relevant job posting, submitted by Twitter users. We collect all tweets
related to job posting using Twitter's API and process them with Natural Language
Processing (NLP) techniques to return relevant job results to our users.
• Website: http://switch-ui.s3-website.us-east-2.amazonaws.com
Technologies we used
• Twitter APIs - To scrape data from data source Twitter we used Twitter API
• Amazon Web Service (AWS) - We used AWS to host our services. We leveraged many
AWS services for this project includes Lambda functions, DynamoDB, Elastic Search
Engine, S3 and CloudWatch Events.
• Python Libraries used are Tweepy, Spacy, Pandas, NumPy, re, json, requests
• Web Development Framework– For our end user Web portal we used React as our
platform of development.
• Other Tools – Google Colabs (Jupyter notebook), Trello (Kanban board).
NLP techniques we used
• Named Entity Recognition – is an information extraction technique to extract
unstructured text (i.e. Tweet messages) into pre-defined categories information such as
organization name and geographic location.
• Naive Bayes Classification – is a simple text retrieval technique for constructing a
classifier based on applying Bayes' algorithm with independence features and labeled
training data. We used it to identify and discard irrelevant tweets in this project. This is
also a type of supervised learning methods.
Architecture Diagram
Lesson learned
• Data exploration is important – be sure to understand your data by sampling them first.
• The defaulttweet message was 140 chars. We need to get the full text message (256 chars) instead.
• Missing valuablefield values, i.e. Screen name (@interstate_batteries)
• Collecting good training datais hard.
• Understand your platform limitation
• AWS Lambda deploymentpackage limitation - 50 MB (zipped for direct upload).
References
• Switch Web Portal: http://switch-ui.s3-website.us-east-2.amazonaws.com/
• Switch Source Codes and Documentation:https://lab.textdata.org/wingkc2/switch
• PresentationVideo: https://youtu.be/loMG4rdGXkg

Mais conteúdo relacionado

Mais procurados

Large Language Models - From RNN to BERT
Large Language Models - From RNN to BERTLarge Language Models - From RNN to BERT
Large Language Models - From RNN to BERTATPowr
 
Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014Philip Fisher-Ogden
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...Universitat Politècnica de Catalunya
 
Clustering paradigms and Partitioning Algorithms
Clustering paradigms and Partitioning AlgorithmsClustering paradigms and Partitioning Algorithms
Clustering paradigms and Partitioning AlgorithmsUmang MIshra
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streamshktripathy
 
neural networks
 neural networks neural networks
neural networksjoshiblog
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 
HDFS NameNode High Availability
HDFS NameNode High AvailabilityHDFS NameNode High Availability
HDFS NameNode High AvailabilityDataWorks Summit
 
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...Jason Tsai
 
Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Databricks
 
Graphes de connaissances avec Neo4j
Graphes de connaissances avec Neo4j Graphes de connaissances avec Neo4j
Graphes de connaissances avec Neo4j Neo4j
 
Présentation pfe Big Data Hachem SELMI et Ahmed DRIDI
Présentation pfe Big Data Hachem SELMI et Ahmed DRIDIPrésentation pfe Big Data Hachem SELMI et Ahmed DRIDI
Présentation pfe Big Data Hachem SELMI et Ahmed DRIDIHaShem Selmi
 
Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep LearningNatasha Latysheva
 
Learning to Rank with Neural Networks
Learning to Rank with Neural NetworksLearning to Rank with Neural Networks
Learning to Rank with Neural NetworksBhaskar Mitra
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)SiamAhmed16
 

Mais procurados (20)

Large Language Models - From RNN to BERT
Large Language Models - From RNN to BERTLarge Language Models - From RNN to BERT
Large Language Models - From RNN to BERT
 
Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014Netflix viewing data architecture evolution - QCon 2014
Netflix viewing data architecture evolution - QCon 2014
 
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
End-to-end Speech Recognition with Recurrent Neural Networks (D3L6 Deep Learn...
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 
Clustering paradigms and Partitioning Algorithms
Clustering paradigms and Partitioning AlgorithmsClustering paradigms and Partitioning Algorithms
Clustering paradigms and Partitioning Algorithms
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
neural networks
 neural networks neural networks
neural networks
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
One shot learning
One shot learningOne shot learning
One shot learning
 
HDFS NameNode High Availability
HDFS NameNode High AvailabilityHDFS NameNode High Availability
HDFS NameNode High Availability
 
Big Data
Big DataBig Data
Big Data
 
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
Introduction to Spiking Neural Networks: From a Computational Neuroscience pe...
 
Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0Introduction to TensorFlow 2.0
Introduction to TensorFlow 2.0
 
Graphes de connaissances avec Neo4j
Graphes de connaissances avec Neo4j Graphes de connaissances avec Neo4j
Graphes de connaissances avec Neo4j
 
Présentation pfe Big Data Hachem SELMI et Ahmed DRIDI
Présentation pfe Big Data Hachem SELMI et Ahmed DRIDIPrésentation pfe Big Data Hachem SELMI et Ahmed DRIDI
Présentation pfe Big Data Hachem SELMI et Ahmed DRIDI
 
Sequence Modelling with Deep Learning
Sequence Modelling with Deep LearningSequence Modelling with Deep Learning
Sequence Modelling with Deep Learning
 
Bert
BertBert
Bert
 
Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
 
Learning to Rank with Neural Networks
Learning to Rank with Neural NetworksLearning to Rank with Neural Networks
Learning to Rank with Neural Networks
 
Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)Presentation About Big Data (DBMS)
Presentation About Big Data (DBMS)
 

Semelhante a Introduction to Natural Language Processing (NLP)

Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and searchNathan McMinn
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on TwitterSmritiAgarwal26
 
Building NLP solutions using Python
Building NLP solutions using PythonBuilding NLP solutions using Python
Building NLP solutions using Pythonbotsplash.com
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text MiningMinha Hwang
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Groupbotsplash.com
 
SE-IT JAVA LAB OOP CONCEPT
SE-IT JAVA LAB OOP CONCEPTSE-IT JAVA LAB OOP CONCEPT
SE-IT JAVA LAB OOP CONCEPTnikshaikh786
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingTyrone Systems
 
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...Sri Ambati
 
Webinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningWebinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningLucidworks
 
Natural Language Processing for Data Analytics - Tel Aviv Summit 2018
Natural Language Processing for Data Analytics - Tel Aviv Summit 2018Natural Language Processing for Data Analytics - Tel Aviv Summit 2018
Natural Language Processing for Data Analytics - Tel Aviv Summit 2018Amazon Web Services
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisCrowdFlower
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsSanghamitra Deb
 
ProjectsSummary.pptx
ProjectsSummary.pptxProjectsSummary.pptx
ProjectsSummary.pptxJamesKirk79
 
Software Programming with Python II.pptx
Software Programming with Python II.pptxSoftware Programming with Python II.pptx
Software Programming with Python II.pptxGevitaChinnaiah
 
SG_UserGroup_Oct20_2022_NLP_AzureLangStudio.pptx
SG_UserGroup_Oct20_2022_NLP_AzureLangStudio.pptxSG_UserGroup_Oct20_2022_NLP_AzureLangStudio.pptx
SG_UserGroup_Oct20_2022_NLP_AzureLangStudio.pptxPriyankaShah668821
 
Crowdsourcing or bust: The Indexer, Archives NZ
Crowdsourcing or bust: The Indexer, Archives NZ Crowdsourcing or bust: The Indexer, Archives NZ
Crowdsourcing or bust: The Indexer, Archives NZ donellemckinley
 

Semelhante a Introduction to Natural Language Processing (NLP) (20)

Natural language processing and search
Natural language processing and searchNatural language processing and search
Natural language processing and search
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on Twitter
 
Building NLP solutions using Python
Building NLP solutions using PythonBuilding NLP solutions using Python
Building NLP solutions using Python
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Building NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML GroupBuilding NLP solutions for Davidson ML Group
Building NLP solutions for Davidson ML Group
 
SE-IT JAVA LAB OOP CONCEPT
SE-IT JAVA LAB OOP CONCEPTSE-IT JAVA LAB OOP CONCEPT
SE-IT JAVA LAB OOP CONCEPT
 
An Introduction to Natural Language Processing
An Introduction to Natural Language ProcessingAn Introduction to Natural Language Processing
An Introduction to Natural Language Processing
 
AI & ML
AI & MLAI & ML
AI & ML
 
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
Invoice 2 Vec: Creating AI to Read Documents - Mark Landry - H2O AI World Lon...
 
Webinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep LearningWebinar: Question Answering and Virtual Assistants with Deep Learning
Webinar: Question Answering and Virtual Assistants with Deep Learning
 
Natural Language Processing for Data Analytics - Tel Aviv Summit 2018
Natural Language Processing for Data Analytics - Tel Aviv Summit 2018Natural Language Processing for Data Analytics - Tel Aviv Summit 2018
Natural Language Processing for Data Analytics - Tel Aviv Summit 2018
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment Analysis
 
NLP and Deep Learning for non_experts
NLP and Deep Learning for non_expertsNLP and Deep Learning for non_experts
NLP and Deep Learning for non_experts
 
ProjectsSummary.pptx
ProjectsSummary.pptxProjectsSummary.pptx
ProjectsSummary.pptx
 
Solved Big Data and Data Science Projects pdf.pdf
Solved Big Data and Data Science Projects pdf.pdfSolved Big Data and Data Science Projects pdf.pdf
Solved Big Data and Data Science Projects pdf.pdf
 
Software Programming with Python II.pptx
Software Programming with Python II.pptxSoftware Programming with Python II.pptx
Software Programming with Python II.pptx
 
Taming Text
Taming TextTaming Text
Taming Text
 
SG_UserGroup_Oct20_2022_NLP_AzureLangStudio.pptx
SG_UserGroup_Oct20_2022_NLP_AzureLangStudio.pptxSG_UserGroup_Oct20_2022_NLP_AzureLangStudio.pptx
SG_UserGroup_Oct20_2022_NLP_AzureLangStudio.pptx
 
Crowdsourcing or bust: The Indexer, Archives NZ
Crowdsourcing or bust: The Indexer, Archives NZ Crowdsourcing or bust: The Indexer, Archives NZ
Crowdsourcing or bust: The Indexer, Archives NZ
 
Final presentation
Final presentationFinal presentation
Final presentation
 

Último

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 

Último (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 

Introduction to Natural Language Processing (NLP)

  • 1. Introduction to Natural Language Processing (NLP) Presented By Wing Chan
  • 2. Agenda What is NLP? How NLP works? NLP Applications Project Demo – Switch
  • 3. What's NLP? • Stands for Natural Language Processing. • A process that try to understand or generate pieces of human language. NLP often uses AI techniques such as Machine Learning. • To convert unstructured data such as text documents and voice data from human to structured and meaningful knowledges. This is an important part of Text mining / Text Analysis.
  • 5. How NLP Works Key features in NLP pipeline can do: • Entity recognition- extracting entities from text documents • Topics analysis - extracting features and topics • Sentiment analysis - Analyzing text for positive and negative feelings • Classification - Classifying documents
  • 6. List of common NLP Algorithms
  • 7. NLP Applications • Spellcheckers or Spam filters • Recommendation system, i.e. Netflix • Voice assistants, I.e. Alexa, Siri • Online search, i.e. Google • Language Translation, i.e. Google translate
  • 8. Project Demo - Switch • Switch is a comprehensive Jobs search engine using content from Twitter. It allows Job seekers to search relevant job posting, submitted by Twitter users. We collect all tweets related to job posting using Twitter's API and process them with Natural Language Processing (NLP) techniques to return relevant job results to our users. • Website: http://switch-ui.s3-website.us-east-2.amazonaws.com
  • 9. Technologies we used • Twitter APIs - To scrape data from data source Twitter we used Twitter API • Amazon Web Service (AWS) - We used AWS to host our services. We leveraged many AWS services for this project includes Lambda functions, DynamoDB, Elastic Search Engine, S3 and CloudWatch Events. • Python Libraries used are Tweepy, Spacy, Pandas, NumPy, re, json, requests • Web Development Framework– For our end user Web portal we used React as our platform of development. • Other Tools – Google Colabs (Jupyter notebook), Trello (Kanban board).
  • 10. NLP techniques we used • Named Entity Recognition – is an information extraction technique to extract unstructured text (i.e. Tweet messages) into pre-defined categories information such as organization name and geographic location. • Naive Bayes Classification – is a simple text retrieval technique for constructing a classifier based on applying Bayes' algorithm with independence features and labeled training data. We used it to identify and discard irrelevant tweets in this project. This is also a type of supervised learning methods.
  • 12. Lesson learned • Data exploration is important – be sure to understand your data by sampling them first. • The defaulttweet message was 140 chars. We need to get the full text message (256 chars) instead. • Missing valuablefield values, i.e. Screen name (@interstate_batteries) • Collecting good training datais hard. • Understand your platform limitation • AWS Lambda deploymentpackage limitation - 50 MB (zipped for direct upload).
  • 13. References • Switch Web Portal: http://switch-ui.s3-website.us-east-2.amazonaws.com/ • Switch Source Codes and Documentation:https://lab.textdata.org/wingkc2/switch • PresentationVideo: https://youtu.be/loMG4rdGXkg