SlideShare uma empresa Scribd logo
1 de 65
Real World NLP and ML
Devin Bost
Software Architect
devin.bost@imaginelearning.com
Questions welcome during
presentation
NLP
ML
Sentiment analysis
Automated essay scoring
Content summarization
Chatbots
Information retrieval
Cluster analysis
Language neural
networks
Language translation
AI Big Data
http://eduardomagrani.com/en/we-are-big-data-new-technologies-and-personal-data-management/
Everything is
NLP
ML
Sentiment analysis
Automated essay scoring
Content summarization
Chatbots
Information retrieval
Cluster analysis
Language neural
networks
Document categorization
AI Big Data
https://openi.nlm.nih.gov/detailedresult.php?img=PMC2841207_1471-2105-11-101-2&req=4
Penn Treebank example:
Meta-analysis of studies: Burns, G. A., Feng, D., & Hovy, E. (2008). Intelligent
approaches to mining the primary research literature:
techniques, systems, and examples. In Computational
Intelligence in Medical Informatics (pp. 17-50). Springer,
Berlin, Heidelberg. Retrieved from:
http://www.academia.edu/download/30797420/burns_feng
_hovy_comp_intel-final.pdf
https://medium.com/@athif.shaffy/one-hot-encoding-of-text-b69124bef0a7
One-hot vectors:
https://www.kdnuggets.com/2017/04/must-know-curse-dimensionality.html
The curse of dimensionality:
Statistical word embeddings: Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases
and their compositionality. In Advances in neural information processing systems (pp. 3111-3119). At:
https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
Cited by over 9804 papers according to Google Scholar, as of: 10/22/2018
Based on statistical relationships between words:
https://www.coursera.org/lecture/intro-to-deep-learning/word-embeddings-dhzl5
Images from: https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/
https://www.packtpub.com/mapt/book/big_data_and_business_intelligence/9781788398060/8/ch08lvl1sec56/mapping-with-word2vec-embeddings
SwiftKey:
So what are bigrams?
Examples of less useful bigrams:
Of the
what is
they are
to the
way to
hey you
Examples of useful bigrams:
New York
West Virginia
Imagine Learning
Imagine Math
Microsoft Office
Neural network
Ping pong
The problem with student chat data:
The problem with student chat data:
Top 10 bigrams:
1. need help
2. back need
3. nice day
4. help nice
5. click back
6. please come
7. hear voice
8. type please
9. problem ask
10.ask find
http://playground.tensorflow.org/
http://playground.tensorflow.org/
Chatbot:
• Luong, M. T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-
based neural machine translation. arXiv preprint arXiv:1508.04025. At:
https://arxiv.org/pdf/1508.04025
• Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly
learning to align and translate. arXiv preprint arXiv:1409.0473. At:
https://arxiv.org/pdf/1409.0473
• Next attempt with chatbots.
• Added context.
• Validation score improved!...
• Problem is that model validation scores have different meanings
when the model changes
• Key point: Ensure that your application allows your accuracy to be
imperfect.
Neural networks: Very good at detecting patterns, but
they don’t always beat less complex ML models (e.g.
Naïve Bayes, XGBoost, etc.)
The data volume paradigm:
Most common cases
https://blog.easysol.net/building-ai-applications/
https://people.duke.edu/~ccc14/sta-663/CUDAPython.html
Question
Analysis project
Using NLTK
named entity
extraction
. . .
• Without data, you have no machine learning!
• Should be obvious, right?
• You’d be surprised.
https://medium.com/data-ops/the-data-lake-is-a-design-pattern-888323323c66
Data pantry:
https://www.pinterest.com/pin/424956914822695344/?lp=true
Great software… but now what? (The problem LinkedIn experienced.)
The solution
How to make it cost effective:
Kinesis data
stream
Kinesis analytics or
Flink/ Spark-
Streaming on EMR
Lambda,
Proactive
Intervention
IoT Core
Client
devices/browsers
API
Gateway
(1)
(2) (3)
(4)
(5)
(6)
(7) (8) (9)
(10)
Lambda,
Auth.
S3 storage CloudWatch logging CloudWatch logging
$8/year
per 1,000,000
events
+ cost of analytics
Real-time streaming predictive analytics
https://analyse.kmi.open.ac.uk/
Clues that you have an organizational or
architectural problem:
Excuse #1: But all of our developers are so constantly busy that
we will never get around to making those changes!
Implication: But we have so much technical debt, we
spend all of our time putting out fires!
Image cropped from: https://www.flickr.com/photos/41284017@N08/9599182665
From: http://gis.nwcg.gov/gist_2004/logos/federal_logos.html
Excuse #2: We have all of the data that we
need!
Implication: We are so unwilling
to take a look at the reality of our problem
that we have no idea how bad it really is.
Excuse #3: It’s really not
that important. We have
higher priorities.
Implication: We think
we’re so right 100% of
the time that no data
could possibly ever tell us
that we’re ever wrong.
Or, we don’t make
mistakes (only our
developers do).
https://www.recruiter.com/i/does-a-worker%E2%80%99s-personal-life-affect-your-brand/fingers-pointing-blame-to-man/
Excuse #4: We make our decisions based on our instincts and
gut feelings.
Implication: We’re so unwilling to have our
assumptions challenged that we don’t want to think about the
idea that additional data could make our instincts even better.
https://medium.com/@vaidoshia/building-my-own-design-gut-instinct-f7f773d6d608
Excuse #5: That’s nice, but that doesn’t apply to us.
Implication: I live in my own little world where truth
doesn’t apply to me.
https://www.deviantart.com/bluejennybird/art/my-own-planet-159966933
Excuse #6: That would be too expensive.
Implication: We’re at least 5 years behind on what big
data technologies and cloud services can offer.
What’s a serverless
function?
What’s an event
stream?
[picture of a person
getting rained on by a
cloud] http://i.telegraph.co.uk/multimedia/archive/01244/appleimac1984_1244597i.jpg
Excuse #7: We don’t have time for that.
We’re so busy chasing the carrot in front of our faces that we probably won’t notice if our
competitors knock us out of the market until it’s too late.
https://www.derekhuether.com/blog/2010/11/12/chasing-the-carrot
https://forum.slowtwitch.com/forum/Slowtwitch_Forums_C1/Triathlon_Forum_F1/What%27s_the_average_first_year_out_of_pocket%3F_P5797700/
Excuse #8: We need to make use of our existing technologies.
We can’t bear the thought that we have been wasting
our investments in outdated technologies. Or, we don’t think
this effort is important enough to justify our investment. (See
excuses 1-7.)
Excuse #9: It would be too hard to maintain
Implication:
I don’t know what “serverless” means. Is that part of
“The Cloud”?
https://www.thoughtco.com/types-of-clouds-recognize-in-the-sky-4025569
Python libraries for exploring word embeddings include:
• Gensim: https://radimrehurek.com/gensim/tutorial.html
• SpaCy: https://spacy.io/usage/spacy-101
• NLTK: https://www.nltk.org
• CoreNLP: https://stanfordnlp.github.io/CoreNLP/

Mais conteúdo relacionado

Semelhante a Real World NLP, ML, and Big Data

Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018Grigory Sapunov
 
Dl applicationlandscape-mar2018-180405144127
Dl applicationlandscape-mar2018-180405144127Dl applicationlandscape-mar2018-180405144127
Dl applicationlandscape-mar2018-180405144127Aravindharamanan S
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIPramit Choudhary
 
EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session Steve Ardire
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science DemystifiedEmily Robinson
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Sri Ambati
 
Data Science Workshop - day 1
Data Science Workshop - day 1Data Science Workshop - day 1
Data Science Workshop - day 1Aseel Addawood
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018HJ van Veen
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk Vijay Ganti
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk Vijay Ganti
 
Data Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software EngineersData Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software EngineersDomino Data Lab
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)HPCC Systems
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest linkCS, NcState
 
(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learningMax Pagels
 
Getting to Know Your Data with R
Getting to Know Your Data with RGetting to Know Your Data with R
Getting to Know Your Data with RStephen Withington
 
[243] turning data into value
[243] turning data into value[243] turning data into value
[243] turning data into valueNAVER D2
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data ScienceSanghamitra Deb
 
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...Chetan Khatri
 

Semelhante a Real World NLP, ML, and Big Data (20)

Data science presentation
Data science presentationData science presentation
Data science presentation
 
Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018
 
Dl applicationlandscape-mar2018-180405144127
Dl applicationlandscape-mar2018-180405144127Dl applicationlandscape-mar2018-180405144127
Dl applicationlandscape-mar2018-180405144127
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AI
 
EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session EDW 2015 cognitive computing panel session
EDW 2015 cognitive computing panel session
 
Data Science Demystified
Data Science DemystifiedData Science Demystified
Data Science Demystified
 
Big Data
Big DataBig Data
Big Data
 
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
Machine Learning Interpretability - Mateusz Dymczyk - H2O AI World London 2018
 
Data Science Workshop - day 1
Data Science Workshop - day 1Data Science Workshop - day 1
Data Science Workshop - day 1
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk NLP & Machine Learning - An Introductory Talk
NLP & Machine Learning - An Introductory Talk
 
Data Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software EngineersData Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software Engineers
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning(In)convenient truths about applied machine learning
(In)convenient truths about applied machine learning
 
Getting to Know Your Data with R
Getting to Know Your Data with RGetting to Know Your Data with R
Getting to Know Your Data with R
 
[243] turning data into value
[243] turning data into value[243] turning data into value
[243] turning data into value
 
From Rocket Science to Data Science
From Rocket Science to Data ScienceFrom Rocket Science to Data Science
From Rocket Science to Data Science
 
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
 

Mais de Devin Bost

Vector Search / Generative AI introduction at Pulsar Meetup
Vector Search / Generative AI introduction at Pulsar MeetupVector Search / Generative AI introduction at Pulsar Meetup
Vector Search / Generative AI introduction at Pulsar MeetupDevin Bost
 
Streaming Patterns and Best Practices with Apache Pulsar for Enabling Machine...
Streaming Patterns and Best Practices with Apache Pulsar for Enabling Machine...Streaming Patterns and Best Practices with Apache Pulsar for Enabling Machine...
Streaming Patterns and Best Practices with Apache Pulsar for Enabling Machine...Devin Bost
 
How to introduce Apache Pulsar into your organization successfully - Devin Bost
How to introduce Apache Pulsar into your organization successfully - Devin BostHow to introduce Apache Pulsar into your organization successfully - Devin Bost
How to introduce Apache Pulsar into your organization successfully - Devin BostDevin Bost
 
Pulsar Architectural Patterns for CI/CD Automation and Self-Service
Pulsar Architectural Patterns for CI/CD Automation and Self-ServicePulsar Architectural Patterns for CI/CD Automation and Self-Service
Pulsar Architectural Patterns for CI/CD Automation and Self-ServiceDevin Bost
 
Real-World Pulsar Architectural Patterns
Real-World Pulsar Architectural PatternsReal-World Pulsar Architectural Patterns
Real-World Pulsar Architectural PatternsDevin Bost
 
Apache Pulsar - Real-time data flows drive core business processes
Apache Pulsar - Real-time data flows drive core business processesApache Pulsar - Real-time data flows drive core business processes
Apache Pulsar - Real-time data flows drive core business processesDevin Bost
 

Mais de Devin Bost (6)

Vector Search / Generative AI introduction at Pulsar Meetup
Vector Search / Generative AI introduction at Pulsar MeetupVector Search / Generative AI introduction at Pulsar Meetup
Vector Search / Generative AI introduction at Pulsar Meetup
 
Streaming Patterns and Best Practices with Apache Pulsar for Enabling Machine...
Streaming Patterns and Best Practices with Apache Pulsar for Enabling Machine...Streaming Patterns and Best Practices with Apache Pulsar for Enabling Machine...
Streaming Patterns and Best Practices with Apache Pulsar for Enabling Machine...
 
How to introduce Apache Pulsar into your organization successfully - Devin Bost
How to introduce Apache Pulsar into your organization successfully - Devin BostHow to introduce Apache Pulsar into your organization successfully - Devin Bost
How to introduce Apache Pulsar into your organization successfully - Devin Bost
 
Pulsar Architectural Patterns for CI/CD Automation and Self-Service
Pulsar Architectural Patterns for CI/CD Automation and Self-ServicePulsar Architectural Patterns for CI/CD Automation and Self-Service
Pulsar Architectural Patterns for CI/CD Automation and Self-Service
 
Real-World Pulsar Architectural Patterns
Real-World Pulsar Architectural PatternsReal-World Pulsar Architectural Patterns
Real-World Pulsar Architectural Patterns
 
Apache Pulsar - Real-time data flows drive core business processes
Apache Pulsar - Real-time data flows drive core business processesApache Pulsar - Real-time data flows drive core business processes
Apache Pulsar - Real-time data flows drive core business processes
 

Último

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 

Último (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 

Real World NLP, ML, and Big Data

Notas do Editor

  1. Unfortunately, when people think of big data, they often think of this: Massive amounts of data.
  2. But the reality is that big data is everywhere. Everything that can potentially collect data should be considered. Data can still be considered Big Data if the variety is high, such as if many different data sources are involved.
  3. Considering that big data is all inclusive, where then does NLP fit into this landscape?
  4. Natural language processing (NLP) can be used to extract features from human language. Our goal is usually to gain deeper insight into what is actually being said by using a computational approach that allows us to detect patterns or gain insights in an automated manner.
  5. What are
  6. Extracted terms can be mapped to domain-specific ontologies. An ontology is like a word map. Ontologies can be industry specific or can be broad. Either way, they allow us to attach additional meaning to our original data. In Big Data, we call this enrichment.
  7. It is common to use what are called one-hot word vectors to represent the words in the data. They are very commonly used with neural network models, such as the models used for Neural Machine Translation (NMT).
  8. Unfortunately, this can result in what we call The Curse of Dimensionality. This is a problem that results from the high number of dimensions that are represented by modeling languages. For example, for neural machine translation (NMT) models used to translate languages, it is common to have millions or even billions of dimensions, depending on the size of the dictionary used.
  9. A very influential method was developed in 2013 by some very bright researchers who discovered a dimensional reduction technique that creates what we call “word embeddings.” These embeddings represent statistical relationships between words and the words that they frequently co-occur adjacent to. This method allows us to replace millions of dimensions of one-hot vectors that contain no context with hundreds of vectors that contain very rich context.
  10. As a consequence, the word embedding represents a vector-space representation of the dimensional reduction.
  11. Because the model is a linear space, it allows us to represent relationships like this:
  12. The linear features of word embeddings are particularly useful for building neural network models for languages.
  13. The latest version of SwiftKey uses a neural network to predict text to accelerate typing on a mobile device.
  14. Bigrams are pairs of words that co-occur in a dataset. Bigrams are the most useful when they represent distinct meaning when combined.
  15. Any useful bigrams?… (Ignore the b character at the start of the string.)
  16. Any useful bigrams?… (Ignore the b character at the start of the string.)
  17. Here are some good libraries for experimenting with word embeddings and natural language processing.