The document discusses various natural language processing and machine learning techniques including sentiment analysis, automated essay scoring, content summarization, chatbots, information retrieval, cluster analysis, language neural networks, and language translation. It provides examples and links to resources on topics like word embeddings, one-hot encoding, the curse of dimensionality, neural networks, and building chatbots. Key points discussed are ensuring applications allow for imperfect accuracy from models and that without data, no machine learning is possible.
3. NLP
ML
Sentiment analysis
Automated essay scoring
Content summarization
Chatbots
Information retrieval
Cluster analysis
Language neural
networks
Language translation
AI Big Data
6. NLP
ML
Sentiment analysis
Automated essay scoring
Content summarization
Chatbots
Information retrieval
Cluster analysis
Language neural
networks
Document categorization
AI Big Data
8. Meta-analysis of studies: Burns, G. A., Feng, D., & Hovy, E. (2008). Intelligent
approaches to mining the primary research literature:
techniques, systems, and examples. In Computational
Intelligence in Medical Informatics (pp. 17-50). Springer,
Berlin, Heidelberg. Retrieved from:
http://www.academia.edu/download/30797420/burns_feng
_hovy_comp_intel-final.pdf
12. Statistical word embeddings: Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases
and their compositionality. In Advances in neural information processing systems (pp. 3111-3119). At:
https://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf
Cited by over 9804 papers according to Google Scholar, as of: 10/22/2018
Based on statistical relationships between words:
https://www.coursera.org/lecture/intro-to-deep-learning/word-embeddings-dhzl5
17. So what are bigrams?
Examples of less useful bigrams:
Of the
what is
they are
to the
way to
hey you
Examples of useful bigrams:
New York
West Virginia
Imagine Learning
Imagine Math
Microsoft Office
Neural network
Ping pong
20. Top 10 bigrams:
1. need help
2. back need
3. nice day
4. help nice
5. click back
6. please come
7. hear voice
8. type please
9. problem ask
10.ask find
24. Chatbot:
• Luong, M. T., Pham, H., & Manning, C. D. (2015). Effective approaches to attention-
based neural machine translation. arXiv preprint arXiv:1508.04025. At:
https://arxiv.org/pdf/1508.04025
• Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly
learning to align and translate. arXiv preprint arXiv:1409.0473. At:
https://arxiv.org/pdf/1409.0473
27. • Validation score improved!...
• Problem is that model validation scores have different meanings
when the model changes
28.
29.
30.
31. • Key point: Ensure that your application allows your accuracy to be
imperfect.
32. Neural networks: Very good at detecting patterns, but
they don’t always beat less complex ML models (e.g.
Naïve Bayes, XGBoost, etc.)
The data volume paradigm:
Most common cases
https://blog.easysol.net/building-ai-applications/
54. Clues that you have an organizational or
architectural problem:
55. Excuse #1: But all of our developers are so constantly busy that
we will never get around to making those changes!
Implication: But we have so much technical debt, we
spend all of our time putting out fires!
Image cropped from: https://www.flickr.com/photos/41284017@N08/9599182665
From: http://gis.nwcg.gov/gist_2004/logos/federal_logos.html
56. Excuse #2: We have all of the data that we
need!
Implication: We are so unwilling
to take a look at the reality of our problem
that we have no idea how bad it really is.
57. Excuse #3: It’s really not
that important. We have
higher priorities.
Implication: We think
we’re so right 100% of
the time that no data
could possibly ever tell us
that we’re ever wrong.
Or, we don’t make
mistakes (only our
developers do).
https://www.recruiter.com/i/does-a-worker%E2%80%99s-personal-life-affect-your-brand/fingers-pointing-blame-to-man/
58. Excuse #4: We make our decisions based on our instincts and
gut feelings.
Implication: We’re so unwilling to have our
assumptions challenged that we don’t want to think about the
idea that additional data could make our instincts even better.
https://medium.com/@vaidoshia/building-my-own-design-gut-instinct-f7f773d6d608
59. Excuse #5: That’s nice, but that doesn’t apply to us.
Implication: I live in my own little world where truth
doesn’t apply to me.
https://www.deviantart.com/bluejennybird/art/my-own-planet-159966933
60. Excuse #6: That would be too expensive.
Implication: We’re at least 5 years behind on what big
data technologies and cloud services can offer.
What’s a serverless
function?
What’s an event
stream?
[picture of a person
getting rained on by a
cloud] http://i.telegraph.co.uk/multimedia/archive/01244/appleimac1984_1244597i.jpg
61. Excuse #7: We don’t have time for that.
We’re so busy chasing the carrot in front of our faces that we probably won’t notice if our
competitors knock us out of the market until it’s too late.
https://www.derekhuether.com/blog/2010/11/12/chasing-the-carrot
https://forum.slowtwitch.com/forum/Slowtwitch_Forums_C1/Triathlon_Forum_F1/What%27s_the_average_first_year_out_of_pocket%3F_P5797700/
62. Excuse #8: We need to make use of our existing technologies.
We can’t bear the thought that we have been wasting
our investments in outdated technologies. Or, we don’t think
this effort is important enough to justify our investment. (See
excuses 1-7.)
63. Excuse #9: It would be too hard to maintain
Implication:
I don’t know what “serverless” means. Is that part of
“The Cloud”?
https://www.thoughtco.com/types-of-clouds-recognize-in-the-sky-4025569
64.
65. Python libraries for exploring word embeddings include:
• Gensim: https://radimrehurek.com/gensim/tutorial.html
• SpaCy: https://spacy.io/usage/spacy-101
• NLTK: https://www.nltk.org
• CoreNLP: https://stanfordnlp.github.io/CoreNLP/
Notas do Editor
Unfortunately, when people think of big data, they often think of this: Massive amounts of data.
But the reality is that big data is everywhere. Everything that can potentially collect data should be considered. Data can still be considered Big Data if the variety is high, such as if many different data sources are involved.
Considering that big data is all inclusive, where then does NLP fit into this landscape?
Natural language processing (NLP) can be used to extract features from human language. Our goal is usually to gain deeper insight into what is actually being said by using a computational approach that allows us to detect patterns or gain insights in an automated manner.
What are
Extracted terms can be mapped to domain-specific ontologies. An ontology is like a word map. Ontologies can be industry specific or can be broad. Either way, they allow us to attach additional meaning to our original data. In Big Data, we call this enrichment.
It is common to use what are called one-hot word vectors to represent the words in the data. They are very commonly used with neural network models, such as the models used for Neural Machine Translation (NMT).
Unfortunately, this can result in what we call The Curse of Dimensionality. This is a problem that results from the high number of dimensions that are represented by modeling languages. For example, for neural machine translation (NMT) models used to translate languages, it is common to have millions or even billions of dimensions, depending on the size of the dictionary used.
A very influential method was developed in 2013 by some very bright researchers who discovered a dimensional reduction technique that creates what we call “word embeddings.” These embeddings represent statistical relationships between words and the words that they frequently co-occur adjacent to. This method allows us to replace millions of dimensions of one-hot vectors that contain no context with hundreds of vectors that contain very rich context.
As a consequence, the word embedding represents a vector-space representation of the dimensional reduction.
Because the model is a linear space, it allows us to represent relationships like this:
The linear features of word embeddings are particularly useful for building neural network models for languages.
The latest version of SwiftKey uses a neural network to predict text to accelerate typing on a mobile device.
Bigrams are pairs of words that co-occur in a dataset. Bigrams are the most useful when they represent distinct meaning when combined.
Any useful bigrams?… (Ignore the b character at the start of the string.)
Any useful bigrams?… (Ignore the b character at the start of the string.)
Here are some good libraries for experimenting with word embeddings and natural language processing.