Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Global Analytics: Text, Speech, Sentiment, and Sense
1. Global Analytics: Text, Speech,
Sentiment, and Sense
Seth Grimes
Alta Plana Corporation
@sethgrimes
December 4, 2014
2. Global Analytics: Text, Speech, Sentiment, and Sense
2
“Reading from Text is a Hard Problem”
Thus the Orb he roam'd
With narrow search; and with inspection deep
Consider'd every Creature, which of all
Most opportune might serve his Wiles.
-- John Milton, Paradise Lost
LT-Accelerate – 4 December, 2014
Eugène
Delacroix,
St. Michael
Defeats the
Devil
3. Global Analytics: Text, Speech, Sentiment, and Sense
3
“Reading from Text is a Hard Problem”
Thus the Orb he roam'd
With narrow search; and with inspection deep
Consider'd every Creature, which of all
Most opportune might serve his Wiles.
-- John Milton, Paradise Lost
LT-Accelerate – 4 December, 2014
Eugène
Delacroix,
St. Michael
Defeats the
Devil
Data Space,
Indexing
Searc
h
Analysis
Intent,
Goals
Context
5. Global Analytics: Text, Speech, Sentiment, and Sense
5
Analytics is the systematic application of
algorithmic methods that derive and deliver
information, typically expressed
quantitatively, whether in the form of
indicators, tables, visualizations, or models.
• Systematic means formal & repeatable.
• Algorithmic contrasts with heuristic.
Analytics creates and/or applies models.
LT-Accelerate – 4 December, 2014
6. Global Analytics: Text, Speech, Sentiment, and Sense
6
Models make the unstructured computable.
LT-Accelerate – 4 December, 2014
http://www.tropicalisland.de/NYC_New_York_
Brooklyn_Bridge_from_World_Trade_Center_
b.jpg
x(t) = t
y(t) = ½ a (et/a + e-t/a)
= acosh(t/a)
http://en.wikipedia.org/wiki/Seven_Bridges_of_K%C3%B6nigsberg
7. Global Analytics: Text, Speech, Sentiment, and Sense
7
Sixty+ years of analysis & modelling
progress:
LT-Accelerate – 4 December, 2014
Text
Numbers
Patterns & Insights
Connections
Interactions
8.
9. Document
input and
processing
Knowledge
handling is
key
Desk Set (1957): Computer engineer
Richard Sumner (Spencer Tracy)
and television network librarian
Bunny Watson (Katherine Hepburn)
and the "electronic brain" EMERAC.
Hans Peter Luhn
“A Business Intelligence System”
IBM Journal, October 1958
10. Global Analytics: Text, Speech, Sentiment, and Sense
10
Luhn’s analysis of
Messengers of the Nervous
System, a Scientific American
article
http://wordle.net,
applied to a Luhn-cited
NY Times article
“Statistical information derived from word frequency and distribution is
used by the machine to compute a relative measure of significance, first for
individual words and then for sentences. Sentences scoring highest in
significance are extracted and printed out to become the auto-abstract.”
H.P. Luhn, The Automatic Creation of Literature Abstracts, IBM Journal, 1958.
LT-Accelerate – 4 December, 2014
11. Global Analytics: Text, Speech, Sentiment, and Sense
11
LT-Accelerate – 4 December, 2014
“This rather unsophisticated argument on
‘significance’ avoids such linguistic
implications as grammar and syntax... No
attention is paid to the logical and semantic
relationships the author has established.”
-- H.P. Luhn
~ 2004-5
12.
13.
14.
15. Global Analytics: Text, Speech, Sentiment, and Sense
15
Patterns, Insights & Connections
~ 2009-
12
LT-Accelerate – 4 December, 2014
16. Global Analytics: Text, Speech, Sentiment, and Sense
16
LT-Accelerate – 4 December, 2014
… also
commonly
explored via
dashboards.
17. Global Analytics: Text, Speech, Sentiment, and Sense
17
Do you currently need (or expect to need) to extract or analyze...
Expect, 22%
Expect, 28%
Expect, 33%
Expect, 25%
Expect, 23%
Expect, 23%
Expect, 24%
LT-Accelerate – 4 December, 2014
Current, 66%
Current, 54%
Current, 47%
Current, 56%
Current, 51%
Current, 47%
Current, 34%
Current, 31%
Current, 33%
Expect, 21%
Expect, 28%
Topics and themes
Sentiment, opinions, attitudes, emotions,…
Relationships and/or facts
Named entities – people, companies, …
Concepts, that is, abstract groups of entities
Metadata such as document author,…
Other entities – phone numbers, part/product …
0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Semantic annotations
Events
Text Analytics 2014
http://altaplana.com/TA2014
What information?
18. Global Analytics: Text, Speech, Sentiment, and Sense
18
Emotion and outcomes
LT-Accelerate – 4 December, 2014
19. Global Analytics: Text, Speech, Sentiment, and Sense
19
LT-Accelerate – 4 December, 2014
“The share rise in users
who selected
Arabic…coincided with
much of the civil
unrest… in Middle
Eastern countries.”
http://bits.blogs.nytimes.com/2014/03/09/the
-languages-of-twitter-users/
20. Global Analytics: Text, Speech, Sentiment, and Sense
20
24%
LT-Accelerate – 4 December, 2014
Non-English language support?
5%
2%
1%
3%
Other
Other European or Slavic/Cyrillic
Other East Asian
Other Arabic script (including Urdu,…
2%
1%
7%
10%
3%
4%
Other African
Turkish or Turkic
Spanish
Scandinavian or Baltic
Russian
Portuguese
Polish
Korean
Japanese
Italian
Hindi, Urdu, Bengali, Punjabi, or other…
2%
1%
0%
16%
9%
34%
36%
2%
18%
7%
13%
8%
38%
3%
9%
17%
3%
28%
7%
17%
2%
10%
11%
15%
8%
4%
17%
21%
3%
20%
4%
0%
2%
0% 10% 20% 30% 40% 50% 60%
Greek
German
French
Dutch
Chinese
Bahasa Indonesia or Malay
Arabic
Current
Within 2 years
Text Analytics 2014
http://altaplana.com/TA2014
21. Global Analytics: Text, Speech, Sentiment, and Sense
21
LT-Accelerate – 4 December, 2014
Audio including speech
Images
Video
IOT
http://www.geekosystem.com/
facebook-face-recognition/
http://www.sciencedirect.com/science
/article/pii/S0167639312000118
http://flylib.com/books/en/2.495.1.54/1/
Beyond text
22. Global Analytics: Text, Speech, Sentiment, and Sense
22
http://searchuserinterfaces.com/
LT-Accelerate – 4 December, 2014
“It is convenient to divide the entire
information access process into two
main components: information
retrieval through searching and
browsing, and analysis and synthesis
of results. This broader process is
often referred to in the literature as
sensemaking.
Sensemaking refers to an iterative
process of formulating a conceptual
representation from of a large
volume of information.”
– Marti Hearst, 2009
Sensemaking
23. Global Analytics: Text, Speech, Sentiment, and Sense
23
Challenges
Context
Interaction
Narrative and discourse
Correlation, integration, and synthesis
Sentiment++: Mood, opinions, emotions, intent
Question answering
Dialog, storytelling
Cross-lingual / “omni-channel” implementation
…
Prescription, autonomy
…
Singularity?
LT-Accelerate – 4 December, 2014
24. Global Analytics: Text, Speech, Sentiment, and Sense
24
Opportunity enablers
LT-Accelerate – 4 December, 2014
The API economy
I.e., on-demand, via-API Web services
Cloud deployment and service delivery
…enabling rapid deployment
Data aggregation and enrichment
Examples: Gnip, DataSift, Spinn3r, and Moreover
Growth hacking
Knowledge graphs
Machine learning
Supervised, unsupervised, active, deep
Open source
Platforms and frameworks
Examples: UIMA, GATE… Salesforce, QlikView… Python, R
25. Global Analytics: Text, Speech, Sentiment, and Sense
25
Where to?
LT-Accelerate – 4 December, 2014
26. Global Analytics: Text, Speech,
Sentiment, and Sense
Seth Grimes
Alta Plana Corporation
@sethgrimes
December 4, 2014