In this lecture, we will look at why emoji are important and the reasons behind their increase in popularity, how emoji meanings are generated/assigned, how to calculate emoji similarity, and how to disambiguate emoji meanings.
A Critique of the Proposed National Education Policy Reform
Analyzing Emoji in Text
1. Analyzing Emoji in Text
Research Scientist, Holler.io, San Mateo, CA.
sanjaya@holler.io | http://sanjw.org/ | @sanjrockz
SANJAYA WIJERATNE
BAX-423 Big Data Analytics
GUEST LECTURE AT THE GRADUATE SCHOOL OF MANAGEMENT OF THE UNIVERSITY OF CALIFORNIA, DAVIS, 24TH
/25TH
APRIL, 2020.
2. Meet Your Instructor
► Research Scientist at Holler.io
► Work on NLP
► Academic Background
► Education - Ph.D. in Computer Science and Engineering
► Research Interest - Emoji/Test Processing, NLU
► My Journey So Far
► I’m from Sri Lanka -> B.Sc. in IT (University of Moratuwa,
Sri Lanka) -> ~2 years as a Software Engineer, 7.5 years
as a GRA/TA at Wright State University
4/19/2020BAX-423 Big Data Analytics, UC Davis
2
3. Emoji Chain Gang Usage Non-Gang
Usage
32.25% 1.14%
53% 1.71%
How I Started Working with Emoji
Anthropology 189:001, UC Berkeley
3
Image Source – https://arxiv.org/pdf/1610.09516.pdf
4/19/2020
5. Emoji = Picture Character
5
► Introduced by Shigetaka Kurita in 1999
4/19/2020BAX-423 Big Data Analytics, UC Davis
► Unicode staterted supporting emoji
character set in 2010
► Emoji are not emoticons. Eg. :-), :-(
6. Why Emoji Usage Increased?
4/19/2020BAX-423 Big Data Analytics, UC Davis
6
8. A Few Open Emoji Research
Problems related to Text Processing
► Challenges in interpreting the meaning of an
emoji in a message context
► Emoji similarity
► Emoji sense disambiguation
► Emoji prediction
► Emoji-based retrieval and search
4/19/2020BAX-423 Big Data Analytics, UC Davis
8
9. A Few Open Emoji Research
Problems related to Text Processing
► Challenges in interpreting the meaning of an
emoji in a message context
► Emoji similarity
► Emoji sense disambiguation
► Emoji prediction
► Emoji-based retrieval and search
4/19/2020BAX-423 Big Data Analytics, UC Davis
9
11. Emoji Semantics
► Emoji are inherently designed with no rigid
semantics
► Emoji does not have a grammar, thus, emoji cannot
be used as a language on its own
► How emoji meanings are assigned?
► Initially, by the emoji creators
► Later, by the users
11
4/19/2020BAX-423 Big Data Analytics, UC Davis
12. How Emoji get their meanings?
12
► Emoji creators submit possible emoji meanings in
their proposals
► Once accepted, these will be available in
Unicode Common Locale Data Repository
(CLDR) at
https://www.unicode.org/cldr/charts/latest/anno
tations/other.html
4/19/2020BAX-423 Big Data Analytics, UC Davis
13. How emoji get their meanings?
► When people replace words using emoji (logographic)
► Homonymy relations in languages (E.g., – eye & I)
13
Image Source – https://goo.gl/rjS1hX
I
*Actual social media content
4/19/2020BAX-423 Big Data Analytics, UC Davis
14. Getting the Emoji Meanings
14
Image Source – http://emojinet.knoesis.org
4/19/2020BAX-423 Big Data Analytics, UC Davis
15. EmojiNet
15
Image Source – https://arxiv.org/pdf/1707.04652.pdf
4/19/2020BAX-423 Big Data Analytics, UC Davis
17. Emoji Similarity Problem
17
4/19/2020BAX-423 Big Data Analytics, UC Davis
► Measuring the semantic similarity of emoji such
that the measure reflects the likeness of their
meaning, interpretation or intended use.”
[Wijeratne et al., 2017]
18. Notion of Emoji Similarity
18
4/19/2020BAX-423 Big Data Analytics, UC Davis
► Notion of emoji similarity is broad
► Pixel-based Emoji Similarity
► Meaning-based Emoji Similarity
20. Distributional Semantics
20
► Finds semantic properties of linguistic items (words)
based on their distribution in a large corpus
► Based on Distributional Hypothesis (Harris, 1954)
► Words that are used and occur in the same contexts tend to
purport similar meanings
► We use large text corpora with emoji to learn
distributional semantics of emoji, which reveals
relationships among emoji
4/19/2020BAX-423 Big Data Analytics, UC Davis
21. Learning Emoji Embeddings
► Learn distributional semantics of words as word
embeddings using two corpora (Tweets and
Google News)
► Convert the words in emoji meanings to vectors
using word embeddings (emoji embeddings)
► Evaluate the similarity (distance) of emoji in the
embedding space using EmoSim508, a new
dataset with 508 emoji pairs
21
4/19/2020BAX-423 Big Data Analytics, UC Davis
23. Ground Truth Data Creation
23
4/19/2020BAX-423 Big Data Analytics, UC Davis
► Most frequently occuring
emoji pairs from a 110M
Twitter dataset with emoji
► Evaluated each emoji
pair for their similarity and
relatedness by 10 human
users
24. Intrinsic Evaluation
► Using four different emoji definitions
(Sense_Desc., Sense_Label, Sense_Def.,
Sense_All) and two corpora (Twitter and Google
News), we trained eight emoji embedding
models for each emoji
► We calculated emoji similarity of the 508 emoji
pairs using each embedding model
24
4/19/2020BAX-423 Big Data Analytics, UC Davis
25. Intrinsic Evaluation Cont.
► Using Spearman’s Rank Correlation Coefficient
(Spearman’s ρ), we compared the similarity
rankings of each model with ground truth data
25
4/19/2020BAX-423 Big Data Analytics, UC Davis
26. Extrinsic Evaluation
► We tested our emoji embedding models using a
sentiment analysis baseline
► Our baseline had 12,920 English tweets, and 2,295 of
them had emoji
► All words in the tweets were replaced with their
corresponding word embeddings and emoji were
replaced with emoji embeddings learned
26
4/19/2020BAX-423 Big Data Analytics, UC Davis
28. Key Takeaways
► Combining emoji sense knowledge with
distributional semantics could improve the emoji
embedding models
► Longer sense definitions are not suitable for emoji
similarity experiments
28
4/19/2020BAX-423 Big Data Analytics, UC Davis
30. Emoji Sense Disambiguation Problem
30
Image Source – https://goo.gl/rjS1hX 4/19/2020BAX-423 Big Data Analytics, UC Davis
*Actual social media contentI Look
► “The ability to identify the meaning of an emoji in the context of a
message in a computational manner” [Wijeratne et al., 2017].
31. Emoji Sense Disambiguation
► Currently, no labeled datasets available to solve the
emoji sense disambiguation in a supervised setting
31
4/19/2020BAX-423 Big Data Analytics, UC Davis
32. Emoji Sense Disambiguation Cont.
► We selected 25 most commonly misunderstood
emoji and selected 50 tweets for each emoji
► Used Simplified LESK algorithm for disambiguation
► Context words were learned for each emoji sense
definition using Twitter and Google News-based word
embedding models
► Twitter-based embeddings outperform others
32
4/19/2020BAX-423 Big Data Analytics, UC Davis
33. Results and Takeaways
33
4/19/2020BAX-423 Big Data Analytics, UC Davis
► Tools designed for well-formed text processing will not
work well when used for ill-formatted text processing
► Sense disambiguation accuracy increases with the
increase of the number of context words used
35. Recap
35
4/19/2020BAX-423 Big Data Analytics, UC Davis
► We looked at
► Why it is important to do emoji analysis
► How emoji get their meanings
► How to calculate emoji similarity
► How to disambiguate the meaning of an emoji
37. References
► Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran. A Semantics-Based Measure of
Emoji Similarity. In 2017 IEEE/WIC/ACM International Conference on Web Intelligence (Web
Intelligence 2017). Leipzig, Germany; 2017. [PDF]
► Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran. EmojiNet: An Open Service and
API for Emoji Sense Discovery. In 11th International AAAI Conference on Web and Social Media
(ICWSM 2017). Montreal, Canada; 2017. [PDF]
► Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran. EmojiNet: Building a Machine
Readable Sense Inventory for Emoji. In 8th International Conference on Social Informatics (SocInfo
2016). Bellevue, WA, USA; 2016. [PDF]
► Lakshika Balasuriya, Sanjaya Wijeratne, Derek Doran, Amit Sheth. Finding Street Gang Members on
Twitter, In The 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis
and Mining (ASONAM 2016). San Francisco, CA, USA; 2016. [PDF]
37
4/19/2020BAX-423 Big Data Analytics, UC Davis