SlideShare a Scribd company logo
1 of 38
Download to read offline
Analyzing Emoji in Text
Research Scientist, Holler.io, San Mateo, CA.
sanjaya@holler.io | http://sanjw.org/ | @sanjrockz
SANJAYA WIJERATNE
BAX-423 Big Data Analytics
GUEST LECTURE AT THE GRADUATE SCHOOL OF MANAGEMENT OF THE UNIVERSITY OF CALIFORNIA, DAVIS, 24TH
/25TH
APRIL, 2020.
Meet Your Instructor
► Research Scientist at Holler.io
► Work on NLP
► Academic Background
► Education - Ph.D. in Computer Science and Engineering
► Research Interest - Emoji/Test Processing, NLU
► My Journey So Far
► I’m from Sri Lanka -> B.Sc. in IT (University of Moratuwa,
Sri Lanka) -> ~2 years as a Software Engineer, 7.5 years
as a GRA/TA at Wright State University
4/19/2020BAX-423 Big Data Analytics, UC Davis
2
Emoji Chain Gang Usage Non-Gang
Usage
32.25% 1.14%
53% 1.71%
How I Started Working with Emoji
Anthropology 189:001, UC Berkeley
3
Image Source – https://arxiv.org/pdf/1610.09516.pdf
4/19/2020
Why Study Emoji?
Emoji = Picture Character
5
► Introduced by Shigetaka Kurita in 1999
4/19/2020BAX-423 Big Data Analytics, UC Davis
► Unicode staterted supporting emoji
character set in 2010
► Emoji are not emoticons. Eg. :-), :-(
Why Emoji Usage Increased?
4/19/2020BAX-423 Big Data Analytics, UC Davis
6
Emoji Usage Statistics
4/19/2020BAX-423 Big Data Analytics, UC Davis
7
A Few Open Emoji Research
Problems related to Text Processing
► Challenges in interpreting the meaning of an
emoji in a message context
► Emoji similarity
► Emoji sense disambiguation
► Emoji prediction
► Emoji-based retrieval and search
4/19/2020BAX-423 Big Data Analytics, UC Davis
8
A Few Open Emoji Research
Problems related to Text Processing
► Challenges in interpreting the meaning of an
emoji in a message context
► Emoji similarity
► Emoji sense disambiguation
► Emoji prediction
► Emoji-based retrieval and search
4/19/2020BAX-423 Big Data Analytics, UC Davis
9
How Emoji get their
Meanings?
Emoji Semantics
► Emoji are inherently designed with no rigid
semantics
► Emoji does not have a grammar, thus, emoji cannot
be used as a language on its own
► How emoji meanings are assigned?
► Initially, by the emoji creators
► Later, by the users
11
4/19/2020BAX-423 Big Data Analytics, UC Davis
How Emoji get their meanings?
12
► Emoji creators submit possible emoji meanings in
their proposals
► Once accepted, these will be available in
Unicode Common Locale Data Repository
(CLDR) at
https://www.unicode.org/cldr/charts/latest/anno
tations/other.html
4/19/2020BAX-423 Big Data Analytics, UC Davis
How emoji get their meanings?
► When people replace words using emoji (logographic)
► Homonymy relations in languages (E.g., – eye & I)
13
Image Source – https://goo.gl/rjS1hX
I
*Actual social media content
4/19/2020BAX-423 Big Data Analytics, UC Davis
Getting the Emoji Meanings
14
Image Source – http://emojinet.knoesis.org
4/19/2020BAX-423 Big Data Analytics, UC Davis
EmojiNet
15
Image Source – https://arxiv.org/pdf/1707.04652.pdf
4/19/2020BAX-423 Big Data Analytics, UC Davis
Emoji Similarity Problem
Emoji Similarity Problem
17
4/19/2020BAX-423 Big Data Analytics, UC Davis
► Measuring the semantic similarity of emoji such
that the measure reflects the likeness of their
meaning, interpretation or intended use.”
[Wijeratne et al., 2017]
Notion of Emoji Similarity
18
4/19/2020BAX-423 Big Data Analytics, UC Davis
► Notion of emoji similarity is broad
► Pixel-based Emoji Similarity
► Meaning-based Emoji Similarity
Representing Emoji Meaning
19
4/19/2020BAX-423 Big Data Analytics, UC Davis
Distributional Semantics
20
► Finds semantic properties of linguistic items (words)
based on their distribution in a large corpus
► Based on Distributional Hypothesis (Harris, 1954)
► Words that are used and occur in the same contexts tend to
purport similar meanings
► We use large text corpora with emoji to learn
distributional semantics of emoji, which reveals
relationships among emoji
4/19/2020BAX-423 Big Data Analytics, UC Davis
Learning Emoji Embeddings
► Learn distributional semantics of words as word
embeddings using two corpora (Tweets and
Google News)
► Convert the words in emoji meanings to vectors
using word embeddings (emoji embeddings)
► Evaluate the similarity (distance) of emoji in the
embedding space using EmoSim508, a new
dataset with 508 emoji pairs
21
4/19/2020BAX-423 Big Data Analytics, UC Davis
Representing Emoji Meaning
22
4/19/2020BAX-423 Big Data Analytics, UC Davis
Ground Truth Data Creation
23
4/19/2020BAX-423 Big Data Analytics, UC Davis
► Most frequently occuring
emoji pairs from a 110M
Twitter dataset with emoji
► Evaluated each emoji
pair for their similarity and
relatedness by 10 human
users
Intrinsic Evaluation
► Using four different emoji definitions
(Sense_Desc., Sense_Label, Sense_Def.,
Sense_All) and two corpora (Twitter and Google
News), we trained eight emoji embedding
models for each emoji
► We calculated emoji similarity of the 508 emoji
pairs using each embedding model
24
4/19/2020BAX-423 Big Data Analytics, UC Davis
Intrinsic Evaluation Cont.
► Using Spearman’s Rank Correlation Coefficient
(Spearman’s ρ), we compared the similarity
rankings of each model with ground truth data
25
4/19/2020BAX-423 Big Data Analytics, UC Davis
Extrinsic Evaluation
► We tested our emoji embedding models using a
sentiment analysis baseline
► Our baseline had 12,920 English tweets, and 2,295 of
them had emoji
► All words in the tweets were replaced with their
corresponding word embeddings and emoji were
replaced with emoji embeddings learned
26
4/19/2020BAX-423 Big Data Analytics, UC Davis
Extrinsic Evaluation Cont.
27
4/19/2020BAX-423 Big Data Analytics, UC Davis
Key Takeaways
► Combining emoji sense knowledge with
distributional semantics could improve the emoji
embedding models
► Longer sense definitions are not suitable for emoji
similarity experiments
28
4/19/2020BAX-423 Big Data Analytics, UC Davis
Emoji Sense Disambiguation
Emoji Sense Disambiguation Problem
30
Image Source – https://goo.gl/rjS1hX 4/19/2020BAX-423 Big Data Analytics, UC Davis
*Actual social media contentI Look
► “The ability to identify the meaning of an emoji in the context of a
message in a computational manner” [Wijeratne et al., 2017].
Emoji Sense Disambiguation
► Currently, no labeled datasets available to solve the
emoji sense disambiguation in a supervised setting
31
4/19/2020BAX-423 Big Data Analytics, UC Davis
Emoji Sense Disambiguation Cont.
► We selected 25 most commonly misunderstood
emoji and selected 50 tweets for each emoji
► Used Simplified LESK algorithm for disambiguation
► Context words were learned for each emoji sense
definition using Twitter and Google News-based word
embedding models
► Twitter-based embeddings outperform others
32
4/19/2020BAX-423 Big Data Analytics, UC Davis
Results and Takeaways
33
4/19/2020BAX-423 Big Data Analytics, UC Davis
► Tools designed for well-formed text processing will not
work well when used for ill-formatted text processing
► Sense disambiguation accuracy increases with the
increase of the number of context words used
What Did We Learn?
Recap
35
4/19/2020BAX-423 Big Data Analytics, UC Davis
► We looked at
► Why it is important to do emoji analysis
► How emoji get their meanings
► How to calculate emoji similarity
► How to disambiguate the meaning of an emoji
Acknowledgements
36
Collaborators
Prof. Amit Sheth
University of South Carolina
Prof. Derek Doran
Wright State University
Lakshika Balasuriya
(Gracenote Inc.)
Funding
4/19/2020BAX-423 Big Data Analytics, UC Davis
References
► Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran. A Semantics-Based Measure of
Emoji Similarity. In 2017 IEEE/WIC/ACM International Conference on Web Intelligence (Web
Intelligence 2017). Leipzig, Germany; 2017. [PDF]
► Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran. EmojiNet: An Open Service and
API for Emoji Sense Discovery. In 11th International AAAI Conference on Web and Social Media
(ICWSM 2017). Montreal, Canada; 2017. [PDF]
► Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran. EmojiNet: Building a Machine
Readable Sense Inventory for Emoji. In 8th International Conference on Social Informatics (SocInfo
2016). Bellevue, WA, USA; 2016. [PDF]
► Lakshika Balasuriya, Sanjaya Wijeratne, Derek Doran, Amit Sheth. Finding Street Gang Members on
Twitter, In The 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis
and Mining (ASONAM 2016). San Francisco, CA, USA; 2016. [PDF]
37
4/19/2020BAX-423 Big Data Analytics, UC Davis
Thank You!
SANJAYA@HOLLER.IO | HTTP://SANJW.ORG/ | @SANJROCKZ

More Related Content

What's hot

Creator IoT Framework
Creator IoT FrameworkCreator IoT Framework
Creator IoT FrameworkPaul Evans
 
Heap management in Compiler Construction
Heap management in Compiler ConstructionHeap management in Compiler Construction
Heap management in Compiler ConstructionMuhammad Haroon
 
AI-powered real-time video analytics for defence sector
AI-powered real-time video analytics for defence sectorAI-powered real-time video analytics for defence sector
AI-powered real-time video analytics for defence sectorAccubits Technologies
 
iOS Architecture
iOS ArchitectureiOS Architecture
iOS ArchitectureJacky Lian
 
Rotor machine,subsitution technique
Rotor machine,subsitution techniqueRotor machine,subsitution technique
Rotor machine,subsitution techniquekirupasuchi1996
 
Android Application Development
Android Application DevelopmentAndroid Application Development
Android Application DevelopmentBenny Skogberg
 
Automatic answer checker
Automatic answer checkerAutomatic answer checker
Automatic answer checkerYesu Raj
 
1. Introduction to IoT
1. Introduction to IoT1. Introduction to IoT
1. Introduction to IoTAbhishek Das
 
Unit 1 - mobile computing introduction
Unit 1 - mobile computing introductionUnit 1 - mobile computing introduction
Unit 1 - mobile computing introductionVintesh Patel
 
FACTORS INFLUENCING THE ADOPTION OF E-GOVERNMENT SERVICES IN PAKISTAN
FACTORS INFLUENCING THE ADOPTION OF E-GOVERNMENT SERVICES IN PAKISTANFACTORS INFLUENCING THE ADOPTION OF E-GOVERNMENT SERVICES IN PAKISTAN
FACTORS INFLUENCING THE ADOPTION OF E-GOVERNMENT SERVICES IN PAKISTANMuhammad Ahmad
 
Hand gesture recognition system(FYP REPORT)
Hand gesture recognition system(FYP REPORT)Hand gesture recognition system(FYP REPORT)
Hand gesture recognition system(FYP REPORT)Afnan Rehman
 
GPSBUS211-Edge Intelligence for IoT Applications
GPSBUS211-Edge Intelligence for IoT ApplicationsGPSBUS211-Edge Intelligence for IoT Applications
GPSBUS211-Edge Intelligence for IoT ApplicationsAmazon Web Services
 
Internet of everything ppt
Internet of everything pptInternet of everything ppt
Internet of everything pptLavanya Sharma
 
Android Application Component: BroadcastReceiver Tutorial
Android Application Component: BroadcastReceiver TutorialAndroid Application Component: BroadcastReceiver Tutorial
Android Application Component: BroadcastReceiver TutorialAhsanul Karim
 

What's hot (20)

Creator IoT Framework
Creator IoT FrameworkCreator IoT Framework
Creator IoT Framework
 
Heap management in Compiler Construction
Heap management in Compiler ConstructionHeap management in Compiler Construction
Heap management in Compiler Construction
 
AI-powered real-time video analytics for defence sector
AI-powered real-time video analytics for defence sectorAI-powered real-time video analytics for defence sector
AI-powered real-time video analytics for defence sector
 
iOS Architecture
iOS ArchitectureiOS Architecture
iOS Architecture
 
Python for IoT
Python for IoTPython for IoT
Python for IoT
 
Android Media player
Android Media playerAndroid Media player
Android Media player
 
Rotor machine,subsitution technique
Rotor machine,subsitution techniqueRotor machine,subsitution technique
Rotor machine,subsitution technique
 
Android Application Development
Android Application DevelopmentAndroid Application Development
Android Application Development
 
Automatic answer checker
Automatic answer checkerAutomatic answer checker
Automatic answer checker
 
Multimedia Streaming Architecture
Multimedia Streaming ArchitectureMultimedia Streaming Architecture
Multimedia Streaming Architecture
 
1. Introduction to IoT
1. Introduction to IoT1. Introduction to IoT
1. Introduction to IoT
 
Unit 1 - mobile computing introduction
Unit 1 - mobile computing introductionUnit 1 - mobile computing introduction
Unit 1 - mobile computing introduction
 
FACTORS INFLUENCING THE ADOPTION OF E-GOVERNMENT SERVICES IN PAKISTAN
FACTORS INFLUENCING THE ADOPTION OF E-GOVERNMENT SERVICES IN PAKISTANFACTORS INFLUENCING THE ADOPTION OF E-GOVERNMENT SERVICES IN PAKISTAN
FACTORS INFLUENCING THE ADOPTION OF E-GOVERNMENT SERVICES IN PAKISTAN
 
Hand gesture recognition system(FYP REPORT)
Hand gesture recognition system(FYP REPORT)Hand gesture recognition system(FYP REPORT)
Hand gesture recognition system(FYP REPORT)
 
Ios operating system
Ios operating systemIos operating system
Ios operating system
 
GPSBUS211-Edge Intelligence for IoT Applications
GPSBUS211-Edge Intelligence for IoT ApplicationsGPSBUS211-Edge Intelligence for IoT Applications
GPSBUS211-Edge Intelligence for IoT Applications
 
IoT architecture
IoT architectureIoT architecture
IoT architecture
 
Internet of everything ppt
Internet of everything pptInternet of everything ppt
Internet of everything ppt
 
Android Application Component: BroadcastReceiver Tutorial
Android Application Component: BroadcastReceiver TutorialAndroid Application Component: BroadcastReceiver Tutorial
Android Application Component: BroadcastReceiver Tutorial
 
Honeypots
HoneypotsHoneypots
Honeypots
 

Similar to Analyzing Emoji in Text

A Framework to Understand Emoji Meaning: Similarity and Sense Disambiguation ...
A Framework to Understand Emoji Meaning: Similarity and Sense Disambiguation ...A Framework to Understand Emoji Meaning: Similarity and Sense Disambiguation ...
A Framework to Understand Emoji Meaning: Similarity and Sense Disambiguation ...Artificial Intelligence Institute at UofSC
 
Tech with tn_curr_no_cartoon
Tech with tn_curr_no_cartoonTech with tn_curr_no_cartoon
Tech with tn_curr_no_cartoonJan Coley
 
Online social network analysis with machine learning techniques
Online social network analysis with machine learning techniquesOnline social network analysis with machine learning techniques
Online social network analysis with machine learning techniquesHari KC
 
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNING
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNINGTHE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNING
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNINGIRJET Journal
 
Hate Speech Identification Using Machine Learning
Hate Speech Identification Using Machine LearningHate Speech Identification Using Machine Learning
Hate Speech Identification Using Machine LearningIRJET Journal
 
Mattmiddaghscatterplotppt
MattmiddaghscatterplotpptMattmiddaghscatterplotppt
Mattmiddaghscatterplotpptmattmidd
 
12101-56982-3-PB.pdf
12101-56982-3-PB.pdf12101-56982-3-PB.pdf
12101-56982-3-PB.pdfSyauqiRahmat1
 
Big social data analytics - social network analysis
Big social data analytics - social network analysis Big social data analytics - social network analysis
Big social data analytics - social network analysis Jari Jussila
 
Y7 Game Design Technologies Program
Y7 Game Design Technologies ProgramY7 Game Design Technologies Program
Y7 Game Design Technologies ProgramJoanne Villis
 
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdfMaemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdfWARCnet
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis reportSavio Aberneithie
 
Doctoral student discussion forum on MOOCs
Doctoral student discussion forum on MOOCsDoctoral student discussion forum on MOOCs
Doctoral student discussion forum on MOOCsGuanliang Chen
 
Profiling User Interests on the Social Semantic Web
Profiling User Interests on the Social Semantic WebProfiling User Interests on the Social Semantic Web
Profiling User Interests on the Social Semantic WebFabrizio Orlandi
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
 
Analyzing Sentiment Of Movie Reviews In Bangla By Applying Machine Learning T...
Analyzing Sentiment Of Movie Reviews In Bangla By Applying Machine Learning T...Analyzing Sentiment Of Movie Reviews In Bangla By Applying Machine Learning T...
Analyzing Sentiment Of Movie Reviews In Bangla By Applying Machine Learning T...Andrew Parish
 
Automatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring Tool
Automatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring ToolAutomatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring Tool
Automatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring ToolLaurie Smith
 

Similar to Analyzing Emoji in Text (20)

Improving Emoji Understanding Tasks using EmojiNet – A Mini-Tutorial
Improving Emoji Understanding Tasks using EmojiNet – A Mini-TutorialImproving Emoji Understanding Tasks using EmojiNet – A Mini-Tutorial
Improving Emoji Understanding Tasks using EmojiNet – A Mini-Tutorial
 
A Framework to Understand Emoji Meaning: Similarity and Sense Disambiguation ...
A Framework to Understand Emoji Meaning: Similarity and Sense Disambiguation ...A Framework to Understand Emoji Meaning: Similarity and Sense Disambiguation ...
A Framework to Understand Emoji Meaning: Similarity and Sense Disambiguation ...
 
EmojiNet: An Open Service and API for Emoji Sense Discovery
EmojiNet: An Open Service and API for Emoji Sense DiscoveryEmojiNet: An Open Service and API for Emoji Sense Discovery
EmojiNet: An Open Service and API for Emoji Sense Discovery
 
A Semantics-Based Measure of Emoji Similarity
A Semantics-Based Measure of Emoji SimilarityA Semantics-Based Measure of Emoji Similarity
A Semantics-Based Measure of Emoji Similarity
 
Tech with tn_curr_no_cartoon
Tech with tn_curr_no_cartoonTech with tn_curr_no_cartoon
Tech with tn_curr_no_cartoon
 
Online social network analysis with machine learning techniques
Online social network analysis with machine learning techniquesOnline social network analysis with machine learning techniques
Online social network analysis with machine learning techniques
 
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNING
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNINGTHE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNING
THE ANALYSIS FOR CUSTOMER REVIEWS THROUGH TWEETS, BASED ON DEEP LEARNING
 
Hate Speech Identification Using Machine Learning
Hate Speech Identification Using Machine LearningHate Speech Identification Using Machine Learning
Hate Speech Identification Using Machine Learning
 
Mattmiddaghscatterplotppt
MattmiddaghscatterplotpptMattmiddaghscatterplotppt
Mattmiddaghscatterplotppt
 
12101-56982-3-PB.pdf
12101-56982-3-PB.pdf12101-56982-3-PB.pdf
12101-56982-3-PB.pdf
 
Big social data analytics - social network analysis
Big social data analytics - social network analysis Big social data analytics - social network analysis
Big social data analytics - social network analysis
 
Y7 Game Design Technologies Program
Y7 Game Design Technologies ProgramY7 Game Design Technologies Program
Y7 Game Design Technologies Program
 
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdfMaemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
Maemura_WARCnet_Developing Datasheets for Archived Web Datasets.pdf
 
Shushmoji app for Younglings developers 2021
Shushmoji app for Younglings developers 2021Shushmoji app for Younglings developers 2021
Shushmoji app for Younglings developers 2021
 
Twitter sentimentanalysis report
Twitter sentimentanalysis reportTwitter sentimentanalysis report
Twitter sentimentanalysis report
 
Doctoral student discussion forum on MOOCs
Doctoral student discussion forum on MOOCsDoctoral student discussion forum on MOOCs
Doctoral student discussion forum on MOOCs
 
Profiling User Interests on the Social Semantic Web
Profiling User Interests on the Social Semantic WebProfiling User Interests on the Social Semantic Web
Profiling User Interests on the Social Semantic Web
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Analyzing Sentiment Of Movie Reviews In Bangla By Applying Machine Learning T...
Analyzing Sentiment Of Movie Reviews In Bangla By Applying Machine Learning T...Analyzing Sentiment Of Movie Reviews In Bangla By Applying Machine Learning T...
Analyzing Sentiment Of Movie Reviews In Bangla By Applying Machine Learning T...
 
Automatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring Tool
Automatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring ToolAutomatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring Tool
Automatic Movie Rating By Using Twitter Sentiment Analysis And Monitoring Tool
 

Recently uploaded

Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room servicediscovermytutordmt
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 

Recently uploaded (20)

Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
9548086042 for call girls in Indira Nagar with room service
9548086042  for call girls in Indira Nagar  with room service9548086042  for call girls in Indira Nagar  with room service
9548086042 for call girls in Indira Nagar with room service
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 

Analyzing Emoji in Text

  • 1. Analyzing Emoji in Text Research Scientist, Holler.io, San Mateo, CA. sanjaya@holler.io | http://sanjw.org/ | @sanjrockz SANJAYA WIJERATNE BAX-423 Big Data Analytics GUEST LECTURE AT THE GRADUATE SCHOOL OF MANAGEMENT OF THE UNIVERSITY OF CALIFORNIA, DAVIS, 24TH /25TH APRIL, 2020.
  • 2. Meet Your Instructor ► Research Scientist at Holler.io ► Work on NLP ► Academic Background ► Education - Ph.D. in Computer Science and Engineering ► Research Interest - Emoji/Test Processing, NLU ► My Journey So Far ► I’m from Sri Lanka -> B.Sc. in IT (University of Moratuwa, Sri Lanka) -> ~2 years as a Software Engineer, 7.5 years as a GRA/TA at Wright State University 4/19/2020BAX-423 Big Data Analytics, UC Davis 2
  • 3. Emoji Chain Gang Usage Non-Gang Usage 32.25% 1.14% 53% 1.71% How I Started Working with Emoji Anthropology 189:001, UC Berkeley 3 Image Source – https://arxiv.org/pdf/1610.09516.pdf 4/19/2020
  • 5. Emoji = Picture Character 5 ► Introduced by Shigetaka Kurita in 1999 4/19/2020BAX-423 Big Data Analytics, UC Davis ► Unicode staterted supporting emoji character set in 2010 ► Emoji are not emoticons. Eg. :-), :-(
  • 6. Why Emoji Usage Increased? 4/19/2020BAX-423 Big Data Analytics, UC Davis 6
  • 7. Emoji Usage Statistics 4/19/2020BAX-423 Big Data Analytics, UC Davis 7
  • 8. A Few Open Emoji Research Problems related to Text Processing ► Challenges in interpreting the meaning of an emoji in a message context ► Emoji similarity ► Emoji sense disambiguation ► Emoji prediction ► Emoji-based retrieval and search 4/19/2020BAX-423 Big Data Analytics, UC Davis 8
  • 9. A Few Open Emoji Research Problems related to Text Processing ► Challenges in interpreting the meaning of an emoji in a message context ► Emoji similarity ► Emoji sense disambiguation ► Emoji prediction ► Emoji-based retrieval and search 4/19/2020BAX-423 Big Data Analytics, UC Davis 9
  • 10. How Emoji get their Meanings?
  • 11. Emoji Semantics ► Emoji are inherently designed with no rigid semantics ► Emoji does not have a grammar, thus, emoji cannot be used as a language on its own ► How emoji meanings are assigned? ► Initially, by the emoji creators ► Later, by the users 11 4/19/2020BAX-423 Big Data Analytics, UC Davis
  • 12. How Emoji get their meanings? 12 ► Emoji creators submit possible emoji meanings in their proposals ► Once accepted, these will be available in Unicode Common Locale Data Repository (CLDR) at https://www.unicode.org/cldr/charts/latest/anno tations/other.html 4/19/2020BAX-423 Big Data Analytics, UC Davis
  • 13. How emoji get their meanings? ► When people replace words using emoji (logographic) ► Homonymy relations in languages (E.g., – eye & I) 13 Image Source – https://goo.gl/rjS1hX I *Actual social media content 4/19/2020BAX-423 Big Data Analytics, UC Davis
  • 14. Getting the Emoji Meanings 14 Image Source – http://emojinet.knoesis.org 4/19/2020BAX-423 Big Data Analytics, UC Davis
  • 15. EmojiNet 15 Image Source – https://arxiv.org/pdf/1707.04652.pdf 4/19/2020BAX-423 Big Data Analytics, UC Davis
  • 17. Emoji Similarity Problem 17 4/19/2020BAX-423 Big Data Analytics, UC Davis ► Measuring the semantic similarity of emoji such that the measure reflects the likeness of their meaning, interpretation or intended use.” [Wijeratne et al., 2017]
  • 18. Notion of Emoji Similarity 18 4/19/2020BAX-423 Big Data Analytics, UC Davis ► Notion of emoji similarity is broad ► Pixel-based Emoji Similarity ► Meaning-based Emoji Similarity
  • 19. Representing Emoji Meaning 19 4/19/2020BAX-423 Big Data Analytics, UC Davis
  • 20. Distributional Semantics 20 ► Finds semantic properties of linguistic items (words) based on their distribution in a large corpus ► Based on Distributional Hypothesis (Harris, 1954) ► Words that are used and occur in the same contexts tend to purport similar meanings ► We use large text corpora with emoji to learn distributional semantics of emoji, which reveals relationships among emoji 4/19/2020BAX-423 Big Data Analytics, UC Davis
  • 21. Learning Emoji Embeddings ► Learn distributional semantics of words as word embeddings using two corpora (Tweets and Google News) ► Convert the words in emoji meanings to vectors using word embeddings (emoji embeddings) ► Evaluate the similarity (distance) of emoji in the embedding space using EmoSim508, a new dataset with 508 emoji pairs 21 4/19/2020BAX-423 Big Data Analytics, UC Davis
  • 22. Representing Emoji Meaning 22 4/19/2020BAX-423 Big Data Analytics, UC Davis
  • 23. Ground Truth Data Creation 23 4/19/2020BAX-423 Big Data Analytics, UC Davis ► Most frequently occuring emoji pairs from a 110M Twitter dataset with emoji ► Evaluated each emoji pair for their similarity and relatedness by 10 human users
  • 24. Intrinsic Evaluation ► Using four different emoji definitions (Sense_Desc., Sense_Label, Sense_Def., Sense_All) and two corpora (Twitter and Google News), we trained eight emoji embedding models for each emoji ► We calculated emoji similarity of the 508 emoji pairs using each embedding model 24 4/19/2020BAX-423 Big Data Analytics, UC Davis
  • 25. Intrinsic Evaluation Cont. ► Using Spearman’s Rank Correlation Coefficient (Spearman’s ρ), we compared the similarity rankings of each model with ground truth data 25 4/19/2020BAX-423 Big Data Analytics, UC Davis
  • 26. Extrinsic Evaluation ► We tested our emoji embedding models using a sentiment analysis baseline ► Our baseline had 12,920 English tweets, and 2,295 of them had emoji ► All words in the tweets were replaced with their corresponding word embeddings and emoji were replaced with emoji embeddings learned 26 4/19/2020BAX-423 Big Data Analytics, UC Davis
  • 27. Extrinsic Evaluation Cont. 27 4/19/2020BAX-423 Big Data Analytics, UC Davis
  • 28. Key Takeaways ► Combining emoji sense knowledge with distributional semantics could improve the emoji embedding models ► Longer sense definitions are not suitable for emoji similarity experiments 28 4/19/2020BAX-423 Big Data Analytics, UC Davis
  • 30. Emoji Sense Disambiguation Problem 30 Image Source – https://goo.gl/rjS1hX 4/19/2020BAX-423 Big Data Analytics, UC Davis *Actual social media contentI Look ► “The ability to identify the meaning of an emoji in the context of a message in a computational manner” [Wijeratne et al., 2017].
  • 31. Emoji Sense Disambiguation ► Currently, no labeled datasets available to solve the emoji sense disambiguation in a supervised setting 31 4/19/2020BAX-423 Big Data Analytics, UC Davis
  • 32. Emoji Sense Disambiguation Cont. ► We selected 25 most commonly misunderstood emoji and selected 50 tweets for each emoji ► Used Simplified LESK algorithm for disambiguation ► Context words were learned for each emoji sense definition using Twitter and Google News-based word embedding models ► Twitter-based embeddings outperform others 32 4/19/2020BAX-423 Big Data Analytics, UC Davis
  • 33. Results and Takeaways 33 4/19/2020BAX-423 Big Data Analytics, UC Davis ► Tools designed for well-formed text processing will not work well when used for ill-formatted text processing ► Sense disambiguation accuracy increases with the increase of the number of context words used
  • 34. What Did We Learn?
  • 35. Recap 35 4/19/2020BAX-423 Big Data Analytics, UC Davis ► We looked at ► Why it is important to do emoji analysis ► How emoji get their meanings ► How to calculate emoji similarity ► How to disambiguate the meaning of an emoji
  • 36. Acknowledgements 36 Collaborators Prof. Amit Sheth University of South Carolina Prof. Derek Doran Wright State University Lakshika Balasuriya (Gracenote Inc.) Funding 4/19/2020BAX-423 Big Data Analytics, UC Davis
  • 37. References ► Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran. A Semantics-Based Measure of Emoji Similarity. In 2017 IEEE/WIC/ACM International Conference on Web Intelligence (Web Intelligence 2017). Leipzig, Germany; 2017. [PDF] ► Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran. EmojiNet: An Open Service and API for Emoji Sense Discovery. In 11th International AAAI Conference on Web and Social Media (ICWSM 2017). Montreal, Canada; 2017. [PDF] ► Sanjaya Wijeratne, Lakshika Balasuriya, Amit Sheth, Derek Doran. EmojiNet: Building a Machine Readable Sense Inventory for Emoji. In 8th International Conference on Social Informatics (SocInfo 2016). Bellevue, WA, USA; 2016. [PDF] ► Lakshika Balasuriya, Sanjaya Wijeratne, Derek Doran, Amit Sheth. Finding Street Gang Members on Twitter, In The 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2016). San Francisco, CA, USA; 2016. [PDF] 37 4/19/2020BAX-423 Big Data Analytics, UC Davis
  • 38. Thank You! SANJAYA@HOLLER.IO | HTTP://SANJW.ORG/ | @SANJROCKZ