Livestreaming platforms enable content producers or streamers to broadcast creative content to a potentially large viewer base. Chatrooms form an integral part of such platforms, enabling viewers to interact
both with streamer and amongst themselves. Streams with high engagement (many viewers and high
active chatters) are typically considered engaging and often promoted to end users by means of recommendation algorithms, and exposed to better monetization opportunities via revenue share from platform advertising, viewer donations and third-party sponsorships. Given such incentives, some streamers
make use of fraudulent means to increase perceived engagement by simulating chatter via fake “chatbots” which can be purchased from online marketplaces. This inorganic engagement can negatively
influence recommendations, hurt streamer and viewer trust in the platform, and harm monetization for
honest streamers. In this study, we tackle the novel problem of automating detection of chatbots on
livestreaming platforms. To this end, we first formalize the livestreaming chatbot detection problem
and characterize differences between botted and genuine chatter behaviour observed from a real-world
livestreaming chatter dataset collected from Twitch.tv. We then propose SHERLOCK and BOTHUNT
methods, which posits a two-stage approach of detecting chatbotted streams, and subsequently detecting
constituent chatbots. Finally, we demonstrate effectiveness on both real and synthetic data: to this end,
we propose a novel strategy for collecting labeled, synthetic chatter dataset (typically unavailable) from
such platforms, enabling evaluation of proposed detection approaches against chatbot bahaviors with
varying signatures. The SHERLOCK approach achieves 97% precision/recall on the real world dataset
and +80% F1 score across most simulated attack settings and BOTHUNT achieves 86% accuracy for
real world dataset and 93% accuracy across all attack settings. This thesis is a timely contribution to the
area of computer science specially combating astroturfing, needed to mitigate the spread of fraudulent
bot users on Live streaming Platforms. The results from this thesis can be used to build real world
solutions to mitigate the spread of untrustworthy or botted streams, fake users, etc. on live streaming
platforms in the future.
7. Chatbot Marketplace
There are numerous chatbot
services that can be used by
the streamers.
The streamer can manipulate
various features of the
chatbots being delivered…
● number of bots,
● text of messages,
● inter-message delays
Chatbotting is user/customer-customizable. 7
9. Adverse Effects & Challenges
Effect of Fraud Challenges
● Loss of Users: Honest streamers are
not recommended; and they leave.
● Loss of Trust: Viewers and streamers
lose trust in the platform.
● Financial Loss: Platform loses money
to fraud
● Noisy Text: Ill-formed sentences,
spelling mistakes, etc.
● User-Controlled Fraud:
Customization gives control to
avoid fraud.
● Lack of Ground Truth.
9
10. Related Work
Text-Based Methods Bot Detection Astroturfing on Social Media
● Classification in IM
(Yahoo chat)
settings.
● Detecting
chatbots based on
links (spam).
● Graph-based
factorization
● Random-walk
based.
● FLOCK
● Twitter bot
detection
● Facebook bot
detection etc.
Bots on livestream want to
increase chatter - not generate
money through spam URLS.
Not focusing on chatbots
specifically.
Not focusing on chatbots.
Closest is FLOCK, that focuses
on identifying viewbots on
livestreaming platforms.
10
11. This Work
In this work, we provide the following contributions
1. Problem Formulation
(Formalize the chatbot detection problem)
2. Dataset Collection and Characterization
(Real livestreaming chatlog data and various attack data)
3. Proposed Frameworks (SHERLOCK and BOTHUNT)
(Two-stage approaches to detect chatbotted streams and then
bots in those streams) 11
14. Problem Statement
Stage 1
Given a set of streams, and message on those
Streams, IDENTIFY chatbotted streams.
Stage 2
Given a chatbotted stream, and chatters on
that stream, LABEL each chatter as bot or
not.
14
19. Initial Observations
Inter-Message Delays (IMDs)
Chatbotted streams have a higher IMD than genuine streams.
Chatbots have a consistent IMD showing that they are automated. 19
22. Proposed Framework: SHERLOCK
Stage 1
Given a set of streams, and message on those
Streams, IDENTIFY chatbotted streams.
Stage 2
Given a chatbotted stream, and chatters on
that stream, LABEL each chatter as bot or not.
22
23. SHERLOCK: Stage 1
Feature Set
● Modeled as Classification Task.
● Based on 3 categories found useful from preliminary analysis
○ Number of Messages
○ IMD Quantiles (60, 70, 80, 90th quantile)
○ Number of Windows
● Corresponding to each feature category, we generate
features from their distribution in the stream per user:
○ #Messages, #Windows - 3 features related to their skewness
○ IMD Quantiles - 4 features, percentile ranging from 50 23
24. Proposed Framework: SHERLOCK
Stage 1
Given a set of streams, and message on those
Streams, IDENTIFY chatbotted streams.
Stage 2
Given a chatbotted stream, and chatters on
that stream, LABEL each chatter as bot or not.
24
25. SHERLOCK: Stage 2
Stage 2
● Semi-Supervised Learning Approach.
● Sensitive to the seeds chosen.
● We follow a heuristic based approach to identify seeds.
● Basic Idea:
○ Level 1 Clustering - cluster on mean IMD and #Messages (Most Discriminative)
○ Exonerating based on subscription status.
○ Level 2 clustering - based on synchronicity on entropy IMD and entropy
#Windows
25
26. SHERLOCK: Stage 2
Stage 2
● On obtaining seeds - marking users as genuine, and bots with
high confidence.
○ We generate a KNN graph between all the users.
● Graph-based Label Propagation.
26
28. Experiments
Baselines
● SSC (Supervised Spam Classifier)1
○ Originally, used for Twitter spammer detection. Customized to detect bots.
○ Features: max, min, median of #words, characters, URLs and IMDs
○ Supervised Algorithm
● SynchroTrap2
○ Group-based method to detect bots.
○ Construct graph such that similar nodes (IMD and #messages per window) are connected.
○ Cluster matrix into 2 groups. Empirically, highest bots cluster is marked as bot.
○ We apply on each stream.
1. Benevenuto et al.,2010 2. Yang et al.,2014
28
29. Experiments
Real World Results
● Dataset:183 Streams, 24 Botted, 159 Genuine
● SHERLOCK
○ Stage 1: 5-fold CV (98.3% accuracy)
○ Stage 2: Ran only on
streams marked as bots
in Stage1.
● SSC - All Users, 5-fold CV
● SynchroTrap - On all streams.
SHERLOCK achieves strong performance on real-world data.
Stage II Performance
29
30. Experiments
Synthetic Dataset Generation
● Dataset
○ [Botted Stream] Paid a chatbotting service to run on empty stream.
○ Used the texts obtained and inter-arrival times as chatbot set.
○ [Real Stream] Collected data from random streams of different durations.
○ [Corrupted Stream] Merged the random and chatbotted streams.
● Noise Models
○ Controlled Chatters (CC): Fix number of chatters, vary delay.
○ Rapid Increase (RI): Small number, large delay => increase number, decrease delay.
○ Gradual Increase (GI): Same as RI, but more uniform.
○ Organic Growth (OG): Increase number over time. For each increase, gradually
decrease delay.
30
32. Experiments
Synthetic Dataset Results
40% chatbots 60% chatbots 80% chatbots
OG Model is most challenging for SHERLOCK.
Performance is worst
for OG attack model.
32
34. Experiments
Synthetic Dataset Results
40% chatbots 60% chatbots 80% chatbots
More the chatbots, easier to discover (Higher SNR)
More the chatbots, better the
performance.
34
37. Problem Statement
Stage 1
Given a set of streams, and message on those Streams, IDENTIFY
chatbotted streams using TEXTUAL features.
Stage 2
Given a chatbotted stream, and chatters on that stream, LABEL
each chatter as bot or not using TEXTUAL features.
37
39. Initial Observations
39
Inter Message Delays (IMDs)
● Chatbotted streams have a higher IMD
than genuine streams.
● Chatbots have a consistent IMD showing
that they are automated.
40. Initial Observations
40
Message Lengths (MLs)
● Bot users post sampled messages at
decided timestamp
● Bot users messages are redundant
unlike Real Genuine users with variable
MLs
41. Initial Observations
Message Content
● Genuine users converse with bot and real users, tag users in
comments.
● Bot users post sampled messages provided by streamer.
● Bot users might tag bot users not real ones as it is dynamic.
41
43. Proposed Framework : BOTHUNT
Stage 1
Given a set of streams, and message on those Streams, IDENTIFY
chatbotted streams using TEXTUAL features.
Stage 2
Given a chatbotted stream, and chatters on that stream, LABEL
each chatter as bot or not using TEXTUAL features.
43
44. BOTHUNT : Stage 1
Feature Set
● Modeled as Classification Task.
● Based on 2 categories found useful from preliminary analysis
○ IMD Corrected Conditional Entropy Quantiles (25, 50, 75th quantile)
○ ML Corrected Conditional Entropy Quantiles (25, 50, 75th quantile)
● Corresponding to each feature category, we generate features
from their distribution in the stream per user:
○ IMD Quantiles - 3 features
○ ML Quantiles - 3 features
44
45. Proposed Framework : BOTHUNT
Stage 1
Given a set of streams, and message on those Streams, IDENTIFY
chatbotted streams using TEXTUAL features.
Stage 2
Given a chatbotted stream, and chatters on that stream, LABEL
each chatter as bot or not using TEXTUAL features.
45
46. BOTHUNT : Stage 2
Stage 2
● Semi-Supervised Learning Approach.
● IMDs CCE, MLs CCE, Conversational Features, User Tag
Features and Channel Followers as features.
● We follow a heuristic based approach to identify seeds.
● Basic Idea:
○ Clustering - based on synchronicity on CCE of IMD and CCE of ML
○ Exonerating based on subscription status.
46
47. BOTHUNT : Stage 2
Stage 2
● On obtaining seeds - marking users as genuine, and bots with
high confidence.
○ We generate a graph, using conversational cues and user
tags as features, between all the users.
● Graph-based Label Propagation.
47
49. Experiments
Baselines
● Revised SSC (Supervised Spam Classifier)1
○ Originally, used for Twitter spammer detection. Customized to detect bots.
○ Features: max, min, mean, median of #words, characters, URLs and IMDs
○ Supervised Algorithm
● Revised SynchroTrap2
○ Group-based method to detect bots.
○ Construct graph such that similar nodes (ML bins, # characters bins, user mentions, URLs used)
are connected.
○ Cluster matrix into 2 groups. Empirically, highest bots cluster is marked as bot.
○ We apply on each stream.
1. Benevenuto et al.,2010 2. Yang et al.,2014 49
50. Experiments
Baselines
● User Text Similarity (UTS) Model1
○ User-level Supervised approach
○ Uses Content and Graph based features for user in Twitter setting
○ Measure duplicacy of all messages posted by a user using Levenshtein distance
○ Min , Max, Mean, Median of distance acts as content based feature
○ Channel followers acts as Graph based feature
1. Wang et al., 2010
50
51. Experiments
Real World Results
● Dataset:183 Streams, 24 Botted, 159 Genuine
● BOTHUNT Stage II Performance
○ Stage 1: 5-fold CV(86.04% accuracy)
○ Stage 2: Ran only on
streams marked as bots
in Stage1.
● Revised SSC - All Users, 5-fold CV
● Revised SynchroTrap - On all streams.
● UTS - All users, 5-fold CV
BOTHUNT achieves strong performance on real-world data.
51
53. Experiments
Stage 2
Obtained an average accuracy of around 85% across all labeled chatbotted streams irrespective
of type of attack model.
53
54. Conclusions
● Proposed and formulated the problem of detecting chatbots on livestreaming platforms.
● Collected real-world data, and provided a method for generating realistic synthetic
datasets for experimentation.
● Proposed a two-stage approach (SHERLOCK & BOTHUNT) for detecting chatbotted
streams and users.
● With real world and synthetic experiments, we showed that:
○ SHERLOCK and BOTHUNT are efficient in detecting chatbots with upto 97%
precision & 85% accuracy respectively.
○ Robust under a variety of noise models, obtaining upto 0.68 F-1 score(SHERLOCK)
and 66% accuracy (BOTHUNT) under the most challenging scenario.
○ Scalable for both stages - Linear in number of streams and number of users.
54
55. Conclusions
● The method is generic, applicable to other platforms, the only limitation is if bot
service provider for platforms other than Twitch offers different attack settings.
● Next we will merge the features SHERLOCK and BOTHUNT to check the efficacy of
the resulted framework.
55
56. Relevant publication
● Jain, S., Niranjan, D., Lamba, H., and Kumaraguru, P. Characterizing and Detecting
Livestreaming Chatbots. ASONAM 2019. Full paper:
http://precog.iiitd.edu.in/pubs/Characterizing_and_Detecting_Livestreaming_Chatbo
ts-ASONAM19.pdf
56
57. Acknowledgements
57
Grateful to Neil Shah, Hemank Lamba and Dipankar Niranjan for collaborating with
me on this work
Thankful to Dr. Ponnurangam Kumaraguru for his inputs and suggestions
Thankful to Dr. Manish Shrivatsava and Prof. Kamakshi Prasad for evaluating the
work and giving valuable inputs.