Characterizing and Detecting Livestreaming Chatbots

Characterizing and Detecting
Livestreaming Chatbots
SHREYA JAIN
(shreya.jain@research.iiit.ac.in)
1

Motivation
15M Daily Active Users.
2.2-3.2M Broadcasters.
44B Minutes Watched
per Month.
4K Streams per day.
1.17M Broadcasters
10B Minutes Watched
per Month
1.2M Active Users.
200M Broadcasts
2

Livestreaming Platforms
Livestreamer
Livestreaming Platform
ChatsVideo/Content
Audience
Watch the
Stream
Participate in
chat
3

Twitch Live Video/Content
4
Chatroom
Audience/ Chatters

Stream Engagement
Number
of Views
Views
Chats
Number of
Messages
Inﬂuences
Recommendation
1.
2.
3.
Recommended
to end-users
5

Stream Monetization
Livestreamer
Higher engagement → Higher revenue.
This gives streamers enough reasons to manipulate the system 6

Chatbot Marketplace
There are numerous chatbot
services that can be used by
the streamers.
The streamer can manipulate
various features of the
chatbots being delivered…
● number of bots,
● text of messages,
● inter-message delays
Chatbotting is user/customer-customizable. 7

Example of Chatbotted Stream
8

Adverse Eﬀects & Challenges
Effect of Fraud Challenges
● Loss of Users: Honest streamers are
not recommended; and they leave.
● Loss of Trust: Viewers and streamers
lose trust in the platform.
● Financial Loss: Platform loses money
to fraud
● Noisy Text: Ill-formed sentences,
spelling mistakes, etc.
● User-Controlled Fraud:
Customization gives control to
avoid fraud.
● Lack of Ground Truth.
9

Related Work
Text-Based Methods Bot Detection Astroturfing on Social Media
● Classification in IM
(Yahoo chat)
settings.
● Detecting
chatbots based on
links (spam).
● Graph-based
factorization
● Random-walk
based.
● FLOCK
● Twitter bot
detection
● Facebook bot
detection etc.
Bots on livestream want to
increase chatter - not generate
money through spam URLS.
Not focusing on chatbots
specifically.
Not focusing on chatbots.
Closest is FLOCK, that focuses
on identifying viewbots on
livestreaming platforms.
10

This Work
In this work, we provide the following contributions
1. Problem Formulation
(Formalize the chatbot detection problem)
2. Dataset Collection and Characterization
(Real livestreaming chatlog data and various attack data)
3. Proposed Frameworks (SHERLOCK and BOTHUNT)
(Two-stage approaches to detect chatbotted streams and then
bots in those streams) 11

Outline
● Approach 1
○ Problem Statement
○ Data Description
○ Initial Observations
○ Proposed Framework
○ Experiments
● Approach 2
○ Experiments
12

Outline
● Approach 1
○ Experiments
● Approach 2
○ Experiments
13

Problem Statement
Stage 1
Given a set of streams, and message on those
Streams, IDENTIFY chatbotted streams.
Stage 2
Given a chatbotted stream, and chatters on
that stream, LABEL each chatter as bot or
not.
14

Outline
● Approach 1
○ Experiments
● Approach 2
○ Experiments
15

Data Description
● 690 random streams
● 439K messages
● Aug-Oct 2018
● Manual annotation:
○ 183 streams
○ Based on text, # messages & user
metadata.
● Discovered 24 botted streams.
● Annotation was time-consuming.
○ Not scalable
16

Outline
● Approach 1
○ Experiments
● Approach 2
○ Experiments
17

Initial Observations
Message Frequency
Chatbots tend to post more messages than genuine users, with most chatbots
posting messages with similar frequency. 18

Inter-Message Delays (IMDs)
Chatbotted streams have a higher IMD than genuine streams.
Chatbots have a consistent IMD showing that they are automated. 19

Message Spread
Chatbots’ message distribution is more spread out, and on average, they post
consistently throughout the stream. 20

Outline
● Approach 1
○ Experiments
● Approach 2
○ Experiments
21

Proposed Framework: SHERLOCK
Stage 1
Stage 2
that stream, LABEL each chatter as bot or not.
22

SHERLOCK: Stage 1
Feature Set
● Modeled as Classiﬁcation Task.
● Based on 3 categories found useful from preliminary analysis
○ Number of Messages
○ IMD Quantiles (60, 70, 80, 90th quantile)
○ Number of Windows
● Corresponding to each feature category, we generate
features from their distribution in the stream per user:
○ #Messages, #Windows - 3 features related to their skewness
○ IMD Quantiles - 4 features, percentile ranging from 50 23

Proposed Framework: SHERLOCK
Stage 1
Stage 2
that stream, LABEL each chatter as bot or not.
24

SHERLOCK: Stage 2
Stage 2
● Semi-Supervised Learning Approach.
● Sensitive to the seeds chosen.
● We follow a heuristic based approach to identify seeds.
● Basic Idea:
○ Level 1 Clustering - cluster on mean IMD and #Messages (Most Discriminative)
○ Exonerating based on subscription status.
○ Level 2 clustering - based on synchronicity on entropy IMD and entropy
#Windows
25

SHERLOCK: Stage 2
Stage 2
● On obtaining seeds - marking users as genuine, and bots with
high conﬁdence.
○ We generate a KNN graph between all the users.
● Graph-based Label Propagation.
26

Outline
● Approach 1
○ Experiments
● Approach 2
○ Experiments
27

Experiments
Baselines
● SSC (Supervised Spam Classiﬁer)1
○ Originally, used for Twitter spammer detection. Customized to detect bots.
○ Features: max, min, median of #words, characters, URLs and IMDs
○ Supervised Algorithm
● SynchroTrap2
○ Group-based method to detect bots.
○ Construct graph such that similar nodes (IMD and #messages per window) are connected.
○ Cluster matrix into 2 groups. Empirically, highest bots cluster is marked as bot.
○ We apply on each stream.
1. Benevenuto et al.,2010 2. Yang et al.,2014
28

Experiments
Real World Results
● Dataset:183 Streams, 24 Botted, 159 Genuine
● SHERLOCK
○ Stage 1: 5-fold CV (98.3% accuracy)
○ Stage 2: Ran only on
streams marked as bots
in Stage1.
● SSC - All Users, 5-fold CV
● SynchroTrap - On all streams.
SHERLOCK achieves strong performance on real-world data.
Stage II Performance
29

Experiments
Synthetic Dataset Generation
● Dataset
○ [Botted Stream] Paid a chatbotting service to run on empty stream.
○ Used the texts obtained and inter-arrival times as chatbot set.
○ [Real Stream] Collected data from random streams of different durations.
○ [Corrupted Stream] Merged the random and chatbotted streams.
● Noise Models
○ Controlled Chatters (CC): Fix number of chatters, vary delay.
○ Rapid Increase (RI): Small number, large delay => increase number, decrease delay.
○ Gradual Increase (GI): Same as RI, but more uniform.
○ Organic Growth (OG): Increase number over time. For each increase, gradually
decrease delay.
30

Experiments
Synthetic Dataset Results
SHERLOCK (Stage 1) is robust under various noise models.
31

Experiments
40% chatbots 60% chatbots 80% chatbots
OG Model is most challenging for SHERLOCK.
Performance is worst
for OG attack model.
32

Experiments
Duration impacts SHERLOCK minimally.
Effect of Duration is
minimal.
33

Experiments
More the chatbots, easier to discover (Higher SNR)
More the chatbots, better the
performance.
34

Scalability
SHERLOCK is scalable for both stages.
Stage 1 Stage 2
35

Outline
● Approach 1
○ Experiments
● Approach 2
○ Experiments
36

Problem Statement
Stage 1
Given a set of streams, and message on those Streams, IDENTIFY
chatbotted streams using TEXTUAL features.
Stage 2
Given a chatbotted stream, and chatters on that stream, LABEL
each chatter as bot or not using TEXTUAL features.
37

Outline
● Approach 1
○ Experiments
● Approach 2
○ Experiments
38

39
Inter Message Delays (IMDs)
● Chatbotted streams have a higher IMD
than genuine streams.
● Chatbots have a consistent IMD showing
that they are automated.

40
Message Lengths (MLs)
● Bot users post sampled messages at
decided timestamp
● Bot users messages are redundant
unlike Real Genuine users with variable
MLs

Message Content
● Genuine users converse with bot and real users, tag users in
comments.
● Bot users post sampled messages provided by streamer.
● Bot users might tag bot users not real ones as it is dynamic.
41

Outline
● Approach 1
○ Experiments
● Approach 2
○ Experiments
42

Proposed Framework : BOTHUNT
Stage 1
Stage 2
43

BOTHUNT : Stage 1
Feature Set
● Modeled as Classiﬁcation Task.
● Based on 2 categories found useful from preliminary analysis
○ IMD Corrected Conditional Entropy Quantiles (25, 50, 75th quantile)
○ ML Corrected Conditional Entropy Quantiles (25, 50, 75th quantile)
● Corresponding to each feature category, we generate features
from their distribution in the stream per user:
○ IMD Quantiles - 3 features
○ ML Quantiles - 3 features
44

Proposed Framework : BOTHUNT
Stage 1
Stage 2
45

BOTHUNT : Stage 2
Stage 2
● Semi-Supervised Learning Approach.
● IMDs CCE, MLs CCE, Conversational Features, User Tag
Features and Channel Followers as features.
● We follow a heuristic based approach to identify seeds.
● Basic Idea:
○ Clustering - based on synchronicity on CCE of IMD and CCE of ML
○ Exonerating based on subscription status.
46

BOTHUNT : Stage 2
Stage 2
● On obtaining seeds - marking users as genuine, and bots with
high conﬁdence.
○ We generate a graph, using conversational cues and user
tags as features, between all the users.
● Graph-based Label Propagation.
47

Outline
● Approach 1
○ Experiments
● Approach 2
○ Experiments
48

Experiments
Baselines
● Revised SSC (Supervised Spam Classiﬁer)1
○ Originally, used for Twitter spammer detection. Customized to detect bots.
○ Features: max, min, mean, median of #words, characters, URLs and IMDs
○ Supervised Algorithm
● Revised SynchroTrap2
○ Group-based method to detect bots.
○ Construct graph such that similar nodes (ML bins, # characters bins, user mentions, URLs used)
are connected.
○ Cluster matrix into 2 groups. Empirically, highest bots cluster is marked as bot.
○ We apply on each stream.
1. Benevenuto et al.,2010 2. Yang et al.,2014 49

Experiments
Baselines
● User Text Similarity (UTS) Model1
○ User-level Supervised approach
○ Uses Content and Graph based features for user in Twitter setting
○ Measure duplicacy of all messages posted by a user using Levenshtein distance
○ Min , Max, Mean, Median of distance acts as content based feature
○ Channel followers acts as Graph based feature
1. Wang et al., 2010
50

Experiments
Real World Results
● Dataset:183 Streams, 24 Botted, 159 Genuine
● BOTHUNT Stage II Performance
○ Stage 1: 5-fold CV(86.04% accuracy)
○ Stage 2: Ran only on
streams marked as bots
in Stage1.
● Revised SSC - All Users, 5-fold CV
● Revised SynchroTrap - On all streams.
● UTS - All users, 5-fold CV
BOTHUNT achieves strong performance on real-world data.
51

Experiments
BOTHUNT (Stage 1) is robust under various noise models.
52

Experiments
Stage 2
Obtained an average accuracy of around 85% across all labeled chatbotted streams irrespective
of type of attack model.
53

Conclusions
● Proposed and formulated the problem of detecting chatbots on livestreaming platforms.
● Collected real-world data, and provided a method for generating realistic synthetic
datasets for experimentation.
● Proposed a two-stage approach (SHERLOCK & BOTHUNT) for detecting chatbotted
streams and users.
● With real world and synthetic experiments, we showed that:
○ SHERLOCK and BOTHUNT are efﬁcient in detecting chatbots with upto 97%
precision & 85% accuracy respectively.
○ Robust under a variety of noise models, obtaining upto 0.68 F-1 score(SHERLOCK)
and 66% accuracy (BOTHUNT) under the most challenging scenario.
○ Scalable for both stages - Linear in number of streams and number of users.
54

Conclusions
● The method is generic, applicable to other platforms, the only limitation is if bot
service provider for platforms other than Twitch offers different attack settings.
● Next we will merge the features SHERLOCK and BOTHUNT to check the efﬁcacy of
the resulted framework.
55

Relevant publication
● Jain, S., Niranjan, D., Lamba, H., and Kumaraguru, P. Characterizing and Detecting
Livestreaming Chatbots. ASONAM 2019. Full paper:
http://precog.iiitd.edu.in/pubs/Characterizing_and_Detecting_Livestreaming_Chatbo
ts-ASONAM19.pdf
56

Acknowledgements
57
Grateful to Neil Shah, Hemank Lamba and Dipankar Niranjan for collaborating with
me on this work
Thankful to Dr. Ponnurangam Kumaraguru for his inputs and suggestions
Thankful to Dr. Manish Shrivatsava and Prof. Kamakshi Prasad for evaluating the
work and giving valuable inputs.

Characterizing and Detecting Livestreaming Chatbots

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Characterizing and Detecting Livestreaming Chatbots

Semelhante a Characterizing and Detecting Livestreaming Chatbots (20)

Mais de IIIT Hyderabad

Mais de IIIT Hyderabad (20)

Último

Último (20)

Characterizing and Detecting Livestreaming Chatbots