SlideShare uma empresa Scribd logo
1 de 58
Baixar para ler offline
Characterizing and Detecting
Livestreaming Chatbots
SHREYA JAIN
(shreya.jain@research.iiit.ac.in)
1
Motivation
15M Daily Active Users.
2.2-3.2M Broadcasters.
44B Minutes Watched
per Month.
4K Streams per day.
1.17M Broadcasters
10B Minutes Watched
per Month
1.2M Active Users.
200M Broadcasts
2
Livestreaming Platforms
Livestreamer
Livestreaming Platform
ChatsVideo/Content
Audience
Watch the
Stream
Participate in
chat
3
Twitch Live Video/Content
4
Chatroom
Audience/ Chatters
Stream Engagement
Number
of Views
Views
Chats
Number of
Messages
Influences
Recommendation
1.
2.
3.
Recommended
to end-users
5
Stream Monetization
Livestreamer
Higher engagement → Higher revenue.
This gives streamers enough reasons to manipulate the system 6
Chatbot Marketplace
There are numerous chatbot
services that can be used by
the streamers.
The streamer can manipulate
various features of the
chatbots being delivered…
● number of bots,
● text of messages,
● inter-message delays
Chatbotting is user/customer-customizable. 7
Example of Chatbotted Stream
8
Adverse Effects & Challenges
Effect of Fraud Challenges
● Loss of Users: Honest streamers are
not recommended; and they leave.
● Loss of Trust: Viewers and streamers
lose trust in the platform.
● Financial Loss: Platform loses money
to fraud
● Noisy Text: Ill-formed sentences,
spelling mistakes, etc.
● User-Controlled Fraud:
Customization gives control to
avoid fraud.
● Lack of Ground Truth.
9
Related Work
Text-Based Methods Bot Detection Astroturfing on Social Media
● Classification in IM
(Yahoo chat)
settings.
● Detecting
chatbots based on
links (spam).
● Graph-based
factorization
● Random-walk
based.
● FLOCK
● Twitter bot
detection
● Facebook bot
detection etc.
Bots on livestream want to
increase chatter - not generate
money through spam URLS.
Not focusing on chatbots
specifically.
Not focusing on chatbots.
Closest is FLOCK, that focuses
on identifying viewbots on
livestreaming platforms.
10
This Work
In this work, we provide the following contributions
1. Problem Formulation
(Formalize the chatbot detection problem)
2. Dataset Collection and Characterization
(Real livestreaming chatlog data and various attack data)
3. Proposed Frameworks (SHERLOCK and BOTHUNT)
(Two-stage approaches to detect chatbotted streams and then
bots in those streams) 11
Outline
● Approach 1
○ Problem Statement
○ Data Description
○ Initial Observations
○ Proposed Framework
○ Experiments
● Approach 2
○ Problem Statement
○ Initial Observations
○ Proposed Framework
○ Experiments
12
Outline
● Approach 1
○ Problem Statement
○ Data Description
○ Initial Observations
○ Proposed Framework
○ Experiments
● Approach 2
○ Problem Statement
○ Initial Observations
○ Proposed Framework
○ Experiments
13
Problem Statement
Stage 1
Given a set of streams, and message on those
Streams, IDENTIFY chatbotted streams.
Stage 2
Given a chatbotted stream, and chatters on
that stream, LABEL each chatter as bot or
not.
14
Outline
● Approach 1
○ Problem Statement
○ Data Description
○ Initial Observations
○ Proposed Framework
○ Experiments
● Approach 2
○ Problem Statement
○ Initial Observations
○ Proposed Framework
○ Experiments
15
Data Description
● 690 random streams
● 439K messages
● Aug-Oct 2018
● Manual annotation:
○ 183 streams
○ Based on text, # messages & user
metadata.
● Discovered 24 botted streams.
● Annotation was time-consuming.
○ Not scalable
16
Outline
● Approach 1
○ Problem Statement
○ Data Description
○ Initial Observations
○ Proposed Framework
○ Experiments
● Approach 2
○ Problem Statement
○ Initial Observations
○ Proposed Framework
○ Experiments
17
Initial Observations
Message Frequency
Chatbots tend to post more messages than genuine users, with most chatbots
posting messages with similar frequency. 18
Initial Observations
Inter-Message Delays (IMDs)
Chatbotted streams have a higher IMD than genuine streams.
Chatbots have a consistent IMD showing that they are automated. 19
Initial Observations
Message Spread
Chatbots’ message distribution is more spread out, and on average, they post
consistently throughout the stream. 20
Outline
● Approach 1
○ Problem Statement
○ Data Description
○ Initial Observations
○ Proposed Framework
○ Experiments
● Approach 2
○ Problem Statement
○ Initial Observations
○ Proposed Framework
○ Experiments
21
Proposed Framework: SHERLOCK
Stage 1
Given a set of streams, and message on those
Streams, IDENTIFY chatbotted streams.
Stage 2
Given a chatbotted stream, and chatters on
that stream, LABEL each chatter as bot or not.
22
SHERLOCK: Stage 1
Feature Set
● Modeled as Classification Task.
● Based on 3 categories found useful from preliminary analysis
○ Number of Messages
○ IMD Quantiles (60, 70, 80, 90th quantile)
○ Number of Windows
● Corresponding to each feature category, we generate
features from their distribution in the stream per user:
○ #Messages, #Windows - 3 features related to their skewness
○ IMD Quantiles - 4 features, percentile ranging from 50 23
Proposed Framework: SHERLOCK
Stage 1
Given a set of streams, and message on those
Streams, IDENTIFY chatbotted streams.
Stage 2
Given a chatbotted stream, and chatters on
that stream, LABEL each chatter as bot or not.
24
SHERLOCK: Stage 2
Stage 2
● Semi-Supervised Learning Approach.
● Sensitive to the seeds chosen.
● We follow a heuristic based approach to identify seeds.
● Basic Idea:
○ Level 1 Clustering - cluster on mean IMD and #Messages (Most Discriminative)
○ Exonerating based on subscription status.
○ Level 2 clustering - based on synchronicity on entropy IMD and entropy
#Windows
25
SHERLOCK: Stage 2
Stage 2
● On obtaining seeds - marking users as genuine, and bots with
high confidence.
○ We generate a KNN graph between all the users.
● Graph-based Label Propagation.
26
Outline
● Approach 1
○ Problem Statement
○ Data Description
○ Initial Observations
○ Proposed Framework
○ Experiments
● Approach 2
○ Problem Statement
○ Initial Observations
○ Proposed Framework
○ Experiments
27
Experiments
Baselines
● SSC (Supervised Spam Classifier)1
○ Originally, used for Twitter spammer detection. Customized to detect bots.
○ Features: max, min, median of #words, characters, URLs and IMDs
○ Supervised Algorithm
● SynchroTrap2
○ Group-based method to detect bots.
○ Construct graph such that similar nodes (IMD and #messages per window) are connected.
○ Cluster matrix into 2 groups. Empirically, highest bots cluster is marked as bot.
○ We apply on each stream.
1. Benevenuto et al.,2010 2. Yang et al.,2014
28
Experiments
Real World Results
● Dataset:183 Streams, 24 Botted, 159 Genuine
● SHERLOCK
○ Stage 1: 5-fold CV (98.3% accuracy)
○ Stage 2: Ran only on
streams marked as bots
in Stage1.
● SSC - All Users, 5-fold CV
● SynchroTrap - On all streams.
SHERLOCK achieves strong performance on real-world data.
Stage II Performance
29
Experiments
Synthetic Dataset Generation
● Dataset
○ [Botted Stream] Paid a chatbotting service to run on empty stream.
○ Used the texts obtained and inter-arrival times as chatbot set.
○ [Real Stream] Collected data from random streams of different durations.
○ [Corrupted Stream] Merged the random and chatbotted streams.
● Noise Models
○ Controlled Chatters (CC): Fix number of chatters, vary delay.
○ Rapid Increase (RI): Small number, large delay => increase number, decrease delay.
○ Gradual Increase (GI): Same as RI, but more uniform.
○ Organic Growth (OG): Increase number over time. For each increase, gradually
decrease delay.
30
Experiments
Synthetic Dataset Results
SHERLOCK (Stage 1) is robust under various noise models.
31
Experiments
Synthetic Dataset Results
40% chatbots 60% chatbots 80% chatbots
OG Model is most challenging for SHERLOCK.
Performance is worst
for OG attack model.
32
Experiments
Synthetic Dataset Results
40% chatbots 60% chatbots 80% chatbots
Duration impacts SHERLOCK minimally.
Effect of Duration is
minimal.
33
Experiments
Synthetic Dataset Results
40% chatbots 60% chatbots 80% chatbots
More the chatbots, easier to discover (Higher SNR)
More the chatbots, better the
performance.
34
Scalability
SHERLOCK is scalable for both stages.
Stage 1 Stage 2
35
Outline
● Approach 1
○ Problem Statement
○ Data Description
○ Initial Observations
○ Proposed Framework
○ Experiments
● Approach 2
○ Problem Statement
○ Initial Observations
○ Proposed Framework
○ Experiments
36
Problem Statement
Stage 1
Given a set of streams, and message on those Streams, IDENTIFY
chatbotted streams using TEXTUAL features.
Stage 2
Given a chatbotted stream, and chatters on that stream, LABEL
each chatter as bot or not using TEXTUAL features.
37
Outline
● Approach 1
○ Problem Statement
○ Data Description
○ Initial Observations
○ Proposed Framework
○ Experiments
● Approach 2
○ Problem Statement
○ Initial Observations
○ Proposed Framework
○ Experiments
38
Initial Observations
39
Inter Message Delays (IMDs)
● Chatbotted streams have a higher IMD
than genuine streams.
● Chatbots have a consistent IMD showing
that they are automated.
Initial Observations
40
Message Lengths (MLs)
● Bot users post sampled messages at
decided timestamp
● Bot users messages are redundant
unlike Real Genuine users with variable
MLs
Initial Observations
Message Content
● Genuine users converse with bot and real users, tag users in
comments.
● Bot users post sampled messages provided by streamer.
● Bot users might tag bot users not real ones as it is dynamic.
41
Outline
● Approach 1
○ Problem Statement
○ Data Description
○ Initial Observations
○ Proposed Framework
○ Experiments
● Approach 2
○ Problem Statement
○ Initial Observations
○ Proposed Framework
○ Experiments
42
Proposed Framework : BOTHUNT
Stage 1
Given a set of streams, and message on those Streams, IDENTIFY
chatbotted streams using TEXTUAL features.
Stage 2
Given a chatbotted stream, and chatters on that stream, LABEL
each chatter as bot or not using TEXTUAL features.
43
BOTHUNT : Stage 1
Feature Set
● Modeled as Classification Task.
● Based on 2 categories found useful from preliminary analysis
○ IMD Corrected Conditional Entropy Quantiles (25, 50, 75th quantile)
○ ML Corrected Conditional Entropy Quantiles (25, 50, 75th quantile)
● Corresponding to each feature category, we generate features
from their distribution in the stream per user:
○ IMD Quantiles - 3 features
○ ML Quantiles - 3 features
44
Proposed Framework : BOTHUNT
Stage 1
Given a set of streams, and message on those Streams, IDENTIFY
chatbotted streams using TEXTUAL features.
Stage 2
Given a chatbotted stream, and chatters on that stream, LABEL
each chatter as bot or not using TEXTUAL features.
45
BOTHUNT : Stage 2
Stage 2
● Semi-Supervised Learning Approach.
● IMDs CCE, MLs CCE, Conversational Features, User Tag
Features and Channel Followers as features.
● We follow a heuristic based approach to identify seeds.
● Basic Idea:
○ Clustering - based on synchronicity on CCE of IMD and CCE of ML
○ Exonerating based on subscription status.
46
BOTHUNT : Stage 2
Stage 2
● On obtaining seeds - marking users as genuine, and bots with
high confidence.
○ We generate a graph, using conversational cues and user
tags as features, between all the users.
● Graph-based Label Propagation.
47
Outline
● Approach 1
○ Problem Statement
○ Data Description
○ Initial Observations
○ Proposed Framework
○ Experiments
● Approach 2
○ Problem Statement
○ Initial Observations
○ Proposed Framework
○ Experiments
48
Experiments
Baselines
● Revised SSC (Supervised Spam Classifier)1
○ Originally, used for Twitter spammer detection. Customized to detect bots.
○ Features: max, min, mean, median of #words, characters, URLs and IMDs
○ Supervised Algorithm
● Revised SynchroTrap2
○ Group-based method to detect bots.
○ Construct graph such that similar nodes (ML bins, # characters bins, user mentions, URLs used)
are connected.
○ Cluster matrix into 2 groups. Empirically, highest bots cluster is marked as bot.
○ We apply on each stream.
1. Benevenuto et al.,2010 2. Yang et al.,2014 49
Experiments
Baselines
● User Text Similarity (UTS) Model1
○ User-level Supervised approach
○ Uses Content and Graph based features for user in Twitter setting
○ Measure duplicacy of all messages posted by a user using Levenshtein distance
○ Min , Max, Mean, Median of distance acts as content based feature
○ Channel followers acts as Graph based feature
1. Wang et al., 2010
50
Experiments
Real World Results
● Dataset:183 Streams, 24 Botted, 159 Genuine
● BOTHUNT Stage II Performance
○ Stage 1: 5-fold CV(86.04% accuracy)
○ Stage 2: Ran only on
streams marked as bots
in Stage1.
● Revised SSC - All Users, 5-fold CV
● Revised SynchroTrap - On all streams.
● UTS - All users, 5-fold CV
BOTHUNT achieves strong performance on real-world data.
51
Experiments
Synthetic Dataset Results
BOTHUNT (Stage 1) is robust under various noise models.
52
Experiments
Stage 2
Obtained an average accuracy of around 85% across all labeled chatbotted streams irrespective
of type of attack model.
53
Conclusions
● Proposed and formulated the problem of detecting chatbots on livestreaming platforms.
● Collected real-world data, and provided a method for generating realistic synthetic
datasets for experimentation.
● Proposed a two-stage approach (SHERLOCK & BOTHUNT) for detecting chatbotted
streams and users.
● With real world and synthetic experiments, we showed that:
○ SHERLOCK and BOTHUNT are efficient in detecting chatbots with upto 97%
precision & 85% accuracy respectively.
○ Robust under a variety of noise models, obtaining upto 0.68 F-1 score(SHERLOCK)
and 66% accuracy (BOTHUNT) under the most challenging scenario.
○ Scalable for both stages - Linear in number of streams and number of users.
54
Conclusions
● The method is generic, applicable to other platforms, the only limitation is if bot
service provider for platforms other than Twitch offers different attack settings.
● Next we will merge the features SHERLOCK and BOTHUNT to check the efficacy of
the resulted framework.
55
Relevant publication
● Jain, S., Niranjan, D., Lamba, H., and Kumaraguru, P. Characterizing and Detecting
Livestreaming Chatbots. ASONAM 2019. Full paper:
http://precog.iiitd.edu.in/pubs/Characterizing_and_Detecting_Livestreaming_Chatbo
ts-ASONAM19.pdf
56
Acknowledgements
57
Grateful to Neil Shah, Hemank Lamba and Dipankar Niranjan for collaborating with
me on this work
Thankful to Dr. Ponnurangam Kumaraguru for his inputs and suggestions
Thankful to Dr. Manish Shrivatsava and Prof. Kamakshi Prasad for evaluating the
work and giving valuable inputs.
Thank You!
58

Mais conteúdo relacionado

Semelhante a Characterizing and Detecting Livestreaming Chatbots

Building Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editorBuilding Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editorSun-Li Beatteay
 
LTTng-UST: Efficient System-Wide User-Space Tracing
LTTng-UST: Efficient System-Wide User-Space TracingLTTng-UST: Efficient System-Wide User-Space Tracing
LTTng-UST: Efficient System-Wide User-Space TracingChristian Babeux
 
Challenges and experiences with IPTV from a network point of view
Challenges and experiences with IPTV from a network point of viewChallenges and experiences with IPTV from a network point of view
Challenges and experiences with IPTV from a network point of viewbrouer
 
Дмитрий Копляров , Потокобезопасные сигналы в C++
Дмитрий Копляров , Потокобезопасные сигналы в C++Дмитрий Копляров , Потокобезопасные сигналы в C++
Дмитрий Копляров , Потокобезопасные сигналы в C++Sergey Platonov
 
Measuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMeasuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMatthew Rowe
 
Comparison of mqtt and coap protocol
Comparison of mqtt and coap protocolComparison of mqtt and coap protocol
Comparison of mqtt and coap protocolYUSUF HUMAYUN
 
Jaswanth-PPT.pptx
Jaswanth-PPT.pptxJaswanth-PPT.pptx
Jaswanth-PPT.pptxreenarocky
 
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player EnvironmentPolicy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player EnvironmentMinh Nguyen
 
2 years into drinking the Microservice kool-aid (Fact and Fiction)
2 years into drinking the Microservice kool-aid (Fact and Fiction)2 years into drinking the Microservice kool-aid (Fact and Fiction)
2 years into drinking the Microservice kool-aid (Fact and Fiction)roblund
 
Proof-of-Stake & Its Improvements (San Francisco Bitcoin Devs Hackathon)
Proof-of-Stake & Its Improvements (San Francisco Bitcoin Devs Hackathon)Proof-of-Stake & Its Improvements (San Francisco Bitcoin Devs Hackathon)
Proof-of-Stake & Its Improvements (San Francisco Bitcoin Devs Hackathon)Alex Chepurnoy
 
BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)
BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)
BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)Igalia
 
Machine Learning Based Botnet Detection
Machine Learning Based Botnet DetectionMachine Learning Based Botnet Detection
Machine Learning Based Botnet Detectionbutest
 
Big Decimal: Avoid Rounding Errors on Decimals in JavaScript
Big Decimal: Avoid Rounding Errors on Decimals in JavaScriptBig Decimal: Avoid Rounding Errors on Decimals in JavaScript
Big Decimal: Avoid Rounding Errors on Decimals in JavaScriptIgalia
 
A Personalized Software Assistant Framework To Achieve User Goals
A Personalized Software Assistant Framework To Achieve User GoalsA Personalized Software Assistant Framework To Achieve User Goals
A Personalized Software Assistant Framework To Achieve User GoalsPradeep K. Venkatesh
 
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player EnvironmentPolicy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player EnvironmentAlpen-Adria-Universität
 
A Taxonomy of Botnet Detection Approaches
A Taxonomy of Botnet Detection ApproachesA Taxonomy of Botnet Detection Approaches
A Taxonomy of Botnet Detection ApproachesFabrizio Farinacci
 

Semelhante a Characterizing and Detecting Livestreaming Chatbots (20)

Building chat bot
Building chat botBuilding chat bot
Building chat bot
 
Building Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editorBuilding Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editor
 
LTTng-UST: Efficient System-Wide User-Space Tracing
LTTng-UST: Efficient System-Wide User-Space TracingLTTng-UST: Efficient System-Wide User-Space Tracing
LTTng-UST: Efficient System-Wide User-Space Tracing
 
Challenges and experiences with IPTV from a network point of view
Challenges and experiences with IPTV from a network point of viewChallenges and experiences with IPTV from a network point of view
Challenges and experiences with IPTV from a network point of view
 
Дмитрий Копляров , Потокобезопасные сигналы в C++
Дмитрий Копляров , Потокобезопасные сигналы в C++Дмитрий Копляров , Потокобезопасные сигналы в C++
Дмитрий Копляров , Потокобезопасные сигналы в C++
 
Measuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online CommunitiesMeasuring the Topical Specificity of Online Communities
Measuring the Topical Specificity of Online Communities
 
Comparison of mqtt and coap protocol
Comparison of mqtt and coap protocolComparison of mqtt and coap protocol
Comparison of mqtt and coap protocol
 
Jaswanth-PPT.pptx
Jaswanth-PPT.pptxJaswanth-PPT.pptx
Jaswanth-PPT.pptx
 
Deep MIML Network
Deep MIML NetworkDeep MIML Network
Deep MIML Network
 
Chapter04
Chapter04Chapter04
Chapter04
 
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player EnvironmentPolicy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
 
2 years into drinking the Microservice kool-aid (Fact and Fiction)
2 years into drinking the Microservice kool-aid (Fact and Fiction)2 years into drinking the Microservice kool-aid (Fact and Fiction)
2 years into drinking the Microservice kool-aid (Fact and Fiction)
 
Proof-of-Stake & Its Improvements (San Francisco Bitcoin Devs Hackathon)
Proof-of-Stake & Its Improvements (San Francisco Bitcoin Devs Hackathon)Proof-of-Stake & Its Improvements (San Francisco Bitcoin Devs Hackathon)
Proof-of-Stake & Its Improvements (San Francisco Bitcoin Devs Hackathon)
 
BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)
BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)
BigDecimal: Avoid rounding errors on decimals in JavaScript (Node.TLV 2020)
 
TorCoin_slides
TorCoin_slidesTorCoin_slides
TorCoin_slides
 
Machine Learning Based Botnet Detection
Machine Learning Based Botnet DetectionMachine Learning Based Botnet Detection
Machine Learning Based Botnet Detection
 
Big Decimal: Avoid Rounding Errors on Decimals in JavaScript
Big Decimal: Avoid Rounding Errors on Decimals in JavaScriptBig Decimal: Avoid Rounding Errors on Decimals in JavaScript
Big Decimal: Avoid Rounding Errors on Decimals in JavaScript
 
A Personalized Software Assistant Framework To Achieve User Goals
A Personalized Software Assistant Framework To Achieve User GoalsA Personalized Software Assistant Framework To Achieve User Goals
A Personalized Software Assistant Framework To Achieve User Goals
 
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player EnvironmentPolicy-Driven Dynamic HTTP Adaptive Streaming Player Environment
Policy-Driven Dynamic HTTP Adaptive Streaming Player Environment
 
A Taxonomy of Botnet Detection Approaches
A Taxonomy of Botnet Detection ApproachesA Taxonomy of Botnet Detection Approaches
A Taxonomy of Botnet Detection Approaches
 

Mais de IIIT Hyderabad

Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayResponsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayIIIT Hyderabad
 
International Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesInternational Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesIIIT Hyderabad
 
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasResponsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasIIIT Hyderabad
 
Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIIIT Hyderabad
 
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyData Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyIIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityIIIT Hyderabad
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...
Data Science for Social Good: #LegalNLP #AlgorithmicBias...IIIT Hyderabad
 
How to Write a (Good) Research Paper
How to Write a (Good) Research Paper How to Write a (Good) Research Paper
How to Write a (Good) Research Paper IIIT Hyderabad
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasData Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasIIIT Hyderabad
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in IndiaIIIT Hyderabad
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in IndiaIIIT Hyderabad
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...IIIT Hyderabad
 
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayPrivacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayIIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
 
Leveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceLeveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceIIIT Hyderabad
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...IIIT Hyderabad
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesIIIT Hyderabad
 

Mais de IIIT Hyderabad (20)

Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayResponsible & Safe AI Systems at ACM India ROCS at IIT Bombay
Responsible & Safe AI Systems at ACM India ROCS at IIT Bombay
 
International Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success storiesInternational Collaboration: Experiences, Challenges, Success stories
International Collaboration: Experiences, Challenges, Success stories
 
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBiasResponsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
Responsible & Safe AI: #LegalBias #Inconsistency #BiasinLLMs #MultiModalBias
 
Identify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake NewsIdentify, Inspect and Intervene Multimodal Fake News
Identify, Inspect and Intervene Multimodal Fake News
 
#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI#ChatGPT #ResponsibleAI
#ChatGPT #ResponsibleAI
 
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyData Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafety
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityBeyond the Surface: A Computational Exploration of Linguistic Ambiguity
Beyond the Surface: A Computational Exploration of Linguistic Ambiguity
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...Data Science for Social Good:                      #LegalNLP #AlgorithmicBias...
Data Science for Social Good: #LegalNLP #AlgorithmicBias...
 
How to Write a (Good) Research Paper
How to Write a (Good) Research Paper How to Write a (Good) Research Paper
How to Write a (Good) Research Paper
 
Data Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBiasData Science for Social Good: #LegalNLP #AlgorithmicBias
Data Science for Social Good: #LegalNLP #AlgorithmicBias
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in India
 
Social Computing Research in India
Social Computing Research in IndiaSocial Computing Research in India
Social Computing Research in India
 
Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...Modeling Online User Interactions and their Offline effects on Socio-Technica...
Modeling Online User Interactions and their Offline effects on Socio-Technica...
 
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayPrivacy. Winter School on “Topics in Digital Trust”. IIT Bombay
Privacy. Winter School on “Topics in Digital Trust”. IIT Bombay
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...It is our choices, Harry, that show what we truly are, far more than our abil...
It is our choices, Harry, that show what we truly are, far more than our abil...
 
Leveraging Social Media for Financial Advice
Leveraging Social Media for Financial AdviceLeveraging Social Media for Financial Advice
Leveraging Social Media for Financial Advice
 
Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...Development of Stress Induction and Detection System to Study its Effect on B...
Development of Stress Induction and Detection System to Study its Effect on B...
 
A Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian LanguagesA Framework for Automatic Question Answering in Indian Languages
A Framework for Automatic Question Answering in Indian Languages
 

Último

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 

Último (20)

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 

Characterizing and Detecting Livestreaming Chatbots

  • 1. Characterizing and Detecting Livestreaming Chatbots SHREYA JAIN (shreya.jain@research.iiit.ac.in) 1
  • 2. Motivation 15M Daily Active Users. 2.2-3.2M Broadcasters. 44B Minutes Watched per Month. 4K Streams per day. 1.17M Broadcasters 10B Minutes Watched per Month 1.2M Active Users. 200M Broadcasts 2
  • 5. Stream Engagement Number of Views Views Chats Number of Messages Influences Recommendation 1. 2. 3. Recommended to end-users 5
  • 6. Stream Monetization Livestreamer Higher engagement → Higher revenue. This gives streamers enough reasons to manipulate the system 6
  • 7. Chatbot Marketplace There are numerous chatbot services that can be used by the streamers. The streamer can manipulate various features of the chatbots being delivered… ● number of bots, ● text of messages, ● inter-message delays Chatbotting is user/customer-customizable. 7
  • 9. Adverse Effects & Challenges Effect of Fraud Challenges ● Loss of Users: Honest streamers are not recommended; and they leave. ● Loss of Trust: Viewers and streamers lose trust in the platform. ● Financial Loss: Platform loses money to fraud ● Noisy Text: Ill-formed sentences, spelling mistakes, etc. ● User-Controlled Fraud: Customization gives control to avoid fraud. ● Lack of Ground Truth. 9
  • 10. Related Work Text-Based Methods Bot Detection Astroturfing on Social Media ● Classification in IM (Yahoo chat) settings. ● Detecting chatbots based on links (spam). ● Graph-based factorization ● Random-walk based. ● FLOCK ● Twitter bot detection ● Facebook bot detection etc. Bots on livestream want to increase chatter - not generate money through spam URLS. Not focusing on chatbots specifically. Not focusing on chatbots. Closest is FLOCK, that focuses on identifying viewbots on livestreaming platforms. 10
  • 11. This Work In this work, we provide the following contributions 1. Problem Formulation (Formalize the chatbot detection problem) 2. Dataset Collection and Characterization (Real livestreaming chatlog data and various attack data) 3. Proposed Frameworks (SHERLOCK and BOTHUNT) (Two-stage approaches to detect chatbotted streams and then bots in those streams) 11
  • 12. Outline ● Approach 1 ○ Problem Statement ○ Data Description ○ Initial Observations ○ Proposed Framework ○ Experiments ● Approach 2 ○ Problem Statement ○ Initial Observations ○ Proposed Framework ○ Experiments 12
  • 13. Outline ● Approach 1 ○ Problem Statement ○ Data Description ○ Initial Observations ○ Proposed Framework ○ Experiments ● Approach 2 ○ Problem Statement ○ Initial Observations ○ Proposed Framework ○ Experiments 13
  • 14. Problem Statement Stage 1 Given a set of streams, and message on those Streams, IDENTIFY chatbotted streams. Stage 2 Given a chatbotted stream, and chatters on that stream, LABEL each chatter as bot or not. 14
  • 15. Outline ● Approach 1 ○ Problem Statement ○ Data Description ○ Initial Observations ○ Proposed Framework ○ Experiments ● Approach 2 ○ Problem Statement ○ Initial Observations ○ Proposed Framework ○ Experiments 15
  • 16. Data Description ● 690 random streams ● 439K messages ● Aug-Oct 2018 ● Manual annotation: ○ 183 streams ○ Based on text, # messages & user metadata. ● Discovered 24 botted streams. ● Annotation was time-consuming. ○ Not scalable 16
  • 17. Outline ● Approach 1 ○ Problem Statement ○ Data Description ○ Initial Observations ○ Proposed Framework ○ Experiments ● Approach 2 ○ Problem Statement ○ Initial Observations ○ Proposed Framework ○ Experiments 17
  • 18. Initial Observations Message Frequency Chatbots tend to post more messages than genuine users, with most chatbots posting messages with similar frequency. 18
  • 19. Initial Observations Inter-Message Delays (IMDs) Chatbotted streams have a higher IMD than genuine streams. Chatbots have a consistent IMD showing that they are automated. 19
  • 20. Initial Observations Message Spread Chatbots’ message distribution is more spread out, and on average, they post consistently throughout the stream. 20
  • 21. Outline ● Approach 1 ○ Problem Statement ○ Data Description ○ Initial Observations ○ Proposed Framework ○ Experiments ● Approach 2 ○ Problem Statement ○ Initial Observations ○ Proposed Framework ○ Experiments 21
  • 22. Proposed Framework: SHERLOCK Stage 1 Given a set of streams, and message on those Streams, IDENTIFY chatbotted streams. Stage 2 Given a chatbotted stream, and chatters on that stream, LABEL each chatter as bot or not. 22
  • 23. SHERLOCK: Stage 1 Feature Set ● Modeled as Classification Task. ● Based on 3 categories found useful from preliminary analysis ○ Number of Messages ○ IMD Quantiles (60, 70, 80, 90th quantile) ○ Number of Windows ● Corresponding to each feature category, we generate features from their distribution in the stream per user: ○ #Messages, #Windows - 3 features related to their skewness ○ IMD Quantiles - 4 features, percentile ranging from 50 23
  • 24. Proposed Framework: SHERLOCK Stage 1 Given a set of streams, and message on those Streams, IDENTIFY chatbotted streams. Stage 2 Given a chatbotted stream, and chatters on that stream, LABEL each chatter as bot or not. 24
  • 25. SHERLOCK: Stage 2 Stage 2 ● Semi-Supervised Learning Approach. ● Sensitive to the seeds chosen. ● We follow a heuristic based approach to identify seeds. ● Basic Idea: ○ Level 1 Clustering - cluster on mean IMD and #Messages (Most Discriminative) ○ Exonerating based on subscription status. ○ Level 2 clustering - based on synchronicity on entropy IMD and entropy #Windows 25
  • 26. SHERLOCK: Stage 2 Stage 2 ● On obtaining seeds - marking users as genuine, and bots with high confidence. ○ We generate a KNN graph between all the users. ● Graph-based Label Propagation. 26
  • 27. Outline ● Approach 1 ○ Problem Statement ○ Data Description ○ Initial Observations ○ Proposed Framework ○ Experiments ● Approach 2 ○ Problem Statement ○ Initial Observations ○ Proposed Framework ○ Experiments 27
  • 28. Experiments Baselines ● SSC (Supervised Spam Classifier)1 ○ Originally, used for Twitter spammer detection. Customized to detect bots. ○ Features: max, min, median of #words, characters, URLs and IMDs ○ Supervised Algorithm ● SynchroTrap2 ○ Group-based method to detect bots. ○ Construct graph such that similar nodes (IMD and #messages per window) are connected. ○ Cluster matrix into 2 groups. Empirically, highest bots cluster is marked as bot. ○ We apply on each stream. 1. Benevenuto et al.,2010 2. Yang et al.,2014 28
  • 29. Experiments Real World Results ● Dataset:183 Streams, 24 Botted, 159 Genuine ● SHERLOCK ○ Stage 1: 5-fold CV (98.3% accuracy) ○ Stage 2: Ran only on streams marked as bots in Stage1. ● SSC - All Users, 5-fold CV ● SynchroTrap - On all streams. SHERLOCK achieves strong performance on real-world data. Stage II Performance 29
  • 30. Experiments Synthetic Dataset Generation ● Dataset ○ [Botted Stream] Paid a chatbotting service to run on empty stream. ○ Used the texts obtained and inter-arrival times as chatbot set. ○ [Real Stream] Collected data from random streams of different durations. ○ [Corrupted Stream] Merged the random and chatbotted streams. ● Noise Models ○ Controlled Chatters (CC): Fix number of chatters, vary delay. ○ Rapid Increase (RI): Small number, large delay => increase number, decrease delay. ○ Gradual Increase (GI): Same as RI, but more uniform. ○ Organic Growth (OG): Increase number over time. For each increase, gradually decrease delay. 30
  • 31. Experiments Synthetic Dataset Results SHERLOCK (Stage 1) is robust under various noise models. 31
  • 32. Experiments Synthetic Dataset Results 40% chatbots 60% chatbots 80% chatbots OG Model is most challenging for SHERLOCK. Performance is worst for OG attack model. 32
  • 33. Experiments Synthetic Dataset Results 40% chatbots 60% chatbots 80% chatbots Duration impacts SHERLOCK minimally. Effect of Duration is minimal. 33
  • 34. Experiments Synthetic Dataset Results 40% chatbots 60% chatbots 80% chatbots More the chatbots, easier to discover (Higher SNR) More the chatbots, better the performance. 34
  • 35. Scalability SHERLOCK is scalable for both stages. Stage 1 Stage 2 35
  • 36. Outline ● Approach 1 ○ Problem Statement ○ Data Description ○ Initial Observations ○ Proposed Framework ○ Experiments ● Approach 2 ○ Problem Statement ○ Initial Observations ○ Proposed Framework ○ Experiments 36
  • 37. Problem Statement Stage 1 Given a set of streams, and message on those Streams, IDENTIFY chatbotted streams using TEXTUAL features. Stage 2 Given a chatbotted stream, and chatters on that stream, LABEL each chatter as bot or not using TEXTUAL features. 37
  • 38. Outline ● Approach 1 ○ Problem Statement ○ Data Description ○ Initial Observations ○ Proposed Framework ○ Experiments ● Approach 2 ○ Problem Statement ○ Initial Observations ○ Proposed Framework ○ Experiments 38
  • 39. Initial Observations 39 Inter Message Delays (IMDs) ● Chatbotted streams have a higher IMD than genuine streams. ● Chatbots have a consistent IMD showing that they are automated.
  • 40. Initial Observations 40 Message Lengths (MLs) ● Bot users post sampled messages at decided timestamp ● Bot users messages are redundant unlike Real Genuine users with variable MLs
  • 41. Initial Observations Message Content ● Genuine users converse with bot and real users, tag users in comments. ● Bot users post sampled messages provided by streamer. ● Bot users might tag bot users not real ones as it is dynamic. 41
  • 42. Outline ● Approach 1 ○ Problem Statement ○ Data Description ○ Initial Observations ○ Proposed Framework ○ Experiments ● Approach 2 ○ Problem Statement ○ Initial Observations ○ Proposed Framework ○ Experiments 42
  • 43. Proposed Framework : BOTHUNT Stage 1 Given a set of streams, and message on those Streams, IDENTIFY chatbotted streams using TEXTUAL features. Stage 2 Given a chatbotted stream, and chatters on that stream, LABEL each chatter as bot or not using TEXTUAL features. 43
  • 44. BOTHUNT : Stage 1 Feature Set ● Modeled as Classification Task. ● Based on 2 categories found useful from preliminary analysis ○ IMD Corrected Conditional Entropy Quantiles (25, 50, 75th quantile) ○ ML Corrected Conditional Entropy Quantiles (25, 50, 75th quantile) ● Corresponding to each feature category, we generate features from their distribution in the stream per user: ○ IMD Quantiles - 3 features ○ ML Quantiles - 3 features 44
  • 45. Proposed Framework : BOTHUNT Stage 1 Given a set of streams, and message on those Streams, IDENTIFY chatbotted streams using TEXTUAL features. Stage 2 Given a chatbotted stream, and chatters on that stream, LABEL each chatter as bot or not using TEXTUAL features. 45
  • 46. BOTHUNT : Stage 2 Stage 2 ● Semi-Supervised Learning Approach. ● IMDs CCE, MLs CCE, Conversational Features, User Tag Features and Channel Followers as features. ● We follow a heuristic based approach to identify seeds. ● Basic Idea: ○ Clustering - based on synchronicity on CCE of IMD and CCE of ML ○ Exonerating based on subscription status. 46
  • 47. BOTHUNT : Stage 2 Stage 2 ● On obtaining seeds - marking users as genuine, and bots with high confidence. ○ We generate a graph, using conversational cues and user tags as features, between all the users. ● Graph-based Label Propagation. 47
  • 48. Outline ● Approach 1 ○ Problem Statement ○ Data Description ○ Initial Observations ○ Proposed Framework ○ Experiments ● Approach 2 ○ Problem Statement ○ Initial Observations ○ Proposed Framework ○ Experiments 48
  • 49. Experiments Baselines ● Revised SSC (Supervised Spam Classifier)1 ○ Originally, used for Twitter spammer detection. Customized to detect bots. ○ Features: max, min, mean, median of #words, characters, URLs and IMDs ○ Supervised Algorithm ● Revised SynchroTrap2 ○ Group-based method to detect bots. ○ Construct graph such that similar nodes (ML bins, # characters bins, user mentions, URLs used) are connected. ○ Cluster matrix into 2 groups. Empirically, highest bots cluster is marked as bot. ○ We apply on each stream. 1. Benevenuto et al.,2010 2. Yang et al.,2014 49
  • 50. Experiments Baselines ● User Text Similarity (UTS) Model1 ○ User-level Supervised approach ○ Uses Content and Graph based features for user in Twitter setting ○ Measure duplicacy of all messages posted by a user using Levenshtein distance ○ Min , Max, Mean, Median of distance acts as content based feature ○ Channel followers acts as Graph based feature 1. Wang et al., 2010 50
  • 51. Experiments Real World Results ● Dataset:183 Streams, 24 Botted, 159 Genuine ● BOTHUNT Stage II Performance ○ Stage 1: 5-fold CV(86.04% accuracy) ○ Stage 2: Ran only on streams marked as bots in Stage1. ● Revised SSC - All Users, 5-fold CV ● Revised SynchroTrap - On all streams. ● UTS - All users, 5-fold CV BOTHUNT achieves strong performance on real-world data. 51
  • 52. Experiments Synthetic Dataset Results BOTHUNT (Stage 1) is robust under various noise models. 52
  • 53. Experiments Stage 2 Obtained an average accuracy of around 85% across all labeled chatbotted streams irrespective of type of attack model. 53
  • 54. Conclusions ● Proposed and formulated the problem of detecting chatbots on livestreaming platforms. ● Collected real-world data, and provided a method for generating realistic synthetic datasets for experimentation. ● Proposed a two-stage approach (SHERLOCK & BOTHUNT) for detecting chatbotted streams and users. ● With real world and synthetic experiments, we showed that: ○ SHERLOCK and BOTHUNT are efficient in detecting chatbots with upto 97% precision & 85% accuracy respectively. ○ Robust under a variety of noise models, obtaining upto 0.68 F-1 score(SHERLOCK) and 66% accuracy (BOTHUNT) under the most challenging scenario. ○ Scalable for both stages - Linear in number of streams and number of users. 54
  • 55. Conclusions ● The method is generic, applicable to other platforms, the only limitation is if bot service provider for platforms other than Twitch offers different attack settings. ● Next we will merge the features SHERLOCK and BOTHUNT to check the efficacy of the resulted framework. 55
  • 56. Relevant publication ● Jain, S., Niranjan, D., Lamba, H., and Kumaraguru, P. Characterizing and Detecting Livestreaming Chatbots. ASONAM 2019. Full paper: http://precog.iiitd.edu.in/pubs/Characterizing_and_Detecting_Livestreaming_Chatbo ts-ASONAM19.pdf 56
  • 57. Acknowledgements 57 Grateful to Neil Shah, Hemank Lamba and Dipankar Niranjan for collaborating with me on this work Thankful to Dr. Ponnurangam Kumaraguru for his inputs and suggestions Thankful to Dr. Manish Shrivatsava and Prof. Kamakshi Prasad for evaluating the work and giving valuable inputs.