SlideShare uma empresa Scribd logo
1 de 32
Baixar para ler offline
Mentions of Security
Vulnerabilities on Reddit,
Twitter and GitHub
Sameera Horawalavithana*,
Abhishek Bhattacharjee, Renhao Liu, Nazim Choudhury,
Lawrence O. Hall, Adriana Iamnitchi
University of South Florida
IEEE/WIC/ACM International Conference on Web Intelligence, Thessaloniki, Greece
Security Vulnerabilities
❏ Identified by CVE (Common
Vulnerabilities and Exposures)
identifiers:
❏ Publicly known security
vulnerability is uniquely identified
by a pattern CVE-YYYY-NNNN
❏ Formally recorded in National
Vulnerability Database (NVD)
❏ “U.S. government repository of
standards based vulnerability
management data represented
using the Security Content
Automation Protocol (SCAP)”
❏ Discussed on social media
2CVEs published in NVD over time.
Research Questions
1) What is the relationship between
mentions of security
vulnerabilities as posted on
Twitter, Reddit and GitHub?
2) Can the software development
activities in GitHub be predicted
from the discussions on Reddit
and Twitter?
3
Outline
❏ Dataset
❏ Data analysis
❏ CVE mentions in Reddit and Twitter
❏ CVE mentions in GitHub actions
❏ Predicting GitHub activities by using Reddit and Twitter activity signals
❏ Summary
4
Datasets
❏ Two social-media platforms: Reddit and
Twitter
❏ One software collaborative platform:
GitHub
❏ 18 months of records: 03/16-08/17
❏ Data filtering using the regular expression
CVE-d{4}-d{4} to match CVE identifiers
that appeared in posts, comments in
Reddit, tweets, replies in Twitter, and
GitHub event descriptions
5
RQ1: What is the relationship between mentions of security
vulnerabilities as posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume
of security vulnerability mentions?
6
CVE Mentions in Reddit and Twitter (1)
7
❏ 10,257 CVE identifiers
mentioned in our Reddit/Twitter
dataset,
❏ 95% CVE identifiers are
mentioned only on Twitter.
❏ 0.5% CVE IDs are mentioned
only on Reddit.
❏ 4.5% mentioned on both
platforms
More security vulnerabilities are discussed on Twitter
RQ1: What is the relationship between mentions of security
vulnerabilities as posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume of
security vulnerability mentions?
b. To what extent are named vulnerabilities discussed on public
channels before the official disclosure day?
8
CVE Mentions in Reddit and Twitter (2)
9
Reddit Twitter
Both platforms show a peak in the mentions of CVE identifiers near their
public disclosure
❏ Day 0 represent the NVD public disclosure date
❏ Published date of the message (post/tweet) is relative to NVD public
disclosure date of mentioned CVE identifier
CVE Mentions in Reddit and Twitter (3)
10
Reddit Twitter
❏ Timing of social-media messages with respect to Reddit subreddits and
Twitter Hashtags
Out of the CVE identifiers discussed on Reddit, majority are discussed
before public disclosure
RQ1: What is the relationship between mentions of security
vulnerabilities as posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume of
security vulnerability mentions?
b. To what extent are named vulnerabilities discussed on public
channels before the official disclosure day?
c. How does the severity of the security vulnerabilities affect the
timing of vulnerability mentions on the two platforms?
11
CVE Mentions in Reddit and Twitter (4)
12
❏ Timing of social-media messages
with respect to the severity of
mentioned security vulnerabilities
❏ We identified bot-driven
communities using the textual
description of the subreddit
❏ We used BotHunter to detect
Twitter bot users
Early discussions related to high
severity CVE identifiers occur on
Reddit
RQ1: What is the relationship between mentions of security vulnerabilities as
posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume of
security vulnerability mentions?
b. To what extent are named vulnerabilities discussed on public
channels before the official disclosure day?
c. How the severity of the security vulnerabilities affects the timing of
vulnerability mentions on the two platforms.
d. How do CVE mentions spread over social-media platforms?
13
CVE Mentions in Reddit and Twitter (5)
14
❏ Three Cascade Types
❏ Before (completed): cascades start and end before the public disclosure day of the
mentioned CVE
❏ Before (not completed): cascades start before the public disclosure day, but continue
after the public disclosure day of the mentioned CVE
❏ After: cascades start and end before the public disclosure day of the mentioned CVE
Reddit discussions are viral before the CVE public disclosure,
Twitter re-shares emerge after the CVE public disclosure
RQ1: What is the relationship between mentions of security vulnerabilities as
posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume of
security vulnerability mentions?
b. To what extent are named vulnerabilities discussed on public
channels before the official disclosure day?
c. How the severity of the security vulnerabilities affects the timing of
vulnerability mentions on the two platforms.
d. How do CVE mentions spread over social-media platforms?
e. What types of sentiments fuel these discussions?
15
CVE Mentions in Reddit and Twitter (6)
16
● Uncertainty analysis of Reddit
comments
○ Used a pre-trained machine learning model
(Yu et al. [1]) to classify whether comment
is certain or not towards the subject of the
conversation
● Reaction types of Twitter replies
○ Used a pre-trained machine learning model
(Glenski et al. [2]) to classify whether the
reply is in a type of an answer, elaboration,
question, appreciation, negative reaction,
and agreement
1. Ning Yu and Graham Horwood. 2018. Veracity Enriched Event Extraction. In 2018 International Workshop on Social Sensing (SocialSens).3–3.
2. Maria Glenski, Tim Weninger, and Svitlana Volkova. 2018. Identifying and Understanding User Reactions to Deceptive and Trusted Social News
Sources. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 176–181.
More “certain” comments in Reddit,
Majority of Twitter replies are classified
as “elaboration”, then follows “answer”
before and after public disclosure
RQ1: What is the relationship between mentions of security vulnerabilities as
posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume of
security vulnerability mentions?
b. To what extent are named vulnerabilities discussed on public
channels before the official disclosure day?
c. How the severity of the security vulnerabilities affects the timing of
vulnerability mentions on the two platforms.
d. How do CVE mentions spread over social-media platforms?
e. How does GitHub activity depend on the public disclosure of
security vulnerabilities?
17
CVE Mentions in GitHub Events (1)
❏ 10,502 CVE identifiers
mentioned in GitHub Events
❏ The overlap with the CVE
identifiers mentioned in
platforms
❏ 40% with Twitter
❏ 3% with Reddit
18
Moderate overlap of CVE identifiers
subject to software development
with Twitter
CVE Mentions in GitHub Events (2)
❏ Majority of GitHub events
mentioned only one CVE identifier,
❏ One CVE identifier
(CVE-2015-1805) is mentioned
more than in 3000 GitHub events,
❏ CVE-2015-1805 is published in
NVD around August 2015
❏ We noticed an increased
volume of related GitHub
activities in early 2016
❏ What did really happen?
19
RQ1: What is the relationship between mentions of security
vulnerabilities as posted on Twitter, Reddit and GitHub?
a. How do social media platforms compare in terms of the volume of
security vulnerability mentions?
b. To what extent are named vulnerabilities discussed on public
channels before the official disclosure day?
c. How the severity of the security vulnerabilities affects the timing of
vulnerability mentions on the two platforms.
d. How do CVE mentions spread over social-media platforms?
e. How does GitHub activity depend on the public disclosure of
security vulnerabilities?
f. How does GitHub activity correlate with the number of CVEs
for the most vulnerable repositories?
20
CVE Mentions in GitHub Events (3)
21
❏ We selected two most vulnerable
repositories with respect to the
number of associated CVE identifiers
❏ We show the pattern across three
time-series, monthly number of
mentioned CVEs, Forks, Watches
and Push Events
❏ We calculate Dynamic Time
Warping (DTW) to measure the
similarity between GitHub event
and CVE time-series
Push Events are the closest to follow
the pattern of CVE mentions
RQ2: Can the software development activities in GitHub be predicted
from the discussions on Reddit and Twitter?
22
Predicting GitHub Activities
A GitHub event is defined as (U,R,Ep
,Th
),
❏ U: user
❏ R: repository
❏ Ep
: type of action (PushEvent
PullRequestEvent, IssuesEvent,
ForkEvent, WatchEvent,
CommitEvent, ReleaseEvent)
❏ Th
: the event time-stamp in hours
23
Time
Reddit
Twitter
GitHub
Training Testing
Features
Target
(Event)
Features
Target
(Event)
January 2017 to May 2017* August 2017
*June and July, 2017 as validation data
Predicting GitHub Activities: Features and Approach
❏ Reddit time-series features
❏ Daily count of posts
❏ Daily count of active authors
❏ Daily count of active subreddits
❏ Daily counts of comments
❏ Twitter time-series features
❏ Daily count of tweets
❏ Daily count of tweeting users
❏ Daily count of retweets
❏ Daily count of retweeting users
24
Reddit/Twitter
time-series
Features
NN
Number of
GitHub events
in a day
Likelihood of a user
performing an
action to a
repository in a hour
LSTM
Hourly GitHub
activities of a
user to a
repository
Predicting Longitudinal User Activity at Fine Time Granularity in Online Collaborative
Platforms, Renhao Liu, Frederick mubang, Lawrence Hall*, Sameera Horawalavithana,
Adriana iamnitchi, John Skvoretz, IEEE International Conference on Systems, Man, and
Cybernetics (SMC) , Bari, Italy, 2019
Predicting GitHub Activities: Results
25
JS-divergence: 0.0020, and R2: 0.6067,JS-divergence: 0.0029 and R2: 0.6300
Predicting GitHub Activities: Relevance
26
❏ Why is predicting GitHub activities
important?
❏ GitHub hosts many exploits and
patches related with CVE identifiers
❏ Predictions might reflect the
software development activities of
an attacker who develops an exploit
❏ Predictions can be used to estimate
the availability of a patch related to
a security vulnerability
Reddit/Twitter features are helpful for
predicting number of GitHub events.
It is more difficult to predict the
identity of a user and the repository
in an event.
Summary
27
❏ We characterized a use-case scenario where diverse online platforms are
interconnected such that the activities in one platform can be predicted based
on the activities in the others.
Practical implications of our findings:
❏ Advance or calibrate security alert tools based on information from multiple
social media platforms.
❏ Better coordinate software development activities with the lessons learned
from social-media information
Acknowledgements
❏ Funded by DARPA SocialSim Program and the Air Force Research
Laboratory
❏ Data: Leidos, Netanomics
❏ Evaluation code provided by Pacific Northwest National Laboratory
28
Mentions of Security
Vulnerabilities on Reddit,
Twitter and GitHub
Sameera Horawalavithana*
(sameera1@mail.usf.edu)
Check out our project @SocialSim
Backup
30
Thank you.
sameera1@mail.usf.edu
31
Check out our project @SocialSim
Related Work
❏ Different types of security vulnerability information available in Twitter (Syed
et. al., Sauerwein et al.)
❏ Description of Vulnerabilities (e.g., URLs to security mailing list, expert blogs etc.)
❏ Demonstration of Exploits (e.g., URLs to YouTube videos)
❏ Unofficial proposals of countermeasures (e.g., URLs to security blogs describing unofficial
patches)
❏ Announcement of patch releases (e.g., URLs to official blog posts by vendors)
❏ Automatically discovering security threats from independent platforms.
❏ E.g., Twitter, Dark Web (Sapienza et al.), security blogs (Mittal et. al, ) etc.
32

Mais conteúdo relacionado

Semelhante a Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub

Computational Verification Challenges in Social Media
Computational Verification Challenges in Social MediaComputational Verification Challenges in Social Media
Computational Verification Challenges in Social MediaSymeon Papadopoulos
 
Software Security Engineering (Learnings from the past to fix the future) - B...
Software Security Engineering (Learnings from the past to fix the future) - B...Software Security Engineering (Learnings from the past to fix the future) - B...
Software Security Engineering (Learnings from the past to fix the future) - B...DebasisMohanty43
 
Evaluating the Utilization of Twitter Messages as a Source of Security Alerts
Evaluating the Utilization of Twitter Messages as a Source of Security AlertsEvaluating the Utilization of Twitter Messages as a Source of Security Alerts
Evaluating the Utilization of Twitter Messages as a Source of Security AlertsLuiz Arthur
 
Open Source Insight: Artifex Ruling, NY Cybersecurity Regs, PATCH Act, & Wan...
Open Source Insight: Artifex Ruling, NY Cybersecurity Regs,  PATCH Act, & Wan...Open Source Insight: Artifex Ruling, NY Cybersecurity Regs,  PATCH Act, & Wan...
Open Source Insight: Artifex Ruling, NY Cybersecurity Regs, PATCH Act, & Wan...Black Duck by Synopsys
 
Built-in Security Mindfulness for Software Developers
Built-in Security Mindfulness for Software DevelopersBuilt-in Security Mindfulness for Software Developers
Built-in Security Mindfulness for Software DevelopersPhú Phùng
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?Yiannis Kompatsiaris
 
Briskinfosec - Threatsploit Report Augest 2021- Cyber security updates
Briskinfosec - Threatsploit Report Augest 2021- Cyber security updatesBriskinfosec - Threatsploit Report Augest 2021- Cyber security updates
Briskinfosec - Threatsploit Report Augest 2021- Cyber security updatesBriskinfosec Technology and Consulting
 
Understanding the mirai botnet
Understanding the mirai botnetUnderstanding the mirai botnet
Understanding the mirai botnetFelipe Prado
 
Hacks R Us used 4 zero-days to infect Windows and Android devices
Hacks R Us used 4 zero-days to infect Windows and Android devicesHacks R Us used 4 zero-days to infect Windows and Android devices
Hacks R Us used 4 zero-days to infect Windows and Android devicesMarkSigmonHeHimHis
 
I’m going to go... stalk... Lenny and Carl...
I’m going to go... stalk... Lenny and Carl...I’m going to go... stalk... Lenny and Carl...
I’m going to go... stalk... Lenny and Carl...volvent
 
Open Source Insight: SCA for DevOps, DHS Security, Securing Open Source for G...
Open Source Insight: SCA for DevOps, DHS Security, Securing Open Source for G...Open Source Insight: SCA for DevOps, DHS Security, Securing Open Source for G...
Open Source Insight: SCA for DevOps, DHS Security, Securing Open Source for G...Black Duck by Synopsys
 
WHITE PAPER▶ Symantec Security Response Presents:The Waterbug Attack Group
WHITE PAPER▶ Symantec Security Response Presents:The Waterbug Attack GroupWHITE PAPER▶ Symantec Security Response Presents:The Waterbug Attack Group
WHITE PAPER▶ Symantec Security Response Presents:The Waterbug Attack GroupSymantec
 
Eset trends report_2018
Eset trends report_2018Eset trends report_2018
Eset trends report_2018malvvv
 
Cybersecurity Trends 2018: The costs of connection
Cybersecurity Trends 2018: The costs of connectionCybersecurity Trends 2018: The costs of connection
Cybersecurity Trends 2018: The costs of connectionESET Middle East
 
Briskinfosec - Threatsploit Report Augest 2021- Cyber security updates
Briskinfosec - Threatsploit Report Augest 2021- Cyber security updatesBriskinfosec - Threatsploit Report Augest 2021- Cyber security updates
Briskinfosec - Threatsploit Report Augest 2021- Cyber security updatesBriskinfosec Technology and Consulting
 
A research of software vulnerabilities
A research of software vulnerabilitiesA research of software vulnerabilities
A research of software vulnerabilitiesAlireza Aghamohammadi
 

Semelhante a Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub (20)

Computational Verification Challenges in Social Media
Computational Verification Challenges in Social MediaComputational Verification Challenges in Social Media
Computational Verification Challenges in Social Media
 
Software Security Engineering (Learnings from the past to fix the future) - B...
Software Security Engineering (Learnings from the past to fix the future) - B...Software Security Engineering (Learnings from the past to fix the future) - B...
Software Security Engineering (Learnings from the past to fix the future) - B...
 
How to assign a CVE to yourself?
How to assign a CVE to yourself?How to assign a CVE to yourself?
How to assign a CVE to yourself?
 
Evaluating the Utilization of Twitter Messages as a Source of Security Alerts
Evaluating the Utilization of Twitter Messages as a Source of Security AlertsEvaluating the Utilization of Twitter Messages as a Source of Security Alerts
Evaluating the Utilization of Twitter Messages as a Source of Security Alerts
 
Ids 004 cve
Ids 004 cveIds 004 cve
Ids 004 cve
 
Open Source Insight: Artifex Ruling, NY Cybersecurity Regs, PATCH Act, & Wan...
Open Source Insight: Artifex Ruling, NY Cybersecurity Regs,  PATCH Act, & Wan...Open Source Insight: Artifex Ruling, NY Cybersecurity Regs,  PATCH Act, & Wan...
Open Source Insight: Artifex Ruling, NY Cybersecurity Regs, PATCH Act, & Wan...
 
Built-in Security Mindfulness for Software Developers
Built-in Security Mindfulness for Software DevelopersBuilt-in Security Mindfulness for Software Developers
Built-in Security Mindfulness for Software Developers
 
From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
 
Briskinfosec - Threatsploit Report Augest 2021- Cyber security updates
Briskinfosec - Threatsploit Report Augest 2021- Cyber security updatesBriskinfosec - Threatsploit Report Augest 2021- Cyber security updates
Briskinfosec - Threatsploit Report Augest 2021- Cyber security updates
 
Understanding the mirai botnet
Understanding the mirai botnetUnderstanding the mirai botnet
Understanding the mirai botnet
 
Hacks R Us used 4 zero-days to infect Windows and Android devices
Hacks R Us used 4 zero-days to infect Windows and Android devicesHacks R Us used 4 zero-days to infect Windows and Android devices
Hacks R Us used 4 zero-days to infect Windows and Android devices
 
I’m going to go... stalk... Lenny and Carl...
I’m going to go... stalk... Lenny and Carl...I’m going to go... stalk... Lenny and Carl...
I’m going to go... stalk... Lenny and Carl...
 
Open Source Insight: SCA for DevOps, DHS Security, Securing Open Source for G...
Open Source Insight: SCA for DevOps, DHS Security, Securing Open Source for G...Open Source Insight: SCA for DevOps, DHS Security, Securing Open Source for G...
Open Source Insight: SCA for DevOps, DHS Security, Securing Open Source for G...
 
WHITE PAPER▶ Symantec Security Response Presents:The Waterbug Attack Group
WHITE PAPER▶ Symantec Security Response Presents:The Waterbug Attack GroupWHITE PAPER▶ Symantec Security Response Presents:The Waterbug Attack Group
WHITE PAPER▶ Symantec Security Response Presents:The Waterbug Attack Group
 
Eset trends report_2018
Eset trends report_2018Eset trends report_2018
Eset trends report_2018
 
Cybersecurity Trends 2018: The costs of connection
Cybersecurity Trends 2018: The costs of connectionCybersecurity Trends 2018: The costs of connection
Cybersecurity Trends 2018: The costs of connection
 
Life of a CVE
Life of a CVELife of a CVE
Life of a CVE
 
BNYMellon - CVE 101.pdf
BNYMellon - CVE 101.pdfBNYMellon - CVE 101.pdf
BNYMellon - CVE 101.pdf
 
Briskinfosec - Threatsploit Report Augest 2021- Cyber security updates
Briskinfosec - Threatsploit Report Augest 2021- Cyber security updatesBriskinfosec - Threatsploit Report Augest 2021- Cyber security updates
Briskinfosec - Threatsploit Report Augest 2021- Cyber security updates
 
A research of software vulnerabilities
A research of software vulnerabilitiesA research of software vulnerabilities
A research of software vulnerabilities
 

Mais de Sameera Horawalavithana

Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
Drivers of Polarized Discussions on Twitter during Venezuela Political CrisisSameera Horawalavithana
 
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Sameera Horawalavithana
 
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...Sameera Horawalavithana
 
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...Sameera Horawalavithana
 
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation Sameera Horawalavithana
 
Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Sameera Horawalavithana
 
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...Sameera Horawalavithana
 
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...Sameera Horawalavithana
 
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingTalk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingSameera Horawalavithana
 

Mais de Sameera Horawalavithana (15)

Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
Drivers of Polarized Discussions on Twitter during Venezuela Political Crisis
 
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
Behind the Mask: Understanding the Structural Forces That Make Social Graphs ...
 
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
[MLNS | NetSci] A Generative/ Discriminative Approach to De-construct Cascadi...
 
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
[Compex Network 18] Diversity, Homophily, and the Risk of Node Re-identificat...
 
Duplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy DatasetDuplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy Dataset
 
Dancing with Stream Processing
Dancing with Stream ProcessingDancing with Stream Processing
Dancing with Stream Processing
 
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation [ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
[ARM 15 | ACM/IFIP/USENIX Middleware 2015] Research Paper Presentation
 
Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015Be Elastic: Leapset Innovation session 06-08-2015
Be Elastic: Leapset Innovation session 06-08-2015
 
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
[Undergraduate Thesis] Final Defense presentation on Cloud Publish/Subscribe ...
 
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
[Undergraduate Thesis] Interim presentation on A Publish/Subscribe Model for ...
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
Zipf distribution
Zipf distributionZipf distribution
Zipf distribution
 
Query personalization
Query personalizationQuery personalization
Query personalization
 
Dancing with publish/subscribe
Dancing with publish/subscribeDancing with publish/subscribe
Dancing with publish/subscribe
 
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand StreamingTalk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
Talk on Spotify: Large Scale, Low Latency, P2P Music-on-Demand Streaming
 

Último

English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformationAnnie Melnic
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfNicoChristianSunaryo
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are successPratikSingh115843
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfnikeshsingh56
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etclalithasri22
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfPratikPatil591646
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 

Último (17)

Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Role of Consumer Insights in business transformation
Role of Consumer Insights in business transformationRole of Consumer Insights in business transformation
Role of Consumer Insights in business transformation
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Digital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdfDigital Indonesia Report 2024 by We Are Social .pdf
Digital Indonesia Report 2024 by We Are Social .pdf
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
Presentation of project of business person who are success
Presentation of project of business person who are successPresentation of project of business person who are success
Presentation of project of business person who are success
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Statistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdfStatistics For Management by Richard I. Levin 8ed.pdf
Statistics For Management by Richard I. Levin 8ed.pdf
 
DATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etcDATA ANALYSIS using various data sets like shoping data set etc
DATA ANALYSIS using various data sets like shoping data set etc
 
Non Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdfNon Text Magic Studio Magic Design for Presentations L&P.pdf
Non Text Magic Studio Magic Design for Presentations L&P.pdf
 
2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use2023 Survey Shows Dip in High School E-Cigarette Use
2023 Survey Shows Dip in High School E-Cigarette Use
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 

Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub

  • 1. Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub Sameera Horawalavithana*, Abhishek Bhattacharjee, Renhao Liu, Nazim Choudhury, Lawrence O. Hall, Adriana Iamnitchi University of South Florida IEEE/WIC/ACM International Conference on Web Intelligence, Thessaloniki, Greece
  • 2. Security Vulnerabilities ❏ Identified by CVE (Common Vulnerabilities and Exposures) identifiers: ❏ Publicly known security vulnerability is uniquely identified by a pattern CVE-YYYY-NNNN ❏ Formally recorded in National Vulnerability Database (NVD) ❏ “U.S. government repository of standards based vulnerability management data represented using the Security Content Automation Protocol (SCAP)” ❏ Discussed on social media 2CVEs published in NVD over time.
  • 3. Research Questions 1) What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? 2) Can the software development activities in GitHub be predicted from the discussions on Reddit and Twitter? 3
  • 4. Outline ❏ Dataset ❏ Data analysis ❏ CVE mentions in Reddit and Twitter ❏ CVE mentions in GitHub actions ❏ Predicting GitHub activities by using Reddit and Twitter activity signals ❏ Summary 4
  • 5. Datasets ❏ Two social-media platforms: Reddit and Twitter ❏ One software collaborative platform: GitHub ❏ 18 months of records: 03/16-08/17 ❏ Data filtering using the regular expression CVE-d{4}-d{4} to match CVE identifiers that appeared in posts, comments in Reddit, tweets, replies in Twitter, and GitHub event descriptions 5
  • 6. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? 6
  • 7. CVE Mentions in Reddit and Twitter (1) 7 ❏ 10,257 CVE identifiers mentioned in our Reddit/Twitter dataset, ❏ 95% CVE identifiers are mentioned only on Twitter. ❏ 0.5% CVE IDs are mentioned only on Reddit. ❏ 4.5% mentioned on both platforms More security vulnerabilities are discussed on Twitter
  • 8. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? b. To what extent are named vulnerabilities discussed on public channels before the official disclosure day? 8
  • 9. CVE Mentions in Reddit and Twitter (2) 9 Reddit Twitter Both platforms show a peak in the mentions of CVE identifiers near their public disclosure ❏ Day 0 represent the NVD public disclosure date ❏ Published date of the message (post/tweet) is relative to NVD public disclosure date of mentioned CVE identifier
  • 10. CVE Mentions in Reddit and Twitter (3) 10 Reddit Twitter ❏ Timing of social-media messages with respect to Reddit subreddits and Twitter Hashtags Out of the CVE identifiers discussed on Reddit, majority are discussed before public disclosure
  • 11. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? b. To what extent are named vulnerabilities discussed on public channels before the official disclosure day? c. How does the severity of the security vulnerabilities affect the timing of vulnerability mentions on the two platforms? 11
  • 12. CVE Mentions in Reddit and Twitter (4) 12 ❏ Timing of social-media messages with respect to the severity of mentioned security vulnerabilities ❏ We identified bot-driven communities using the textual description of the subreddit ❏ We used BotHunter to detect Twitter bot users Early discussions related to high severity CVE identifiers occur on Reddit
  • 13. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? b. To what extent are named vulnerabilities discussed on public channels before the official disclosure day? c. How the severity of the security vulnerabilities affects the timing of vulnerability mentions on the two platforms. d. How do CVE mentions spread over social-media platforms? 13
  • 14. CVE Mentions in Reddit and Twitter (5) 14 ❏ Three Cascade Types ❏ Before (completed): cascades start and end before the public disclosure day of the mentioned CVE ❏ Before (not completed): cascades start before the public disclosure day, but continue after the public disclosure day of the mentioned CVE ❏ After: cascades start and end before the public disclosure day of the mentioned CVE Reddit discussions are viral before the CVE public disclosure, Twitter re-shares emerge after the CVE public disclosure
  • 15. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? b. To what extent are named vulnerabilities discussed on public channels before the official disclosure day? c. How the severity of the security vulnerabilities affects the timing of vulnerability mentions on the two platforms. d. How do CVE mentions spread over social-media platforms? e. What types of sentiments fuel these discussions? 15
  • 16. CVE Mentions in Reddit and Twitter (6) 16 ● Uncertainty analysis of Reddit comments ○ Used a pre-trained machine learning model (Yu et al. [1]) to classify whether comment is certain or not towards the subject of the conversation ● Reaction types of Twitter replies ○ Used a pre-trained machine learning model (Glenski et al. [2]) to classify whether the reply is in a type of an answer, elaboration, question, appreciation, negative reaction, and agreement 1. Ning Yu and Graham Horwood. 2018. Veracity Enriched Event Extraction. In 2018 International Workshop on Social Sensing (SocialSens).3–3. 2. Maria Glenski, Tim Weninger, and Svitlana Volkova. 2018. Identifying and Understanding User Reactions to Deceptive and Trusted Social News Sources. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 176–181. More “certain” comments in Reddit, Majority of Twitter replies are classified as “elaboration”, then follows “answer” before and after public disclosure
  • 17. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? b. To what extent are named vulnerabilities discussed on public channels before the official disclosure day? c. How the severity of the security vulnerabilities affects the timing of vulnerability mentions on the two platforms. d. How do CVE mentions spread over social-media platforms? e. How does GitHub activity depend on the public disclosure of security vulnerabilities? 17
  • 18. CVE Mentions in GitHub Events (1) ❏ 10,502 CVE identifiers mentioned in GitHub Events ❏ The overlap with the CVE identifiers mentioned in platforms ❏ 40% with Twitter ❏ 3% with Reddit 18 Moderate overlap of CVE identifiers subject to software development with Twitter
  • 19. CVE Mentions in GitHub Events (2) ❏ Majority of GitHub events mentioned only one CVE identifier, ❏ One CVE identifier (CVE-2015-1805) is mentioned more than in 3000 GitHub events, ❏ CVE-2015-1805 is published in NVD around August 2015 ❏ We noticed an increased volume of related GitHub activities in early 2016 ❏ What did really happen? 19
  • 20. RQ1: What is the relationship between mentions of security vulnerabilities as posted on Twitter, Reddit and GitHub? a. How do social media platforms compare in terms of the volume of security vulnerability mentions? b. To what extent are named vulnerabilities discussed on public channels before the official disclosure day? c. How the severity of the security vulnerabilities affects the timing of vulnerability mentions on the two platforms. d. How do CVE mentions spread over social-media platforms? e. How does GitHub activity depend on the public disclosure of security vulnerabilities? f. How does GitHub activity correlate with the number of CVEs for the most vulnerable repositories? 20
  • 21. CVE Mentions in GitHub Events (3) 21 ❏ We selected two most vulnerable repositories with respect to the number of associated CVE identifiers ❏ We show the pattern across three time-series, monthly number of mentioned CVEs, Forks, Watches and Push Events ❏ We calculate Dynamic Time Warping (DTW) to measure the similarity between GitHub event and CVE time-series Push Events are the closest to follow the pattern of CVE mentions
  • 22. RQ2: Can the software development activities in GitHub be predicted from the discussions on Reddit and Twitter? 22
  • 23. Predicting GitHub Activities A GitHub event is defined as (U,R,Ep ,Th ), ❏ U: user ❏ R: repository ❏ Ep : type of action (PushEvent PullRequestEvent, IssuesEvent, ForkEvent, WatchEvent, CommitEvent, ReleaseEvent) ❏ Th : the event time-stamp in hours 23 Time Reddit Twitter GitHub Training Testing Features Target (Event) Features Target (Event) January 2017 to May 2017* August 2017 *June and July, 2017 as validation data
  • 24. Predicting GitHub Activities: Features and Approach ❏ Reddit time-series features ❏ Daily count of posts ❏ Daily count of active authors ❏ Daily count of active subreddits ❏ Daily counts of comments ❏ Twitter time-series features ❏ Daily count of tweets ❏ Daily count of tweeting users ❏ Daily count of retweets ❏ Daily count of retweeting users 24 Reddit/Twitter time-series Features NN Number of GitHub events in a day Likelihood of a user performing an action to a repository in a hour LSTM Hourly GitHub activities of a user to a repository Predicting Longitudinal User Activity at Fine Time Granularity in Online Collaborative Platforms, Renhao Liu, Frederick mubang, Lawrence Hall*, Sameera Horawalavithana, Adriana iamnitchi, John Skvoretz, IEEE International Conference on Systems, Man, and Cybernetics (SMC) , Bari, Italy, 2019
  • 25. Predicting GitHub Activities: Results 25 JS-divergence: 0.0020, and R2: 0.6067,JS-divergence: 0.0029 and R2: 0.6300
  • 26. Predicting GitHub Activities: Relevance 26 ❏ Why is predicting GitHub activities important? ❏ GitHub hosts many exploits and patches related with CVE identifiers ❏ Predictions might reflect the software development activities of an attacker who develops an exploit ❏ Predictions can be used to estimate the availability of a patch related to a security vulnerability Reddit/Twitter features are helpful for predicting number of GitHub events. It is more difficult to predict the identity of a user and the repository in an event.
  • 27. Summary 27 ❏ We characterized a use-case scenario where diverse online platforms are interconnected such that the activities in one platform can be predicted based on the activities in the others. Practical implications of our findings: ❏ Advance or calibrate security alert tools based on information from multiple social media platforms. ❏ Better coordinate software development activities with the lessons learned from social-media information
  • 28. Acknowledgements ❏ Funded by DARPA SocialSim Program and the Air Force Research Laboratory ❏ Data: Leidos, Netanomics ❏ Evaluation code provided by Pacific Northwest National Laboratory 28
  • 29. Mentions of Security Vulnerabilities on Reddit, Twitter and GitHub Sameera Horawalavithana* (sameera1@mail.usf.edu) Check out our project @SocialSim
  • 32. Related Work ❏ Different types of security vulnerability information available in Twitter (Syed et. al., Sauerwein et al.) ❏ Description of Vulnerabilities (e.g., URLs to security mailing list, expert blogs etc.) ❏ Demonstration of Exploits (e.g., URLs to YouTube videos) ❏ Unofficial proposals of countermeasures (e.g., URLs to security blogs describing unofficial patches) ❏ Announcement of patch releases (e.g., URLs to official blog posts by vendors) ❏ Automatically discovering security threats from independent platforms. ❏ E.g., Twitter, Dark Web (Sapienza et al.), security blogs (Mittal et. al, ) etc. 32