SlideShare uma empresa Scribd logo
1 de 42
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
From Data to Decisions Makers
A Behind the Scenes Look at Building The
Most Respected Report In Cybersecurity
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
ABOUT ME
(Briefly)
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
• DBIR team manager/author (more on this in a bit)
• Former cyber risk director for a Fortune 100
insurance company
• Serial #rstats Tweeter (@hrbrmstr), blogger
(rud.is/b & @ddsecblog) & regular helper on
StackOverflow
• Author of and contributor to 14 CRAN packages
• Co-author of Data-Driven Security (@ddsecbook)
• Co-host of the Data-Driven Security Podcast
(@ddsecpodcast)
• Die-hard ggplot2 advocate, widgeteer, heavily
addicted cartographer & shameless user of the
forward assignment operator ←4EVA→
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
WHAT IS THE DBIR?
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
The Verizon Data Breach
Investigations Report (DBIR)
“The Verizon Data Breach Investigations Report
(DBIR) is an annual publication that provides
analysis of information security incidents, with a
specific focus on data breaches.”
http://searchsecurity.techtarget.com/definition/Verizon-Data-Breach-Investigations-Report-DBIR
verizonenterprise.com/DBIR
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
WHO IS THE DBIR?
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
Wade Baker Dave Hylender Marc Spitler Jay Jacobs
Kevin Thompson Suzanne Widup Bhaskar Karambelkar Gabriel Bassett
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
The DBIR
• Started in 2008
• Cited by virtually every other cybersecurity report
by the 3❡
• Read by individual contributors up through senior
leadership at virtually every global enterprise
• A lot of fun to work on
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
#RSAC
#DBIR
2008 2009 2010 2011 2012 2013 2014 2015
1 1 2 3 6
18
50
70
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
WHAT DOES THIS HAVE TO
DO WITH ?
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
200,000
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
Vocabulary for
Event
Recording and
Incident
Sharing
veriscommunity.net
vcdb.org
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
verisr
github.com/vz-risk/verisr
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
library(verisr)
vcdb <- json2veris(jsondir)
summary(vcdb) # too big to show
getenum(vcdb, "actor")
## enum x
## 1 external 955
## 2 internal 535
## 3 partner 100
## 4 unknown 85
getenum(vcdb, "actor", add.n=TRUE, add.freq=TRUE)
## enum x n freq
## 1 external 955 1643 0.581
## 2 internal 535 1643 0.326
## 3 partner 100 1643 0.061
## 4 unknown 85 1643 0.052
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
vz-risk.github.io/dbir/2015/19/
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
• 200m successful vulnerability exploits across 20,000 enterprises
• 170m malware events across over 10,000 enterprises
• 6 months of malware traffic data from 30+m mobile devices
• Live botnet traffic from compromised organizations
• Millions of Indicators of Compromise
• Details of all Denial of Service activity for 2014
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
• 200m successful vulnerability exploits across 20,000 enterprises
• 170m malware events across over 10,000 enterprises
• 6 months of malware traffic data from 30+m mobile devices
• Live botnet traffic from compromised organizations
• Millions of Indicators of Compromise
• Details of all Denial of Service activity for 2014
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
PUTTING IT ALL TOGETHER
Getting the data
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
PUTTING IT ALL TOGETHER
Creating, organizing and sharing analyses
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
.R .Rmd .json .Rdata
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
1. Assign areas to each researcher
2. For “standard VERIS” analyses, generate reports from core Rmd
3. Have “Findings Review” collaborative meetings where we peer-review the work
4. (Repeat step 3 after refinement of findings)
5. Decide on final sections for the report and assign authors
6. Add rough draft visualizations to the findings
7. Lock in content
8. Refine visualizations
9. Finalize text content
10. Work with Marketing & Graphics
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
FIGURATIVELY SPEAKING
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
• Create one “Master Rmd” for all
visualization figures using canned data from
outputs of analyses, having one master
(giant) HTML document version and multiple
individual PDF versions to give to the
creative staff to work with
Why PDF? Complex ggplot2 SVGs crash
Illustrator and the fonts are horrible (they
get converted to polygons).
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
• When you decide you want to use a figure
from the analysis spend the time to make it
look as amazing (and final) as possible to
save $$, save time down the road and to
avoid seeing your creations on @wtfviz
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
LESSONS LEA NED
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
R Markdown (Rmd) makes it super
amazingly awesomely easy to
document, iterate, modify & share
analyses.
spinning is cool too.
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
ggplot2 makes is super amazingly
awesomely straightforward to make
“camera ready” visualizations
(PDF vs SVG)
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
Do not upgrade your analysis stack or
experiment with RStudio during the
core analysis phase
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
Packages (even for analyses) > loosely
connected documents and scripts
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
Source code control & data versioning
control is extremely important
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
A fellow researcher must be able to
reproduce your analyses with the same
data & Rmd and understand your
reasoning in the annotation
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
Freezing or at least recording versions
of packages you use may be vitally
important to your ability to reproduce
at a later date (store them in version
control with analyses or perhaps
embed in a container like Docker)
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
ABOUT THE COVER
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
• @vzdbir
• dbir@verizon.com
• verizonenterprise.com/dbir
• veriscommunity.net
• vcdb.org
• github.com/vz-risk
• @wadebaker
• @davehylender
• @marc_spitler
• @bfist
• @jayjacobs
• @SuzanneWidup
• @bhaskar_vk
• @gdbassett
• @hrbrmstr

Mais conteúdo relacionado

Semelhante a Effective Applications of the R Language

The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationInside Analysis
 
Coding Dojo - Surrey Rubyists #2 - 26 April 2011
Coding Dojo - Surrey Rubyists #2 - 26 April 2011Coding Dojo - Surrey Rubyists #2 - 26 April 2011
Coding Dojo - Surrey Rubyists #2 - 26 April 2011Gavin Heavyside
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Andrew Brust
 
Robert Professional Profile 2016a
Robert  Professional Profile 2016aRobert  Professional Profile 2016a
Robert Professional Profile 2016aRobert Pineset
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceSATOSHI TAGOMORI
 
Big Data and NoSQL in Microsoft-Land
Big Data and NoSQL in Microsoft-LandBig Data and NoSQL in Microsoft-Land
Big Data and NoSQL in Microsoft-LandAndrew Brust
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big DataAndrew Brust
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelUwe Printz
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Andrew Brust
 
Government and Education Webinar: Full Stack Observability
Government and Education Webinar: Full Stack ObservabilityGovernment and Education Webinar: Full Stack Observability
Government and Education Webinar: Full Stack ObservabilitySolarWinds
 
Business Centric Data Modeling
Business Centric Data ModelingBusiness Centric Data Modeling
Business Centric Data ModelingDATAVERSITY
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolEDB
 
apidays Australia 2022 - Debunking the Big Aussie F Word – Fintech BFF Buildi...
apidays Australia 2022 - Debunking the Big Aussie F Word – Fintech BFF Buildi...apidays Australia 2022 - Debunking the Big Aussie F Word – Fintech BFF Buildi...
apidays Australia 2022 - Debunking the Big Aussie F Word – Fintech BFF Buildi...apidays
 
Add Redis to Postgres to Make Your Microservices Go Boom!
Add Redis to Postgres to Make Your Microservices Go Boom!Add Redis to Postgres to Make Your Microservices Go Boom!
Add Redis to Postgres to Make Your Microservices Go Boom!Dave Nielsen
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database RoundtableEric Kavanagh
 
Data In Action: Business Value of Data
Data In Action: Business Value of DataData In Action: Business Value of Data
Data In Action: Business Value of DataMatt Turner
 
Big Data Ecosystem- Impetus Technologies
Big Data Ecosystem-  Impetus TechnologiesBig Data Ecosystem-  Impetus Technologies
Big Data Ecosystem- Impetus TechnologiesImpetus Technologies
 
Framing the Argument: How to Scale Faster with NoSQL
Framing the Argument: How to Scale Faster with NoSQLFraming the Argument: How to Scale Faster with NoSQL
Framing the Argument: How to Scale Faster with NoSQLInside Analysis
 

Semelhante a Effective Applications of the R Language (20)

The New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data ExplorationThe New Frontier: Optimizing Big Data Exploration
The New Frontier: Optimizing Big Data Exploration
 
Coding Dojo - Surrey Rubyists #2 - 26 April 2011
Coding Dojo - Surrey Rubyists #2 - 26 April 2011Coding Dojo - Surrey Rubyists #2 - 26 April 2011
Coding Dojo - Surrey Rubyists #2 - 26 April 2011
 
Big Data Strategy for the Relational World
Big Data Strategy for the Relational World Big Data Strategy for the Relational World
Big Data Strategy for the Relational World
 
Robert Professional Profile 2016a
Robert  Professional Profile 2016aRobert  Professional Profile 2016a
Robert Professional Profile 2016a
 
Open Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud Service
 
Big Data and NoSQL in Microsoft-Land
Big Data and NoSQL in Microsoft-LandBig Data and NoSQL in Microsoft-Land
Big Data and NoSQL in Microsoft-Land
 
Microsoft's Big Play for Big Data
Microsoft's Big Play for Big DataMicrosoft's Big Play for Big Data
Microsoft's Big Play for Big Data
 
Hadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data ModelHadoop meets Agile! - An Agile Big Data Model
Hadoop meets Agile! - An Agile Big Data Model
 
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
Microsoft's Big Play for Big Data- Visual Studio Live! NY 2012
 
Government and Education Webinar: Full Stack Observability
Government and Education Webinar: Full Stack ObservabilityGovernment and Education Webinar: Full Stack Observability
Government and Education Webinar: Full Stack Observability
 
Business Centric Data Modeling
Business Centric Data ModelingBusiness Centric Data Modeling
Business Centric Data Modeling
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic Tool
 
apidays Australia 2022 - Debunking the Big Aussie F Word – Fintech BFF Buildi...
apidays Australia 2022 - Debunking the Big Aussie F Word – Fintech BFF Buildi...apidays Australia 2022 - Debunking the Big Aussie F Word – Fintech BFF Buildi...
apidays Australia 2022 - Debunking the Big Aussie F Word – Fintech BFF Buildi...
 
Add Redis to Postgres to Make Your Microservices Go Boom!
Add Redis to Postgres to Make Your Microservices Go Boom!Add Redis to Postgres to Make Your Microservices Go Boom!
Add Redis to Postgres to Make Your Microservices Go Boom!
 
Horses for Courses: Database Roundtable
Horses for Courses: Database RoundtableHorses for Courses: Database Roundtable
Horses for Courses: Database Roundtable
 
Data In Action: Business Value of Data
Data In Action: Business Value of DataData In Action: Business Value of Data
Data In Action: Business Value of Data
 
Big Data Ecosystem- Impetus Technologies
Big Data Ecosystem-  Impetus TechnologiesBig Data Ecosystem-  Impetus Technologies
Big Data Ecosystem- Impetus Technologies
 
Framing the Argument: How to Scale Faster with NoSQL
Framing the Argument: How to Scale Faster with NoSQLFraming the Argument: How to Scale Faster with NoSQL
Framing the Argument: How to Scale Faster with NoSQL
 
NoSQL
NoSQLNoSQL
NoSQL
 
Big data&hadoop
Big data&hadoopBig data&hadoop
Big data&hadoop
 

Último

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 

Último (20)

VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 

Effective Applications of the R Language

  • 1. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net From Data to Decisions Makers A Behind the Scenes Look at Building The Most Respected Report In Cybersecurity
  • 2. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net ABOUT ME (Briefly)
  • 3. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net • DBIR team manager/author (more on this in a bit) • Former cyber risk director for a Fortune 100 insurance company • Serial #rstats Tweeter (@hrbrmstr), blogger (rud.is/b & @ddsecblog) & regular helper on StackOverflow • Author of and contributor to 14 CRAN packages • Co-author of Data-Driven Security (@ddsecbook) • Co-host of the Data-Driven Security Podcast (@ddsecpodcast) • Die-hard ggplot2 advocate, widgeteer, heavily addicted cartographer & shameless user of the forward assignment operator ←4EVA→
  • 4. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net WHAT IS THE DBIR?
  • 5. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net The Verizon Data Breach Investigations Report (DBIR) “The Verizon Data Breach Investigations Report (DBIR) is an annual publication that provides analysis of information security incidents, with a specific focus on data breaches.” http://searchsecurity.techtarget.com/definition/Verizon-Data-Breach-Investigations-Report-DBIR verizonenterprise.com/DBIR
  • 6. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net WHO IS THE DBIR?
  • 7. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net Wade Baker Dave Hylender Marc Spitler Jay Jacobs Kevin Thompson Suzanne Widup Bhaskar Karambelkar Gabriel Bassett
  • 8. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net The DBIR • Started in 2008 • Cited by virtually every other cybersecurity report by the 3❡ • Read by individual contributors up through senior leadership at virtually every global enterprise • A lot of fun to work on
  • 9. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net #RSAC #DBIR 2008 2009 2010 2011 2012 2013 2014 2015 1 1 2 3 6 18 50 70
  • 10. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net WHAT DOES THIS HAVE TO DO WITH ?
  • 11. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net 200,000
  • 12. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net Vocabulary for Event Recording and Incident Sharing veriscommunity.net vcdb.org
  • 13. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net
  • 14. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net verisr github.com/vz-risk/verisr
  • 15. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net library(verisr) vcdb <- json2veris(jsondir) summary(vcdb) # too big to show getenum(vcdb, "actor") ## enum x ## 1 external 955 ## 2 internal 535 ## 3 partner 100 ## 4 unknown 85 getenum(vcdb, "actor", add.n=TRUE, add.freq=TRUE) ## enum x n freq ## 1 external 955 1643 0.581 ## 2 internal 535 1643 0.326 ## 3 partner 100 1643 0.061 ## 4 unknown 85 1643 0.052
  • 16. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net
  • 17. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net vz-risk.github.io/dbir/2015/19/
  • 18. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net
  • 19. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net
  • 20. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net • 200m successful vulnerability exploits across 20,000 enterprises • 170m malware events across over 10,000 enterprises • 6 months of malware traffic data from 30+m mobile devices • Live botnet traffic from compromised organizations • Millions of Indicators of Compromise • Details of all Denial of Service activity for 2014
  • 21. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net • 200m successful vulnerability exploits across 20,000 enterprises • 170m malware events across over 10,000 enterprises • 6 months of malware traffic data from 30+m mobile devices • Live botnet traffic from compromised organizations • Millions of Indicators of Compromise • Details of all Denial of Service activity for 2014
  • 22. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net PUTTING IT ALL TOGETHER Getting the data
  • 23. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net
  • 24. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net PUTTING IT ALL TOGETHER Creating, organizing and sharing analyses
  • 25. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net .R .Rmd .json .Rdata
  • 26. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net 1. Assign areas to each researcher 2. For “standard VERIS” analyses, generate reports from core Rmd 3. Have “Findings Review” collaborative meetings where we peer-review the work 4. (Repeat step 3 after refinement of findings) 5. Decide on final sections for the report and assign authors 6. Add rough draft visualizations to the findings 7. Lock in content 8. Refine visualizations 9. Finalize text content 10. Work with Marketing & Graphics
  • 27. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net FIGURATIVELY SPEAKING
  • 28. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net • Create one “Master Rmd” for all visualization figures using canned data from outputs of analyses, having one master (giant) HTML document version and multiple individual PDF versions to give to the creative staff to work with Why PDF? Complex ggplot2 SVGs crash Illustrator and the fonts are horrible (they get converted to polygons).
  • 29. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net • When you decide you want to use a figure from the analysis spend the time to make it look as amazing (and final) as possible to save $$, save time down the road and to avoid seeing your creations on @wtfviz
  • 30. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net LESSONS LEA NED
  • 31. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net R Markdown (Rmd) makes it super amazingly awesomely easy to document, iterate, modify & share analyses. spinning is cool too.
  • 32. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net ggplot2 makes is super amazingly awesomely straightforward to make “camera ready” visualizations (PDF vs SVG)
  • 33. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net Do not upgrade your analysis stack or experiment with RStudio during the core analysis phase
  • 34. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net Packages (even for analyses) > loosely connected documents and scripts
  • 35. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net Source code control & data versioning control is extremely important
  • 36. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net A fellow researcher must be able to reproduce your analyses with the same data & Rmd and understand your reasoning in the annotation
  • 37. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net Freezing or at least recording versions of packages you use may be vitally important to your ability to reproduce at a later date (store them in version control with analyses or perhaps embed in a container like Docker)
  • 38. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net ABOUT THE COVER
  • 39. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net
  • 40. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net
  • 41. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net
  • 42. Bob Rudis • Managing Principal & Senior Data Scientist bob@rudis.net • @vzdbir • dbir@verizon.com • verizonenterprise.com/dbir • veriscommunity.net • vcdb.org • github.com/vz-risk • @wadebaker • @davehylender • @marc_spitler • @bfist • @jayjacobs • @SuzanneWidup • @bhaskar_vk • @gdbassett • @hrbrmstr

Notas do Editor

  1. What is the DBIR? It may cause you some concern that before the DBIR (and if I’m being honest with everyone) even since it most decisions in cybersecurity are not made through what most of you here would call “data science”. Most things in cyber are based on expert opinions often unencumbered by facts.
  2. But I can’t talk about how we use R for the DBIR without first introducing you to the “we”.
  3. I may be the only one standing up here talking about the DBIR but it’s a team effort. The spiffy looking gentleman in the upper left is Wade Baker. He started this whole thing and we affectionately refer to him as the godfather of the DBIR. We do not perform author attribution in the report proper since the process of publishing this report involves so many individuals that we would need scrolling movie credits then inevitably leave off someone. However, if we exclude internal marketing and production management, these folks are the ones who put their hearts and souls into the analyses, visualizations and main report production.
  4. The DBIR’s history goes back to 2008 when the first one was published. Back then, it was comprised of transcribed incidents from the Verizon Incident Response team. The data was crunched in excel and I’m embarrassed to admit just how riddled with pie charts that inaugural issue was. However, it was the first report in cybersecurity that provided real data about the actors who commit cybercrimes, the actions they take when committing cybercrime, the assets those actions were taken against and the attributes of the impacted data elements. It’s valued since it’s not a survey conducted by a vendor with a vested interest in the outcome but is actual real data provided by many (now 70) contributors.
  5. So, speaking of contributors…As the DBIR evolved more contributors came on which made using Excel a bit difficult. This is where R comes into play. The covers you see here aren’t just pretty pictures (tho they are pretty pictures). We’ll cover more about the covers (heh) in a bit.
  6. When the number of incidents was small (in the low hundreds) working with the data in Excel was fairly straightforward. But, thanks to regular contribution from the secret service, department of health and human services and over 60 other global organizations the number of incidents in the corpus has now hit 200K. Now excel can handle 200K rows, but there are some things that make the analysis a bit trickier.
  7. The VERIS (Vocabulary for Event Recording and Incident Sharing) Framework Is a taxonomy that standardizes how security incidents are described and categorized. The schema and record format is in JSON and organized into the Actor/Action/Asset/Attribute categories I mentioned earlier. You can see real incidents encoded in this format over at vcdb.org where we have a corpus of public breaches encoded.
  8. If we limit this to just the top level categories there are 315 top-level combinations, but it’s possible to record multiple actors, actions, assets and attributes per incident. Think of a phishing incident where there’s a phishing email that then causes social engineering that eventually has malware deployed on a system with an actor then looking for other systems to break and eventually steal data from or corrupt data. It’s possible to actually have over 2,000 enumeration details associated with an incident. Given the nested structure of JSON and the limitations in excel, the decision to move to R was not a tough one to make.
  9. To help standardize the analysis of the incident records, the verisr package was created (by Jay). While not in CRAN, the package is available on the VZ RISK team’s github repository (with a forked copy in Jay’s github) and can be used to analyze VCDB incidents or incidents that organizations encode in VERIS format. It makes heavy use of the data.table package. One interesting fact is that the incident and breach corpus fits on a cheesy thumb drive that you might get at a vendor booth at a tech conference. We have an entire chapter on VERIS and verisr in our book Data-Driven Security with examples of how to use the package to analyze different aspects of incidents.
  10. Here’s a small example of what the verisr pkg can do. There are many helper functions that make it easy to slice & dice the data for any given analysis.
  11. With verisr and a corpus of breach data to work with, you can do things like compare the time it takes an attacker to compromise an organization vs the time it takes an org to discover a breach. Unlike the vast majority of security reports that would be glad to declare victory at 2014 being the closest compromise vs discover gap, the trends lines paint a slightl different picture. Whenever ggplot2 was used to make a chart I’ve included the sticker on the page. Every visualization you see in this presentation and in the report is 99% ggplot2. Font issues (more on that in a bit) and some required style guide restrictions (how legends appear, for example) make up the 1%. We probably saved $12-15K (based on the hourly rate) in post-production costs by providing high quality & pre-styled charts to the production team.
  12. And, because VERIS uses the North American Industry Classification System (NAICS) for granular recording what industry an org is in, you can do really cool things like cluster incidents by selected enumeration details at a broad or discrete level. This particular chart encodes # of incidents in a given industry as circle size and clusters incidents with similar attack profiles closer together. We usually look at the industries at the 2-digit NAICS level (since the report has a broad audience) but for this particular analysis we wanted to see if industries further down the NAICS tree were clustered within their higher level category or across them. The exercise was pretty illuminating and you can look for yourself in the report or hit the URL on this page. We exported the data from R and made an interactive D3 visualization that you can explore.
  13. Last year, we a number of clustering techniques to try to classify the breaches into categories. It’s virtually impossible to do this by hand anymore and the analysis ended up putting each incident into one of nine buckets. When we looked at the core attributes of each bucket we were easily able to give them an easy name to remember since the VERIS enumerations that made up each category made sense to the domain experts performing the analysis. We dubbed them the Nefarious Nine (+1 which lumps the ones that had no classification into a catchall category). By doing this we are also able to provide a snapshot into a current and multi-year view. This heatmap attempts to show the most prevalent pattern in a given industry (with a 3 year history) and lets you compare across industries to spot similarities and differences. We’re working on a way to do this for more discrete NAICS codes without making the chart gibberish (it may have to be an interactive version to accomplish that though).
  14. This year we had an opportunity to do more than work on breaches. A handful of new partners (vendors and service) provided over 12TB of incident and vulnerability data to us to analyze and make part of the report.
  15. Note the mistake about using density plots
  16. Note the mistake about using density plots
  17. We encode our own incidents and others code their incidents into a Survey Gizmo form. Yes. Survey Gizmo. It’s cheaper than building and maintaining an app (we tried!), is more creator-friendly than google forms has built-in user management, has an API (more on that in a bit) and is fine from a security standpoint since we have codenames for all participants and uniquely identifying components of an incident are forbidden from being entered. We sometimes have to fly to an org to help them encode and transport incidents in locked briefcases (no handcuffs I’m afraid). We use node.js for JSON schema validation for each record and to do some minor cleanup of each incident (if necessary). We can use V8 now to keep all that activity in R. For some of the new, large data we received we ended up having to use postgresql and elasticsearch for some of it and most of it was downloaded across secure internet connections.
  18. We used an internal gitlab instance on an annoyingly secure private network enclave as the source of authority for the JSON records for the incident records and to hold the R scripts and Rmd files for analysis. The VCDB incidents are on github (go play!) so we used that as well. We were keeping a leaderboard for github incident encoding at one point as well. We used Slack for virtually all team collaboration (which is one reason I wrote the slackr package) and uses gpg tools to share anything remotely sensitive. We also received alerts about SLA issues both for outages and survey completion times from SurveyGizmo (their API is pretty decent). We used Room.co for video chats since it has more secure point-to-point websockets and Google Hangouts records everything even if you didn’t ask it to. All the analyses were done in RStudio cuz RStudio & Kevin Ushey (et al) rocks.
  19. Notice I did not say easy
  20. There is a cover contest each year where we usually add hidden text to one of the covers (with pictorial clues on the cover and some in the text of the report) that send folks on a cryptographic and puzzle-infused scavenger hunt to eventually know where to send a coded message two. The first 3 folks or teams to do so win prizes (like iPads) or can have a donation made in their name to a charity of their choice. The 2014 cover is the first time there was an actual data-driven cover completely generated in R. There’s an explanation on the back of the 2014 report that talks about the clustering used there. The base was done in ggplot2 and igraph was used to generate the graphs on top (layered by hand in illustrator). It’s 100% driven by data from that year’s report and shows the universe of breaches quite nicely IMO.
  21. We based this year’s cover on Joy Division’s “Unknown Pleasures” album cover. The cover was entirely generated with ggplot2 with only minor editing by the graphics team
  22. Rather than use hidden text, we used R to encode bits onto the back as “waveforms”. The winning teams ended up transcribing the bits by hand. I have R code that can read the PDF encoded lines and determine 1/0 from it (like 4 lines of R). The bits make what look like gibberish unless you recognize what bitly short urls look like after the slash.