Effective Applications of the R Language

Bob Rudis • Managing Principal & Senior Data Scientist
bob@rudis.net
From Data to Decisions Makers
A Behind the Scenes Look at Building The
Most Respected Report In Cybersecurity

bob@rudis.net
ABOUT ME
(Briefly)

bob@rudis.net
• DBIR team manager/author (more on this in a bit)
• Former cyber risk director for a Fortune 100
insurance company
• Serial #rstats Tweeter (@hrbrmstr), blogger
(rud.is/b & @ddsecblog) & regular helper on
StackOverflow
• Author of and contributor to 14 CRAN packages
• Co-author of Data-Driven Security (@ddsecbook)
• Co-host of the Data-Driven Security Podcast
(@ddsecpodcast)
• Die-hard ggplot2 advocate, widgeteer, heavily
addicted cartographer & shameless user of the
forward assignment operator ←4EVA→

bob@rudis.net
WHAT IS THE DBIR?

bob@rudis.net
The Verizon Data Breach
Investigations Report (DBIR)
“The Verizon Data Breach Investigations Report
(DBIR) is an annual publication that provides
analysis of information security incidents, with a
specific focus on data breaches.”
http://searchsecurity.techtarget.com/definition/Verizon-Data-Breach-Investigations-Report-DBIR
verizonenterprise.com/DBIR

bob@rudis.net
WHO IS THE DBIR?

bob@rudis.net
Wade Baker Dave Hylender Marc Spitler Jay Jacobs
Kevin Thompson Suzanne Widup Bhaskar Karambelkar Gabriel Bassett

bob@rudis.net
The DBIR
• Started in 2008
• Cited by virtually every other cybersecurity report
by the 3❡
• Read by individual contributors up through senior
leadership at virtually every global enterprise
• A lot of fun to work on

bob@rudis.net
#RSAC
#DBIR
2008 2009 2010 2011 2012 2013 2014 2015
1 1 2 3 6
18
50
70

bob@rudis.net
WHAT DOES THIS HAVE TO
DO WITH ?

bob@rudis.net
200,000

bob@rudis.net
Vocabulary for
Event
Recording and
Incident
Sharing
veriscommunity.net
vcdb.org

bob@rudis.net

bob@rudis.net
verisr
github.com/vz-risk/verisr

bob@rudis.net
library(verisr)
vcdb <- json2veris(jsondir)
summary(vcdb) # too big to show
getenum(vcdb, "actor")
## enum x
## 1 external 955
## 2 internal 535
## 3 partner 100
## 4 unknown 85
getenum(vcdb, "actor", add.n=TRUE, add.freq=TRUE)
## enum x n freq
## 1 external 955 1643 0.581
## 2 internal 535 1643 0.326
## 3 partner 100 1643 0.061
## 4 unknown 85 1643 0.052

bob@rudis.net
vz-risk.github.io/dbir/2015/19/

bob@rudis.net
• 200m successful vulnerability exploits across 20,000 enterprises
• 170m malware events across over 10,000 enterprises
• 6 months of malware traffic data from 30+m mobile devices
• Live botnet traffic from compromised organizations
• Millions of Indicators of Compromise
• Details of all Denial of Service activity for 2014

bob@rudis.net
PUTTING IT ALL TOGETHER
Getting the data

bob@rudis.net
PUTTING IT ALL TOGETHER
Creating, organizing and sharing analyses

bob@rudis.net
.R .Rmd .json .Rdata

bob@rudis.net
1. Assign areas to each researcher
2. For “standard VERIS” analyses, generate reports from core Rmd
3. Have “Findings Review” collaborative meetings where we peer-review the work
4. (Repeat step 3 after refinement of findings)
5. Decide on final sections for the report and assign authors
6. Add rough draft visualizations to the findings
7. Lock in content
8. Refine visualizations
9. Finalize text content
10. Work with Marketing & Graphics

bob@rudis.net
FIGURATIVELY SPEAKING

bob@rudis.net
• Create one “Master Rmd” for all
visualization figures using canned data from
outputs of analyses, having one master
(giant) HTML document version and multiple
individual PDF versions to give to the
creative staff to work with
Why PDF? Complex ggplot2 SVGs crash
Illustrator and the fonts are horrible (they
get converted to polygons).

bob@rudis.net
• When you decide you want to use a figure
from the analysis spend the time to make it
look as amazing (and final) as possible to
save $$, save time down the road and to
avoid seeing your creations on @wtfviz

bob@rudis.net
LESSONS LEA NED

bob@rudis.net
R Markdown (Rmd) makes it super
amazingly awesomely easy to
document, iterate, modify & share
analyses.
spinning is cool too.

bob@rudis.net
ggplot2 makes is super amazingly
awesomely straightforward to make
“camera ready” visualizations
(PDF vs SVG)

bob@rudis.net
Do not upgrade your analysis stack or
experiment with RStudio during the
core analysis phase

bob@rudis.net
Packages (even for analyses) > loosely
connected documents and scripts

bob@rudis.net
Source code control & data versioning
control is extremely important

bob@rudis.net
A fellow researcher must be able to
reproduce your analyses with the same
data & Rmd and understand your
reasoning in the annotation

bob@rudis.net
Freezing or at least recording versions
of packages you use may be vitally
important to your ability to reproduce
at a later date (store them in version
control with analyses or perhaps
embed in a container like Docker)

bob@rudis.net
ABOUT THE COVER

bob@rudis.net
• @vzdbir
• dbir@verizon.com
• verizonenterprise.com/dbir
• veriscommunity.net
• vcdb.org
• github.com/vz-risk
• @wadebaker
• @davehylender
• @marc_spitler
• @bfist
• @jayjacobs
• @SuzanneWidup
• @bhaskar_vk
• @gdbassett
• @hrbrmstr

Effective Applications of the R Language

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a Effective Applications of the R Language

Semelhante a Effective Applications of the R Language (20)

Último

Último (20)

Effective Applications of the R Language

Notas do Editor