SlideShare uma empresa Scribd logo
1 de 52
Baixar para ler offline
By Tiago Henriques, Filipa Rodrigues
Florentino Bexiga, Ana Barbosa
I, for one, welcome our
new Cyber Overlords!
An introduction to the use of
data science in cybersecurity
WHO ARE WE?
MACHINE LEARNING AND CYBERSECURITY
IMAGE WORKFLOW
IMAGE ANALYSIS IN DETAIL
DATA VISUALISATION
Agenda
Tiago is the CEO and Data necromancer at
BinaryEdge however he gets to meddle in the
intersection of data science and cybersecurity
by providing his team with lovely problems that
they solve on a daily basis.
Tiago Henriques
Presenter
Florentino is the Data MacGyver at
BinaryEdge. On a daily basis he needs to
deploy infrastructure used to analyse big
and realtime data. When not doing that, he
can be found creating models to analyse
data. Give him an orange, he’ll give you a
skynet. Why an orange you ask? He’s
hungry and likes oranges, there!
Florentino Bexiga
Presenter
Filipa is the Data Diva at BinaryEdge, she
dances the macarena with numbers to get
them to tell her all their dirty secret.
Filipa Rodrigues
Presenter
Ana is the Data Ferret at BinaryEdge.
She is small and hides between the 110th
and 111th characters of the ascii code to
see and show data in that unique
perspective of someone who can’t reach
the box of cookies stored on top of the
capitol 'I'
Ana Barbosa
Presenter
Earlier today
BinaryEdge
HACKING
SKILLS
SECURITY DOMAIN
EXPERTISE
STATISTICS
KNOWLEDGE
MACHINE
LEARNING
TRADITIONAL
RESEARCH
DANGER
ZONE!
DATA
SCIENCE
Source: Data-Driven Security: Analysis, visualisation and Dashboards (adapted)
How we got here....
200 port scan of the entire internet/ month
1,400,000,000 scanning events/ month *
746,000 torrents monitored and increasing
1,362,225,600 torrent events/ month
* at a minimum
Worldwide distribution of IPs running services
<= 100
Number of IPs found
>= 1,000,000
100,000 < #found < 1,000,000
10,000 < #found <= 100,000
1,000 < #found <= 10,000
100 < #found <= 1,000
Map IPv4 addresses to Hilbert curves
% of coverage
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Data Science & Machine Learning
How many IP addresses did job X had vs. job Y?
What is the average duration of the scans?
Can we extract more from all the screenshots we get?
Can we have a more optimized job distribution?
We can only identify X% of services because we’re
using static signatures, can we do better?
Can we find similar images?
MULTIPLE WILD QUESTIONS APPEAR... ...ONE COMMON ANSWER
DATA SCIENCE
&
MACHINE LEARNING
Data Science & Machine Learning
DATA SCIENCE MACHINE LEARNING
INITIAL ANALYSIS AND CLEAN UP
EXPLORATORY DATA ANALYSIS
DATA VISUALISATION
KNOWLEDGE DISCOVERY
CLASSIFICATION
CLUSTERING
SIMILARITY MATCHING
REGRESSION
IDENTIFICATION
Problems and Limitations of
Machine Learning in CyberSecurity
Lots of adversarial scenarios – Attacks to the classifiers, goes against the foundation of
machine learning
Prediction – Scenarios and data too volatile, not enough proper sources of data
Lack of data in quantity and quality to train models
Good use cases
further work needs to be done, but will allow to move antivirus from a static/
signature based system into a much improved dynamic/ learning based
system
If a computer is hacked certain behaviors will change, if constant data is being
monitored and fed into a system the hack could be detected
detection of vulnerable patterns during development
sentiment analysis applied to emails, tweets, social networks of employees
PATTERN DETECTION/OUTLIER
DETECTION (IDS/IPS)
ANTIVIRUS
ANTI-SPAM
SMARTER FUZZERS
SOURCE CODE ANALYSIS
INTERNAL ATTACKERS
metadata
files people
photos
family&friends
behaviour
social
search
company
registration
ip address
url address
news
forums
sub-reddits
internal
external
phone
email
linked urls
likes
topics
BGP
AS
whois
AS membership
AS peer
list of IPs
shared
infrastructure
co-hosted
sites
contact
geolocation
office
locations
social
networks
phone
portscan
dns
torrents
binaryedge.io2016
domains
AXFR
MX records
screenshots
web
services
http https
webserver
framework
headers
cookies
certificate
configuration
authorities
entities
SMB
VNC
RDP
users
appsfiles
peers torrent name
OCR
SW
banners
image
classifier
vulnerabilities
data points
Torrent Correlation
Torrent Correlation
China or Military
Data correlation
Data correlation
Turkish IP
DEMO
At PixelsCamp
At PixelsCamp
metadata
files people
photos
family&friends
behaviour
social
search
company
registration
ip address
url address
news
forums
sub-reddits
internal
external
phone
email
linked urls
likes
topics
BGP
AS
whois
AS membership
AS peer
list of IPs
shared
infrastructure
co-hosted
sites
contact
geolocation
office
locations
social
networks
phone
portscan
dns
torrents
binaryedge.io2016
domains
AXFR
MX records
screenshots
web
services
http https
webserver
framework
headers
cookies
certificate
configuration
authorities
entities
SMB
VNC
RDP
users
appsfiles
peers torrent name
OCR
SW
banners
image
classifier
vulnerabilities
data points
Microservices (REST API)
MICROSERVICES
(REST API)
PORT WORD
TAG
FACECOUNTRY LOGO
IP
Scan
SCAN
GENERATES EVENTS
DOES IT
GENERATE A
SCREENSHOT?
STORE THE IMAGE FILE
ON THE CLOUD
YES
NO
GENERATE A NOTIFICATION
THAT NEW IMAGE WAS UPLOADED
FINISH
Image Workflow
INITIALIZER FILTER LOGO DETECTION
FACE DETECTION
OPTICAL CHARACTER
RECOGNITION (OCR)
INITIALIZER FILTER LOGO DETECTION
FACE DETECTION
OPTICAL CHARACTER
RECOGNITION (OCR)
Image Workflow
PULL MESSAGE
FROM QUEUE
IS THERE
A NEW IMAGE?
DECRYPT AND STORE IMAGE
METADATA ON A DATABASE
YES
NO
GENERATE IMAGE SIGNATURE
FOR SIMILARITY COMPARISON
FINISH
MESSAGE QUEUE
Image Workflow
PULL MESSAGE
FROM QUEUE
DOES THE
IMAGE HAVE ANY
INFORMATION?
PERFORM SIMPLE
ENTROPY FILTERING
YES
NO
FINISH
MESSAGED QUEUE
INITIALIZER FILTER LOGO DETECTION
FACE DETECTION
OPTICAL CHARACTER
RECOGNITION (OCR)
PULL MESSAGE
FROM QUEUE
ENHANCE IMAGE WITH
APPLICATION OF SOME FILTERS
RUN FACE AND LOGO DETECTION
AND OCR ALGORITHMS
STORE RESULTS
IN DATABASE
PERFORM ADDITIONAL
ACTIONS WITH THE RESULTS
Image Workflow
INITIALIZER FILTER LOGO DETECTION
FACE DETECTION
OPTICAL CHARACTER
RECOGNITION (OCR)
Image Workflow
[{"BreachDate": "2013-10-04", "DataClasses": ["Email addresses",
"Password hints", "Passwords", "Usernames"], "Title": "Adobe", "IsAc-
tive": true, "Description": "In October 2013, 153 million Adobe accounts
were breached with each containing an internal ID, username, email,
<em>encrypted</em> password and a password hint in plain text. The
password cryptography was poorly done and <a href="http://stric-
ture-group.com/files/adobe-top100.txt" target="_blank">many were
quickly resolved back to plain text</a>. The unencrypted hints also <a
href="http://www.troyhunt.com/2013/11/adobe-creden-
tials-and-serious.html" target="_blank">disclosed much about the
passwords</a> adding further to the risk that hundreds of millions of
Adobe customers already faced.", "Domain": "adobe.com", "Added-
Date": "2013-12-04T00:00:00Z", "PwnCount": 152445165, "IsRetired":
false, "IsVerified": true, "LogoType": "svg", "IsSensitive": false, "Name":
"Adobe"}]
Email
DataLeak API
Image WorkflowImage Workflow
INITIALIZER FILTER LOGO DETECTION
FACE DETECTION
OPTICAL CHARACTER
RECOGNITION (OCR)
Shannon’s Entropy
Entropy = 0.00 bits Entropy ~ 0.03 bits Entropy ~ 2.13 bits
Filter
Data Visualization
EXPLORATION REPRESENTATION DETAILS FINISHING UPTOOLS
“a multidisciplinary recipe of art, science, math, technology, and many other interesting ingredients.”
Andy Kirk, “Data Visualization: a successful design process”
EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP
DATA TYPE
RELEVANCE
FILTER
What is the most interesting?
What is most important?
Audience’s Profile
What is the most relevant information in the context?
Show all values or just a few?
Define periods?
Define a threshold?
Hierarchical
Relational
Temporal
Spatial
Categorical
Exploration
Data Visualization
Representation
Experimentation is important
Conceive ideas
Storyboarding
Do multipe iterations
Prototype
Test
design can be used in the future
Data VisualizationEXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP
69,543,915 25,436,974 7,008,108 3,475,472 1,287,446 1,043,331
951,629 854,817 789,515 759,115 490,290 288,885
266,827 257,105 219,025 198,898 186,286 141,474
HowmanyopenportsdoesanIPhave?
NumberofIPswithXopenportsport
NumberofIPs
Representation
EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP
Distribution of IP addresses running encrypted and unencrypted services
MARKS
Points
Areas
Lines
ATTTRIBUTES Position
Connections/ Patterns
Size/ Color
REPRESENT RECORDS
EMPHASIZE THE MOST IMPORTANT
ASPECTS OF THE DATA on port 443
on port 80
51,467,779
HTTP
28,671,263
IPs running
HTTP services
IPs running
HTTPS services
16,519,503IPs running both
HTTP and HTTPS services
HTTP
&
HTTPS
HTTPS
Data Visualization
Data Visualization
Representation
PRECISION IN DESIGN
Geometric Calculations
Truncated axis
Scales
MAKE IT UNDERSTANDABLE
Reference lines
Markers
MAKE IT APPEALING
Minimise the clutter
Priority: preserve function
Top 10Web Servers for theWeb
Most common web servers found on port 80
Apache httpd
AkamaiGHost
Micorosft IIS httpd
nginx
lighttpd
Huawei HG532e ADSL modem http admin
Microsoft HTTPAPI httpd
Technicolor DSL modem http admin
Mbedthis-Appweb
micro_httpd
2 4 6 80 10 12 millions
11,493,552
8,361,080
4,843,769
3,860,883
2,031,741
1,539,629
952,300
699,202
694,393
678,657
EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP
EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP
Representation
Consider different design solutions
DATA TYPE
CONDITION
Hierarchical
Relational
Temporal
Spatial
Categorical
CVSS SCORES
LOW
MEDIUM
HIGH
0.0
10.0
4.0
7.0
SEVERITY
CVSS: CommonVulnerability Scoring System
Data Visualization
CVE
Identifier
Number
References
Description
EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP
CVE: CommonVulnerabilities and Exposure
Representation
Consider different design solutions
DATA TYPE
CONDITION
Hierarchical
Relational
Temporal
Spatial
Categorical
Data Visualization
EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP
Overview of protocols used for email, according to encryption used
Email Protocols
ENCRYPTED UNENCRYPTED
POP3 POP3S SMTP SMTPS IMAP IMAPS
4,572,161 3,742,289 3,531,071 2,971,159 4,131,737 3,703,364
10,416,812 12,234,969
SERVICE
COUNT
Representation
Consider different design solutions
DATA TYPE
CONDITION
Hierarchical
Relational
Temporal
Spatial
Categorical
Data Visualization
Representation
Consider different design solutions
DATA TYPE
CONDITION
Hierarchical
Relational
Temporal
Spatial
Categorical
EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP
Big Data Technologies
Changes in amount of data exposed without security
MongoDB Memcached Redis 2 TB
644.3 TB
Aug 2015 Jan 2016 July 2016
724.7 TB 627.7 TB
13.2 TB
11.3 TB
710.9 TB 12.0 TB
598.7 TB 27.5 TB 1.5 TB
1.8 TB
619.8 TB
Data Visualization
Representation
Consider different design solutions
DATA TYPE
CONDITION
Hierarchical
Relational
Temporal
Spatial
Categorical
EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP
Heartbleed
Countries with higher number of IPs vulnerable to Heartbleed
Russia
5,264
Republic of Korea
4,564
China
6,790
United States
23,649
Italy
2,508
Germany
6,382
France
5,622
Netherlands
2,779United Kingdom
3,459
Japan
2,484
Data Visualization
Data VisualizationEXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP
VNC wordcloud
loginwindows
edition
2016
delete
ctrl
server
press
microsoft
system
welcome
your help
file
linux
google
kernel
from
ubuntu
Details
ANNOTATION
Titles and subtitles
Labels
Legends
TYPOGRAPHY
Use fonts that are easy to read
Don’t use fonts that are considered sloppy
SSH Banners
SSH-2.0-OpenSSH_5.3
SSH-2.0-OpenSSH_6.6.1p1
SSH-2.0-OpenSSH_6.6.1
SSH-2.0-OpenSSH_4.3
SSH-2.0-OpenSSH_6.0p1
SSH-2.0-OpenSSH_6.7p1
SSH-2.0-dropbear_2014.63
SSH-2.0-OpenSSH_5.5p1
SSH-2.0-ROSSSH
SSH-2.0-OpenSSH_5.9p1
202,361
352,978
436,700449,570
462,616
537,667
555,779
604,579
1,501,749
2,632,270
count
banner
Most common SSH Banners found
EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP
Data Visualization
Details
ANNOTATION
Titles and subtitles
Labels
Legends
TYPOGRAPHY
Use fonts that are easy to read
Don’t use fonts that are considered sloppy
SSH
-2.0-O
penSSH
_5.3
SSH
-2.0-O
penSSH
_6.6.1p1
SSH
-2.0-O
penSSH
_6.6.1
SSH
-2.0-O
penSSH
_4.3
SSH
-2.0-O
penSSH
_6.0p1
SSH
-2.0-O
penSSH
_6.7p1
SSH
-2.0-dropbear_2014.63
SSH-2.0-OpenSSH_5.5p1
SSH
-2.0-RO
SSSH
SSH
-2.0-O
penSSH
_5.9p1
202,361
352,978
436,700449,570
462,616
537,667
555,779
604,579
1,501,749
2,632,270
EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP
Data Visualization
Details
COLOR
Legibility
Functional purpose
Salience
Consistency
Color Blindness
COMPOSITION
Chart size/ orientation
Alignments
EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP
SSH Key Lengths
Most common key lengths found
Key
length
count
641,719
1040
186,070
1032
13,845
4096
5,068,711
1024
3,740,593
2048
9,064
512
7,830
2056
6,265
2064
6,212
1016
4,755
768
Data Visualization
Tools
EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP
BALANCE
Automation
Programming Language
to create plots
Fine tunning in illustrator
(make it better for the audience)
Hand-editing process
Human error
Originality
Automated Analysis
Illustrator (or other tool) to
create visualization solution
Human error
Data Visualization
EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP
DOCUMENT EVERY STEP OF THE PROCESS
Calculations
Choices of visualisations
Choices of data points
REVIEW EVERYTHING
What could have been done differently?
What could be better?
TAKE CONSTRUCTIVE FEEDBACK
Even if it means to start over
A visualization can be used in the future
Data Visualization
INTERNET
SECURITY
EXPOSURE
2016
BinaryEdge.io
Be Ready. Be Safe. Be Secure.
ise.binaryedge.io
THE SCIENCE
BEHIND THE DATA
CREATED BY
BINARYEDGE

Mais conteúdo relacionado

Mais procurados

Infragard atlanta ulf mattsson - cloud security - regulations and data prot...
Infragard atlanta   ulf mattsson - cloud security - regulations and data prot...Infragard atlanta   ulf mattsson - cloud security - regulations and data prot...
Infragard atlanta ulf mattsson - cloud security - regulations and data prot...Ulf Mattsson
 
Emerging Data Privacy and Security for Cloud
Emerging Data Privacy and Security for CloudEmerging Data Privacy and Security for Cloud
Emerging Data Privacy and Security for CloudUlf Mattsson
 
F5 networks the_expectation_of_ssl_everywhere
F5 networks the_expectation_of_ssl_everywhereF5 networks the_expectation_of_ssl_everywhere
F5 networks the_expectation_of_ssl_everywhereF5 Networks
 
What I Learned at RSAC 2020
What I Learned at RSAC 2020What I Learned at RSAC 2020
What I Learned at RSAC 2020Ulf Mattsson
 
What i learned at gartner summit 2019
What i learned at gartner summit 2019What i learned at gartner summit 2019
What i learned at gartner summit 2019Ulf Mattsson
 
Next generation data protection and security for oracle users - gdpr blockc...
Next generation data protection and security for oracle users   - gdpr blockc...Next generation data protection and security for oracle users   - gdpr blockc...
Next generation data protection and security for oracle users - gdpr blockc...Ulf Mattsson
 
Institucional proofpoint
Institucional proofpointInstitucional proofpoint
Institucional proofpointvoliverio
 
State of the ATT&CK - ATT&CKcon Power Hour
State of the ATT&CK - ATT&CKcon Power HourState of the ATT&CK - ATT&CKcon Power Hour
State of the ATT&CK - ATT&CKcon Power HourAdam Pennington
 
Jun 15 privacy in the cloud at financial institutions at the object managemen...
Jun 15 privacy in the cloud at financial institutions at the object managemen...Jun 15 privacy in the cloud at financial institutions at the object managemen...
Jun 15 privacy in the cloud at financial institutions at the object managemen...Ulf Mattsson
 
Emerging application and data protection for multi cloud
Emerging application and data protection for multi cloudEmerging application and data protection for multi cloud
Emerging application and data protection for multi cloudUlf Mattsson
 
Securing data today and in the future - Oracle NYC
Securing data today and in the future - Oracle NYCSecuring data today and in the future - Oracle NYC
Securing data today and in the future - Oracle NYCUlf Mattsson
 
[EMC] Source Code Protection
[EMC] Source Code Protection[EMC] Source Code Protection
[EMC] Source Code ProtectionPerforce
 
What I learned from RSAC 2019
What I learned from RSAC 2019What I learned from RSAC 2019
What I learned from RSAC 2019Ulf Mattsson
 
Becoming a Yogi on Mac ATT&CK with OceanLotus Postures
Becoming a Yogi on Mac ATT&CKwith OceanLotus PosturesBecoming a Yogi on Mac ATT&CKwith OceanLotus Postures
Becoming a Yogi on Mac ATT&CK with OceanLotus PosturesAdam Pennington
 
The past, present, and future of big data security
The past, present, and future of big data securityThe past, present, and future of big data security
The past, present, and future of big data securityUlf Mattsson
 
Information Security Risk Management
Information Security Risk ManagementInformation Security Risk Management
Information Security Risk Managementipspat
 
Data centric security key to digital business success - ulf mattsson - bright...
Data centric security key to digital business success - ulf mattsson - bright...Data centric security key to digital business success - ulf mattsson - bright...
Data centric security key to digital business success - ulf mattsson - bright...Ulf Mattsson
 
ISSA Atlanta - Emerging application and data protection for multi cloud
ISSA Atlanta - Emerging application and data protection for multi cloudISSA Atlanta - Emerging application and data protection for multi cloud
ISSA Atlanta - Emerging application and data protection for multi cloudUlf Mattsson
 

Mais procurados (20)

Infragard atlanta ulf mattsson - cloud security - regulations and data prot...
Infragard atlanta   ulf mattsson - cloud security - regulations and data prot...Infragard atlanta   ulf mattsson - cloud security - regulations and data prot...
Infragard atlanta ulf mattsson - cloud security - regulations and data prot...
 
Hacking 05 2011
Hacking 05 2011Hacking 05 2011
Hacking 05 2011
 
Emerging Data Privacy and Security for Cloud
Emerging Data Privacy and Security for CloudEmerging Data Privacy and Security for Cloud
Emerging Data Privacy and Security for Cloud
 
F5 networks the_expectation_of_ssl_everywhere
F5 networks the_expectation_of_ssl_everywhereF5 networks the_expectation_of_ssl_everywhere
F5 networks the_expectation_of_ssl_everywhere
 
What I Learned at RSAC 2020
What I Learned at RSAC 2020What I Learned at RSAC 2020
What I Learned at RSAC 2020
 
What i learned at gartner summit 2019
What i learned at gartner summit 2019What i learned at gartner summit 2019
What i learned at gartner summit 2019
 
Next generation data protection and security for oracle users - gdpr blockc...
Next generation data protection and security for oracle users   - gdpr blockc...Next generation data protection and security for oracle users   - gdpr blockc...
Next generation data protection and security for oracle users - gdpr blockc...
 
Institucional proofpoint
Institucional proofpointInstitucional proofpoint
Institucional proofpoint
 
State of the ATT&CK - ATT&CKcon Power Hour
State of the ATT&CK - ATT&CKcon Power HourState of the ATT&CK - ATT&CKcon Power Hour
State of the ATT&CK - ATT&CKcon Power Hour
 
Jun 15 privacy in the cloud at financial institutions at the object managemen...
Jun 15 privacy in the cloud at financial institutions at the object managemen...Jun 15 privacy in the cloud at financial institutions at the object managemen...
Jun 15 privacy in the cloud at financial institutions at the object managemen...
 
Emerging application and data protection for multi cloud
Emerging application and data protection for multi cloudEmerging application and data protection for multi cloud
Emerging application and data protection for multi cloud
 
Securing data today and in the future - Oracle NYC
Securing data today and in the future - Oracle NYCSecuring data today and in the future - Oracle NYC
Securing data today and in the future - Oracle NYC
 
[EMC] Source Code Protection
[EMC] Source Code Protection[EMC] Source Code Protection
[EMC] Source Code Protection
 
What I learned from RSAC 2019
What I learned from RSAC 2019What I learned from RSAC 2019
What I learned from RSAC 2019
 
Becoming a Yogi on Mac ATT&CK with OceanLotus Postures
Becoming a Yogi on Mac ATT&CKwith OceanLotus PosturesBecoming a Yogi on Mac ATT&CKwith OceanLotus Postures
Becoming a Yogi on Mac ATT&CK with OceanLotus Postures
 
The past, present, and future of big data security
The past, present, and future of big data securityThe past, present, and future of big data security
The past, present, and future of big data security
 
Information Security Risk Management
Information Security Risk ManagementInformation Security Risk Management
Information Security Risk Management
 
Data centric security key to digital business success - ulf mattsson - bright...
Data centric security key to digital business success - ulf mattsson - bright...Data centric security key to digital business success - ulf mattsson - bright...
Data centric security key to digital business success - ulf mattsson - bright...
 
ISSA Atlanta - Emerging application and data protection for multi cloud
ISSA Atlanta - Emerging application and data protection for multi cloudISSA Atlanta - Emerging application and data protection for multi cloud
ISSA Atlanta - Emerging application and data protection for multi cloud
 
BO2K Byline
BO2K BylineBO2K Byline
BO2K Byline
 

Semelhante a I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACHINE LEARNING IN CYBERSECURITY

From Info Science to Data Science & Smart Nation
From Info Science to Data Science & Smart Nation From Info Science to Data Science & Smart Nation
From Info Science to Data Science & Smart Nation CK Toh
 
RDBMS to Graph Webinar
RDBMS to Graph WebinarRDBMS to Graph Webinar
RDBMS to Graph WebinarNeo4j
 
The Other AI: How Semantic Reasoning Automates Security Analysis
The Other AI: How Semantic Reasoning Automates Security AnalysisThe Other AI: How Semantic Reasoning Automates Security Analysis
The Other AI: How Semantic Reasoning Automates Security AnalysisAnton Goncharov
 
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligenceTour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligenceAlex Danvy
 
Internet of Things (IoT) - in the cloud or rather on-premises?
Internet of Things (IoT) - in the cloud or rather on-premises?Internet of Things (IoT) - in the cloud or rather on-premises?
Internet of Things (IoT) - in the cloud or rather on-premises?Guido Schmutz
 
Alessandro Ferrari - Smart City, Mixed Reality, Self-Driving Cars & Neural Co...
Alessandro Ferrari - Smart City, Mixed Reality, Self-Driving Cars & Neural Co...Alessandro Ferrari - Smart City, Mixed Reality, Self-Driving Cars & Neural Co...
Alessandro Ferrari - Smart City, Mixed Reality, Self-Driving Cars & Neural Co...Codemotion
 
Big Data LDN 2017: Big Impact with Big Data
Big Data LDN 2017: Big Impact with Big DataBig Data LDN 2017: Big Impact with Big Data
Big Data LDN 2017: Big Impact with Big DataMatt Stubbs
 
Xuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsXuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsMachine Learning Prague
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsClusterpoint
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015StampedeCon
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use CasesMax De Marzi
 
Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...
Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...
Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...Priyanka Aash
 
Connected devices microsoft
Connected devices microsoftConnected devices microsoft
Connected devices microsoftArif Shafique
 
Luiz eduardo. introduction to mobile snitch
Luiz eduardo. introduction to mobile snitchLuiz eduardo. introduction to mobile snitch
Luiz eduardo. introduction to mobile snitchYury Chemerkin
 
RSA2015: Securing the Internet of Things
RSA2015: Securing the Internet of ThingsRSA2015: Securing the Internet of Things
RSA2015: Securing the Internet of ThingsDaniel Miessler
 
Brief Intro to Data Visualisation
Brief Intro to Data VisualisationBrief Intro to Data Visualisation
Brief Intro to Data VisualisationRi Liu
 
Alitora Innovation Networks
Alitora Innovation NetworksAlitora Innovation Networks
Alitora Innovation Networksalitora
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesNeo4j
 

Semelhante a I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACHINE LEARNING IN CYBERSECURITY (20)

From Info Science to Data Science & Smart Nation
From Info Science to Data Science & Smart Nation From Info Science to Data Science & Smart Nation
From Info Science to Data Science & Smart Nation
 
RDBMS to Graph Webinar
RDBMS to Graph WebinarRDBMS to Graph Webinar
RDBMS to Graph Webinar
 
The Other AI: How Semantic Reasoning Automates Security Analysis
The Other AI: How Semantic Reasoning Automates Security AnalysisThe Other AI: How Semantic Reasoning Automates Security Analysis
The Other AI: How Semantic Reasoning Automates Security Analysis
 
AI pitch SSideri
 AI pitch SSideri  AI pitch SSideri
AI pitch SSideri
 
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligenceTour de France Azure PaaS 6/7 Ajouter de l'intelligence
Tour de France Azure PaaS 6/7 Ajouter de l'intelligence
 
Internet of Things (IoT) - in the cloud or rather on-premises?
Internet of Things (IoT) - in the cloud or rather on-premises?Internet of Things (IoT) - in the cloud or rather on-premises?
Internet of Things (IoT) - in the cloud or rather on-premises?
 
Alessandro Ferrari - Smart City, Mixed Reality, Self-Driving Cars & Neural Co...
Alessandro Ferrari - Smart City, Mixed Reality, Self-Driving Cars & Neural Co...Alessandro Ferrari - Smart City, Mixed Reality, Self-Driving Cars & Neural Co...
Alessandro Ferrari - Smart City, Mixed Reality, Self-Driving Cars & Neural Co...
 
Big Data LDN 2017: Big Impact with Big Data
Big Data LDN 2017: Big Impact with Big DataBig Data LDN 2017: Big Impact with Big Data
Big Data LDN 2017: Big Impact with Big Data
 
Xuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsXuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent Applications
 
High-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutionsHigh-performance database technology for rock-solid IoT solutions
High-performance database technology for rock-solid IoT solutions
 
Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015Graph Database Use Cases - StampedeCon 2015
Graph Database Use Cases - StampedeCon 2015
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...
Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...
Transfer Learning: Repurposing ML Algorithms from Different Domains to Cloud ...
 
Connected devices microsoft
Connected devices microsoftConnected devices microsoft
Connected devices microsoft
 
Luiz eduardo. introduction to mobile snitch
Luiz eduardo. introduction to mobile snitchLuiz eduardo. introduction to mobile snitch
Luiz eduardo. introduction to mobile snitch
 
RSA2015: Securing the Internet of Things
RSA2015: Securing the Internet of ThingsRSA2015: Securing the Internet of Things
RSA2015: Securing the Internet of Things
 
Tfm slides
Tfm slidesTfm slides
Tfm slides
 
Brief Intro to Data Visualisation
Brief Intro to Data VisualisationBrief Intro to Data Visualisation
Brief Intro to Data Visualisation
 
Alitora Innovation Networks
Alitora Innovation NetworksAlitora Innovation Networks
Alitora Innovation Networks
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 

Mais de Tiago Henriques

BSides Lisbon 2023 - AI in Cybersecurity.pdf
BSides Lisbon 2023 - AI in Cybersecurity.pdfBSides Lisbon 2023 - AI in Cybersecurity.pdf
BSides Lisbon 2023 - AI in Cybersecurity.pdfTiago Henriques
 
Codebits 2014 - Secure Coding - Gamification and automation for the win
Codebits 2014 - Secure Coding - Gamification and automation for the winCodebits 2014 - Secure Coding - Gamification and automation for the win
Codebits 2014 - Secure Coding - Gamification and automation for the winTiago Henriques
 
Presentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresecPresentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresecTiago Henriques
 
Confraria 28-feb-2013 mesa redonda
Confraria 28-feb-2013 mesa redondaConfraria 28-feb-2013 mesa redonda
Confraria 28-feb-2013 mesa redondaTiago Henriques
 
How to dominate a country
How to dominate a countryHow to dominate a country
How to dominate a countryTiago Henriques
 
Country domination - Causing chaos and wrecking havoc
Country domination - Causing chaos and wrecking havocCountry domination - Causing chaos and wrecking havoc
Country domination - Causing chaos and wrecking havocTiago Henriques
 
(Mis)trusting and (ab)using ssh
(Mis)trusting and (ab)using ssh(Mis)trusting and (ab)using ssh
(Mis)trusting and (ab)using sshTiago Henriques
 
Secure coding - Balgan - Tiago Henriques
Secure coding - Balgan - Tiago HenriquesSecure coding - Balgan - Tiago Henriques
Secure coding - Balgan - Tiago HenriquesTiago Henriques
 
Vulnerability, exploit to metasploit
Vulnerability, exploit to metasploitVulnerability, exploit to metasploit
Vulnerability, exploit to metasploitTiago Henriques
 
Practical exploitation and social engineering
Practical exploitation and social engineeringPractical exploitation and social engineering
Practical exploitation and social engineeringTiago Henriques
 

Mais de Tiago Henriques (17)

BSides Lisbon 2023 - AI in Cybersecurity.pdf
BSides Lisbon 2023 - AI in Cybersecurity.pdfBSides Lisbon 2023 - AI in Cybersecurity.pdf
BSides Lisbon 2023 - AI in Cybersecurity.pdf
 
Codebits 2014 - Secure Coding - Gamification and automation for the win
Codebits 2014 - Secure Coding - Gamification and automation for the winCodebits 2014 - Secure Coding - Gamification and automation for the win
Codebits 2014 - Secure Coding - Gamification and automation for the win
 
Presentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresecPresentation Brucon - Anubisnetworks and PTCoresec
Presentation Brucon - Anubisnetworks and PTCoresec
 
Hardware hacking 101
Hardware hacking 101Hardware hacking 101
Hardware hacking 101
 
Workshop
WorkshopWorkshop
Workshop
 
Enei
EneiEnei
Enei
 
Confraria 28-feb-2013 mesa redonda
Confraria 28-feb-2013 mesa redondaConfraria 28-feb-2013 mesa redonda
Confraria 28-feb-2013 mesa redonda
 
Preso fcul
Preso fculPreso fcul
Preso fcul
 
How to dominate a country
How to dominate a countryHow to dominate a country
How to dominate a country
 
Country domination - Causing chaos and wrecking havoc
Country domination - Causing chaos and wrecking havocCountry domination - Causing chaos and wrecking havoc
Country domination - Causing chaos and wrecking havoc
 
(Mis)trusting and (ab)using ssh
(Mis)trusting and (ab)using ssh(Mis)trusting and (ab)using ssh
(Mis)trusting and (ab)using ssh
 
Secure coding - Balgan - Tiago Henriques
Secure coding - Balgan - Tiago HenriquesSecure coding - Balgan - Tiago Henriques
Secure coding - Balgan - Tiago Henriques
 
Vulnerability, exploit to metasploit
Vulnerability, exploit to metasploitVulnerability, exploit to metasploit
Vulnerability, exploit to metasploit
 
Practical exploitation and social engineering
Practical exploitation and social engineeringPractical exploitation and social engineering
Practical exploitation and social engineering
 
Booklet
BookletBooklet
Booklet
 
Talkj4mshare
Talkj4mshareTalkj4mshare
Talkj4mshare
 
Codebits 2010
Codebits 2010Codebits 2010
Codebits 2010
 

Último

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 

Último (20)

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 

I FOR ONE WELCOME OUR NEW CYBER OVERLORDS! AN INTRODUCTION TO THE USE OF MACHINE LEARNING IN CYBERSECURITY

  • 1. By Tiago Henriques, Filipa Rodrigues Florentino Bexiga, Ana Barbosa I, for one, welcome our new Cyber Overlords! An introduction to the use of data science in cybersecurity
  • 2. WHO ARE WE? MACHINE LEARNING AND CYBERSECURITY IMAGE WORKFLOW IMAGE ANALYSIS IN DETAIL DATA VISUALISATION Agenda
  • 3. Tiago is the CEO and Data necromancer at BinaryEdge however he gets to meddle in the intersection of data science and cybersecurity by providing his team with lovely problems that they solve on a daily basis. Tiago Henriques Presenter
  • 4. Florentino is the Data MacGyver at BinaryEdge. On a daily basis he needs to deploy infrastructure used to analyse big and realtime data. When not doing that, he can be found creating models to analyse data. Give him an orange, he’ll give you a skynet. Why an orange you ask? He’s hungry and likes oranges, there! Florentino Bexiga Presenter
  • 5. Filipa is the Data Diva at BinaryEdge, she dances the macarena with numbers to get them to tell her all their dirty secret. Filipa Rodrigues Presenter
  • 6. Ana is the Data Ferret at BinaryEdge. She is small and hides between the 110th and 111th characters of the ascii code to see and show data in that unique perspective of someone who can’t reach the box of cookies stored on top of the capitol 'I' Ana Barbosa Presenter
  • 9. How we got here.... 200 port scan of the entire internet/ month 1,400,000,000 scanning events/ month * 746,000 torrents monitored and increasing 1,362,225,600 torrent events/ month * at a minimum
  • 10. Worldwide distribution of IPs running services <= 100 Number of IPs found >= 1,000,000 100,000 < #found < 1,000,000 10,000 < #found <= 100,000 1,000 < #found <= 10,000 100 < #found <= 1,000
  • 11. Map IPv4 addresses to Hilbert curves % of coverage 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
  • 12. Data Science & Machine Learning How many IP addresses did job X had vs. job Y? What is the average duration of the scans? Can we extract more from all the screenshots we get? Can we have a more optimized job distribution? We can only identify X% of services because we’re using static signatures, can we do better? Can we find similar images? MULTIPLE WILD QUESTIONS APPEAR... ...ONE COMMON ANSWER DATA SCIENCE & MACHINE LEARNING
  • 13. Data Science & Machine Learning DATA SCIENCE MACHINE LEARNING INITIAL ANALYSIS AND CLEAN UP EXPLORATORY DATA ANALYSIS DATA VISUALISATION KNOWLEDGE DISCOVERY CLASSIFICATION CLUSTERING SIMILARITY MATCHING REGRESSION IDENTIFICATION
  • 14. Problems and Limitations of Machine Learning in CyberSecurity Lots of adversarial scenarios – Attacks to the classifiers, goes against the foundation of machine learning Prediction – Scenarios and data too volatile, not enough proper sources of data Lack of data in quantity and quality to train models
  • 15. Good use cases further work needs to be done, but will allow to move antivirus from a static/ signature based system into a much improved dynamic/ learning based system If a computer is hacked certain behaviors will change, if constant data is being monitored and fed into a system the hack could be detected detection of vulnerable patterns during development sentiment analysis applied to emails, tweets, social networks of employees PATTERN DETECTION/OUTLIER DETECTION (IDS/IPS) ANTIVIRUS ANTI-SPAM SMARTER FUZZERS SOURCE CODE ANALYSIS INTERNAL ATTACKERS
  • 16. metadata files people photos family&friends behaviour social search company registration ip address url address news forums sub-reddits internal external phone email linked urls likes topics BGP AS whois AS membership AS peer list of IPs shared infrastructure co-hosted sites contact geolocation office locations social networks phone portscan dns torrents binaryedge.io2016 domains AXFR MX records screenshots web services http https webserver framework headers cookies certificate configuration authorities entities SMB VNC RDP users appsfiles peers torrent name OCR SW banners image classifier vulnerabilities data points
  • 21.
  • 22. DEMO
  • 25. metadata files people photos family&friends behaviour social search company registration ip address url address news forums sub-reddits internal external phone email linked urls likes topics BGP AS whois AS membership AS peer list of IPs shared infrastructure co-hosted sites contact geolocation office locations social networks phone portscan dns torrents binaryedge.io2016 domains AXFR MX records screenshots web services http https webserver framework headers cookies certificate configuration authorities entities SMB VNC RDP users appsfiles peers torrent name OCR SW banners image classifier vulnerabilities data points
  • 26. Microservices (REST API) MICROSERVICES (REST API) PORT WORD TAG FACECOUNTRY LOGO IP
  • 27. Scan SCAN GENERATES EVENTS DOES IT GENERATE A SCREENSHOT? STORE THE IMAGE FILE ON THE CLOUD YES NO GENERATE A NOTIFICATION THAT NEW IMAGE WAS UPLOADED FINISH
  • 28. Image Workflow INITIALIZER FILTER LOGO DETECTION FACE DETECTION OPTICAL CHARACTER RECOGNITION (OCR)
  • 29. INITIALIZER FILTER LOGO DETECTION FACE DETECTION OPTICAL CHARACTER RECOGNITION (OCR) Image Workflow PULL MESSAGE FROM QUEUE IS THERE A NEW IMAGE? DECRYPT AND STORE IMAGE METADATA ON A DATABASE YES NO GENERATE IMAGE SIGNATURE FOR SIMILARITY COMPARISON FINISH MESSAGE QUEUE
  • 30. Image Workflow PULL MESSAGE FROM QUEUE DOES THE IMAGE HAVE ANY INFORMATION? PERFORM SIMPLE ENTROPY FILTERING YES NO FINISH MESSAGED QUEUE INITIALIZER FILTER LOGO DETECTION FACE DETECTION OPTICAL CHARACTER RECOGNITION (OCR)
  • 31. PULL MESSAGE FROM QUEUE ENHANCE IMAGE WITH APPLICATION OF SOME FILTERS RUN FACE AND LOGO DETECTION AND OCR ALGORITHMS STORE RESULTS IN DATABASE PERFORM ADDITIONAL ACTIONS WITH THE RESULTS Image Workflow INITIALIZER FILTER LOGO DETECTION FACE DETECTION OPTICAL CHARACTER RECOGNITION (OCR)
  • 32. Image Workflow [{"BreachDate": "2013-10-04", "DataClasses": ["Email addresses", "Password hints", "Passwords", "Usernames"], "Title": "Adobe", "IsAc- tive": true, "Description": "In October 2013, 153 million Adobe accounts were breached with each containing an internal ID, username, email, <em>encrypted</em> password and a password hint in plain text. The password cryptography was poorly done and <a href="http://stric- ture-group.com/files/adobe-top100.txt" target="_blank">many were quickly resolved back to plain text</a>. The unencrypted hints also <a href="http://www.troyhunt.com/2013/11/adobe-creden- tials-and-serious.html" target="_blank">disclosed much about the passwords</a> adding further to the risk that hundreds of millions of Adobe customers already faced.", "Domain": "adobe.com", "Added- Date": "2013-12-04T00:00:00Z", "PwnCount": 152445165, "IsRetired": false, "IsVerified": true, "LogoType": "svg", "IsSensitive": false, "Name": "Adobe"}] Email DataLeak API
  • 33. Image WorkflowImage Workflow INITIALIZER FILTER LOGO DETECTION FACE DETECTION OPTICAL CHARACTER RECOGNITION (OCR)
  • 34. Shannon’s Entropy Entropy = 0.00 bits Entropy ~ 0.03 bits Entropy ~ 2.13 bits Filter
  • 35. Data Visualization EXPLORATION REPRESENTATION DETAILS FINISHING UPTOOLS “a multidisciplinary recipe of art, science, math, technology, and many other interesting ingredients.” Andy Kirk, “Data Visualization: a successful design process”
  • 36. EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP DATA TYPE RELEVANCE FILTER What is the most interesting? What is most important? Audience’s Profile What is the most relevant information in the context? Show all values or just a few? Define periods? Define a threshold? Hierarchical Relational Temporal Spatial Categorical Exploration Data Visualization
  • 37. Representation Experimentation is important Conceive ideas Storyboarding Do multipe iterations Prototype Test design can be used in the future Data VisualizationEXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP 69,543,915 25,436,974 7,008,108 3,475,472 1,287,446 1,043,331 951,629 854,817 789,515 759,115 490,290 288,885 266,827 257,105 219,025 198,898 186,286 141,474 HowmanyopenportsdoesanIPhave? NumberofIPswithXopenportsport NumberofIPs
  • 38. Representation EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP Distribution of IP addresses running encrypted and unencrypted services MARKS Points Areas Lines ATTTRIBUTES Position Connections/ Patterns Size/ Color REPRESENT RECORDS EMPHASIZE THE MOST IMPORTANT ASPECTS OF THE DATA on port 443 on port 80 51,467,779 HTTP 28,671,263 IPs running HTTP services IPs running HTTPS services 16,519,503IPs running both HTTP and HTTPS services HTTP & HTTPS HTTPS Data Visualization
  • 39. Data Visualization Representation PRECISION IN DESIGN Geometric Calculations Truncated axis Scales MAKE IT UNDERSTANDABLE Reference lines Markers MAKE IT APPEALING Minimise the clutter Priority: preserve function Top 10Web Servers for theWeb Most common web servers found on port 80 Apache httpd AkamaiGHost Micorosft IIS httpd nginx lighttpd Huawei HG532e ADSL modem http admin Microsoft HTTPAPI httpd Technicolor DSL modem http admin Mbedthis-Appweb micro_httpd 2 4 6 80 10 12 millions 11,493,552 8,361,080 4,843,769 3,860,883 2,031,741 1,539,629 952,300 699,202 694,393 678,657 EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP
  • 40. EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP Representation Consider different design solutions DATA TYPE CONDITION Hierarchical Relational Temporal Spatial Categorical CVSS SCORES LOW MEDIUM HIGH 0.0 10.0 4.0 7.0 SEVERITY CVSS: CommonVulnerability Scoring System Data Visualization
  • 41. CVE Identifier Number References Description EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP CVE: CommonVulnerabilities and Exposure Representation Consider different design solutions DATA TYPE CONDITION Hierarchical Relational Temporal Spatial Categorical Data Visualization
  • 42. EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP Overview of protocols used for email, according to encryption used Email Protocols ENCRYPTED UNENCRYPTED POP3 POP3S SMTP SMTPS IMAP IMAPS 4,572,161 3,742,289 3,531,071 2,971,159 4,131,737 3,703,364 10,416,812 12,234,969 SERVICE COUNT Representation Consider different design solutions DATA TYPE CONDITION Hierarchical Relational Temporal Spatial Categorical Data Visualization
  • 43. Representation Consider different design solutions DATA TYPE CONDITION Hierarchical Relational Temporal Spatial Categorical EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP Big Data Technologies Changes in amount of data exposed without security MongoDB Memcached Redis 2 TB 644.3 TB Aug 2015 Jan 2016 July 2016 724.7 TB 627.7 TB 13.2 TB 11.3 TB 710.9 TB 12.0 TB 598.7 TB 27.5 TB 1.5 TB 1.8 TB 619.8 TB Data Visualization
  • 44. Representation Consider different design solutions DATA TYPE CONDITION Hierarchical Relational Temporal Spatial Categorical EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP Heartbleed Countries with higher number of IPs vulnerable to Heartbleed Russia 5,264 Republic of Korea 4,564 China 6,790 United States 23,649 Italy 2,508 Germany 6,382 France 5,622 Netherlands 2,779United Kingdom 3,459 Japan 2,484 Data Visualization
  • 45. Data VisualizationEXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP VNC wordcloud loginwindows edition 2016 delete ctrl server press microsoft system welcome your help file linux google kernel from ubuntu
  • 46. Details ANNOTATION Titles and subtitles Labels Legends TYPOGRAPHY Use fonts that are easy to read Don’t use fonts that are considered sloppy SSH Banners SSH-2.0-OpenSSH_5.3 SSH-2.0-OpenSSH_6.6.1p1 SSH-2.0-OpenSSH_6.6.1 SSH-2.0-OpenSSH_4.3 SSH-2.0-OpenSSH_6.0p1 SSH-2.0-OpenSSH_6.7p1 SSH-2.0-dropbear_2014.63 SSH-2.0-OpenSSH_5.5p1 SSH-2.0-ROSSSH SSH-2.0-OpenSSH_5.9p1 202,361 352,978 436,700449,570 462,616 537,667 555,779 604,579 1,501,749 2,632,270 count banner Most common SSH Banners found EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP Data Visualization
  • 47. Details ANNOTATION Titles and subtitles Labels Legends TYPOGRAPHY Use fonts that are easy to read Don’t use fonts that are considered sloppy SSH -2.0-O penSSH _5.3 SSH -2.0-O penSSH _6.6.1p1 SSH -2.0-O penSSH _6.6.1 SSH -2.0-O penSSH _4.3 SSH -2.0-O penSSH _6.0p1 SSH -2.0-O penSSH _6.7p1 SSH -2.0-dropbear_2014.63 SSH-2.0-OpenSSH_5.5p1 SSH -2.0-RO SSSH SSH -2.0-O penSSH _5.9p1 202,361 352,978 436,700449,570 462,616 537,667 555,779 604,579 1,501,749 2,632,270 EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP Data Visualization
  • 48. Details COLOR Legibility Functional purpose Salience Consistency Color Blindness COMPOSITION Chart size/ orientation Alignments EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP SSH Key Lengths Most common key lengths found Key length count 641,719 1040 186,070 1032 13,845 4096 5,068,711 1024 3,740,593 2048 9,064 512 7,830 2056 6,265 2064 6,212 1016 4,755 768 Data Visualization
  • 49. Tools EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP BALANCE Automation Programming Language to create plots Fine tunning in illustrator (make it better for the audience) Hand-editing process Human error Originality Automated Analysis Illustrator (or other tool) to create visualization solution Human error Data Visualization
  • 50. EXPLORATION REPRESENTATION DETAILS TOOLS FINISHING UP DOCUMENT EVERY STEP OF THE PROCESS Calculations Choices of visualisations Choices of data points REVIEW EVERYTHING What could have been done differently? What could be better? TAKE CONSTRUCTIVE FEEDBACK Even if it means to start over A visualization can be used in the future Data Visualization
  • 52. THE SCIENCE BEHIND THE DATA CREATED BY BINARYEDGE