O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
BE ready. BE safe. BE secure.
Bsides Lisbon 2015
Security data Metrics
and measurements at
scale1
binaryedge.io
Who
TIAGO HENRIQUES
• BSc Software Engineering /
University of Brighton
• MSc Computer Security and
Forensic...
binaryedge.io
Where
Europe
SWITZERLAND
Headquarter
Market
Management
Portugal
Workforce
United Kingdom
Market
Workforce
3
binaryedge.io
What
Machine
Learning Data Mining
security
data
data
data
data
4
binaryedge.io
NEW COMMODITY
DATA IS THE NEW OIL
5
binaryedge.io
NEW CURRENCY
6
binaryedge.io
DATA BUSINESS MODEL
Gather
and Sell
raw Data
collect
supply
store
host
filter
refine
enhance
enrich
simplify
a...
binaryedge.io
ORGANIsATION
8
binaryedge.io
BECOMING EXPONENTIAL ORGANIZATION
SI
D
E
A
S
C
A
L
E
Staff on Demand
MASSIVE TRANSFORMATIVE PURPOSE
MTP
MARK...
binaryedge.io
Business Operations
Product
Security
DEVdata
science
UI
sysops
data
agent
UX
backend
product
owner
frontend
...
binaryedge.io
Goals Requirements Results
• Easy to UNDERSTAND • understandability Simple Architecture
• EASY TO extend • e...
binaryedge.io
PHASEIMPORTANCEEFFORTMILESTONES
less
MORE
GREATER
average
milestone
intelligencedata information knowledge
J...
binaryedge.io
ENGINEERING
13
No legacy to maintain
Lots of experience in the team
Lots of technologies to pick from
Micro service based approach
Metric...
Metrics collection at large scale
15
Architecture
Focus on
architecture
simple resilient scalable
replaceable
components
Technology
independent
16
architecture overview
17
HTTP API
Command line clients
Modules
• Python
• NodeJS
• Go
Third-party APIs
architecture - job request
API oriented
job ...
Agents listen for work in channels
technologies
Multiple types of agents
Agents
architecture - job execution
GO
Python
Nod...
ActiveMQ
NATS
Kafka
Kestrel
NSQ
RabbitMQ
Redis
QPID
HornetQ
Apollo
http://bravenewgeek.com/dissecting-message-queues/
arch...
http://bravenewgeek.com/dissecting-message-queues/
architecture - job execution
zeromq
nanomsg
Messaging
Brokerless
21
architecture - job execution
Amazon
Microsoft
Google
realtime.co
…
Messaging
Cloud
22
Agents can feed other agents
Different types of enrichment
• Clean data
• Process data
• Alarms
architecture - data enrich...
All information is stored
• RAW data
• Processed data
Geolocate of information
Encrypted data for each client
Data Storage...
Delivering data
• Realtime - Streaming
• Storage for Analytics
• API
• Raw
Data Analytics
• Kibana
• InfluxDB
• Druid
arch...
Data Processing
• Apache Spark
• Hadoop
• Amazon Kinesis
Data Intelligence
• Amazon Machine Learning/EMR
• Google Predicti...
Our agents are very simple
• Simple tasks
• Easy to maintain and adapt
agents/ minions
Agents can be located/run anywhere
...
New Relic
Logentries
Server Density
Cloud watch
Grafana
logstash
monitoring
28
Ansible
Puppet
Docker
Saltstack
etcd
deployment
saltstack.com 29
binaryedge.io
Machine Learning
30
binaryedge.io
CHALLENGES IN DATA MINING
MODELLING LARGE
SCALE NETWORKS
DISCOVERY OF THREATS
Network dynamics
and Cyberatta...
ip address
url address
linked urls
internal
external
Company
registration
email
people
phonesocial
search
photos
family&fr...
binaryedge.io
Machine Learning techniques
• Artificial Neural Network (ANN)
• Support vector machine (SVM)
• Decision tree...
binaryedge.io
Machine Learning - Why?
Classification
Detection
Clustering
Automation
correlation
prediction
analysis
34
binaryedge.io
Measurements on our own data
Support - Indicates which percentage of data on storage shows correlation
Confi...
binaryedge.io
Improving our own data
• Kalman Filter
• AdaBoost (Adaptive Boost)
SAMPLE 1 SAMPLE 1.2 SAMPLE 1.3 SAMPLE 1.4...
binaryedge.io
DATA chain
Collection
Data
Processing
ML of Data Report Storage
37
binaryedge.io
CYBER INNOVATION LOOP
Observations
guide &
control
cultural
IDENTITY
new
information
previous
experience
ana...
binaryedge.io
CYBER INNOVATION LOOP
INFORMATION
HYPOTHESISdirectives
facts classification
resolution
ASSESSMENT
enactment
...
binaryedge.io
CYBERSECURITY DATA SCIENCE
TELEMETRY
SENSOR DATA
CONTEXTUAL DATA
HISTORICAL DATA
REAL TIME PREDICTIONS AND
D...
binaryedge.io
DEMO
41
binaryedge.io
DEMO
42
binaryedge.io
DEMO
43
binaryedge.io
DEMO
44
contingency irrelevantthreat safe
BE ready. BE safe. BE secure.
BINARYEDGE.IO
Finsterrütistrasse 4, 8134
Adliswil, ZURICH
...
Próximos SlideShares
Carregando em…5
×

BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

1.851 visualizações

Publicada em

In this presentation we talk about the research we are doing combining Machine learning techniques with Cybersecurity data.

Publicada em: Dados e análise
  • Seja o primeiro a comentar

BinaryEdge - Security Data Metrics and Measurements at Scale - BSidesLisbon 2015

  1. 1. BE ready. BE safe. BE secure. Bsides Lisbon 2015 Security data Metrics and measurements at scale1
  2. 2. binaryedge.io Who TIAGO HENRIQUES • BSc Software Engineering / University of Brighton • MSc Computer Security and Forensics / University of Bedfordshire • 8 Years experience in Information Security consultancy, leadership and research CEO and Founder @ BinaryEdge TIAGO MARTINS • BSc and MSc Computer Science / University of Lisbon • 7 Years experience developing real-time systems and high-volume data processing CTO and Co-Founder @ BinaryEdge ROBERTO BARBOSA • More than 20 years on the IT sector • ex-Engineer at Sun Microsystems • Former Philip Morris corporate Auditor • expert on High Scalability and Availability on the Finance Sector (UBS, Citigroup and Leonteq) and mobile startup. COO and Head of DataScience at BinaryEdge 2
  3. 3. binaryedge.io Where Europe SWITZERLAND Headquarter Market Management Portugal Workforce United Kingdom Market Workforce 3
  4. 4. binaryedge.io What Machine Learning Data Mining security data data data data 4
  5. 5. binaryedge.io NEW COMMODITY DATA IS THE NEW OIL 5
  6. 6. binaryedge.io NEW CURRENCY 6
  7. 7. binaryedge.io DATA BUSINESS MODEL Gather and Sell raw Data collect supply store host filter refine enhance enrich simplify access consult advise Hold onto someone else’s data for them Strip out problematic records or data fields or release interesting data subsets Blend in other datasets to create a new and interesting picture Help people cherry-pick the data they want in the format they prefer Provide guidance on others’ data efforts 7
  8. 8. binaryedge.io ORGANIsATION 8
  9. 9. binaryedge.io BECOMING EXPONENTIAL ORGANIZATION SI D E A S C A L E Staff on Demand MASSIVE TRANSFORMATIVE PURPOSE MTP MARKET Interfaces Dashboards Experimentation Autonomy Social Community and Crowd Algorithms Lease Assets Engagement LEFT BRAIN order control stability RIGHT BRAIN creativity growth uncertainty 9
  10. 10. binaryedge.io Business Operations Product Security DEVdata science UI sysops data agent UX backend product owner frontend mobile information retrievalInternet of Things Internet of Money Internet of People Internet of Content machine learning math/stats design Tester QA quality support security experts security experts audits devops marketing human resources finance sales sales rep sales rep assistant assistant social media consulting ORGANISATION RELATIONSHIP 10
  11. 11. binaryedge.io Goals Requirements Results • Easy to UNDERSTAND • understandability Simple Architecture • EASY TO extend • extensibility Loosely Coupled Services • EASY TO change • changeability Built for replacement • EASY TO replace • replaceability Self-dependency • EASY TO deploy • deployability Immutability • EASY TO scale • scalability Responsibility Segretation • EASY TO recover • resilience Decoupling and Isolation • EASY TO connect • uniform interface API based • EASY TO afford • cost efficienT On-demand computing DATA ARCHITECTURE DESIGN 11
  12. 12. binaryedge.io PHASEIMPORTANCEEFFORTMILESTONES less MORE GREATER average milestone intelligencedata information knowledge July August September October November December 2015 sensor agent (minions) backend feed API User API data visualizationscalability machine learning threat classification deep learning storage & archiving search& classification data analytics image processing predictive analytics POC data process analysis intel PRODUCT IMPROVEMENTS 2015 12
  13. 13. binaryedge.io ENGINEERING 13
  14. 14. No legacy to maintain Lots of experience in the team Lots of technologies to pick from Micro service based approach Metrics collection at large scale Very young startup Technologies? Architecture? Prototype? but where to start? 14
  15. 15. Metrics collection at large scale 15
  16. 16. Architecture Focus on architecture simple resilient scalable replaceable components Technology independent 16
  17. 17. architecture overview 17
  18. 18. HTTP API Command line clients Modules • Python • NodeJS • Go Third-party APIs architecture - job request API oriented job types Data Collection Data Processing / Analytics 18
  19. 19. Agents listen for work in channels technologies Multiple types of agents Agents architecture - job execution GO Python NodeJS Scala Java RabbitMQ NSQ Redis Apollo job control 19
  20. 20. ActiveMQ NATS Kafka Kestrel NSQ RabbitMQ Redis QPID HornetQ Apollo http://bravenewgeek.com/dissecting-message-queues/ architecture - job execution Messaging Broker ActiveMQ 20
  21. 21. http://bravenewgeek.com/dissecting-message-queues/ architecture - job execution zeromq nanomsg Messaging Brokerless 21
  22. 22. architecture - job execution Amazon Microsoft Google realtime.co … Messaging Cloud 22
  23. 23. Agents can feed other agents Different types of enrichment • Clean data • Process data • Alarms architecture - data enrichment 23
  24. 24. All information is stored • RAW data • Processed data Geolocate of information Encrypted data for each client Data Storage Cloud Services • Amazon S3 • Amazon DynamoDB • AzureDocumentDB • Azure Storage • Google Cloud Storage • Google BigQuery • Rackspace Cloud Files • Constant Cloud Storage • Skylable • RunAbove architecture - store Database Solutions • MongoDB • ElasticSearch • Cassandra • Riak • LUCEne 24
  25. 25. Delivering data • Realtime - Streaming • Storage for Analytics • API • Raw Data Analytics • Kibana • InfluxDB • Druid architecture - serving 25
  26. 26. Data Processing • Apache Spark • Hadoop • Amazon Kinesis Data Intelligence • Amazon Machine Learning/EMR • Google Prediction API • Azure Machine Learning architecture - serving 26
  27. 27. Our agents are very simple • Simple tasks • Easy to maintain and adapt agents/ minions Agents can be located/run anywhere • Geo distribution • Clouds • Dedicated Servers • Raspberry Pis in Tiago Henriques’ dual gbit connection 27
  28. 28. New Relic Logentries Server Density Cloud watch Grafana logstash monitoring 28
  29. 29. Ansible Puppet Docker Saltstack etcd deployment saltstack.com 29
  30. 30. binaryedge.io Machine Learning 30
  31. 31. binaryedge.io CHALLENGES IN DATA MINING MODELLING LARGE SCALE NETWORKS DISCOVERY OF THREATS Network dynamics and Cyberattacks Privacy Preservation in data mining 31
  32. 32. ip address url address linked urls internal external Company registration email people phonesocial search photos family&friend behavior news forums sub-reddits topics likes metadata BGP co-hosted sites shared infrastructure AS membership AS Peer list of IPs AS whois contact geolocation phone social networks office locations portscan services web http https certificate configuration authorities entities web server framework headers cookies screenshots dns domains AXFR MX records banners image classifier threat SMB VNC RDP files files apps SW users OCR data points contingency irrelevant © 2015 binaryedge.io malware 32
  33. 33. binaryedge.io Machine Learning techniques • Artificial Neural Network (ANN) • Support vector machine (SVM) • Decision trees • bayesian networks (BNS) • K-Nearest neighbour (knn) • Hidden Markov Model (HMM) 33
  34. 34. binaryedge.io Machine Learning - Why? Classification Detection Clustering Automation correlation prediction analysis 34
  35. 35. binaryedge.io Measurements on our own data Support - Indicates which percentage of data on storage shows correlation Confidence - Indicates probability of our assumption being correct 35
  36. 36. binaryedge.io Improving our own data • Kalman Filter • AdaBoost (Adaptive Boost) SAMPLE 1 SAMPLE 1.2 SAMPLE 1.3 SAMPLE 1.4 DEEPER DATA POINT SUPPORT MANUAL CLASSIFICATION Weight 6/10 Portscan Weight 3/10 GEolocation Weight 5/10 OCR screenshot Weight 9/10 Previous known Correct data 36
  37. 37. binaryedge.io DATA chain Collection Data Processing ML of Data Report Storage 37
  38. 38. binaryedge.io CYBER INNOVATION LOOP Observations guide & control cultural IDENTITY new information previous experience analysis & synthesis Decisions Action feedback feedback interaction with environment interaction with environment SECURITY FEEDS REAL WORD DATA RWD MODELS guide & control observe ORIENT DECIDE act feed forward feed forward feed forward 38
  39. 39. binaryedge.io CYBER INNOVATION LOOP INFORMATION HYPOTHESISdirectives facts classification resolution ASSESSMENT enactment knowledge data • classification knowledge transforms fact to information • assessment knowledge transforms information to hypothesis • resolution knowledge transforms hypothesis to directive • enactment knowledge transforms directive to fact 39
  40. 40. binaryedge.io CYBERSECURITY DATA SCIENCE TELEMETRY SENSOR DATA CONTEXTUAL DATA HISTORICAL DATA REAL TIME PREDICTIONS AND DECISIONS agents agents REAL WORD DATA RWD RECOMMENDERCLASSIFIER SOCIAL THREAT FRAUD features MODELS VALUE data INTELLIGENCE data engineering custom dashboard 40
  41. 41. binaryedge.io DEMO 41
  42. 42. binaryedge.io DEMO 42
  43. 43. binaryedge.io DEMO 43
  44. 44. binaryedge.io DEMO 44
  45. 45. contingency irrelevantthreat safe BE ready. BE safe. BE secure. BINARYEDGE.IO Finsterrütistrasse 4, 8134 Adliswil, ZURICH Switzerland + 41 78 632 32 90 Email : th@binaryedge.io www.binaryedge.io 45

×