Unblocking The Main Thread Solving ANRs and Frozen Frames
Anomaly Detection at Scale
1. ANOMALY DETECTION AT SCALE:
A CYBERSECURITY STREAMING DATA PIPELINE USING KAFKA AND AKKA
CLUSTERING
O'Reilly Security Conference NYC, November 2, 2016
Jeff Henrikson
Groovescale
http://www.groovescale.com
4. Why build predictive models?
Models continue to do usefulwork a er humans are not looking
Models are based on assumptions
Only humans can make assumptions
5. INTRUSION DETECTION
1) Log Data
2) Configure rules
3) Human awareness examines alarms and logs
4) Quick action taken (e.g. deauthorize)
5) Re-authorize once human awareness deems longer-term mitigation is adequate
Sometimes for high-confidence rules we allow 2) to trigger 4) without human intervention
6. HOW IS A SKILLED PERSON'S AWARENESS CAN BE MORE EFFECTIVELY GUIDED?
1) Matching of network behavior against localized rules
2) Predictive modeling of the aggregate network behavior
7. HOW IS A SKILLED PERSON'S AWARENESS CAN BE MORE EFFECTIVELY GUIDED?
1) Matching of network behavior against localized rules
2) Predictive modeling of the aggregate network behavior
Hypothesis: Let's see if 2 is better.
8. AI Artificial Intelligence
"IA" Intelligence Augmented
From Building practical AI systems
Adam Cheyer, (Siri, Sentient, and Viv Labs) Strata 2016
9. INTRUSION DETECTION TOOLS AS "INTELLIGENCE AUGMENTED"
Intruders are trying to evade detection.
Let's not worry about making the human protector of the network going away. Probably not possible
given evasive response.
13. NETFLOW (V5) BASICS
Attributes:
Source/Destination IP
Source/Destination Port
Input interface
Metrics: Number ofPackets, Sum of Bytes, Start Time, End Time.
IPv4 only
https://nsrc.org/workshops/2015/sanog25-nmm-tutorial/materials/netflow.pdf
14. Functional Requirements
Produce netflow from PCAP
Score netflow for anomalies
Control the number of anomalous events brought to the human expert's attention
19. EXTERNAL DESIGN
System coupling:
Do not prescribe deploying kafka upstream or downstream
(Which Kafka version? Which language binding?)
External APIs:
Ingress HTTP POST octet encoding
Egress HTTP GET Long Polling
21. INTERNAL DESIGN
Record state only in:
Kafka
Pcap temporary files on local fs
Need to write block id to EFH and dedupe for sumsto be correct in the presence of retries
Prefer late delivery to dropping data
Prefer reading capture time in data stream to wall clock time
22. Akka-cluster in one slide:
Framework for Actor-based concurrency
Program in Scala or Java
Akka-cluster more general than map reduce, data pipelines
Makes use local and remote resources work the same
23. MINIMUM VIABLE PREDICTIVE MODEL
1) Take Netflow metrics: sum(bytes), sum(packets), count
2) For each metric, compute mean and variance
3) Emit an "anomaly" when signal exceeds (mean + 3.0*sqrt(variance))
Meets minimum requirement: controls the number of events brought to the human expert's
attention
24. EXERCISE FOR THE READER
Model for periodicity:
Ihler et al, Adaptive Event Detection with Time–Varying Poisson Processes, ACM SIGKDD 2006
http://www.datalab.uci.edu/papers/event_detection_kdd06.pdf
26. RESULTS
Qualitatively, users can find relevant Anomalies in a reasonable sized stream
System operates reliably
Numbers are correct within assumptions
28. SO WHY KAFKA VS ANY OTHER STREAMING COMPONENT?
https://databaseline.wordpress.com/2016/03/12/an-overview-of-apache-streaming-technologies/comment-page-1/
30. STREAMING DATA LITERATURE:
A data entity is created by one module, is passed from module to module until it is no longer needed
and is then destroyed. . . . Punched card accounting systems exemplify this environment.
J. P. Morrison, "Data Stream Linkage Mechanism", IBM Systems Journal, 1978.
http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=45DED06EC91474F5938A9E05CC3D5A61?
doi=10.1.1.89.2601&rep=rep1&type=pdf
31. BIND ARCHITECTURAL COUPLINGS EARLY SO THAT ARCHICTECTURAL
COMPONENTS CAN BE CHOSEN WITH AMPLE EVIDENCE
Examples of components:
which database
which streaming engine
Examples of couplings:
format of data (e.g. newline delimited json)
how to notify
how to checkpoint
32. HTTP COUPLING: WINS
Win #1: Can't get access to pcap over API
Win #2: Only RHEL-distributed reqs (perl-core, curl) required for ingress
Win #3: Upgrade kafka when improved
33. HTTP COUPLING: WIN #3: UPGRADE WHEN READY
Kafka Version 0.9.0 0.10.0.1 0.10.1.0
Partition by Hash x x x
Write timestamp to message x x
Read seek by timestamp x
36. FAVOR INTEGRATION TESTING TO UNIT TESTING
Ingress, egress have optional flag placebo={true,false}. Default to true.
Every deployment simulates low volume placebo sinks, sources.
Transmit heartbeats when each component is sure to have made forward progress.
37. ON EVALUATING FAULT TOLERANCE AND SCALABILITY
My smart buddy
LinkedIn runs it in production
The NSA
Can we do better?
38. ON EVALUATING FAULT TOLERANCE AND SCALABILITY:
The idea:
Create linked containers for app
Use tc to tell netfilter to drop and/or delay packets
Run simulated data source
39. ON EVALUATING FAULT TOLERANCE AND SCALABILITY:
Hands on create container:
Hands on with the container:
Hands on with the host:
(docker-machine's boot2docker has tc built-in)
docker run -it --rm ubuntu:14.04.2 bash
root@07e330775e98:/# apt-get update && apt-get install -y ethtool
root@07e330775e98:/# ethtool -S eth0
NIC statistics:
peer_ifindex: 875
dev=$(ip link | grep '^875:')
tc qdisc change dev $dev root netem delay 100ms 20ms distribution normal
tc qdisc change dev eth0 root netem loss 0.1%
41. Myth: Code should always go into docker containers through an image
Alternative:
docker run -v $dirSrc:$dirSrc # to convey source code
docker exec # to restart program
42. Myth: A docker image is something that came from a Dockerfile:
43. Myth: A docker image is something that came from a Dockerfile:
Alternative
docker run
ansible-playbook -c local
docker commit
45. RECOMMENDED READING
I Heart Logs, Jay Kreps (creator of Kafka)
Akka in Action, Roestenburg et al
Released Sept 30, 2016
Scala for the Impatient, 1e, Cay Horstman
Second edition coming December 2016
https://www.amazon.com/Heart-Logs-Stream-Processing-Integration/dp/1491909382
https://www.amazon.com/Akka-Action-Raymond-Roestenburg/dp/1617291013
https://www.amazon.com/Scala-Impatient-Cay-S-Horstmann/dp/0321774094
46. READINGS ON LOW LATENCY DATA ENGINEERING
(ORGANIZED BY COMMUNITY)
Community Title URL
Reactive The Reactive Manifesto http://www.reactivemanifesto.org/
Reactive Streams http://www.reactive-streams.org/
Kafka I Heart Logs, Jay Kreps, 2014 https://www.amazon.com/Heart-Logs-Stream-Processing-
Integration/dp/1491909382
Kafka: The Definitive Guide,
prerelease/2017
https://www.amazon.com/Kafka-Definitive-Real-time-stream-
processing/dp/1491936169
NiFi The core concepts of NifFi http://nifi.apache.org/docs/nifi-docs/html/overview.html#the-core-
concepts-of-nifi
Flow Based
Programming
Flow-Based Programming, J. Paul
Morrison, 2010
https://www.amazon.com/Flow-Based-Programming-2nd-
Application-Development/dp/1451542321
Storm Big Data, Nathan Marz, 2015 https://www.amazon.com/Big-Data-Principles-practices-
scalable/dp/1617290343