SlideShare uma empresa Scribd logo
1 de 47
OpenSOC
The Open Security Operations
Center
for
Analyzing 1.2 Million Network Packets per Second
in Real TimeJames Sirota,
Big Data Architect
Cisco Security Solutions Practice
jsirota@cisco.com
Sheetal Dolas
Principal Architect
Hortonworks
sheetal@hortonworks.com
June 3, 2014
2
 Problem Statement & Business Case for OpenSOC
 Solution Architecture and Design
 Best Practices and Lessons Learned
 Q & A
Over Next Few Minutes
3
Business Case
4
fatalism:
It's no longer if or when you get hacked,
but the assumption is that you've
already been hacked,
with a focus on minimizing the
damage.”
Source: Dark Reading / Security’s New
Reality: Assume The Worst
5
Breaches Happen in Hours…
But Go Undetected for Months or Even Years
Source: 2013 Data Breach Investigations
Report
Seconds Minutes Hours Days Weeks Months Years
Initial Attack to Initial
Compromise
10% 75% 12% 2% 0% 1% 1%
Initial Compromise
to Data Exfiltration
8% 38% 14% 25% 8% 8% 0%
Initial Compromise
to Discovery
0% 0% 2% 13% 29% 54% 2%
Discovery to
Containment/
Restoration 0% 1% 9% 32% 38% 17% 4%
Timespan of events by percent of breaches
In 60% of
breaches, data
is stolen in hours
54% of breaches
are not discovered for
months
6
Cisco Global Cloud Index
Source: 2014 Cisco Global Cloud Index
7
Introducing OpenSOC
Intersection of Big Data and Security Analytics
Multi Petabyte Storage
Interactive Query
Real-Time Search
Scalable Stream
Processing
Unstructured Data
Data Access Control
Scalable Compute
OpenSOC
Real-Time Alerts
Anomaly Detection
Data
Correlation
Rules and Reports
Predictive Modeling
UI and Applications
Big Data
Platform
Hadoop
8
OpenSOC Journey
Sept 2013
First Prototype
Dec 2013
Hortonworks
joins the
project
March 2014
Platform
development
finished
Sept 2014
General
Availability
May 2014
CR Work off
April 2014
First beta test
at customer
site
9
Solution Architecture &
Design
10
OpenSOC Conceptual Architecture
Raw Network Stream
Network Metadata
Stream
Netflow
Syslog
Raw Application Logs
Other Streaming
Telemetry
HiveHBase
Raw Packet
Store
Long-Term
Store
Elastic Search
Real-Time
Index
Network Packet
Mining and
PCAP
Reconstruction
Log Mining and
Analytics
Big Data
Exploration,
Predictive
Modeling
Applications + Analyst Tools
Parse+Format
Enrich
Alert
Threat Intelligence
Feeds
Enrichment Data
11
 Raw Network Packet Capture, Store, Traffic Reconstruction
 Telemetry Ingest, Enrichment and Real-Time Rules-Based
Alerts
 Real-Time Telemetry Search and Cross-Telemetry Matching
 Automated Reports, Anomaly Detection and Anomaly
Alerts
Key Functional Capabilities
12
 Fully-Backed by Cisco and Used Internally for Multiple
Customers
 Free, Open Source and Apache Licensed
 Built on Highly-Scalable and Proven Platforms (Hadoop,
Kafka, Storm)
 Extensible and Pluggable Design
 Flexible Deployment Model (On-Premise or Cloud)
 Centralize your processes, people and data
The OpenSOC Advantage
13
OpenSOC Deployment at Cisco
Hardware footprint (40u)
 14 Data Nodes (UCS C240 M3)
 3 Cluster Control Nodes (UCS C220
M3)
 2 ESX Hypervisor Hosts (UCS C220
M3)
 1 PCAP Processor (UCS C220 M3 +
Napatech NIC)
 2 SourceFire Threat alert processors
 1 Anue Network Traffic splitter
 1 Router
 1 48 Port 10GE Switch
Software Stack
HDP 2.1
Kafka 0.8
Elastic Search 1.1
MySQL 5.5
14
OpenSOC - Stitching Things Together
AccessMessaging SystemData CollectionSource Systems StorageReal Time Processing
StormKafka
B Topic
N Topic
Elastic
Search
Index
Web
Services
Search
PCAP
Reconstruction
HBase
PCAP Table
Analytic
Tools
R / Python
Power Pivot
Tableau
Hive
Raw Data
ORC
Passive
Tap
PCAP Topic
DPI Topic
A Topic
Telemetry
Sources
Syslog
HTTP
File System
Other
Flume
Agent A
Agent B
Agent N
B Topology
N Topology
A Topology
PCAP
Traffic
Replicato
r
PCAP
Topology
DPI Topology
15
OpenSOC - Stitching Things Together
AccessMessaging SystemData CollectionSource Systems StorageReal Time Processing
StormKafka
B Topic
N Topic
Elastic
Search
Index
Web
Services
Search
PCAP
Reconstruction
HBase
PCAP Table
Analytic
Tools
R / Python
Power Pivot
Tableau
Hive
Raw Data
ORC
Passive
Tap
PCAP Topic
DPI Topic
A Topic
Telemetry
Sources
Syslog
HTTP
File System
Other
Flume
Agent A
Agent B
Agent N
B Topology
N Topology
A Topology
PCAP
Traffic
Replicato
r
Deeper
Look
PCAP
Topology
DPI Topology
16
PCAP Topology
StorageReal Time Processing
Storm
Elastic Search
Index
HBase
PCAP Table
Hive
Raw Data
ORC
Kafka
Spout
Parse
r
Bolt
HDFS
Bolt
HBas
e
Bolt
ES
Bolt
17
DPI Topology & Telemetry Enrichment
StorageReal Time Processing
Storm
Elastic Search
Index
HBase
PCAP Table
Hive
Raw Data
ORC
Kafka
Spout
Parse
r Bolt
GEO
Enric
h
Whoi
s
Enric
h
CIF
Enric
h
HDF
S
Bolt
ES
Bolt
18
Enrichments
Parse
r
Bolt
GEO
Enrich
RAW
Message
{
“msg_key1”: “msg value1”,
“src_ip”: “10.20.30.40”,
“dest_ip”: “20.30.40.50”,
“domain”: “mydomain.com”
}
Who
Is
Enrich
"geo":[ {"region":"CA",
"postalCode":"95134",
"areaCode":"408",
"metroCode":"807",
"longitude":-121.946,
"latitude":37.425,
"locId":4522,
"city":"San Jose",
"country":"US"
}]
CIF
Enrich
"whois":[ {
"OrgId":"CISCOS",
"Parent":"NET-144-0-0-0-0",
"OrgAbuseName":"Cisco Systems Inc",
"RegDate":"1991-01-171991-01-17",
"OrgName":"Cisco Systems",
"Address":"170 West Tasman Drive",
"NetType":"Direct Assignment"
} ],
“cif”:”Yes”
Enriched
Message
Cache
MySQL
Geo Lite
Data
Cache
HBase
Who Is Data
Cache
HBase
CIF Data
19
Applications: Telemetry Matching and DPI
Step1: Search
Step2: Match
Step3: Analyze
Step4: Build PCAP
20
Integration with Analytics Tools
Dashboards Reports
21
Best Practices
and
Lessons Learned
22
Journey Towards Highly
Scalable Application
23
Kafka Tuning
24
This is where we began
25
Some code optimizations and increased
parallelism
26
 Is Disk I/O heavy
 Kafka 0.8+ supports replication and JBOD
 Better performance compared to RAID
 Parallelism is largely driven by number of disks and partitions per topic
 Key configuration parameters:
 num.io.threads - Keep it at least equal to number of disks provided to
Kafka
 num.network.threads - adjust it based on number of concurrent
producers, consumers and replication factor
Kafka Tuning
27
After Kafka Tuning
28
Bottleneck Isolation, Resource Profiling,
Load Balancing
29
HBase Tuning
30
This is where we began
31
 Row Key design is critical (gets or scans or both?)
 Keys with IP Addresses
 Standard IP addresses have only two variations of the first character : 1 & 2
 Minimum key length will be 7 characters and max 15 with a typical average of 12
 Subnet range scans become difficult – range of 90 to 220 excludes 112
 IP converted to hex (10.20.30.40 => 0a141e28)
 gives 16 variations of first key character
 consistently 8 character key
 Easy to search for subnet ranges
Row Key Design
32
Experiments with Row Key
33
 Know your data
 Auto split under high workload can result into hotspots and split storms
 Understand your data and presplit the regions
 Identify how many regions a RS can have to perform optimally. Use the
formula below
(RS memory)*(total memstore fraction)/((memstore size)*(# column families))
Region Splits
34
With Region Pre-Splits
35
 Enable Micro Batching (client side buffer)
 Smart shuffle/grouping in storm
 Understand your data and situationally exploit various WAL options
 Watch for many minor compactions
 For heavy ‘write’ workload Increase hbase.hstore.blockingStoreFiles (we
used 200)
Know Your Application
36
And Finally
37
Kafka Spout
38
 Parallelism is controlled by number of partitions per topic
 Set Kafka spout parallelism equal to number of partitions in
topic
 Other key parameters that drive performance
 fetchSizeBytes
 bufferSizeBytes
Kafka Spout
39
Mysteriously Missing Data
40
 A bug in Kafka spout that used to miss out some partitions
and loose data
 It is now fixed and available from Hortonworks repository (
http://repo.hortonworks.com/content/repositories/releases/org/apache/
storm/storm-Kafka )
Mysteriously Missing Data Root Cause
41
Storm
42
 Every small thing counts at scale
 Even simple string operations can slowdown throughput when
executed on millions of Tuples
Storm
43
 Error handling is critical
 Poorly handled errors can lead to topology failure and eventually loss
of data (or data duplication)
Storm
44
 Tune & Scale individual spout and bolts before performance
testing/tuning entire topology
 Write your own simple data generator spouts and no-op bolts
 Making as many things configurable as possible helps a lot
Storm
45
 When it comes to Hadoop…partner up
 Separate the hype from the opportunity
 Start small then scale up
 Design Iteratively
 It doesn’t work unless you have proven it at scale
 Keep an eye on ROI
Lessons Learned
46
How can you contribute?
 Technology Partner Program – contribute developers to
join the Cisco and Hortonworks team
Looking for Community Partners
Cisco + Hortonworks + Community Support for OpenSOC
Thank you!
We are hiring:
jsirota@cisco.com
sheetal@hortonworks.com

Mais conteúdo relacionado

Mais procurados

Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explainedconfluent
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELKGeert Pante
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersDatabricks
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack PresentationAmr Alaa Yassen
 
Centralised logging with ELK stack
Centralised logging with ELK stackCentralised logging with ELK stack
Centralised logging with ELK stackSimon Hanmer
 
Splunk Webinar: Full-Stack End-to-End SAP-Monitoring mit Splunk
Splunk Webinar: Full-Stack End-to-End SAP-Monitoring mit SplunkSplunk Webinar: Full-Stack End-to-End SAP-Monitoring mit Splunk
Splunk Webinar: Full-Stack End-to-End SAP-Monitoring mit SplunkSplunk
 
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...Databricks
 
Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3SANG WON PARK
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterMongoDB
 
Continuous Compliance and DevSecOps
Continuous Compliance and DevSecOpsContinuous Compliance and DevSecOps
Continuous Compliance and DevSecOpsPuppet
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is FailingDataWorks Summit
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...Edureka!
 
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashKeeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashAmazon Web Services
 
RabbitMQ vs Apache Kafka Part II Webinar
RabbitMQ vs Apache Kafka Part II WebinarRabbitMQ vs Apache Kafka Part II Webinar
RabbitMQ vs Apache Kafka Part II WebinarErlang Solutions
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안SANG WON PARK
 
Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips confluent
 
SAST (Static Application Security Testing) vs. SCA (Software Composition Anal...
SAST (Static Application Security Testing) vs. SCA (Software Composition Anal...SAST (Static Application Security Testing) vs. SCA (Software Composition Anal...
SAST (Static Application Security Testing) vs. SCA (Software Composition Anal...WhiteSource
 

Mais procurados (20)

Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 
Elk - An introduction
Elk - An introductionElk - An introduction
Elk - An introduction
 
Log management with ELK
Log management with ELKLog management with ELK
Log management with ELK
 
Spark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production usersSpark Summit EU 2015: Lessons from 300+ production users
Spark Summit EU 2015: Lessons from 300+ production users
 
Elastic stack Presentation
Elastic stack PresentationElastic stack Presentation
Elastic stack Presentation
 
Centralised logging with ELK stack
Centralised logging with ELK stackCentralised logging with ELK stack
Centralised logging with ELK stack
 
Splunk Webinar: Full-Stack End-to-End SAP-Monitoring mit Splunk
Splunk Webinar: Full-Stack End-to-End SAP-Monitoring mit SplunkSplunk Webinar: Full-Stack End-to-End SAP-Monitoring mit Splunk
Splunk Webinar: Full-Stack End-to-End SAP-Monitoring mit Splunk
 
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
Accelerating Spark SQL Workloads to 50X Performance with Apache Arrow-Based F...
 
Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3Apache kafka performance(latency)_benchmark_v0.3
Apache kafka performance(latency)_benchmark_v0.3
 
Capacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB ClusterCapacity Planning For Your Growing MongoDB Cluster
Capacity Planning For Your Growing MongoDB Cluster
 
Continuous Compliance and DevSecOps
Continuous Compliance and DevSecOpsContinuous Compliance and DevSecOps
Continuous Compliance and DevSecOps
 
Why your Spark Job is Failing
Why your Spark Job is FailingWhy your Spark Job is Failing
Why your Spark Job is Failing
 
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
What Is ELK Stack | ELK Tutorial For Beginners | Elasticsearch Kibana | ELK S...
 
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and LogstashKeeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
Keeping Up with the ELK Stack: Elasticsearch, Kibana, Beats, and Logstash
 
RabbitMQ vs Apache Kafka Part II Webinar
RabbitMQ vs Apache Kafka Part II WebinarRabbitMQ vs Apache Kafka Part II Webinar
RabbitMQ vs Apache Kafka Part II Webinar
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips Kafka Security 101 and Real-World Tips
Kafka Security 101 and Real-World Tips
 
Zero-Trust SASE DevSecOps
Zero-Trust SASE DevSecOpsZero-Trust SASE DevSecOps
Zero-Trust SASE DevSecOps
 
SAST (Static Application Security Testing) vs. SCA (Software Composition Anal...
SAST (Static Application Security Testing) vs. SCA (Software Composition Anal...SAST (Static Application Security Testing) vs. SCA (Software Composition Anal...
SAST (Static Application Security Testing) vs. SCA (Software Composition Anal...
 
SAP HANA on POWER9 systems
SAP HANA on POWER9 systemsSAP HANA on POWER9 systems
SAP HANA on POWER9 systems
 

Semelhante a Cisco OpenSOC

Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCSheetal Dolas
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...confluent
 
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...confluent
 
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...confluent
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
DefCon 2012 - Rooting SOHO Routers
DefCon 2012 - Rooting SOHO RoutersDefCon 2012 - Rooting SOHO Routers
DefCon 2012 - Rooting SOHO RoutersMichael Smith
 
Anomaly Detection at Scale
Anomaly Detection at ScaleAnomaly Detection at Scale
Anomaly Detection at ScaleJeff Henrikson
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonSpark Summit
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Extreme Network Performance with Hazelcast on Torusware
Extreme Network Performance with Hazelcast on ToruswareExtreme Network Performance with Hazelcast on Torusware
Extreme Network Performance with Hazelcast on ToruswareHazelcast
 
The Next AMPLab: Real-Time, Intelligent, and Secure Computing
The Next AMPLab: Real-Time, Intelligent, and Secure ComputingThe Next AMPLab: Real-Time, Intelligent, and Secure Computing
The Next AMPLab: Real-Time, Intelligent, and Secure ComputingSpark Summit
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basFlorent Ramiere
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle CoherenceBen Stopford
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weitingWei Ting Chen
 
Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning Eugene
 
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxData
 
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxData
 

Semelhante a Cisco OpenSOC (20)

Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOC
 
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
Scaling Security on 100s of Millions of Mobile Devices Using Apache Kafka® an...
 
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eri...
 
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi, Imply ...
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
optimizing_ceph_flash
optimizing_ceph_flashoptimizing_ceph_flash
optimizing_ceph_flash
 
DefCon 2012 - Rooting SOHO Routers
DefCon 2012 - Rooting SOHO RoutersDefCon 2012 - Rooting SOHO Routers
DefCon 2012 - Rooting SOHO Routers
 
Anomaly Detection at Scale
Anomaly Detection at ScaleAnomaly Detection at Scale
Anomaly Detection at Scale
 
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena EdelsonStreaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
Streaming Analytics with Spark, Kafka, Cassandra and Akka by Helena Edelson
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Extreme Network Performance with Hazelcast on Torusware
Extreme Network Performance with Hazelcast on ToruswareExtreme Network Performance with Hazelcast on Torusware
Extreme Network Performance with Hazelcast on Torusware
 
The Next AMPLab: Real-Time, Intelligent, and Secure Computing
The Next AMPLab: Real-Time, Intelligent, and Secure ComputingThe Next AMPLab: Real-Time, Intelligent, and Secure Computing
The Next AMPLab: Real-Time, Intelligent, and Secure Computing
 
Devoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en basDevoxx university - Kafka de haut en bas
Devoxx university - Kafka de haut en bas
 
moveMountainIEEE
moveMountainIEEEmoveMountainIEEE
moveMountainIEEE
 
Data Grids with Oracle Coherence
Data Grids with Oracle CoherenceData Grids with Oracle Coherence
Data Grids with Oracle Coherence
 
20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting20150704 benchmark and user experience in sahara weiting
20150704 benchmark and user experience in sahara weiting
 
Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning Data Pipelines and Telephony Fraud Detection Using Machine Learning
Data Pipelines and Telephony Fraud Detection Using Machine Learning
 
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam DillardInfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
InfluxEnterprise Architecture Patterns by Tim Hall & Sam Dillard
 
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
InfluxEnterprise Architectural Patterns by Dean Sheehan, Senior Director, Pre...
 
IoT meets Big Data
IoT meets Big DataIoT meets Big Data
IoT meets Big Data
 

Último

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 

Último (20)

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 

Cisco OpenSOC

  • 1. OpenSOC The Open Security Operations Center for Analyzing 1.2 Million Network Packets per Second in Real TimeJames Sirota, Big Data Architect Cisco Security Solutions Practice jsirota@cisco.com Sheetal Dolas Principal Architect Hortonworks sheetal@hortonworks.com June 3, 2014
  • 2. 2  Problem Statement & Business Case for OpenSOC  Solution Architecture and Design  Best Practices and Lessons Learned  Q & A Over Next Few Minutes
  • 4. 4 fatalism: It's no longer if or when you get hacked, but the assumption is that you've already been hacked, with a focus on minimizing the damage.” Source: Dark Reading / Security’s New Reality: Assume The Worst
  • 5. 5 Breaches Happen in Hours… But Go Undetected for Months or Even Years Source: 2013 Data Breach Investigations Report Seconds Minutes Hours Days Weeks Months Years Initial Attack to Initial Compromise 10% 75% 12% 2% 0% 1% 1% Initial Compromise to Data Exfiltration 8% 38% 14% 25% 8% 8% 0% Initial Compromise to Discovery 0% 0% 2% 13% 29% 54% 2% Discovery to Containment/ Restoration 0% 1% 9% 32% 38% 17% 4% Timespan of events by percent of breaches In 60% of breaches, data is stolen in hours 54% of breaches are not discovered for months
  • 6. 6 Cisco Global Cloud Index Source: 2014 Cisco Global Cloud Index
  • 7. 7 Introducing OpenSOC Intersection of Big Data and Security Analytics Multi Petabyte Storage Interactive Query Real-Time Search Scalable Stream Processing Unstructured Data Data Access Control Scalable Compute OpenSOC Real-Time Alerts Anomaly Detection Data Correlation Rules and Reports Predictive Modeling UI and Applications Big Data Platform Hadoop
  • 8. 8 OpenSOC Journey Sept 2013 First Prototype Dec 2013 Hortonworks joins the project March 2014 Platform development finished Sept 2014 General Availability May 2014 CR Work off April 2014 First beta test at customer site
  • 10. 10 OpenSOC Conceptual Architecture Raw Network Stream Network Metadata Stream Netflow Syslog Raw Application Logs Other Streaming Telemetry HiveHBase Raw Packet Store Long-Term Store Elastic Search Real-Time Index Network Packet Mining and PCAP Reconstruction Log Mining and Analytics Big Data Exploration, Predictive Modeling Applications + Analyst Tools Parse+Format Enrich Alert Threat Intelligence Feeds Enrichment Data
  • 11. 11  Raw Network Packet Capture, Store, Traffic Reconstruction  Telemetry Ingest, Enrichment and Real-Time Rules-Based Alerts  Real-Time Telemetry Search and Cross-Telemetry Matching  Automated Reports, Anomaly Detection and Anomaly Alerts Key Functional Capabilities
  • 12. 12  Fully-Backed by Cisco and Used Internally for Multiple Customers  Free, Open Source and Apache Licensed  Built on Highly-Scalable and Proven Platforms (Hadoop, Kafka, Storm)  Extensible and Pluggable Design  Flexible Deployment Model (On-Premise or Cloud)  Centralize your processes, people and data The OpenSOC Advantage
  • 13. 13 OpenSOC Deployment at Cisco Hardware footprint (40u)  14 Data Nodes (UCS C240 M3)  3 Cluster Control Nodes (UCS C220 M3)  2 ESX Hypervisor Hosts (UCS C220 M3)  1 PCAP Processor (UCS C220 M3 + Napatech NIC)  2 SourceFire Threat alert processors  1 Anue Network Traffic splitter  1 Router  1 48 Port 10GE Switch Software Stack HDP 2.1 Kafka 0.8 Elastic Search 1.1 MySQL 5.5
  • 14. 14 OpenSOC - Stitching Things Together AccessMessaging SystemData CollectionSource Systems StorageReal Time Processing StormKafka B Topic N Topic Elastic Search Index Web Services Search PCAP Reconstruction HBase PCAP Table Analytic Tools R / Python Power Pivot Tableau Hive Raw Data ORC Passive Tap PCAP Topic DPI Topic A Topic Telemetry Sources Syslog HTTP File System Other Flume Agent A Agent B Agent N B Topology N Topology A Topology PCAP Traffic Replicato r PCAP Topology DPI Topology
  • 15. 15 OpenSOC - Stitching Things Together AccessMessaging SystemData CollectionSource Systems StorageReal Time Processing StormKafka B Topic N Topic Elastic Search Index Web Services Search PCAP Reconstruction HBase PCAP Table Analytic Tools R / Python Power Pivot Tableau Hive Raw Data ORC Passive Tap PCAP Topic DPI Topic A Topic Telemetry Sources Syslog HTTP File System Other Flume Agent A Agent B Agent N B Topology N Topology A Topology PCAP Traffic Replicato r Deeper Look PCAP Topology DPI Topology
  • 16. 16 PCAP Topology StorageReal Time Processing Storm Elastic Search Index HBase PCAP Table Hive Raw Data ORC Kafka Spout Parse r Bolt HDFS Bolt HBas e Bolt ES Bolt
  • 17. 17 DPI Topology & Telemetry Enrichment StorageReal Time Processing Storm Elastic Search Index HBase PCAP Table Hive Raw Data ORC Kafka Spout Parse r Bolt GEO Enric h Whoi s Enric h CIF Enric h HDF S Bolt ES Bolt
  • 18. 18 Enrichments Parse r Bolt GEO Enrich RAW Message { “msg_key1”: “msg value1”, “src_ip”: “10.20.30.40”, “dest_ip”: “20.30.40.50”, “domain”: “mydomain.com” } Who Is Enrich "geo":[ {"region":"CA", "postalCode":"95134", "areaCode":"408", "metroCode":"807", "longitude":-121.946, "latitude":37.425, "locId":4522, "city":"San Jose", "country":"US" }] CIF Enrich "whois":[ { "OrgId":"CISCOS", "Parent":"NET-144-0-0-0-0", "OrgAbuseName":"Cisco Systems Inc", "RegDate":"1991-01-171991-01-17", "OrgName":"Cisco Systems", "Address":"170 West Tasman Drive", "NetType":"Direct Assignment" } ], “cif”:”Yes” Enriched Message Cache MySQL Geo Lite Data Cache HBase Who Is Data Cache HBase CIF Data
  • 19. 19 Applications: Telemetry Matching and DPI Step1: Search Step2: Match Step3: Analyze Step4: Build PCAP
  • 20. 20 Integration with Analytics Tools Dashboards Reports
  • 24. 24 This is where we began
  • 25. 25 Some code optimizations and increased parallelism
  • 26. 26  Is Disk I/O heavy  Kafka 0.8+ supports replication and JBOD  Better performance compared to RAID  Parallelism is largely driven by number of disks and partitions per topic  Key configuration parameters:  num.io.threads - Keep it at least equal to number of disks provided to Kafka  num.network.threads - adjust it based on number of concurrent producers, consumers and replication factor Kafka Tuning
  • 28. 28 Bottleneck Isolation, Resource Profiling, Load Balancing
  • 30. 30 This is where we began
  • 31. 31  Row Key design is critical (gets or scans or both?)  Keys with IP Addresses  Standard IP addresses have only two variations of the first character : 1 & 2  Minimum key length will be 7 characters and max 15 with a typical average of 12  Subnet range scans become difficult – range of 90 to 220 excludes 112  IP converted to hex (10.20.30.40 => 0a141e28)  gives 16 variations of first key character  consistently 8 character key  Easy to search for subnet ranges Row Key Design
  • 33. 33  Know your data  Auto split under high workload can result into hotspots and split storms  Understand your data and presplit the regions  Identify how many regions a RS can have to perform optimally. Use the formula below (RS memory)*(total memstore fraction)/((memstore size)*(# column families)) Region Splits
  • 35. 35  Enable Micro Batching (client side buffer)  Smart shuffle/grouping in storm  Understand your data and situationally exploit various WAL options  Watch for many minor compactions  For heavy ‘write’ workload Increase hbase.hstore.blockingStoreFiles (we used 200) Know Your Application
  • 38. 38  Parallelism is controlled by number of partitions per topic  Set Kafka spout parallelism equal to number of partitions in topic  Other key parameters that drive performance  fetchSizeBytes  bufferSizeBytes Kafka Spout
  • 40. 40  A bug in Kafka spout that used to miss out some partitions and loose data  It is now fixed and available from Hortonworks repository ( http://repo.hortonworks.com/content/repositories/releases/org/apache/ storm/storm-Kafka ) Mysteriously Missing Data Root Cause
  • 42. 42  Every small thing counts at scale  Even simple string operations can slowdown throughput when executed on millions of Tuples Storm
  • 43. 43  Error handling is critical  Poorly handled errors can lead to topology failure and eventually loss of data (or data duplication) Storm
  • 44. 44  Tune & Scale individual spout and bolts before performance testing/tuning entire topology  Write your own simple data generator spouts and no-op bolts  Making as many things configurable as possible helps a lot Storm
  • 45. 45  When it comes to Hadoop…partner up  Separate the hype from the opportunity  Start small then scale up  Design Iteratively  It doesn’t work unless you have proven it at scale  Keep an eye on ROI Lessons Learned
  • 46. 46 How can you contribute?  Technology Partner Program – contribute developers to join the Cisco and Hortonworks team Looking for Community Partners Cisco + Hortonworks + Community Support for OpenSOC
  • 47. Thank you! We are hiring: jsirota@cisco.com sheetal@hortonworks.com

Notas do Editor

  1. In Storm bolts shuffle group based on regions so that each HBase bolt gets data mostly for one or two regions and minimizes RS trips In case of DoS attack situations where actual packet are very small 20-60 bytes and individual packets are not very critical for analysis, skip WAL
  2. In Storm bolts shuffle group based on regions so that each HBase bolt gets data mostly for one or two regions and minimizes RS trips In case of DoS attack situations where actual packet are very small 20-60 bytes and individual packets are not very critical for analysis, skip WAL
  3. Frequent minor compactions reduce the overall throughput of system. For ‘write’ heavy workload reduce frequency of minor compactions by increasing hbase.hstore.blockingStoreFiles (we used 200)
  4. Frequent minor compactions reduce the overall throughput of system. For ‘write’ heavy workload reduce frequency of minor compactions by increasing hbase.hstore.blockingStoreFiles (we used 200)
  5. Frequent minor compactions reduce the overall throughput of system. For ‘write’ heavy workload reduce frequency of minor compactions by increasing hbase.hstore.blockingStoreFiles (we used 200)