SlideShare uma empresa Scribd logo
1 de 42
Tame your router data
with Apache Kafka and Apache Druid
Rachel Pedreschi
rachel.pedreschi@imply.io
Eric Graham
eric.graham@imply.io
Tell ‘em what you are gonna tell ‘em
! The Who? Intro to your (slightly) nervous speakers
! The Why? What is the problem?
! The How? Introducing the OSS stack to solve all the world’s ills
! The Demo. So much demo.
2
The Who
3
Eric Graham. 

The Man, The Legend.
The one that wrote the paper
that got us accepted to this conference.
Rachel Pedreschi. 

Mostly Overhead.
The one that wrote the abstract
that got us accepted to this conference.
4
Part of the problem - The Data
5
Streaming Telemetry Flow Syslog Augmentation
A recent advancement to replace
SNMP. Provides streaming interface
vs. older pull model. Gives network
operators much quicker response to
deviations.
detailed network analysis around
TCP/IP flows through routers,
switches and firewalls. Flow data
includes src/dst MAC, src/dst IP,
Protocol, src/dst port, in/out
interface ID, TCP flags, TOS, BGP
information, Bytes/Packets and
more
System logs for routers and
switches
Routing, DNS, usernames make
visibility that much clearer
Telegraf, pipeline, sflowd
Tools - examples: PMACCT, Cento,
NIFI/NFDump
Syslog-ng
ksql, kstream, lookup tables, BGP
routing
Used to collect metrics on interface
stats, cpu, memory, disk space and
more.
Get detailed information on TCP/IP
packets
Textual information on whats going
on
Clearer visibility to make rapid decisions
Let’s make the data part of the solution!
6
OSS to the rescue!
7
Network analytics pipeline
Streaming architectures are true-to-life and enable faster decision cycles.
8Confidential. Do not redistribute.
Routers, Switches, Firewalls,
Hosts
Ingest
Application
Hostname mapping
Microservice name
Application name
Routing lookups
Enhance the data
Syslog
BGP, Flow
The Answer: Apache Kafka and Apache Druid
! Both built for modern data
architectures.
! Both can handle data at scale.
(largest Druid cluster over
2000 servers, 50Pb raw data)
! Full redundancy.
! Druid was developed for real-
time analytics.
! Both work in harmony together
helping get answers fast.
9
!10
What the heck is Apache Druid and Why
Should I Care?
11
!12
!13
!14
!15
The 90s: data warehouses and data marts
Tightly coupled architecture with limited flexibility.
Data
Data
Data
Data Sources
ETL Data
Warehouse
Processing Store and Compute
Analytics
Reporting
Data mining
Querying
Confidential. Do not redistribute. 16
!17
!18
The 2000s - present: data lakes
Separation of storage and compute enables flexibility in tools.
19
Data
Data
Data
Mapreduce
Reporting and Analytics
ELT
Data
Warehouse
ML/AI Engine
Search
system
Data
Lake
StorageData Sources
Confidential. Do not redistribute.
!20
The Now: data rivers
Streaming architectures enable faster decision cycles.
21
Data
Data
Data
Data Sources
Message bus
Data
Lake
Streaming OLAP
Confidential. Do not redistribute.
The problem
22
The problem
23
Typical Big Data++ Challenges
! Scale: when data is large, we need a lot of servers
! Speed: aiming for sub-second response time
! Complexity: too much fine grain to precompute
! High dimensionality: 10s or 100s of dimensions
! Concurrency: many users and tenants
! Freshness: load from streams
24
What were the options?
25
Search
platform
OLAP
! Real-time ingestion
! Flexible schema
! Full text search
! Batch ingestion
! Efficient storage
! Fast analytic queries
Timeseries
database
! Optimized storage for
time-based datasets
! Time-based functions
26
! Batch ingestion
! Efficient storage
! Fast analytic queries
Confidential. Do not redistribute.
Search
platform
OLAP
! Real-time ingestion
! Flexible schema
! Full text search
Timeseries
database
! Optimized storage for
time-based datasets
! Time-based functions
high performance
analytics database for
event-driven data
27
These guys have played a Druid…
28
Source: http://druid.io/druid-powered.html and imply.io
+ many more!
Gratuitous Customer Quote
“The performance is great ... some of the tables that we have internally in
Druid have billions and billions of events in them, and we’re scanning
them in under a second.”
29
Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts-its-hadoop-stuff.html
From Yahoo:
Shall we take a look?
30
Network analytics pipeline
Streaming architectures are true-to-life and enable faster decision cycles.
31Confidential. Do not redistribute.
Routers, Switches, Firewalls,
Hosts
Ingest
Application
Hostname mapping
Microservice name
Application name
Routing lookups
Enhance the data
Syslog
BGP, Flow
!32
curl -X POST -H 'Content-Type:
application/json' -d @supervisor-spec.json
http://localhost:8090/druid/indexer/v1/
supervisor
33
Use Cases
34
Use Case: Network troubleshooting
35
Use Case: Network troubleshooting
! Dashboards that include logs, flow and snmp (single pane of glass) for quick cross dataset
visualizations.
! Visualize spikes and dips and easily filter on specific data.
! Enhance the data to visualize names and not IPs/MAC addresses – but get the IPs when you
need them.
! Dashboards to show most interesting, common areas of interest.
! Alerting notifications for threshold breaches or deviation from normal.
! Is it the network or application? Enhanced datasets provide quick answers.
36
Use Case: DDOS and security
! Visualize spikes and dips and easily filter on specific data. (Geo, Attack vectors, known bad
actors)
! DDOS specific alerting (UDP badports, TCP Flags, Number of unique IPs, Overall increase)
! Hooks to multiple notification channels for always on notifications.
! Webhooks for integration with back office systems.
! Easily drill-down into
37
Use Case: BGP Analytics
! PMACCT can collect and add BGP information by peering with a BGP speaker.
! Use Kafka KSQL or Kstream to augment data with BGP information.
! Visualize the BGP AS_PATH (where you traffic is going across the Internet).
! Who are your top transit or peering partners.
! Top Source and Destination ASNs.
! Top BGP communities.
38
Download
Druid community site (current): http://druid.io/
Druid community site (new): https://druid.apache.org/
Imply distribution: https://imply.io/get-started
39
Contribute
40
https://github.com/apache/druid
Stay in touch
41
@druidio
Join the community!
http://druid.io/community
Come by our booth for a druid t-shirt and to learn more!
Follow the Druid project on Twitter!
Thank you!
!42
Hold for applause…

Mais conteúdo relacionado

Semelhante a How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eric Graham, Imply Data) Kafka Summit London 2019

Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream
csching
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
Malware vs Big Data
Malware vs Big DataMalware vs Big Data
Malware vs Big Data
Frank Denis
 

Semelhante a How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eric Graham, Imply Data) Kafka Summit London 2019 (20)

eProsima RPC over DDS - OMG June 2013 Berlin Meeting
eProsima RPC over DDS - OMG June 2013 Berlin MeetingeProsima RPC over DDS - OMG June 2013 Berlin Meeting
eProsima RPC over DDS - OMG June 2013 Berlin Meeting
 
Tech
TechTech
Tech
 
Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream
 
Realtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLibRealtime Detection of DDOS attacks using Apache Spark and MLLib
Realtime Detection of DDOS attacks using Apache Spark and MLLib
 
Kentik Network@Scale (Dan Ellis)
Kentik Network@Scale (Dan Ellis)Kentik Network@Scale (Dan Ellis)
Kentik Network@Scale (Dan Ellis)
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Large-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and FinanceLarge-Scale System Integration with DDS for SCADA, C2, and Finance
Large-Scale System Integration with DDS for SCADA, C2, and Finance
 
BIG DATA
BIG DATABIG DATA
BIG DATA
 
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
Javantura v3 - Real-time BigData ingestion and querying of aggregated data – ...
 
Splunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk TrafficSplunk Stream - Einblicke in Netzwerk Traffic
Splunk Stream - Einblicke in Netzwerk Traffic
 
Introduction to networking
Introduction to networkingIntroduction to networking
Introduction to networking
 
Big Data: hype or necessity?
Big Data: hype or necessity?Big Data: hype or necessity?
Big Data: hype or necessity?
 
Insider Threat Visualization - HITB 2007, Kuala Lumpur
Insider Threat Visualization - HITB 2007, Kuala LumpurInsider Threat Visualization - HITB 2007, Kuala Lumpur
Insider Threat Visualization - HITB 2007, Kuala Lumpur
 
Networking 101 english
Networking 101   englishNetworking 101   english
Networking 101 english
 
Malware vs Big Data
Malware vs Big DataMalware vs Big Data
Malware vs Big Data
 
Solving the Really Big Tech Problems with IoT
 Solving the Really Big Tech Problems with IoT Solving the Really Big Tech Problems with IoT
Solving the Really Big Tech Problems with IoT
 
Insider Threat Visualization - HackInTheBox 2007
Insider Threat Visualization - HackInTheBox 2007Insider Threat Visualization - HackInTheBox 2007
Insider Threat Visualization - HackInTheBox 2007
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Big Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's PerspectiveBig Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's Perspective
 
Inroduction to Big Data
Inroduction to Big DataInroduction to Big Data
Inroduction to Big Data
 

Mais de confluent

Mais de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eric Graham, Imply Data) Kafka Summit London 2019

  • 1. Tame your router data with Apache Kafka and Apache Druid Rachel Pedreschi rachel.pedreschi@imply.io Eric Graham eric.graham@imply.io
  • 2. Tell ‘em what you are gonna tell ‘em ! The Who? Intro to your (slightly) nervous speakers ! The Why? What is the problem? ! The How? Introducing the OSS stack to solve all the world’s ills ! The Demo. So much demo. 2
  • 3. The Who 3 Eric Graham. 
 The Man, The Legend. The one that wrote the paper that got us accepted to this conference. Rachel Pedreschi. 
 Mostly Overhead. The one that wrote the abstract that got us accepted to this conference.
  • 4. 4
  • 5. Part of the problem - The Data 5 Streaming Telemetry Flow Syslog Augmentation A recent advancement to replace SNMP. Provides streaming interface vs. older pull model. Gives network operators much quicker response to deviations. detailed network analysis around TCP/IP flows through routers, switches and firewalls. Flow data includes src/dst MAC, src/dst IP, Protocol, src/dst port, in/out interface ID, TCP flags, TOS, BGP information, Bytes/Packets and more System logs for routers and switches Routing, DNS, usernames make visibility that much clearer Telegraf, pipeline, sflowd Tools - examples: PMACCT, Cento, NIFI/NFDump Syslog-ng ksql, kstream, lookup tables, BGP routing Used to collect metrics on interface stats, cpu, memory, disk space and more. Get detailed information on TCP/IP packets Textual information on whats going on Clearer visibility to make rapid decisions
  • 6. Let’s make the data part of the solution! 6
  • 7. OSS to the rescue! 7
  • 8. Network analytics pipeline Streaming architectures are true-to-life and enable faster decision cycles. 8Confidential. Do not redistribute. Routers, Switches, Firewalls, Hosts Ingest Application Hostname mapping Microservice name Application name Routing lookups Enhance the data Syslog BGP, Flow
  • 9. The Answer: Apache Kafka and Apache Druid ! Both built for modern data architectures. ! Both can handle data at scale. (largest Druid cluster over 2000 servers, 50Pb raw data) ! Full redundancy. ! Druid was developed for real- time analytics. ! Both work in harmony together helping get answers fast. 9
  • 10. !10
  • 11. What the heck is Apache Druid and Why Should I Care? 11
  • 12. !12
  • 13. !13
  • 14. !14
  • 15. !15
  • 16. The 90s: data warehouses and data marts Tightly coupled architecture with limited flexibility. Data Data Data Data Sources ETL Data Warehouse Processing Store and Compute Analytics Reporting Data mining Querying Confidential. Do not redistribute. 16
  • 17. !17
  • 18. !18
  • 19. The 2000s - present: data lakes Separation of storage and compute enables flexibility in tools. 19 Data Data Data Mapreduce Reporting and Analytics ELT Data Warehouse ML/AI Engine Search system Data Lake StorageData Sources Confidential. Do not redistribute.
  • 20. !20
  • 21. The Now: data rivers Streaming architectures enable faster decision cycles. 21 Data Data Data Data Sources Message bus Data Lake Streaming OLAP Confidential. Do not redistribute.
  • 24. Typical Big Data++ Challenges ! Scale: when data is large, we need a lot of servers ! Speed: aiming for sub-second response time ! Complexity: too much fine grain to precompute ! High dimensionality: 10s or 100s of dimensions ! Concurrency: many users and tenants ! Freshness: load from streams 24
  • 25. What were the options? 25 Search platform OLAP ! Real-time ingestion ! Flexible schema ! Full text search ! Batch ingestion ! Efficient storage ! Fast analytic queries Timeseries database ! Optimized storage for time-based datasets ! Time-based functions
  • 26. 26 ! Batch ingestion ! Efficient storage ! Fast analytic queries Confidential. Do not redistribute. Search platform OLAP ! Real-time ingestion ! Flexible schema ! Full text search Timeseries database ! Optimized storage for time-based datasets ! Time-based functions high performance analytics database for event-driven data
  • 27. 27
  • 28. These guys have played a Druid… 28 Source: http://druid.io/druid-powered.html and imply.io + many more!
  • 29. Gratuitous Customer Quote “The performance is great ... some of the tables that we have internally in Druid have billions and billions of events in them, and we’re scanning them in under a second.” 29 Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts-its-hadoop-stuff.html From Yahoo:
  • 30. Shall we take a look? 30
  • 31. Network analytics pipeline Streaming architectures are true-to-life and enable faster decision cycles. 31Confidential. Do not redistribute. Routers, Switches, Firewalls, Hosts Ingest Application Hostname mapping Microservice name Application name Routing lookups Enhance the data Syslog BGP, Flow
  • 32. !32 curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json http://localhost:8090/druid/indexer/v1/ supervisor
  • 33. 33
  • 35. Use Case: Network troubleshooting 35
  • 36. Use Case: Network troubleshooting ! Dashboards that include logs, flow and snmp (single pane of glass) for quick cross dataset visualizations. ! Visualize spikes and dips and easily filter on specific data. ! Enhance the data to visualize names and not IPs/MAC addresses – but get the IPs when you need them. ! Dashboards to show most interesting, common areas of interest. ! Alerting notifications for threshold breaches or deviation from normal. ! Is it the network or application? Enhanced datasets provide quick answers. 36
  • 37. Use Case: DDOS and security ! Visualize spikes and dips and easily filter on specific data. (Geo, Attack vectors, known bad actors) ! DDOS specific alerting (UDP badports, TCP Flags, Number of unique IPs, Overall increase) ! Hooks to multiple notification channels for always on notifications. ! Webhooks for integration with back office systems. ! Easily drill-down into 37
  • 38. Use Case: BGP Analytics ! PMACCT can collect and add BGP information by peering with a BGP speaker. ! Use Kafka KSQL or Kstream to augment data with BGP information. ! Visualize the BGP AS_PATH (where you traffic is going across the Internet). ! Who are your top transit or peering partners. ! Top Source and Destination ASNs. ! Top BGP communities. 38
  • 39. Download Druid community site (current): http://druid.io/ Druid community site (new): https://druid.apache.org/ Imply distribution: https://imply.io/get-started 39
  • 41. Stay in touch 41 @druidio Join the community! http://druid.io/community Come by our booth for a druid t-shirt and to learn more! Follow the Druid project on Twitter!
  • 42. Thank you! !42 Hold for applause…