O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Tame your router data
with Apache Kafka and Apache Druid
Rachel Pedreschi
rachel.pedreschi@imply.io
Eric Graham
eric.graha...
Tell ‘em what you are gonna tell ‘em
! The Who? Intro to your (slightly) nervous speakers
! The Why? What is the problem?
...
The Who
3
Eric Graham. 

The Man, The Legend.
The one that wrote the paper
that got us accepted to this conference.
Rachel...
4
Part of the problem - The Data
5
Streaming Telemetry Flow Syslog Augmentation
A recent advancement to replace
SNMP. Provid...
Let’s make the data part of the solution!
6
OSS to the rescue!
7
Network analytics pipeline
Streaming architectures are true-to-life and enable faster decision cycles.
8Confidential. Do n...
The Answer: Apache Kafka and Apache Druid
! Both built for modern data
architectures.
! Both can handle data at scale.
(la...
!10
What the heck is Apache Druid and Why
Should I Care?
11
!12
!13
!14
!15
The 90s: data warehouses and data marts
Tightly coupled architecture with limited flexibility.
Data
Data
Data
Data Sources...
!17
!18
The 2000s - present: data lakes
Separation of storage and compute enables flexibility in tools.
19
Data
Data
Data
Mapreduc...
!20
The Now: data rivers
Streaming architectures enable faster decision cycles.
21
Data
Data
Data
Data Sources
Message bus
Dat...
The problem
22
The problem
23
Typical Big Data++ Challenges
! Scale: when data is large, we need a lot of servers
! Speed: aiming for sub-second respons...
What were the options?
25
Search
platform
OLAP
! Real-time ingestion
! Flexible schema
! Full text search
! Batch ingestio...
26
! Batch ingestion
! Efficient storage
! Fast analytic queries
Confidential. Do not redistribute.
Search
platform
OLAP
!...
27
These guys have played a Druid…
28
Source: http://druid.io/druid-powered.html and imply.io
+ many more!
Gratuitous Customer Quote
“The performance is great ... some of the tables that we have internally in
Druid have billions ...
Shall we take a look?
30
Network analytics pipeline
Streaming architectures are true-to-life and enable faster decision cycles.
31Confidential. Do ...
!32
curl -X POST -H 'Content-Type:
application/json' -d @supervisor-spec.json
http://localhost:8090/druid/indexer/v1/
supe...
33
Use Cases
34
Use Case: Network troubleshooting
35
Use Case: Network troubleshooting
! Dashboards that include logs, flow and snmp (single pane of glass) for quick cross dat...
Use Case: DDOS and security
! Visualize spikes and dips and easily filter on specific data. (Geo, Attack vectors, known ba...
Use Case: BGP Analytics
! PMACCT can collect and add BGP information by peering with a BGP speaker.
! Use Kafka KSQL or Ks...
Download
Druid community site (current): http://druid.io/
Druid community site (new): https://druid.apache.org/
Imply dist...
Contribute
40
https://github.com/apache/druid
Stay in touch
41
@druidio
Join the community!
http://druid.io/community
Come by our booth for a druid t-shirt and to learn...
Thank you!
!42
Hold for applause…
Próximos SlideShares
Carregando em…5
×

How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eric Graham, Imply Data) Kafka Summit London 2019

311 visualizações

Publicada em

Do you know who is knocking on your network’s door? Have new regulations left you scratching your head on how to a handle what is happening in your network? Network flow data helps answer many questions across a multitude of use cases including network security, performance, capacity planning, routing, operational troubleshooting and more. Today’s modern day streaming data pipelines need to include tools that can scale to meet the demands of these service providers while continuing to provide responsive answers to difficult questions. In addition to stream processing, data needs to be stored in a redundant, operationally focused database to provide fast, reliable answers to critical questions. Together, Kafka and Druid work together to create such a pipeline.

In this talk Eric Graham and Rachel Pedreschi will discuss these pipelines and cover the following topics: Network flow use cases and why this data is important. Reference architectures from production systems at a major international Bank. Why Kafka and Druid and other OSS tools for Network flows. A demo of one such system.

Publicada em: Tecnologia
  • Seja o primeiro a comentar

How To Use Kafka and Druid to Tame Your Router Data (Rachel Pedreschi and Eric Graham, Imply Data) Kafka Summit London 2019

  1. 1. Tame your router data with Apache Kafka and Apache Druid Rachel Pedreschi rachel.pedreschi@imply.io Eric Graham eric.graham@imply.io
  2. 2. Tell ‘em what you are gonna tell ‘em ! The Who? Intro to your (slightly) nervous speakers ! The Why? What is the problem? ! The How? Introducing the OSS stack to solve all the world’s ills ! The Demo. So much demo. 2
  3. 3. The Who 3 Eric Graham. 
 The Man, The Legend. The one that wrote the paper that got us accepted to this conference. Rachel Pedreschi. 
 Mostly Overhead. The one that wrote the abstract that got us accepted to this conference.
  4. 4. 4
  5. 5. Part of the problem - The Data 5 Streaming Telemetry Flow Syslog Augmentation A recent advancement to replace SNMP. Provides streaming interface vs. older pull model. Gives network operators much quicker response to deviations. detailed network analysis around TCP/IP flows through routers, switches and firewalls. Flow data includes src/dst MAC, src/dst IP, Protocol, src/dst port, in/out interface ID, TCP flags, TOS, BGP information, Bytes/Packets and more System logs for routers and switches Routing, DNS, usernames make visibility that much clearer Telegraf, pipeline, sflowd Tools - examples: PMACCT, Cento, NIFI/NFDump Syslog-ng ksql, kstream, lookup tables, BGP routing Used to collect metrics on interface stats, cpu, memory, disk space and more. Get detailed information on TCP/IP packets Textual information on whats going on Clearer visibility to make rapid decisions
  6. 6. Let’s make the data part of the solution! 6
  7. 7. OSS to the rescue! 7
  8. 8. Network analytics pipeline Streaming architectures are true-to-life and enable faster decision cycles. 8Confidential. Do not redistribute. Routers, Switches, Firewalls, Hosts Ingest Application Hostname mapping Microservice name Application name Routing lookups Enhance the data Syslog BGP, Flow
  9. 9. The Answer: Apache Kafka and Apache Druid ! Both built for modern data architectures. ! Both can handle data at scale. (largest Druid cluster over 2000 servers, 50Pb raw data) ! Full redundancy. ! Druid was developed for real- time analytics. ! Both work in harmony together helping get answers fast. 9
  10. 10. !10
  11. 11. What the heck is Apache Druid and Why Should I Care? 11
  12. 12. !12
  13. 13. !13
  14. 14. !14
  15. 15. !15
  16. 16. The 90s: data warehouses and data marts Tightly coupled architecture with limited flexibility. Data Data Data Data Sources ETL Data Warehouse Processing Store and Compute Analytics Reporting Data mining Querying Confidential. Do not redistribute. 16
  17. 17. !17
  18. 18. !18
  19. 19. The 2000s - present: data lakes Separation of storage and compute enables flexibility in tools. 19 Data Data Data Mapreduce Reporting and Analytics ELT Data Warehouse ML/AI Engine Search system Data Lake StorageData Sources Confidential. Do not redistribute.
  20. 20. !20
  21. 21. The Now: data rivers Streaming architectures enable faster decision cycles. 21 Data Data Data Data Sources Message bus Data Lake Streaming OLAP Confidential. Do not redistribute.
  22. 22. The problem 22
  23. 23. The problem 23
  24. 24. Typical Big Data++ Challenges ! Scale: when data is large, we need a lot of servers ! Speed: aiming for sub-second response time ! Complexity: too much fine grain to precompute ! High dimensionality: 10s or 100s of dimensions ! Concurrency: many users and tenants ! Freshness: load from streams 24
  25. 25. What were the options? 25 Search platform OLAP ! Real-time ingestion ! Flexible schema ! Full text search ! Batch ingestion ! Efficient storage ! Fast analytic queries Timeseries database ! Optimized storage for time-based datasets ! Time-based functions
  26. 26. 26 ! Batch ingestion ! Efficient storage ! Fast analytic queries Confidential. Do not redistribute. Search platform OLAP ! Real-time ingestion ! Flexible schema ! Full text search Timeseries database ! Optimized storage for time-based datasets ! Time-based functions high performance analytics database for event-driven data
  27. 27. 27
  28. 28. These guys have played a Druid… 28 Source: http://druid.io/druid-powered.html and imply.io + many more!
  29. 29. Gratuitous Customer Quote “The performance is great ... some of the tables that we have internally in Druid have billions and billions of events in them, and we’re scanning them in under a second.” 29 Source: https://www.infoworld.com/article/2949168/hadoop/yahoo-struts-its-hadoop-stuff.html From Yahoo:
  30. 30. Shall we take a look? 30
  31. 31. Network analytics pipeline Streaming architectures are true-to-life and enable faster decision cycles. 31Confidential. Do not redistribute. Routers, Switches, Firewalls, Hosts Ingest Application Hostname mapping Microservice name Application name Routing lookups Enhance the data Syslog BGP, Flow
  32. 32. !32 curl -X POST -H 'Content-Type: application/json' -d @supervisor-spec.json http://localhost:8090/druid/indexer/v1/ supervisor
  33. 33. 33
  34. 34. Use Cases 34
  35. 35. Use Case: Network troubleshooting 35
  36. 36. Use Case: Network troubleshooting ! Dashboards that include logs, flow and snmp (single pane of glass) for quick cross dataset visualizations. ! Visualize spikes and dips and easily filter on specific data. ! Enhance the data to visualize names and not IPs/MAC addresses – but get the IPs when you need them. ! Dashboards to show most interesting, common areas of interest. ! Alerting notifications for threshold breaches or deviation from normal. ! Is it the network or application? Enhanced datasets provide quick answers. 36
  37. 37. Use Case: DDOS and security ! Visualize spikes and dips and easily filter on specific data. (Geo, Attack vectors, known bad actors) ! DDOS specific alerting (UDP badports, TCP Flags, Number of unique IPs, Overall increase) ! Hooks to multiple notification channels for always on notifications. ! Webhooks for integration with back office systems. ! Easily drill-down into 37
  38. 38. Use Case: BGP Analytics ! PMACCT can collect and add BGP information by peering with a BGP speaker. ! Use Kafka KSQL or Kstream to augment data with BGP information. ! Visualize the BGP AS_PATH (where you traffic is going across the Internet). ! Who are your top transit or peering partners. ! Top Source and Destination ASNs. ! Top BGP communities. 38
  39. 39. Download Druid community site (current): http://druid.io/ Druid community site (new): https://druid.apache.org/ Imply distribution: https://imply.io/get-started 39
  40. 40. Contribute 40 https://github.com/apache/druid
  41. 41. Stay in touch 41 @druidio Join the community! http://druid.io/community Come by our booth for a druid t-shirt and to learn more! Follow the Druid project on Twitter!
  42. 42. Thank you! !42 Hold for applause…

×