Enterprise Grade Streaming under 2ms on Hadoop

•Transferir como PPTX, PDF•

0 gostou•571 visualizações

DataWorks Summit/Hadoop Summit

Tecnologia

7
X (predictor)
Spend amount, geo
Y (response)
Simple Velocity Advanced

11
Hard Metrics Goal
Latency < 40ms
Ideally < 16ms
Throughput Goal of 2000 events / second
Durability No loss, every message gets exactly one response
Availability 99.5% uptime (downtime of 1.83 days / year);
Ideally 99.999% uptime (downtime of 5.26 minutes / year)
Scalability Can add resources, still meet latency requirements
Integration Transparently connected to existing systems – Hardware, Messaging,
HDFS
Soft Metrics Goal
Open Source All components licensed as open source
Extensibility Rules can be updated, model is regularly refreshed

14
Enterprise
Readiness
RoadmapPerformance
Community

26
• Avg. 0.25ms, @70k records/sec, w/ 600GB RAM
Thread Local on ~54M events
Percentiles (in ms)
Throughput Count
Avg
(ms) 90% 95% 99% 99.9% 4 9’s 5 9’s 6 9’s
70k/sec
54,126,1
22 0.19 1 1 1 2 2 5 6
Performance

27
Durability
• Two physically independent pipelines on the same cluster processing
identical data
• For the same tuple, we find the best-case time between two pipelines
– 39 records out of 5.2M exceeded 16ms
– 173 out of 5.2M exceeded 16ms in one pipeline but succeeded in the other
• 99.99925% success rate – “Five Nines”
• Average Latency of 0.0981ms

28
@vijaysbhat
linkedin.com/in/vijaysbhat

Mais conteúdo relacionado

Mais procurados

Practice of large Hadoop cluster in China MobileDataWorks Summit

Hadoop operations-2015-hadoop-summit-san-jose-v5Chris Nauroth

HBaseConAsia2018 Track1-3: HBase at XiaomiMichael Stack

Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...DataStax

Apache kafkaShravan (Sean) Pabba

Bringing Real-Time to the Enterprise with Hortonworks DataFlowDataWorks Summit

From docker to kubernetes: running Apache Hadoop in a cloud native wayDataWorks Summit

January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...Yahoo Developer Network

C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...DataStax

Tez Shuffle Handler: Shuffling at Scale with Apache HadoopDataWorks Summit

Spark Summit EU talk by Debasish Das and Pramod NarasimhaSpark Summit

Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...DataWorks Summit

HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.Cloudera, Inc.

Hive, Presto, and Spark on TPC-DS benchmarkDongwon Kim

HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at HuaweiMichael Stack

DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast DataHakka Labs

Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...DataStax

Kafka to the Maxka - (Kafka Performance Tuning)DataWorks Summit

Apache Hadoop YARN 3.x in AlibabaDataWorks Summit

Simplified Cluster Operation & TroubleshootingDataWorks Summit/Hadoop Summit

Mais procurados (20)

Practice of large Hadoop cluster in China Mobile

Hadoop operations-2015-hadoop-summit-san-jose-v5

HBaseConAsia2018 Track1-3: HBase at Xiaomi

Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...

Apache kafka

Bringing Real-Time to the Enterprise with Hortonworks DataFlow

From docker to kubernetes: running Apache Hadoop in a cloud native way

January 2015 HUG: Using HBase Co-Processors to Build a Distributed, Transacti...

C* Capacity Forecasting (Ajay Upadhyay, Jyoti Shandil, Arun Agrawal, Netflix)...

Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop

Spark Summit EU talk by Debasish Das and Pramod Narasimha

Optimizing, profiling and deploying high performance Spark ML and TensorFlow ...

HBaseCon 2013: Apache HBase, Meet Ops. Ops, Meet Apache HBase.

Hive, Presto, and Spark on TPC-DS benchmark

HBaseConAsia2018 Track3-4: HBase and OpenTSDB practice at Huawei

DatEngConf SF16 - Apache Kudu: Fast Analytics on Fast Data

Develop Scalable Applications with DataStax Drivers (Alex Popescu, Bulat Shak...

Kafka to the Maxka - (Kafka Performance Tuning)

Apache Hadoop YARN 3.x in Alibaba

Simplified Cluster Operation & Troubleshooting

Destaque

Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...DataWorks Summit/Hadoop Summit

Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleDataWorks Summit/Hadoop Summit

YARN Federation DataWorks Summit/Hadoop Summit

Big Data Ready Enterprise DataWorks Summit/Hadoop Summit

Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Precisely

Solving Performance Problems on HadoopTyler Mitchell

Workload Automation + Hadoop?DataWorks Summit/Hadoop Summit

Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data AnalysisDataWorks Summit/Hadoop Summit

Active Learning for Fraud PreventionDataWorks Summit/Hadoop Summit

Streaming in the Wild with Apache FlinkDataWorks Summit/Hadoop Summit

Beyond TCODataWorks Summit/Hadoop Summit

Machine Learning for Any Size of Data, Any Type of DataDataWorks Summit/Hadoop Summit

A New "Sparkitecture" for modernizing your data warehouseDataWorks Summit/Hadoop Summit

Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveDataWorks Summit/Hadoop Summit

The Future of Apache Hadoop an Enterprise Architecture ViewDataWorks Summit/Hadoop Summit

Intro to Spark with Zeppelin Crash Course Hadoop Summit SJDaniel Madrigal

Accelerating Data Warehouse ModernizationDataWorks Summit/Hadoop Summit

Swimming Across the Data Lake, Lessons learned and keys to success DataWorks Summit/Hadoop Summit

Apache Ranger Hive Metastore Security DataWorks Summit/Hadoop Summit

Analysis of Major Trends in Big Data AnalyticsDataWorks Summit/Hadoop Summit

Destaque (20)

Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...

Managing Hadoop, HBase and Storm Clusters at Yahoo Scale

YARN Federation

Big Data Ready Enterprise

Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...

Solving Performance Problems on Hadoop

Workload Automation + Hadoop?

Apache Zeppelin + LIvy: Bringing Multi Tenancy to Interactive Data Analysis

Active Learning for Fraud Prevention

Streaming in the Wild with Apache Flink

Beyond TCO

Machine Learning for Any Size of Data, Any Type of Data

A New "Sparkitecture" for modernizing your data warehouse

Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive

The Future of Apache Hadoop an Enterprise Architecture View

Intro to Spark with Zeppelin Crash Course Hadoop Summit SJ

Accelerating Data Warehouse Modernization

Swimming Across the Data Lake, Lessons learned and keys to success

Apache Ranger Hive Metastore Security

Analysis of Major Trends in Big Data Analytics

Semelhante a Enterprise Grade Streaming under 2ms on Hadoop

Cl306Juliette Ponnet

Capital One's Next Generation Decision in less than 2 msApache Apex

Huawei OceanStorDoradoAll-Flashtorage Systems.pdfvineeshen2

Next-Gen Decision Making in Under 2msIlya Ganelin

Advanced off heap ipcPeter Lawrey

VMware vFabric gemfire for high performance, resilient distributed appsVMware vFabric

RedisConf18 - Auto-Scaling Redis Caches - Observability, Efficiency & Perform...Redis Labs

Flex pod minitheatre-orlando1Michael Harding

Spinnaker VLDB 2011sandeep_tata

Fault tolerant mechanisms in Big DataKaran Pardeshi

Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph Ceph Community

Scylla Summit 2016: Scylla at Samsung SDSScyllaDB

Oracle CoherenceMustafa Ahmed

Availability Considerations for SQL ServerBob Roudebush

Big data on Azure for ArchitectsTomasz Kopacz

Introduction to Back End Automation Testing - Nguyen Vu Hoang, Hoang PhiHo Chi Minh City Software Testing Club

Everything You Need to Know About ShardingMongoDB

2011 06-30-hadoop-summit v5Samuel Rash

Continuity Software 4.3 Detailed GapsGilHecht

MySQL InnoDB Cluster HA Overview & DemoKeith Hollman

Semelhante a Enterprise Grade Streaming under 2ms on Hadoop (20)

Cl306

Capital One's Next Generation Decision in less than 2 ms

Huawei OceanStorDoradoAll-Flashtorage Systems.pdf

Next-Gen Decision Making in Under 2ms

Advanced off heap ipc

VMware vFabric gemfire for high performance, resilient distributed apps

RedisConf18 - Auto-Scaling Redis Caches - Observability, Efficiency & Perform...

Flex pod minitheatre-orlando1

Spinnaker VLDB 2011

Fault tolerant mechanisms in Big Data

Ceph Day Seoul - AFCeph: SKT Scale Out Storage Ceph

Scylla Summit 2016: Scylla at Samsung SDS

Oracle Coherence

Availability Considerations for SQL Server

Big data on Azure for Architects

Introduction to Back End Automation Testing - Nguyen Vu Hoang, Hoang Phi

Everything You Need to Know About Sharding

2011 06-30-hadoop-summit v5

Continuity Software 4.3 Detailed Gaps

MySQL InnoDB Cluster HA Overview & Demo

Mais de DataWorks Summit/Hadoop Summit

Running Apache Spark & Apache Zeppelin in ProductionDataWorks Summit/Hadoop Summit

State of Security: Apache Spark & Apache ZeppelinDataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit

Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit

Revolutionize Text Mining with Spark and ZeppelinDataWorks Summit/Hadoop Summit

Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit

Hadoop Crash CourseDataWorks Summit/Hadoop Summit

Data Science Crash CourseDataWorks Summit/Hadoop Summit

Apache Spark Crash CourseDataWorks Summit/Hadoop Summit

Dataflow with Apache NiFiDataWorks Summit/Hadoop Summit

Schema Registry - Set you Data FreeDataWorks Summit/Hadoop Summit

Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit

Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit

Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit

How Hadoop Makes the Natixis Pack More Efficient DataWorks Summit/Hadoop Summit

HBase in Practice DataWorks Summit/Hadoop Summit

The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit

Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopDataWorks Summit/Hadoop Summit

From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit

Backup and Disaster Recovery in Hadoop DataWorks Summit/Hadoop Summit

Mais de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production

State of Security: Apache Spark & Apache Zeppelin

Unleashing the Power of Apache Atlas with Apache Ranger

Enabling Digital Diagnostics with a Data Science Platform

Revolutionize Text Mining with Spark and Zeppelin

Double Your Hadoop Performance with Hortonworks SmartSense

Hadoop Crash Course

Data Science Crash Course

Apache Spark Crash Course

Dataflow with Apache NiFi

Schema Registry - Set you Data Free

Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...

Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...

Mool - Automated Log Analysis using Data Science and ML

How Hadoop Makes the Natixis Pack More Efficient

HBase in Practice

The Challenge of Driving Business Value from the Analytics of Things (AOT)

Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop

From Regulatory Process Verification to Predictive Maintenance and Beyond wit...

Backup and Disaster Recovery in Hadoop

Último

Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Histor y of HAM Radio presentation slidevu2urc

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Enterprise Grade Streaming under 2ms on Hadoop

1. Enterprise Grade Streaming Under 2ms On Hadoop @vijaysbhat

2. 2

3. 3 VS.

4. 4

5. 5

6. 6

7. 7 X (predictor) Spend amount, geo Y (response) Simple Velocity Advanced

8. 8

9. 9

10. 10

11. 11 Hard Metrics Goal Latency < 40ms Ideally < 16ms Throughput Goal of 2000 events / second Durability No loss, every message gets exactly one response Availability 99.5% uptime (downtime of 1.83 days / year); Ideally 99.999% uptime (downtime of 5.26 minutes / year) Scalability Can add resources, still meet latency requirements Integration Transparently connected to existing systems – Hardware, Messaging, HDFS Soft Metrics Goal Open Source All components licensed as open source Extensibility Rules can be updated, model is regularly refreshed

12. 12

13. 13 Onyx

14. 14 Enterprise Readiness RoadmapPerformance Community

15. 15

16. 16

17. 17

18. 18

19. 19

20. 20

21. 21

22. 22

23. 23

24. 24 Failure Handling

25. 25

26. 26 • Avg. 0.25ms, @70k records/sec, w/ 600GB RAM Thread Local on ~54M events Percentiles (in ms) Throughput Count Avg (ms) 90% 95% 99% 99.9% 4 9’s 5 9’s 6 9’s 70k/sec 54,126,1 22 0.19 1 1 1 2 2 5 6 Performance

27. 27 Durability • Two physically independent pipelines on the same cluster processing identical data • For the same tuple, we find the best-case time between two pipelines – 39 records out of 5.2M exceeded 16ms – 173 out of 5.2M exceeded 16ms in one pipeline but succeeded in the other • 99.99925% success rate – “Five Nines” • Average Latency of 0.0981ms

28. 28 @vijaysbhat linkedin.com/in/vijaysbhat

Notas do Editor

Talk about next generation real time decisioning system
Batch systems Scaling out challenging but no strict time requirements Cannot operate on one piece of data at a time Large latency
Mainframes High performance systems Highly specialized Very expensive, but extremely durable Open source Customize code Grow project organically Not going to get same level of performance
Availability is crucial Measure of downtime what fraction will be unavailable Measured in nines – 5 nines is 5 minutes
Free More flexible Can build yourself Future proof design
What we do today Credit card swiped, take decision on it based on model. Bunch of inputs produces decision Executed on mainframe
Simple features Aggregate features Need notion of state Need in memory DB due to disk latency of 100ms Advanced features Joining multiple sources
Current situation - Proprietary solution, slow refresh, costs a lot of money - Data is stale, models are stale, quality of decisions is low, throwing money away Swamp, don’t own anything. Mordor is dark and terrifying. Just throwing money at the swamp.
- How do we get to a better place?
Prove whatever we are doing is valid for the use case – have to convince business stakeholders that we have the right solution. - Need rigor
Open source Keep options open not stuck with proprietary solution Take our learnings give back to community Future proof what we do
Infosphere – built for NSA, handles video and audio traffic on a global scale. Hazelcast – gridgain ignite. Onyx – streaming platform built in Clojure Feedzai – proprietary Storm Cassandra stack
Performance – durability availability latency Enterprise ready – measure of confidence Roadmap – private conversations with folks behind platforms Community – how vibrant is the community, how is the adoption
We chose Apex, and I’ll go ahead and explain why
Not true streaming, microbatching 100s of ms latency unavoidable even with small window size Great for offline ETL and we use it for computing some of our slow moving features
Spark streaming is missing – nonstarter due to microbatch, lack of dynamic dag reconfiguration
Failure detection through ACK which cannot be configured lower than 1 second Replay from source, means for durability, need to have separate applications. Resource sharing not great on Storm which is why Twitter built Heron Fast Easy to use Mature *********** Failures are not independent, nimbus, no dynamic topologies, 1 sec ack, resource usage Community stagnating, only Horton Roadmap still to bring it to enterprise (Integration with YARN, elastic topologies, high availability)
How does Flink handle failures? Inject barriers / markers and checkpoint against them to disk Replay from last global checkpoint – still need to load from disk Reset whole pipeline on failure – can’t have truly independent pipelines Baidu running 1000 node cluster Easy to Use Fast Support for SQL-like queries ----- Meeting Notes (10/27/15 10:14) ----- Reset to upstream data source, Shared JVM, No dynamic topologies Young community Roadmap – fine grained fault tolerance, in-memory store integration, off-heap memory, full SQL
Brings us to Apex
Veterans from Yahoo! Finance and Hadoop Built for Enterprise stability and durability before performance ***************** Phu Hoang - CEO and Co-Founder head of engineering at Yahoo Amol Kekre - Led Yahoo! Finance and Led YARN Chetan - Lead architect from Yahoo Finance Thomas Weise - Hadoop Veteran from Yahoo
Dynamic modifying topologies on a live system without affecting performance Also supports dynamic partitioning when data gets skewed
Fine grained control of locality for streams thread, node, container, rack Can define both affinity and anti-affinity
Independent pipelines deployed in the same application. Buffer server offers fine grained state persistence so whole topology doesn’t need to be reset
Different pieces of data (partitions) going through same logical operator by mapping it to physical instances of operator Independence of partitions Auto-scaling (throughput and latency)
10 node cluster
- If either one of them responds within 16 ms we have won

Enterprise Grade Streaming under 2ms on Hadoop

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Enterprise Grade Streaming under 2ms on Hadoop

Semelhante a Enterprise Grade Streaming under 2ms on Hadoop (20)

Mais de DataWorks Summit/Hadoop Summit

Mais de DataWorks Summit/Hadoop Summit (20)

Último

Último (20)

Enterprise Grade Streaming under 2ms on Hadoop

Notas do Editor