O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic Databases



1.799 visualizações

Publicada em

3 Things to Learn About:

*How Apache Kudu enables users to do more than ever before with their Analytic and Operational Databases
*How Cloudera has built two versatile databases to help our customers tackle their hardest problems.
*How the addition of Apache Kudu to this mix will enable new use cases around real-time analytics, internet of things, time series data, and more.

Publicada em: Software
  • Hello! Get Your Professional Job-Winning Resume Here - Check our website! https://vk.cc/818RFv
       Responder 
    Tem certeza que deseja  Sim  Não
    Insira sua mensagem aqui

Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic Databases



  1. 1. 1© Cloudera, Inc. All rights reserved. Apache Kudu Webinar Series Extending the Capabilities of Cloudera’s Operational and Analytic Databases Alex Gutow & Ryan Lippert | Cloudera
  2. 2. 2© Cloudera, Inc. All rights reserved. Kudu Webinar Series Part 1: Lambda Architectures – Simplified by Apache Kudu A look into the potential trouble involved with a lambda architecture, and how Apache Kudu can dramatically simplify real-time analytics. Part 2: Extending the Capabilities of Operational and Analytical Databases An examination of how Apache Kudu expands the set of use cases that Cloudera’s Operational and Analytical databases can handle. Part 3: Data-in-Motion: Unlock the Value of Real-Time Data Forrester will discuss their research into real-time data pipelines and analytics, and Cloudera will discuss how https://www.cloudera.com/about-cloudera/events/webinars/kudu-webinar-series.html
  3. 3. 3© Cloudera, Inc. All rights reserved. Updateable Analytic Storage Simple real-time analytics and updates with Apache Kudu Kudu: Storage for fast analytics on fast data • Simplified architecture for building real-time analytic applications • Designed for next-generation hardware for faster analytic performance across frameworks • Native Hadoop storage engine Flexibility for the right tools for the right use case in one platform • Only analytic database for big data with Kudu + Impala • Simple real-time applications with Kudu + Spark Use cases • Time series data • Machine data analytics • Online reporting STRUCTURED Sqoop UNSTRUCTURED Kafka, Flume PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT YARN SECURITY Sentry, RecordService STORE INTEGRATE BATCH Spark, Hive, Pig MapReduce STREAM Spark SQL Impala SEARCH Solr OTHER Kite NoSQL HBase OTHER Object Store FILESYSTEM HDFS RELATIONAL Kudu
  4. 4. 4© Cloudera, Inc. All rights reserved. HDFS Fast Scans, Analytics and Processing of Stored Data Fast On-Line Updates & Data Serving Arbitrary Storage (Active Archive) Fast Analytics (on fast-changing or frequently-updated data) Filling the Analytic Gap Unchanging Fast Changing Frequent Updates HBase Append-Only Real-Time Kudu Kudu fills the Gap Modern analytic applications often require complex data flow & difficult integration work to move data between HBase & HDFS Analytic Gap Pace of Analysis PaceofData
  5. 5. 5© Cloudera, Inc. All rights reserved. Better Together Kudu Benefits from Integration with the Apache Ecosystem Spark – Stream Processing for Kudu • Open standard for real-time stream processing • Effective for automating decision processes and machine learning • Use Cases include: Time Series Data & Machine Data Analytics Impala – High-Performance BI & SQL for Kudu • Open standard for interactive SQL queries • Powers analytic database workloads with flexibility, scale, and open architecture • Use Cases include: Time Series Data & Online Reporting
  6. 6. 6© Cloudera, Inc. All rights reserved. Apache Kudu Availability Data-driven applications to deliver real-time insights. Operational Database Explore, analyze, and understand all your data. Analytic Database Process data, develop and serve predictive models. Data Engineering
  7. 7. 7© Cloudera, Inc. All rights reserved. Kudu for the Operational Database Expand Addressable Use Cases
  8. 8. 8© Cloudera, Inc. All rights reserved. Operational Database Durable, low latency storage for web applications, message stores, and mission critical operational activities. Web-Scale Data Depot Identifying meaningful events based on multiple data streams and taking action. Complex Event Processing Use data and current/past events to score and serve the likelihood of subsequent events. Model Scoring/Serving
  9. 9. 9© Cloudera, Inc. All rights reserved. Storage Processing/ Exploration Unique Components Cloudera’s Operational Database Fast/random reads and writes via a high-performance, distributed NoSQL data store HBase Fast analytics on fast data with a relational structure Kudu Faceted, text-based search for data exploration and democratization Cloudera Search Powerful and flexible processing, streaming, and SQL Spark Multi-Storage Multi-Environment Encryption, Key Trustee Navigator Storage & Governance
  10. 10. 10© Cloudera, Inc. All rights reserved. Kudu Keeps Your Business Operational Machine Data Analytics Inserts, scans, lookups Workload Real-time data inserts with the ability to analyze trends identifies potential problems. Kudu identifies trouble through: • Unlimited storage, yielding better historic trend analysis • Fast inserts to enable an up-to-date network view • Fast scans identify/flag undesired states for remedy Examples Network threat detection; network health monitoring; application performance monitoring
  11. 11. 11© Cloudera, Inc. All rights reserved. Kudu Increases the Value of Time Series Data Time Series Inserts, updates, scans, lookups Workload Examples Stream market data; IoT; fraud detection & prevention; risk monitoring; connected cars; Time series data is most valuable if you can analyze it to change outcomes in real time. Kudu simulateneously enables: • Time series data inserted/updated as it arrives • Analytic scans to find trends on fresh time series data • Lookups to quickly visit the point in time where an event occured
  12. 12. 12© Cloudera, Inc. All rights reserved. Operational DB: Real-Time Architecture Driving the Model Through Machine Learning Kafka Spark Streaming Kudu Spark MLlib Application Data Sources Individual Session Full Model/Learning Genesis Spark 1 Event Occurs 2 Messaging 3 Stream Processing 4 Land in RDBMS 5 Apply ML Libraries
  13. 13. 13© Cloudera, Inc. All rights reserved. Operational DB: Real-Time Architecture MLlib & K-Means: Defining Microsegments via Machine Learning Height Weight Height Weight 1 2 Height Weight 3 Height Weight 4 L M S XL L M S XS Near Custom ?
  14. 14. 14© Cloudera, Inc. All rights reserved. Operational DB: Real-Time Architecture Determining the Next Best Action Kafka Spark Streaming Kudu Spark MLlib Application Data Sources Individual Session 1 Data Processed Genesis Spark 2 Request Processed/ Kudu Queried 3 4 Results Returned Results Processed 5 Processed Data Returned Full Model/Learning
  15. 15. 15© Cloudera, Inc. All rights reserved. Operational DB: Real-Time Architecture Determining the Next Best Action Step 1: Data Processed Apache Spark processes the data from the event (IoT, clickstream, markets, etc), which potentially involves keeping a running list of the last X number of events Step 2: Request Processed/Kudu Queried A Spark application uses the data gathered in step one to query Kudu’s database in a predefined manner to look for similar patterns defined via machine learning Step 3: Kudu Results Returned Kudu returns the results from the query in step 2 back to Spark to determine what needs to be returned to the application Step 4: Results Processed Spark associates the results from Kudu with the information stored from the current event to determine the next step to feed back to the application Step 5: Processed Data Returned The machine-generated, best possible outcome is prescribed and served to the application
  16. 16. 16© Cloudera, Inc. All rights reserved. Operational DB: Cybersecurity Use Case Discovering APT in Your Network Kafka Spark Streaming Kudu Spark MLlib Application Data Sources Individual Session Rogue User Spark Full Model/Learning Data Request Sent For Stream Processing Data Cleaned/Ordered/Processed, Then Delivered to Kudu for Modelling Access verified, initial data delivered, subsequent requests aggregated and compared to standard user/role behavior Illustrative, models will likely have >2 dimensions
  17. 17. 17© Cloudera, Inc. All rights reserved. Kudu for the Analytic Database Enabling Real-Time Updates
  18. 18. 18© Cloudera, Inc. All rights reserved. Analytic Database More data of all types is being tapped for analytics, across environments Self-Service BI & Data Open up new possibilities for real-time insights as data changes Real-Time Analysis BI & analytics are critical but only tell part of the story. Get more value by sharing data across workloads Converged Workloads
  19. 19. 19© Cloudera, Inc. All rights reserved. Cloudera’s Analytic Database Identify, offload, & optimize workloads to Hadoop Navigator Optimizer Intelligent SQL editor Hue Audit, lineage, encryption, key management, & policy lifecycles Navigator Integration with the leading BI tools BI Partners Interactive query engine for BI & SQL analytics Impala Large-scale ETL & batch processing engine Hive-on- Spark Multi-Storage, Multi-Environment Data Storage for Fast & Changing Data Kudu
  20. 20. 20© Cloudera, Inc. All rights reserved. Anatomy of an Analytic Database Cloudera Decoupled by Design Query Engine Storage Engine Catalog Query Engine (Impala) Catalog (HMS) Monolithic Analytic Database Modern Analytic Database Storage (Kudu) Storage (S3) Storage (HDFS)
  21. 21. 21© Cloudera, Inc. All rights reserved. Key Benefits An analytic database designed for Hadoop High-Performance BI and SQL Analytics Flexibility for Data and Use Case Variety Cost-effective Scale for Today and Tomorrow Go Beyond SQL with an Open Architecture
  22. 22. 22© Cloudera, Inc. All rights reserved. Handle Time Series Data in Real-Time Time Series Real-time analytics on live data Enables: • Time series data inserted/updated on arrival • Analytic scans to find trends on fresh data • Point-in-time lookups to quickly find where an event occurred Examples Streaming market data, IoT, fraud detection & prevention, risk monitoring, connected cars Workload Inserts, updates, scans, lookups
  23. 23. 23© Cloudera, Inc. All rights reserved. More Versatility in Online Reporting Remove the limits of online reporting Enables: • Always-on, unlimited storage, eliminating archival needs • Fast inserts/updates to keep data fresh • Fast lookups and analytic scans with one data store Examples Operational data store Workload Inserts, updates, scans, lookups Online Reporting
  24. 24. 24© Cloudera, Inc. All rights reserved. Fast Analytics with Updates (pre-Kudu) Complexity & Latency Considerations: ● How do I handle failure during this process? ● How often do I reorganize data streaming in into a format appropriate for reporting? ● When reporting, how do I see data that has not yet been reorganized? ● How do I ensure that important jobs aren’t interrupted by maintenance? New Partition Most Recent Partition Historic Data HBase Parquet File Have we accumulated enough data? Reorganize HBase file into Parquet • Wait for running operations to complete • Define new Impala partition referencing the newly written Parquet file Incoming Data (Messaging System) Reporting Request Impala on HDFS
  25. 25. 25© Cloudera, Inc. All rights reserved. Real-Time Analytics Today (with Kudu) Simpler Architecture, Superior Performance Impala on Kudu Incoming Data (Messaging System) Reporting Request
  26. 26. 26© Cloudera, Inc. All rights reserved. LOWER BUSINESS RISKS Re-architected system to meet critical latency requirements for fraud detection • Would have been cost-prohibitive & slower with legacy system • Single platform for RT fraud detection alerts and NRT executive monitoring dashboards • Achieved <2s response time for SQL queries Credit Card Processing System
  27. 27. 27© Cloudera, Inc. All rights reserved. Needed simplified system for more and faster analysis • Understand trends better with more data & detect/respond to anomalies faster • Single platform for both analytics and operational reporting • Met compliance requirements Healthcare Services Provider
  28. 28. 28© Cloudera, Inc. All rights reserved. Demo
  29. 29. 29© Cloudera, Inc. All rights reserved. Connected Car Demo Architecture Data Generator Spark Streaming Impala Kafka Kafka • Time • VIN • Miles • xAccel • yAccel • zAccel • Speed • Brakes • LaneDeparture • Signal • CollisionDetected • HazardDetected • Latitude • Longitude Kudu
  30. 30. 30© Cloudera, Inc. All rights reserved. Data is Transforming Business DRIVE CUSTOMER INSIGHTS IMPROVE PRODUCT & SERVICES EFFICIENCY LOWER BUSINESS RISKS MODERN DATA ARCHITECTURE DATA SCIENCE & ENGINEERING ANALYTIC DATABASE OPERATIONAL DATABASE
  31. 31. 31© Cloudera, Inc. All rights reserved. Next Steps Join us on March 8th to learn more about data in-motion https://www.cloudera.com/about-cloudera/events/webinars/kudu-webinar-series.html Get Started with Kudu & Cloudera Start Contributing to Kudu • www.cloudera.com/downloads • https://blog.cloudera.com/?s=kudu http://kudu.apache.org/
  32. 32. 32© Cloudera, Inc. All rights reserved. Thank you See you on March 8th!

×