Anúncio
Anúncio

Mais conteúdo relacionado

Apresentações para você(20)

Similar a Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ram Dhakne | Current 2022(20)

Anúncio

Mais de HostedbyConfluent(20)

Anúncio

Apache Kafka With Spark Structured Streaming With Emma Liu, Nitin Saksena, Ram Dhakne | Current 2022

  1. ©2022 Databricks Inc. — All rights reserved Kafka with Spark Structured Streaming and Beyond: Building Real-Time Data Processing and Analytics with Databricks Emma Liu Staff Product Manager, Databricks Nitin Saksena Sr. Director of Omni Channel Architecture, Albertsons Companies Ram Dhakne Staff Solutions Engineer, Confluent
  2. ©2022 Databricks Inc. — All rights reserved Databricks The Lakehouse Company Global adoption Over 7000 customers, from F500 to unicorns Inventor and pioneer of the data lakehouse Gartner recognized leader in both ● Database Management Systems ● Data Science and Machine Learning Platforms Creator of highly successful OSS data projects: Delta Lake, Apache Spark, and MLflow Raised over $3B in investment 3000+ employees across the globe
  3. ©2022 Databricks Inc. — All rights reserved 3 Simple Unify your data warehousing and AI use cases on a single platform Multicloud One consistent data platform across clouds Open Built on open source and open standards Databricks Lakehouse Platform Lakehouse Platform Data Warehousing Data Engineering Data Science and ML Data Streaming All structured and unstructured data Cloud Data Lake Unity Catalog Fine-grained governance for data and AI Delta Lake Data reliability and performance
  4. ©2022 Databricks Inc. — All rights reserved Data Governance Data Warehousing Data Engineering Data Science and ML Data Streaming BI and Dashboards Machine Learning Data Science Consulting & SI Partners Databricks thrives within your modern data stack Data Pipelines Unity Catalog Delta Lake Cloud Data Lake Data Ingestion
  5. ©2022 Databricks Inc. — All rights reserved Speaker: Emma Liu Benyue (Emma) Liu Staff Product Manager – Data Ingestion @ Databricks Previous Experiences: Product Manager at TigerGraph, MarkLogic; Software Engineer at Oracle Domain Focus: Cloud Computing, DBaaS, Multi-Model & Graph Databases, Developer/DBA Tools, Containerization MS in Technology and Policy Program at MIT BS in General Engineering at Harvey Mudd College
  6. ©2022 Databricks Inc. — All rights reserved Spark Structured Streaming Project Lightspeed Topic All structured and unstructured data Cloud Data Lake Unified Governance and Security Data Reliability and Performance Lakehouse Platform Data Warehousing Data Engineering Data Science and ML Data Streaming Developer Experience Kafka and Spark Structured Streaming
  7. ©2022 Databricks Inc. — All rights reserved Continuously generated and unbounded data Machine & Application Logs Clickstreams Mobile & IoT Data DB Change Data Feeds Application Events Streaming Data The vast majority of the data in the world is streaming data! 7
  8. ©2022 Databricks Inc. — All rights reserved Stream Processing is continuous and unbounded Stream Processing 8 Traditional Processing is one-off and bounded 1 Data Source 2 Processing Data Source Processing
  9. ©2022 Databricks Inc. — All rights reserved Stream Processing in the Lakehouse Databases Streaming Sources Cloud Object Stores SaaS Applications NoSQL On-premises Systems Data Sources BI Reporting Dashboarding Data Science Machine Learning Operational Apps Data Consumers Transform and improve quality Stream/batch processing Spark, Structured Streaming (Photon) Ingest Auto Loader Serve / Share Orchestrate all processes Monitor & Govern Delta Live Tables Jobs
  10. ©2022 Databricks Inc. — All rights reserved Technical Advantages A more intuitive way of capturing and processing continuous and unbounded data Lower latency for time sensitive applications and use cases Better fault-tolerance through checkpointing Higher compute utilization and scalability through continuous and incremental processing 10
  11. ©2022 Databricks Inc. — All rights reserved Business Benefits 11 BI and SQL Analytics Fresher and faster insights Quicker and better business decisions Data Engineering Sooner availability of cleaned data More business use cases Data Science and ML More frequent model update and inference Better model efficacy Event Driven Application Faster customized response and action Better and differentiated customer experience
  12. ©2022 Databricks Inc. — All rights reserved 12 Project Lightspeed
  13. ©2022 Databricks Inc. — All rights reserved Project Lightspeed Faster and simpler stream processing Predictable Low Latency Target reduction in tail latency by up to 2x Enhanced Functionality Advanced capabilities for processing data with new operators and easy to use APIs Operations & Troubleshooting Simplifying deployment, operations, monitoring, and troubleshooting Connectors & Ecosystem Improving ecosystem support for connectors, authentication & authorization features 13
  14. ©2022 Databricks Inc. — All rights reserved 14 Apache Kafka & Apache Spark Structured Streaming
  15. ©2022 Databricks Inc. — All rights reserved Apache Kafka and Apache Spark Structured Streaming Apache Kafka Connector for Spark Structured Streaming Structured Streaming End-to-end Open Source Pipeline 15
  16. ©2022 Databricks Inc. — All rights reserved Apache Kafka Connector for Spark Structured Streaming Apache Kafka Connector for Spark Structured Streaming Structured Streaming
  17. ©2022 Databricks Inc. — All rights reserved Apache Kafka and Databricks Auto Loader Auto Loader Cloud Storage (S3, ADLS, GCS) Structured Streaming
  18. ©2022 Databricks Inc. — All rights reserved Confluent Cloud - Databricks Delta Lake Sink Connector Cloud Storage Delta Lake Sink Connector
  19. ©2022 Databricks Inc. — All rights reserved Latency Convenience Cost Choose the right latency, convenience, and cost tradeoff for each specific use case 19
  20. ©2022 Databricks Inc. — All rights reserved 20 Supercharge Lakehouse with Streaming Data Databricks and Confluent
  21. ©2022 Databricks Inc. — All rights reserved 21 Albertsons
  22. Building Real-Time Data Processing and Real time Analytics Nitin Saksena Senior Director of Omni Channel Architecture, Albertsons Companies
  23. ALBERTSONS COMPANIES 23 Albertsons Companies is one of the largest food and drug retailers in the country. Locally great, Nationally strong
  24. ALBERTSONS COMPANIES 24 Our Purpose To bring people together around the joys of food and to inspire well-being
  25. ALBERTSONS COMPANIES 25 Albertsons – Locally great, Nationally strong Environment, Social and Governance (ESG) efforts are focused on: Planet, People, Product and Community
  26. ALBERTSONS COMPANIES 26 Nitin Saksena Senior Director of Omni Channel Architecture, Albertsons Nitin is leading the Enterprise Architecture Team across eCommerce, Digital Shopping Experience, Marketing and Media Collective, Merchandizing and Pharmacy. He has 19+ years of experience in industry predominantly in Retail. Nitin has architected several key initiatives in Loyalty, Offer Execution and Redemption, Order Management and Customer Support. His areas of interest is to solve complex business problems with simple solutions.
  27. ALBERTSONS COMPANIES 27 Technology Transformation and Strategic Priorities
  28. ALBERTSONS COMPANIES 28 Technology Transformation - Key Tenets Strengthen talent with an agile and high energy Global Team High Performance Team Build easy frictionless & differentiated Omni- Channel Experiences Omni-Channel Experiences Create agile, scalable and reliable technology with Platform Modernization Modern Technology Platform Maximize value for our customers and employees with intelligent data Intelligent Data Leverage technology to drive productivity in the enterprise Productivity Strong Technology Foundation to Support Accelerated Business Growth
  29. ALBERTSONS COMPANIES 29 Technology Strategic Priorities DRIVERS Intelligent Agility Cognitive Realtime Connected Autonomous Foundation Modernization & Migration to Public Cloud Enterprise Data Platform Network Modernization InfoSecurity Global People & Process Optimization Competitive Digital & eCommerce Optimized Promotions Forecasting & Replenishment Run Great Stores Smart Stores Supply Chain of the Future Scaled & Differentiated eCommerce Strategic Data Health & Wellness Expansion Loyalty Programs Business Priorities EASY & FRICTIONLESS EXCITING & INNOVATIVE DIGITAL PENETRATION LONG TERM CUSTOMER RELATIONSHIPS
  30. ALBERTSONS COMPANIES 30 Architecture Deep Dive
  31. ALBERTSONS COMPANIES 31 Real Time Data Processing using Databricks Distributing Offers and Customer Clips to each store in Near Real Time Distribution of Offers to the relevant Stores Distribution of Clips in near real time to the frequently shopped store. An improved Store checkout experience. Real Time Business Reporting and Dashboard Supply Chain Order Forecast Warehouse Order Management and Delivery Labor Forecasting and Management Demand planning Inventory management Inventory updates from 2200 stores to the cloud services in near real time Maintain a visibility in the health of the Stores Well Managed Out of Stock and Substitution recommendations for Digital Customers Ingesting Transactions in Near Real Time to Data Hub Generate Near Real Time Dashboards for Associates and Business Feed Data in Near Real time to Data Models for in-session Hyper Personalization
  32. ALBERTSONS COMPANIES 32 Real Time Data Processing Using Databricks Real Time Data Processing using Databricks in an Event Driven Architecture Problem Statement • Merchandising Team creates Offers and Promotions. • The Offers are created at Banner, Division or Individual Store Levels • These Offers are to be presented to our customers. • More importantly these offers are sent to the stores across the country. • Customers view these offers on digital channel and clip them. • These Clips are sent to frequently shopped stores in real time. • Clipped Offers are applied on the transactions online and in stores • Stores and Online system are notified of the redeemed offers Stores Offers Clips Offers
  33. ALBERTSONS COMPANIES 33 Real Time Data Processing using Databricks Offer Management System API API Master Data Stores Offer Expansion and Distribution Clips Distribution to relevant store
  34. ALBERTSONS COMPANIES 34 Real Time Analytics Using Databricks API Operational Database Event Database Extract Metadata out of Business Events Cross Reference the Metadata with other Event Metadata Drive the Metrics out of Event Cross References Derived Metrics Dashboard Application
  35. ALBERTSONS COMPANIES 35 Future Use Case with Real Time Data Processing Hyper Personalization of Content (Offers, Recipes) In session recommendation using runtime data models Next Best Action Reduced Out of Stock Pick and Improve Substitution By using real time inventory signals
  36. ©2022 Databricks Inc. — All rights reserved 36 Databricks & Confluent Partnership
  37. ©2022 Databricks Inc. — All rights reserved Speaker: Ram Dhakne Ram Dhakne Ram works as a Staff Solutions Engineer at Confluent. He has a wide array of experience in NoSQL databases, Filesystems, Distributed Systems and Apache Kafka. He supports various industry verticals ranging from large financial services, retail, healthcare, telecom, and utilities companies towards their digital modernization journey. His current interests are in helping customers adopt realtime event streaming technologies using Kafka. As a part-time hobby, he has authored two children’s books.
  38. ©2022 Databricks Inc. — All rights reserved Modern data platforms are powering multiple real-time apps across industries Retail Drive consumer analytics & streamline operations Healthcare Provide patients better choices & doctors better insight Banking Combat fraud & remain competitive Automotive Amplify vehicle intelligence & safety Inventory Management Personalized Promotions Product Development & Introduction Sentiment Analysis Supply chain logistics Systems of Scale for High Traffic Periods Connected Health Records Data Confidentiality & Accessibility Dynamic Staff Allocation Optimization Integrated Treatment Proactive Patient Care Real-Time Monitoring Capital Management Early-On Fraud Detection Market Risk Recognition & Investigation Preventive Regulatory Scanning Real-Time What-If Analysis Trade Flow Monitoring Advanced Navigation Environmental Factor Processing Fleet Management Predictive Maintenance Threat Detection & Real-Time Response Traffic Distribution Optimization
  39. ©2022 Databricks Inc. — All rights reserved
  40. ©2022 Databricks Inc. — All rights reserved Multiple deployment models Confluent Platform The Enterprise Distribution of Apache Kafka Confluent Cloud Apache Kafka Re-engineered for the Cloud Self-Managed Software Fully-Managed Service VM Deploy on any platform, on-prem or cloud Available on the leading public clouds
  41. ©2022 Databricks Inc. — All rights reserved From migration to modernization How Confluent and Databricks accelerate your journey to real-time analytics in the cloud
  42. ©2022 Databricks Inc. — All rights reserved Modernize your data platform with Confluent + Databricks Reduce total cost of ownership Process data in stream to lower DW costs. Lower data pipeline TCO with fully managed cloud service. Power new analytics and apps Link on-prem and cloud for easier data movement across environments with real-time event streaming to modernize your analytics and applications Get more data to and from your DW Break data silos and easily connect your data warehouse to popular sources and sinks using Confluent’s 120+ pre-built connectors = Real-time connections & streams
  43. ©2022 Databricks Inc. — All rights reserved Managed Connectors (for near real time data delivery) ● Bring Data from everywhere to everywhere ● Multi-Cloud Native ● Open Source Kafka - hardened for Enterprise ● Single platform for BI, AI, ML on all data ● Multi-Cloud Native ● Open source (Spark, MLFlow, Delta Lake) Multiple Integrations Spark Structured Streaming (for real time data delivery) Best in class solution for real-time analytics Delivered at scale with the speed, security, and reliability required by enterprises
  44. ©2022 Databricks Inc. — All rights reserved Next steps Get started Try Databricks for free databricks.com/try-databricks Visit the Databricks booth Databricks Get started Try Confluent for free with $400 in free credits confluent.io/confluent-cloud Visit the Confluent booth Confluent
  45. ©2022 Databricks Inc. — All rights reserved Q&A
  46. ©2022 Databricks Inc. — All rights reserved
  47. ©2022 Databricks Inc. — All rights reserved Thank you! 47
Anúncio