Mais conteúdo relacionado

Similar a Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms(20)

Mais de Anant Corporation(20)

Data Engineer's Lunch #81: Reverse ETL Tools for Modern Data Platforms

  1. Modern Open Data Platform : Cool Open Source Tools Crafting your Dream Stack with the Open Data Platform Playbook Rahul Xavier Singh Anant Corporation Data Engineer’s Lunch / Anant Webinar 11/07/2022
  2. Playbook Design Framework Approach ETL / Reverse ETL Customer Data Platforms Components DataOps Agenda
  3. We help platform owners reach beyond their potential to serve a global customer base that demands Everything, Now.
  4. We design with our Playbook, build with our Framework, and manage platforms with our Approach so our clients Think & Grow Big.
  5. Customer Success
  6. Challenge Business Platform Playbook Framework Approach Technology Management Solutions [Data] Services Catalog Fully Managed Service Subscriptions We offer Professional Services to engineer Solutions and offer Managed Services to clients where it makes sense, after an Assessment
  7. 7 Modern Technology is Disconnected https://chiefmartec.com/2020/04/marketing-technology-landscape-2020-martech-5000/ Businesses want to : - Create value - Get the customer - Deliver the value - Get paid
  8. 8 Most Users Just Want / Need to … FIND DISCOVER FILTER ANALYZE VISUALIZE MEASURE ACT USE SHARE
  9. 9 Business / Platform Dream Enterprise Consciousness : - People - Processes, - Information - Systems Connected / Synchronized. Business has been chasing this dream for a while. As technologies improve, this becomes more accessible. Image Source: Digital Business Technology Platforms, Gartner 2016
  10. 10 Going Beyond “Reactive Manifesto” / 12 Factor References: https://12factor.net/, https://www.reactivemanifesto.org/ - Current Business Information is available to People in the swiftest way possible within the bounds of reasonable costs. - Business Information is generally available to the enterprise, siloed only by security and governance. - Data platforms make use of appropriate resources for hot vs. cold, raw vs. enhanced data. - Data platforms are always available, redundant, always trying to achieve a RPO/RTO of zero. Project Information Client Service Information Corporate Guides Collaborative Documents Assets & Files Corporate Assets Unified User Experience
  11. Challenges of Managing Data Platforms in a Growing Enterprise
  12. Optimized Core enabled Business Modularity This process needs to be done in sequence. Otherwise we end up having to redo the work.
  13. Business Silos Standardized Platform Optimized Core Business Modularity Phases of Business Modularity
  14. 14 Generic Data Platform Operations
  15. Modern Open Data Platform
  16. Design Contexts Responsibilities Approach Framework Tools
  17. 17 So Many Different “Modern Stacks?” Lots of “reference” architectures available. They tend not to think about the speed layer since they are focusing on batch. What about SPEED?
  18. 18 How do you choose from the landscape? Lots and lots of components in the Data & AI Landscape. Which ones are the right ones for your business?
  19. 19 Playbook for Modern Open Data Platform Platform Design Evaluate Framework Cloud - Public - Private - Hybrid Data - Data:Object - Data:Stream - Data:Table - Data:Index - Processor:Batch - Processor:Stream DataOps - ETL/ELT/EtLT - Reverse ETL - Orchestration DevOps - Infrastructure as Code - Systems Automation - Application CICD Architecture (Design) - Cloud - Data - DevOps - DataOps Engineering - Configuration - Scripting - Programming Operation - Setup / Deploy - Monitoring/Alerts - Administration User Experience - No-Code/Low Code Apps/Form Builders - Automatic API Generator/Platform - Customer App/API Framework Execute Approach Discovery (Inventory) - People - Process - Information (Objects) - Systems (Apps)
  20. Modern Enterprise Canvas Workflow Approval Customer Acquisition Customer Payment Customer Information Customer Information Customer Information Business Information Billing Information Zoho App Creator Unbounce Zoho CRM Stripe Zapier Contexts - People - Process - Information - Systems Responsibility Areas - Products & Services - Sales & Marketing - Operations & Infrastructure - Research & Development - Finance & Accounting - Leadership & Management
  21. Modern Enterprise Canvas Contexts - People - Process - Information - Systems Responsibility Areas - Customer - Users - Business - Product Owners - Engineering - Developers - Operations - Administrators
  22. Framework
  23. Framework Distributed Realtime Extendable / Open Automated Monitored / Managed
  24. Public Cloud Native - Amazon
  25. Public Cloud Native - Microsoft
  26. Public Cloud Native - Google
  27. Cool Tools: Optimizing Distributed Data with Cloud vs. Open Core with Open Source Tools
  28. Open Core Distributed Data Platforms To create globally distributed and real time platforms, we need to use distributed realtime technologies to build your platform. Here are some. Which ones should you choose?
  29. Open Core Data Modernization / Automation / Integration In addition to vastly scalable tools, there are also modern innovations that can help teams automate and maximize human capital by making data platform management easier.
  30. Framework Components ● Major Components ○ Persistent Queues ( RAM/BUS) ○ Queue Processing & Compute ( CPU) ○ Persistent Storage (DISK/RAM) ○ Reporting Engine (Display) ○ Orchestration Framework (Motherboard) ○ Scheduler (Operating System) ● Strategies ○ Cloud Native on Google ○ Self-Managed Open Source ○ Self-Managed Commercial Source ○ Managed Commercial Source Customers want options, so we decided to create a Framework that can scale with whatever Infrastructure and Software strategy they want to use.
  31. 31 Framework
  32. Approach
  33. Approach Setup Training Administration Configuration Knowledge
  34. Approach 34
  35. Sample STACK Outline 35 Framework Platform Component s Resources Platform Setup Training Administrati on Configuratio n Knowledge ● Components ○ Infrastructure ■ Source / Git ■ Github ■ Gitlab ■ Cloud / Public ■ AWS ■ Azure ■ GCP ■ DO ■ Orchestration ■ Terraform ■ Terraform / Atlanits ■ Configuration ■ Ansible ■ Ansible / AWX / Semaphore ○ Compute ■ Datastax / Spark ■ Datastax / Livy ■ Databricks ○ Data / Open Core ■ Datastax Enterprise ■ Cassandra ■ Search / Solr ■ Graph ■ Confluent Platform ○ Data / Cloud ■ Datastax / Astra ■ Confluent Cloud ○ Data / Open Source ■ Cassandra ■ Kafka ■ Elassandra ■ YugaByte ■ Scylla ■ Pulsar ○ Application ■ Airflow ■ Airbyte ■ Kafka Streams ■ Jupyter ■ Redash ■ Metabase ■ Superset ■ Zeppelin
  36. Use Case: Standard Data Fabric
  37. 37 How Distributed Data Helps Drive Enterprise Consciousness XDCR: Cross datacenter replication is the ultimate data fabric. Resilience, performance, availability, and scale. Made widely available by Cassandra and Couchbase
  38. 38 Modern Open Data Platform + Cool Database = Data Fabric One cluster, many workloads. With any other “Data Warehouse”, this would be problematic. With Cassandra, this is a core feature.
  39. 39 How YugaByteDB allows us to go further… All the benefits of XDCR and …. - More Data Density at High Speed - YCQL Queries to support Non Relational / C* CQL like queries. - YSQL Queries to support Relational / SQL Queries - Transactions/Consistency - …
  40. 40 Let’s Get Data into a Database - Easier Today Open Source: - Airbyte / RudderStack makes ETL Easier and are open source - Kafka Connect / Pulsar IO can convert ETL into Streaming ETL SaaS/PaaS: - SaaS like Stitch/HevoData - Supported versions of Airbyte/RudderStack
  41. 41 Once It’s There, Serve it , Do More Processing Open Source: - Flink / Spark / Kafka Streams can be used to save Analytics / ML processed data. - Hasura can help serve data as GraphQL, PostgREST can expose REST apis.
  42. 42 Open Source: - Grouparoo / Airbyte , RudderStack are free. Others are paid. - You can always use Kafka Connect / Pulsar IO to send data back also. Let’s send it back via Reverse ETL! Reverse ETL is the process of copying data from a warehouse into business applications like CRM, analytics, and marketing automation software. You perform this process by using a reverse ETL tool that integrates with your data source and your business SaaS tools. - Segment Blog https://segment.com/blog/reverse- etl/
  43. 43 Let’s put it all together now - ONE DATA FABRIC Cassandra isn’t the only database to do XDCR that can enable multiple workloads. Yugabyte also offers a PostgreSQL compliant Layer
  44. 44 Key Takeaways for Open Data Platforms Don’t reinvent the wheel. Prioritize DevOps / DataOps Document the STACK Identify the Objectives - Identify the objectives so that you know what success looks like. - DevOps / DataOps combined with a true agile approach allows you to iterate your platform quickly. - Put the data into a distributed data store that supports SQL/CQL, and possibly archive it into Parquet/Iceberg (historical data) - Get the data out to your Systems using “Reverse ETL” tools. Use open tools that are well supported
  45. 45 Thank you and Dream Big. Hire us - Design Workshops - Innovation Sprints - Service Catalog Anant.us - Read our Playbook - Join our Mailing List - Read up on Data Platforms - Watch our Videos - Download Examples

Notas do Editor

  1. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  2. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  3. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  4. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  5. Challenge Currently the components are broken up in to different vendors and parts. Similar to building a computer every time for every client.
  6. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  7. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  8. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  9. Challenge Currently the components are broken up in to different vendors and parts. Similar to building a computer every time for every client.
  10. Challenge Currently the components are broken up in to different vendors and parts. Similar to building a computer every time for every client.
  11. Challenge Currently the components are broken up in to different vendors and parts. Similar to building a computer every time for every client.
  12. Challenge Currently the components are broken up in to different vendors and parts. Similar to building a computer every time for every client.
  13. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  14. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  15. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  16. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  17. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  18. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  19. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  20. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  21. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.