This document discusses building a modern open data platform using open source tools. It introduces Anant Corporation and their playbook, framework, and approach for designing data platforms. Various open source tools are presented for building distributed, real-time data platforms including Cassandra, Kafka, Airflow, and more. The document provides an overview of how to choose the right tools to optimize core capabilities, achieve business modularity, and connect business information systems.
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Modern Open Data Platform: Cool Open Source Tools Crafting your Dream Stack
1. Modern Open Data Platform :
Cool Open Source Tools
Crafting your Dream Stack with the Open Data Platform
Playbook
Rahul Xavier Singh Anant Corporation
Data Engineer’s Lunch / Anant Webinar 11/07/2022
7. 7
Modern Technology is Disconnected
https://chiefmartec.com/2020/04/marketing-technology-landscape-2020-martech-5000/
Businesses want to :
- Create value
- Get the customer
- Deliver the value
- Get paid
8. 8
Most Users Just Want / Need to …
FIND
DISCOVER
FILTER
ANALYZE
VISUALIZE
MEASURE
ACT
USE
SHARE
9. 9
Business / Platform Dream
Enterprise
Consciousness :
- People
- Processes,
- Information
- Systems
Connected /
Synchronized.
Business has been chasing
this dream for a while. As
technologies improve, this
becomes more accessible. Image Source: Digital Business
Technology Platforms, Gartner 2016
10. 10
Going Beyond “Reactive Manifesto” / 12 Factor
References: https://12factor.net/, https://www.reactivemanifesto.org/
- Current Business Information is
available to People in the swiftest
way possible within the bounds of
reasonable costs.
- Business Information is generally
available to the enterprise, siloed
only by security and governance.
- Data platforms make use of
appropriate resources for hot vs.
cold, raw vs. enhanced data.
- Data platforms are always
available, redundant, always
trying to achieve a RPO/RTO of
zero.
Project
Information
Client
Service
Information
Corporate
Guides
Collaborative
Documents
Assets
& Files
Corporate
Assets
Unified User Experience
17. 17
So Many Different “Modern Stacks?”
Lots of “reference” architectures
available. They tend not to think about
the speed layer since they are focusing
on batch. What about SPEED?
18. 18
How do you choose from the landscape?
Lots and lots of components in the
Data & AI Landscape. Which ones are
the right ones for your business?
19. 19
Playbook for Modern Open Data Platform
Platform Design Evaluate Framework
Cloud
- Public
- Private
- Hybrid
Data
- Data:Object
- Data:Stream
- Data:Table
- Data:Index
- Processor:Batch
- Processor:Stream
DataOps
- ETL/ELT/EtLT
- Reverse ETL
- Orchestration
DevOps
- Infrastructure as
Code
- Systems
Automation
- Application CICD
Architecture (Design)
- Cloud
- Data
- DevOps
- DataOps
Engineering
- Configuration
- Scripting
- Programming
Operation
- Setup / Deploy
- Monitoring/Alerts
- Administration
User Experience
- No-Code/Low Code Apps/Form Builders
- Automatic API Generator/Platform
- Customer App/API Framework
Execute Approach
Discovery (Inventory)
- People
- Process
- Information (Objects)
- Systems (Apps)
20. Modern Enterprise Canvas
Workflow
Approval
Customer
Acquisition Customer
Payment
Customer
Information
Customer
Information
Customer
Information
Business
Information
Billing
Information
Zoho App
Creator
Unbounce
Zoho CRM Stripe
Zapier
Contexts
- People
- Process
- Information
- Systems
Responsibility Areas
- Products & Services
- Sales & Marketing
- Operations &
Infrastructure
- Research &
Development
- Finance &
Accounting
- Leadership &
Management
21. Modern Enterprise Canvas
Contexts
- People
- Process
- Information
- Systems
Responsibility Areas
- Customer
- Users
- Business
- Product Owners
- Engineering
- Developers
- Operations
- Administrators
28. Open Core Distributed Data Platforms
To create globally distributed and real time platforms, we
need to use distributed realtime technologies to build your
platform. Here are some. Which ones should you choose?
29. Open Core
Data Modernization / Automation / Integration
In addition to vastly scalable tools, there are also modern
innovations that can help teams automate and maximize
human capital by making data platform management easier.
30. Framework Components
● Major Components
○ Persistent Queues ( RAM/BUS)
○ Queue Processing & Compute ( CPU)
○ Persistent Storage (DISK/RAM)
○ Reporting Engine (Display)
○ Orchestration Framework (Motherboard)
○ Scheduler (Operating System)
● Strategies
○ Cloud Native on Google
○ Self-Managed Open Source
○ Self-Managed Commercial Source
○ Managed Commercial Source
Customers want options, so we decided to
create a Framework that can scale with
whatever Infrastructure and Software strategy
they want to use.
37. 37
How Distributed Data Helps Drive Enterprise
Consciousness
XDCR: Cross datacenter
replication is the
ultimate data fabric.
Resilience,
performance,
availability, and scale.
Made widely available
by Cassandra and
Couchbase
38. 38
Modern Open Data Platform + Cool Database = Data Fabric
One cluster, many workloads.
With any other “Data Warehouse”,
this would be problematic. With
Cassandra, this is a core feature.
39. 39
How YugaByteDB allows us to go further…
All the benefits of XDCR and ….
- More Data Density at High
Speed
- YCQL Queries to support
Non Relational / C* CQL
like queries.
- YSQL Queries to support
Relational / SQL Queries
- Transactions/Consistency
- …
40. 40
Let’s Get Data into a Database - Easier Today
Open Source:
- Airbyte / RudderStack
makes ETL Easier and
are open source
- Kafka Connect / Pulsar
IO can convert ETL into
Streaming ETL
SaaS/PaaS:
- SaaS like Stitch/HevoData
- Supported versions of Airbyte/RudderStack
41. 41
Once It’s There, Serve it , Do More Processing
Open Source:
- Flink / Spark / Kafka
Streams can be used
to save Analytics /
ML processed data.
- Hasura can help
serve data as
GraphQL, PostgREST
can expose REST
apis.
42. 42
Open Source:
- Grouparoo / Airbyte ,
RudderStack are free.
Others are paid.
- You can always use
Kafka Connect /
Pulsar IO to send data
back also.
Let’s send it back via Reverse ETL!
Reverse ETL is the process of copying data from a warehouse into business applications like
CRM, analytics, and marketing automation software. You perform this process by using a
reverse ETL tool that integrates with your data source and your business SaaS tools.
- Segment Blog
https://segment.com/blog/reverse-
etl/
43. 43
Let’s put it all together now - ONE DATA FABRIC
Cassandra isn’t the only database to
do XDCR that can enable multiple
workloads.
Yugabyte also offers a PostgreSQL
compliant Layer
44. 44
Key Takeaways for Open Data Platforms
Don’t reinvent the wheel.
Prioritize DevOps / DataOps
Document the STACK
Identify the Objectives
- Identify the objectives so that you
know what success looks like.
- DevOps / DataOps combined with a
true agile approach allows you to
iterate your platform quickly.
- Put the data into a distributed data
store that supports SQL/CQL, and
possibly archive it into
Parquet/Iceberg (historical data)
- Get the data out to your Systems
using “Reverse ETL” tools.
Use open tools that are well
supported
45. 45
Thank you and Dream Big.
Hire us
- Design Workshops
- Innovation Sprints
- Service Catalog
Anant.us
- Read our Playbook
- Join our Mailing List
- Read up on Data Platforms
- Watch our Videos
- Download Examples
Notas do Editor
What makes a good story?
Once you get good at it, presenting becomes easy.
Shared stories with people we’ve bonded with (community for example).
This format is not good for Metastories.
What makes a good story?
Once you get good at it, presenting becomes easy.
Shared stories with people we’ve bonded with (community for example).
This format is not good for Metastories.
What makes a good story?
Once you get good at it, presenting becomes easy.
Shared stories with people we’ve bonded with (community for example).
This format is not good for Metastories.
What makes a good story?
Once you get good at it, presenting becomes easy.
Shared stories with people we’ve bonded with (community for example).
This format is not good for Metastories.
Challenge
Currently the components are broken up in to different vendors and parts.
Similar to building a computer every time for every client.
What makes a good story?
Once you get good at it, presenting becomes easy.
Shared stories with people we’ve bonded with (community for example).
This format is not good for Metastories.
What makes a good story?
Once you get good at it, presenting becomes easy.
Shared stories with people we’ve bonded with (community for example).
This format is not good for Metastories.
What makes a good story?
Once you get good at it, presenting becomes easy.
Shared stories with people we’ve bonded with (community for example).
This format is not good for Metastories.
Challenge
Currently the components are broken up in to different vendors and parts.
Similar to building a computer every time for every client.
Challenge
Currently the components are broken up in to different vendors and parts.
Similar to building a computer every time for every client.
Challenge
Currently the components are broken up in to different vendors and parts.
Similar to building a computer every time for every client.
Challenge
Currently the components are broken up in to different vendors and parts.
Similar to building a computer every time for every client.
What makes a good story?
Once you get good at it, presenting becomes easy.
Shared stories with people we’ve bonded with (community for example).
This format is not good for Metastories.
What makes a good story?
Once you get good at it, presenting becomes easy.
Shared stories with people we’ve bonded with (community for example).
This format is not good for Metastories.
What makes a good story?
Once you get good at it, presenting becomes easy.
Shared stories with people we’ve bonded with (community for example).
This format is not good for Metastories.
What makes a good story?
Once you get good at it, presenting becomes easy.
Shared stories with people we’ve bonded with (community for example).
This format is not good for Metastories.
What makes a good story?
Once you get good at it, presenting becomes easy.
Shared stories with people we’ve bonded with (community for example).
This format is not good for Metastories.
What makes a good story?
Once you get good at it, presenting becomes easy.
Shared stories with people we’ve bonded with (community for example).
This format is not good for Metastories.
What makes a good story?
Once you get good at it, presenting becomes easy.
Shared stories with people we’ve bonded with (community for example).
This format is not good for Metastories.
What makes a good story?
Once you get good at it, presenting becomes easy.
Shared stories with people we’ve bonded with (community for example).
This format is not good for Metastories.
What makes a good story?
Once you get good at it, presenting becomes easy.
Shared stories with people we’ve bonded with (community for example).
This format is not good for Metastories.