SlideShare uma empresa Scribd logo
1 de 45
Modern Open Data Platform :
Cool Open Source Tools
Crafting your Dream Stack with the Open Data Platform
Playbook
Rahul Xavier Singh Anant Corporation
Data Engineer’s Lunch / Anant Webinar 11/07/2022
Playbook
Design
Framework
Approach
ETL / Reverse ETL
Customer Data Platforms
Components
DataOps
Agenda
We help platform owners
reach beyond their potential
to serve a global customer
base that demands
Everything, Now.
We design with our
Playbook, build with our
Framework, and manage
platforms with our Approach
so our clients
Think & Grow Big.
Customer Success
Challenge
Business
Platform
Playbook
Framework
Approach
Technology
Management
Solutions
[Data] Services Catalog
Fully Managed Service
Subscriptions
We offer Professional Services to engineer Solutions and
offer Managed Services to clients where it makes sense, after an
Assessment
7
Modern Technology is Disconnected
https://chiefmartec.com/2020/04/marketing-technology-landscape-2020-martech-5000/
Businesses want to :
- Create value
- Get the customer
- Deliver the value
- Get paid
8
Most Users Just Want / Need to …
FIND
DISCOVER
FILTER
ANALYZE
VISUALIZE
MEASURE
ACT
USE
SHARE
9
Business / Platform Dream
Enterprise
Consciousness :
- People
- Processes,
- Information
- Systems
Connected /
Synchronized.
Business has been chasing
this dream for a while. As
technologies improve, this
becomes more accessible. Image Source: Digital Business
Technology Platforms, Gartner 2016
10
Going Beyond “Reactive Manifesto” / 12 Factor
References: https://12factor.net/, https://www.reactivemanifesto.org/
- Current Business Information is
available to People in the swiftest
way possible within the bounds of
reasonable costs.
- Business Information is generally
available to the enterprise, siloed
only by security and governance.
- Data platforms make use of
appropriate resources for hot vs.
cold, raw vs. enhanced data.
- Data platforms are always
available, redundant, always
trying to achieve a RPO/RTO of
zero.
Project
Information
Client
Service
Information
Corporate
Guides
Collaborative
Documents
Assets
& Files
Corporate
Assets
Unified User Experience
Challenges of
Managing Data
Platforms in a
Growing Enterprise
Optimized Core enabled Business Modularity
This process needs
to be done in
sequence. Otherwise
we end up having to
redo the work.
Business
Silos
Standardized
Platform
Optimized
Core
Business
Modularity
Phases of Business Modularity
14
Generic Data Platform Operations
Modern
Open Data Platform
Design
Contexts
Responsibilities
Approach
Framework
Tools
17
So Many Different “Modern Stacks?”
Lots of “reference” architectures
available. They tend not to think about
the speed layer since they are focusing
on batch. What about SPEED?
18
How do you choose from the landscape?
Lots and lots of components in the
Data & AI Landscape. Which ones are
the right ones for your business?
19
Playbook for Modern Open Data Platform
Platform Design Evaluate Framework
Cloud
- Public
- Private
- Hybrid
Data
- Data:Object
- Data:Stream
- Data:Table
- Data:Index
- Processor:Batch
- Processor:Stream
DataOps
- ETL/ELT/EtLT
- Reverse ETL
- Orchestration
DevOps
- Infrastructure as
Code
- Systems
Automation
- Application CICD
Architecture (Design)
- Cloud
- Data
- DevOps
- DataOps
Engineering
- Configuration
- Scripting
- Programming
Operation
- Setup / Deploy
- Monitoring/Alerts
- Administration
User Experience
- No-Code/Low Code Apps/Form Builders
- Automatic API Generator/Platform
- Customer App/API Framework
Execute Approach
Discovery (Inventory)
- People
- Process
- Information (Objects)
- Systems (Apps)
Modern Enterprise Canvas
Workflow
Approval
Customer
Acquisition Customer
Payment
Customer
Information
Customer
Information
Customer
Information
Business
Information
Billing
Information
Zoho App
Creator
Unbounce
Zoho CRM Stripe
Zapier
Contexts
- People
- Process
- Information
- Systems
Responsibility Areas
- Products & Services
- Sales & Marketing
- Operations &
Infrastructure
- Research &
Development
- Finance &
Accounting
- Leadership &
Management
Modern Enterprise Canvas
Contexts
- People
- Process
- Information
- Systems
Responsibility Areas
- Customer
- Users
- Business
- Product Owners
- Engineering
- Developers
- Operations
- Administrators
Framework
Framework
Distributed
Realtime
Extendable / Open
Automated
Monitored / Managed
Public Cloud Native - Amazon
Public Cloud Native - Microsoft
Public Cloud Native - Google
Cool Tools:
Optimizing Distributed Data
with Cloud vs. Open Core with
Open Source Tools
Open Core Distributed Data Platforms
To create globally distributed and real time platforms, we
need to use distributed realtime technologies to build your
platform. Here are some. Which ones should you choose?
Open Core
Data Modernization / Automation / Integration
In addition to vastly scalable tools, there are also modern
innovations that can help teams automate and maximize
human capital by making data platform management easier.
Framework Components
● Major Components
○ Persistent Queues ( RAM/BUS)
○ Queue Processing & Compute ( CPU)
○ Persistent Storage (DISK/RAM)
○ Reporting Engine (Display)
○ Orchestration Framework (Motherboard)
○ Scheduler (Operating System)
● Strategies
○ Cloud Native on Google
○ Self-Managed Open Source
○ Self-Managed Commercial Source
○ Managed Commercial Source
Customers want options, so we decided to
create a Framework that can scale with
whatever Infrastructure and Software strategy
they want to use.
31
Framework
Approach
Approach
Setup
Training
Administration
Configuration
Knowledge
Approach
34
Sample STACK Outline
35
Framework
Platform
Component
s
Resources
Platform
Setup
Training
Administrati
on
Configuratio
n
Knowledge
● Components
○ Infrastructure
■ Source / Git
■ Github
■ Gitlab
■ Cloud / Public
■ AWS
■ Azure
■ GCP
■ DO
■ Orchestration
■ Terraform
■ Terraform / Atlanits
■ Configuration
■ Ansible
■ Ansible / AWX / Semaphore
○ Compute
■ Datastax / Spark
■ Datastax / Livy
■ Databricks
○ Data / Open Core
■ Datastax Enterprise
■ Cassandra
■ Search / Solr
■ Graph
■ Confluent Platform
○ Data / Cloud
■ Datastax / Astra
■ Confluent Cloud
○ Data / Open Source
■ Cassandra
■ Kafka
■ Elassandra
■ YugaByte
■ Scylla
■ Pulsar
○ Application
■ Airflow
■ Airbyte
■ Kafka Streams
■ Jupyter
■ Redash
■ Metabase
■ Superset
■ Zeppelin
Use Case:
Standard Data Fabric
37
How Distributed Data Helps Drive Enterprise
Consciousness
XDCR: Cross datacenter
replication is the
ultimate data fabric.
Resilience,
performance,
availability, and scale.
Made widely available
by Cassandra and
Couchbase
38
Modern Open Data Platform + Cool Database = Data Fabric
One cluster, many workloads.
With any other “Data Warehouse”,
this would be problematic. With
Cassandra, this is a core feature.
39
How YugaByteDB allows us to go further…
All the benefits of XDCR and ….
- More Data Density at High
Speed
- YCQL Queries to support
Non Relational / C* CQL
like queries.
- YSQL Queries to support
Relational / SQL Queries
- Transactions/Consistency
- …
40
Let’s Get Data into a Database - Easier Today
Open Source:
- Airbyte / RudderStack
makes ETL Easier and
are open source
- Kafka Connect / Pulsar
IO can convert ETL into
Streaming ETL
SaaS/PaaS:
- SaaS like Stitch/HevoData
- Supported versions of Airbyte/RudderStack
41
Once It’s There, Serve it , Do More Processing
Open Source:
- Flink / Spark / Kafka
Streams can be used
to save Analytics /
ML processed data.
- Hasura can help
serve data as
GraphQL, PostgREST
can expose REST
apis.
42
Open Source:
- Grouparoo / Airbyte ,
RudderStack are free.
Others are paid.
- You can always use
Kafka Connect /
Pulsar IO to send data
back also.
Let’s send it back via Reverse ETL!
Reverse ETL is the process of copying data from a warehouse into business applications like
CRM, analytics, and marketing automation software. You perform this process by using a
reverse ETL tool that integrates with your data source and your business SaaS tools.
- Segment Blog
https://segment.com/blog/reverse-
etl/
43
Let’s put it all together now - ONE DATA FABRIC
Cassandra isn’t the only database to
do XDCR that can enable multiple
workloads.
Yugabyte also offers a PostgreSQL
compliant Layer
44
Key Takeaways for Open Data Platforms
Don’t reinvent the wheel.
Prioritize DevOps / DataOps
Document the STACK
Identify the Objectives
- Identify the objectives so that you
know what success looks like.
- DevOps / DataOps combined with a
true agile approach allows you to
iterate your platform quickly.
- Put the data into a distributed data
store that supports SQL/CQL, and
possibly archive it into
Parquet/Iceberg (historical data)
- Get the data out to your Systems
using “Reverse ETL” tools.
Use open tools that are well
supported
45
Thank you and Dream Big.
Hire us
- Design Workshops
- Innovation Sprints
- Service Catalog
Anant.us
- Read our Playbook
- Join our Mailing List
- Read up on Data Platforms
- Watch our Videos
- Download Examples

Mais conteúdo relacionado

Mais procurados

Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data MeshLibbySchulze
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardParis Data Engineers !
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation Brett VanderPlaats
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data GovernanceTuba Yaman Him
 
Apache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, ConfluentApache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, ConfluentHostedbyConfluent
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...Databricks
 
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform DesigningRahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform DesigningLviv Startup Club
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for DinnerKent Graziano
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lakeMykola Zerniuk
 

Mais procurados (20)

Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
Time to Talk about Data Mesh
Time to Talk about Data MeshTime to Talk about Data Mesh
Time to Talk about Data Mesh
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin AmbardDelta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
Delta Lake OSS: Create reliable and performant Data Lake by Quentin Ambard
 
Zero to Snowflake Presentation
Zero to Snowflake Presentation Zero to Snowflake Presentation
Zero to Snowflake Presentation
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Data Quality & Data Governance
Data Quality & Data GovernanceData Quality & Data Governance
Data Quality & Data Governance
 
Apache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, ConfluentApache Kafka and the Data Mesh | Michael Noll, Confluent
Apache Kafka and the Data Mesh | Michael Noll, Confluent
 
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
 
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform DesigningRahat Yasir: Enterprise Data & AI Strategy & Platform Designing
Rahat Yasir: Enterprise Data & AI Strategy & Platform Designing
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
 
Data Mesh for Dinner
Data Mesh for DinnerData Mesh for Dinner
Data Mesh for Dinner
 
Intro to databricks delta lake
 Intro to databricks delta lake Intro to databricks delta lake
Intro to databricks delta lake
 

Semelhante a Modern Open Data Platform: Cool Open Source Tools Crafting your Dream Stack

Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsScyllaDB
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Anant Corporation
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessAnant Corporation
 
Digital Reinvention by NRB
Digital Reinvention by NRBDigital Reinvention by NRB
Digital Reinvention by NRBWilliam Poos
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...UA DevOps Conference
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
Bigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpBigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpbigdata sunil
 
Red hat infrastructure for analytics
Red hat infrastructure for analyticsRed hat infrastructure for analytics
Red hat infrastructure for analyticsKyle Bader
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...DataKitchen
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
Informix warehouse and accelerator overview
Informix warehouse and accelerator overviewInformix warehouse and accelerator overview
Informix warehouse and accelerator overviewKeshav Murthy
 
Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudKaran Singh
 
OFF SHORE RECRUITER TRAINING
OFF SHORE RECRUITER TRAININGOFF SHORE RECRUITER TRAINING
OFF SHORE RECRUITER TRAININGsatish_kumar646
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyYaroslav Tkachenko
 
Cloud Computing Architecture Primer
Cloud Computing Architecture PrimerCloud Computing Architecture Primer
Cloud Computing Architecture PrimerIlham Ahmed
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like ProductsVMware Tanzu
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabaseKinetica
 

Semelhante a Modern Open Data Platform: Cool Open Source Tools Crafting your Dream Stack (20)

Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise ConsciousnessData Engineer's Lunch #60: Series - Developing Enterprise Consciousness
Data Engineer's Lunch #60: Series - Developing Enterprise Consciousness
 
Digital Reinvention by NRB
Digital Reinvention by NRBDigital Reinvention by NRB
Digital Reinvention by NRB
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
 
Neo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael MooreNeo4j GraphTour New York_EY Presentation_Michael Moore
Neo4j GraphTour New York_EY Presentation_Michael Moore
 
Sql 2016 2017 full
Sql 2016   2017 fullSql 2016   2017 full
Sql 2016 2017 full
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Bigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExpBigdata.sunil_6+yearsExp
Bigdata.sunil_6+yearsExp
 
Sql 2017 net raf
Sql 2017  net rafSql 2017  net raf
Sql 2017 net raf
 
Red hat infrastructure for analytics
Red hat infrastructure for analyticsRed hat infrastructure for analytics
Red hat infrastructure for analytics
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Informix warehouse and accelerator overview
Informix warehouse and accelerator overviewInformix warehouse and accelerator overview
Informix warehouse and accelerator overview
 
Managing data analytics in a hybrid cloud
Managing data analytics in a hybrid cloudManaging data analytics in a hybrid cloud
Managing data analytics in a hybrid cloud
 
OFF SHORE RECRUITER TRAINING
OFF SHORE RECRUITER TRAININGOFF SHORE RECRUITER TRAINING
OFF SHORE RECRUITER TRAINING
 
Apache Flink Adoption at Shopify
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at Shopify
 
Cloud Computing Architecture Primer
Cloud Computing Architecture PrimerCloud Computing Architecture Primer
Cloud Computing Architecture Primer
 
Running Data Platforms Like Products
Running Data Platforms Like ProductsRunning Data Platforms Like Products
Running Data Platforms Like Products
 
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU DatabasePowering Real-Time Big Data Analytics with a Next-Gen GPU Database
Powering Real-Time Big Data Analytics with a Next-Gen GPU Database
 

Mais de Anant Corporation

QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137Anant Corporation
 
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdfKono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdfAnant Corporation
 
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache PinotData Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache PinotAnant Corporation
 
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...Anant Corporation
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAnant Corporation
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapAnant Corporation
 
Machine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowMachine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowAnant Corporation
 
Cassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward TalksCassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward TalksAnant Corporation
 
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with ArcionData Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with ArcionAnant Corporation
 
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Anant Corporation
 
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & FutureCassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & FutureAnant Corporation
 
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Anant Corporation
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackAnant Corporation
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergAnant Corporation
 
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOpsApache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOpsAnant Corporation
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraAnant Corporation
 
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
Data Engineer’s Lunch #67: Machine Learning - Feature SelectionData Engineer’s Lunch #67: Machine Learning - Feature Selection
Data Engineer’s Lunch #67: Machine Learning - Feature SelectionAnant Corporation
 
Data Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource ManagersData Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource ManagersAnant Corporation
 

Mais de Anant Corporation (20)

QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
QLoRA Fine-Tuning on Cassandra Link Data Set (1/2) Cassandra Lunch 137
 
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdfKono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
Kono.IntelCraft.Weekly.AI.LLM.Landscape.2024.02.28.pdf
 
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache PinotData Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
Data Engineer's Lunch 96: Intro to Real Time Analytics Using Apache Pinot
 
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
NoCode, Data & AI LLM Inside Bootcamp: Episode 6 - Design Patterns: Retrieval...
 
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPTAutomate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
Automate your Job and Business with ChatGPT #3 - Fundamentals of LLM/GPT
 
YugabyteDB Developer Tools
YugabyteDB Developer ToolsYugabyteDB Developer Tools
YugabyteDB Developer Tools
 
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer RoadmapEpisode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
Episode 2: The LLM / GPT / AI Prompt / Data Engineer Roadmap
 
Machine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowMachine Learning Orchestration with Airflow
Machine Learning Orchestration with Airflow
 
Cassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward TalksCassandra Lunch 130: Recap of Cassandra Forward Talks
Cassandra Lunch 130: Recap of Cassandra Forward Talks
 
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with ArcionData Engineer's Lunch 90: Migrating SQL Data with Arcion
Data Engineer's Lunch 90: Migrating SQL Data with Arcion
 
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
 
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & FutureCassandra Lunch 129: What’s New:  Apache Cassandra 4.1+ Features & Future
Cassandra Lunch 129: What’s New: Apache Cassandra 4.1+ Features & Future
 
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
Data Engineer's Lunch #86: Building Real-Time Applications at Scale: A Case S...
 
Data Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data StackData Engineer's Lunch #85: Designing a Modern Data Stack
Data Engineer's Lunch #85: Designing a Modern Data Stack
 
CL 121
CL 121CL 121
CL 121
 
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache IcebergData Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
Data Engineer's Lunch #83: Strategies for Migration to Apache Iceberg
 
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOpsApache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
Apache Cassandra Lunch 120: Apache Cassandra Monitoring Made Easy with AxonOps
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache CassandraApache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
 
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
Data Engineer’s Lunch #67: Machine Learning - Feature SelectionData Engineer’s Lunch #67: Machine Learning - Feature Selection
Data Engineer’s Lunch #67: Machine Learning - Feature Selection
 
Data Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource ManagersData Engineer's Lunch #80: Apache Spark Resource Managers
Data Engineer's Lunch #80: Apache Spark Resource Managers
 

Último

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 

Último (20)

New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 

Modern Open Data Platform: Cool Open Source Tools Crafting your Dream Stack

  • 1. Modern Open Data Platform : Cool Open Source Tools Crafting your Dream Stack with the Open Data Platform Playbook Rahul Xavier Singh Anant Corporation Data Engineer’s Lunch / Anant Webinar 11/07/2022
  • 2. Playbook Design Framework Approach ETL / Reverse ETL Customer Data Platforms Components DataOps Agenda
  • 3. We help platform owners reach beyond their potential to serve a global customer base that demands Everything, Now.
  • 4. We design with our Playbook, build with our Framework, and manage platforms with our Approach so our clients Think & Grow Big.
  • 6. Challenge Business Platform Playbook Framework Approach Technology Management Solutions [Data] Services Catalog Fully Managed Service Subscriptions We offer Professional Services to engineer Solutions and offer Managed Services to clients where it makes sense, after an Assessment
  • 7. 7 Modern Technology is Disconnected https://chiefmartec.com/2020/04/marketing-technology-landscape-2020-martech-5000/ Businesses want to : - Create value - Get the customer - Deliver the value - Get paid
  • 8. 8 Most Users Just Want / Need to … FIND DISCOVER FILTER ANALYZE VISUALIZE MEASURE ACT USE SHARE
  • 9. 9 Business / Platform Dream Enterprise Consciousness : - People - Processes, - Information - Systems Connected / Synchronized. Business has been chasing this dream for a while. As technologies improve, this becomes more accessible. Image Source: Digital Business Technology Platforms, Gartner 2016
  • 10. 10 Going Beyond “Reactive Manifesto” / 12 Factor References: https://12factor.net/, https://www.reactivemanifesto.org/ - Current Business Information is available to People in the swiftest way possible within the bounds of reasonable costs. - Business Information is generally available to the enterprise, siloed only by security and governance. - Data platforms make use of appropriate resources for hot vs. cold, raw vs. enhanced data. - Data platforms are always available, redundant, always trying to achieve a RPO/RTO of zero. Project Information Client Service Information Corporate Guides Collaborative Documents Assets & Files Corporate Assets Unified User Experience
  • 11. Challenges of Managing Data Platforms in a Growing Enterprise
  • 12. Optimized Core enabled Business Modularity This process needs to be done in sequence. Otherwise we end up having to redo the work.
  • 17. 17 So Many Different “Modern Stacks?” Lots of “reference” architectures available. They tend not to think about the speed layer since they are focusing on batch. What about SPEED?
  • 18. 18 How do you choose from the landscape? Lots and lots of components in the Data & AI Landscape. Which ones are the right ones for your business?
  • 19. 19 Playbook for Modern Open Data Platform Platform Design Evaluate Framework Cloud - Public - Private - Hybrid Data - Data:Object - Data:Stream - Data:Table - Data:Index - Processor:Batch - Processor:Stream DataOps - ETL/ELT/EtLT - Reverse ETL - Orchestration DevOps - Infrastructure as Code - Systems Automation - Application CICD Architecture (Design) - Cloud - Data - DevOps - DataOps Engineering - Configuration - Scripting - Programming Operation - Setup / Deploy - Monitoring/Alerts - Administration User Experience - No-Code/Low Code Apps/Form Builders - Automatic API Generator/Platform - Customer App/API Framework Execute Approach Discovery (Inventory) - People - Process - Information (Objects) - Systems (Apps)
  • 20. Modern Enterprise Canvas Workflow Approval Customer Acquisition Customer Payment Customer Information Customer Information Customer Information Business Information Billing Information Zoho App Creator Unbounce Zoho CRM Stripe Zapier Contexts - People - Process - Information - Systems Responsibility Areas - Products & Services - Sales & Marketing - Operations & Infrastructure - Research & Development - Finance & Accounting - Leadership & Management
  • 21. Modern Enterprise Canvas Contexts - People - Process - Information - Systems Responsibility Areas - Customer - Users - Business - Product Owners - Engineering - Developers - Operations - Administrators
  • 25. Public Cloud Native - Microsoft
  • 27. Cool Tools: Optimizing Distributed Data with Cloud vs. Open Core with Open Source Tools
  • 28. Open Core Distributed Data Platforms To create globally distributed and real time platforms, we need to use distributed realtime technologies to build your platform. Here are some. Which ones should you choose?
  • 29. Open Core Data Modernization / Automation / Integration In addition to vastly scalable tools, there are also modern innovations that can help teams automate and maximize human capital by making data platform management easier.
  • 30. Framework Components ● Major Components ○ Persistent Queues ( RAM/BUS) ○ Queue Processing & Compute ( CPU) ○ Persistent Storage (DISK/RAM) ○ Reporting Engine (Display) ○ Orchestration Framework (Motherboard) ○ Scheduler (Operating System) ● Strategies ○ Cloud Native on Google ○ Self-Managed Open Source ○ Self-Managed Commercial Source ○ Managed Commercial Source Customers want options, so we decided to create a Framework that can scale with whatever Infrastructure and Software strategy they want to use.
  • 35. Sample STACK Outline 35 Framework Platform Component s Resources Platform Setup Training Administrati on Configuratio n Knowledge ● Components ○ Infrastructure ■ Source / Git ■ Github ■ Gitlab ■ Cloud / Public ■ AWS ■ Azure ■ GCP ■ DO ■ Orchestration ■ Terraform ■ Terraform / Atlanits ■ Configuration ■ Ansible ■ Ansible / AWX / Semaphore ○ Compute ■ Datastax / Spark ■ Datastax / Livy ■ Databricks ○ Data / Open Core ■ Datastax Enterprise ■ Cassandra ■ Search / Solr ■ Graph ■ Confluent Platform ○ Data / Cloud ■ Datastax / Astra ■ Confluent Cloud ○ Data / Open Source ■ Cassandra ■ Kafka ■ Elassandra ■ YugaByte ■ Scylla ■ Pulsar ○ Application ■ Airflow ■ Airbyte ■ Kafka Streams ■ Jupyter ■ Redash ■ Metabase ■ Superset ■ Zeppelin
  • 37. 37 How Distributed Data Helps Drive Enterprise Consciousness XDCR: Cross datacenter replication is the ultimate data fabric. Resilience, performance, availability, and scale. Made widely available by Cassandra and Couchbase
  • 38. 38 Modern Open Data Platform + Cool Database = Data Fabric One cluster, many workloads. With any other “Data Warehouse”, this would be problematic. With Cassandra, this is a core feature.
  • 39. 39 How YugaByteDB allows us to go further… All the benefits of XDCR and …. - More Data Density at High Speed - YCQL Queries to support Non Relational / C* CQL like queries. - YSQL Queries to support Relational / SQL Queries - Transactions/Consistency - …
  • 40. 40 Let’s Get Data into a Database - Easier Today Open Source: - Airbyte / RudderStack makes ETL Easier and are open source - Kafka Connect / Pulsar IO can convert ETL into Streaming ETL SaaS/PaaS: - SaaS like Stitch/HevoData - Supported versions of Airbyte/RudderStack
  • 41. 41 Once It’s There, Serve it , Do More Processing Open Source: - Flink / Spark / Kafka Streams can be used to save Analytics / ML processed data. - Hasura can help serve data as GraphQL, PostgREST can expose REST apis.
  • 42. 42 Open Source: - Grouparoo / Airbyte , RudderStack are free. Others are paid. - You can always use Kafka Connect / Pulsar IO to send data back also. Let’s send it back via Reverse ETL! Reverse ETL is the process of copying data from a warehouse into business applications like CRM, analytics, and marketing automation software. You perform this process by using a reverse ETL tool that integrates with your data source and your business SaaS tools. - Segment Blog https://segment.com/blog/reverse- etl/
  • 43. 43 Let’s put it all together now - ONE DATA FABRIC Cassandra isn’t the only database to do XDCR that can enable multiple workloads. Yugabyte also offers a PostgreSQL compliant Layer
  • 44. 44 Key Takeaways for Open Data Platforms Don’t reinvent the wheel. Prioritize DevOps / DataOps Document the STACK Identify the Objectives - Identify the objectives so that you know what success looks like. - DevOps / DataOps combined with a true agile approach allows you to iterate your platform quickly. - Put the data into a distributed data store that supports SQL/CQL, and possibly archive it into Parquet/Iceberg (historical data) - Get the data out to your Systems using “Reverse ETL” tools. Use open tools that are well supported
  • 45. 45 Thank you and Dream Big. Hire us - Design Workshops - Innovation Sprints - Service Catalog Anant.us - Read our Playbook - Join our Mailing List - Read up on Data Platforms - Watch our Videos - Download Examples

Notas do Editor

  1. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  2. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  3. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  4. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  5. Challenge Currently the components are broken up in to different vendors and parts. Similar to building a computer every time for every client.
  6. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  7. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  8. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  9. Challenge Currently the components are broken up in to different vendors and parts. Similar to building a computer every time for every client.
  10. Challenge Currently the components are broken up in to different vendors and parts. Similar to building a computer every time for every client.
  11. Challenge Currently the components are broken up in to different vendors and parts. Similar to building a computer every time for every client.
  12. Challenge Currently the components are broken up in to different vendors and parts. Similar to building a computer every time for every client.
  13. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  14. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  15. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  16. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  17. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  18. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  19. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  20. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.
  21. What makes a good story? Once you get good at it, presenting becomes easy. Shared stories with people we’ve bonded with (community for example). This format is not good for Metastories.