SlideShare a Scribd company logo
1 of 21
Data Platform
Architecture Principles
and Evaluation Criteria
Pooja Kelgaonkar,
Senior Data Architect at Rackspace Technology
Pooja Kelgaonkar
■ Senior Data Architect - GCP, Snowflake
■ Specialist in Data Modernization Implementations
■ Expertise in “Data” domain
■ Learner, Tech Blogger, Tech evangelist
■ Reading, listening Indian classical music
■ Architecture Principles
■ Data Modernization
■ Data Platform Offerings
■ Evaluation Criteria
■ Sample Use Case - Evaluation Comparison
Agenda
3
Data Architecture
Principles
Framework Pillars
Operational Excellence
05
● Serviceability - Easy Operations & Maint
● Maintenance - Data Pipeline Maintenance
● Reduced Ops Activities & Cost
Efficiency
04 ● Performance Efficiency
● Cost Efficiency - Cost Optimized
Availability
03
● Reliability
● Resiliency of System
● Availability - System Time UP
Scalability
02
● Horizontal Scaling
● Vertical Scaling
● Auto Scaling
Security
01
● Access Management & Controls
● Data Protection - Encryption , Data Masking
● Compliance - ISO, HIPPA , PCI DSS
● Data Governance
5
■ Cloud Migration / Adoption - 5Rs of transformation
■ Rehost , Refactor , Revise , Rebuild and Replace
Data Modernization Journey
Data Discovery
Analysis of existing Data Architecture,
System Design and evaluating the need,
requirements of new data system
Data Architecture & Assessment
Designing new data platform, assessment of data
modelling, Data Governance and Security
Data Architecture &
Engineering
Data Platform implementation, Data
Pipeline development and enhancement
POCs. Designing end to end cycle.
Go Live & DataOps
Soft launch/ early cut off to integrate with
other systems and signing off from
business users. Implementing operations
of new platform and modified pipelines
Data Migration & Pipeline
Development/Conversion
Actual pipeline development, conversion on
new platform. Implementing , testing and
validating pipelines/data on new platform.
05
01
02 03
04
05
01
02 03
04
6
Design Framework Pillars & Considerations
7
Teams
Architects Engineering Operations
Who?
When? How?
Business Drive
Technology Drive
Management &
Engagement Drive
What?
End User SLAs Assessment
& SLA Setting
System Assessment &
Technology Evaluations
Signed Up Services vs Open Source vs
Hybrid Evaluations
Business Assessment
Technology Evangelist
Sign Up for Services
Business Teams
Data Platform
Offerings
There are various offerings to implement Data Platform by Public Cloud
providers for DB / DW / Data Lake / Data Mesh / SQL / NoSQL etc.
Cloud Native
● AWS Glue
● EMR
● Kinesis, Opensearch
● RDS , Aurora
● Redshift
● DynamoDB , DocumentDB
AWS
● Azure Data Factory
● HDInsight
● Azure Stream Analytics
● Azure SQL, Managed SQL
● SQL Server, PostgreSQL
● MariaDB, CosmosDB, Managed
Cassandra
Azure
● DataFlow, Data Fusion
● DataProc
● Pub/Sub, Stream
● Cloud SQL , Cloud Spanner
● BigQuery , BigLake
● Bigtable, Firestore, Memorystore
GCP
9
There are a variety of offerings to implement data platform and design data
pipelines using native and open source services.
Data on Cloud - Common Offerings
10
Data Platform -
Evaluation Criteria
(Assessment Phase)
Evaluation - Pre-Requisites
Evaluation
Criteria
Existing/Cross
Application Platform to
be Evaluated
New Platforms to be
Explored
Platform Offerings
Existing Support Tier/
Billing Plans
Platform Offerings
Probable Platform to be
Evaluated, Cost
Comparisons Done?
Managed/Native Services /
BYOL services
Existing System Licenses,
Integrators- BI, OPS tools
Managed/Native Services /
BYOL services
Existing System Licenses,
Integrators - BI, OPS tools
Specific Evaluation or Open
Evaluation to Select Best Fit
for Given Use Case
12
13
Evaluation - Inputs
Capex vs Opex
% of Data Scan vs
Processed
Compute vs
Storage
Utilization
Data Challenges
System
Challenges
Capex vs Opex
% Storage vs Scan vs Processed
Compute vs Storage Utilization
% Data Challenges
System Challenges
Evaluation - CheckPoints
1
Data Operations & Business
Critical Requirements
● Data Pipeline Management - Monitoring & Operations
● Business Requirements - 24X7 Monitoring vs SLAs
● Critical Applications - Availability & SLAs
3
Business Checkpoint
● Data Availability - SLAs
● End User Agreements
● Business Requirements - Specific to Tooling
● Existing Cost utilization
● Performance Ratio - Current vs Expected
● Modernization Drive
5 Data Platform Checkpoint
● Type of Data - Structured, Semi-Structured,
Unstructured
● Sources of Data - Files , DBs, ioTs, Devices,
APIs
● Consumers of Data - Users vs System
● Frequency of Data - Batch, RealTime
● Data Storage - Active vs Passive
● Data Modelling - Schema, Tables , DB
Objects
2
Data Analytics Checkpoint
● Data Analytics - BI Tooling
● Predictive Analytics - Algorithms, Tools,
Libraries used
● AI/ML Use Cases - Customer Facing vs
In-House
● Enterprise vs Cloud Native
4
Data Processing Checkpoint
● Target Systems Integrations
● Data Usage - Hot Data vs Cold Data
● Data Stored vs Data Processed vs Reads
● Data Pipelines - Batch vs Streaming
● Data Pipeline Complexity - S/M/C/VC
● Data Pipeline Scheduling - Tools , Cron jobs,
Native Schedulers, Event based
● ETL vs ELT Requirements
14
15
Evaluation - Metrics
Checkpoint Category Metrics
Data
Checkpoint
Data
Integrators
No of Sources
No of Target
No of Specific Systems
Total Storage Volume
Daily Delta Volume
Data
Modelling
Frequency of Schema
Evolution
No of Objects
% of NoSQL Objects
% of PL SQL Objects
Data
Processing
Data
Pipelines
No of S/M/C Jobs
No of External Functions
Integrated (Java/Python/SQL)
No of ETL Jobs (Tool Based)
No of Compute Intense Jobs
No of Storage Intense Jobs
Checkpoint Category Metrics
Business
Checkpoint
Operations No of Times SLA Challenged
No of End Users Affected
Reliability No of times Data compromised
No of DR activities
No of end users impacted
Performance
Efficiency
Total Batch Time
No of Times Batch SLA Impacted
No of End User Reports
No of End Users/Consumers
No of Poor Performing Reports/Queries
Cost
Utilization
Overall Billing ( Capex )
Total Operations, Maint cost
Data
Operations
Monitoring No of Support Team Members
No of Monitoring Dashboards
Data Analytics Analytics No of ML Jobs/Algorithms
ML Integrators
Data Platform -
Evaluation Use Case
Evaluation - Pre-Requisites
17
Evaluation
Criteria
Existing/Cross
Application Platform to
be Evaluated
New Platforms to be
Explored
Platform Offerings
Existing Support Tier/
Billing Plans
Platform Offerings
Probable Platform to be
Evaluated, Cost
Comparisons Done?
Managed/Native Services /
BYOL services
Existing System Licenses,
Integrators- BI, OPS tools
Managed/Native Services /
BYOL services
Existing System Licenses,
Integrators - BI, OPS tools
Specific Evaluation or Open
Evaluation to Select Best Fit
for Given Use Case
Evaluation - Inputs
■ Domain - Retail , DW - Teradata, ETL - DataStage
■ Platform - Recently Signed up for Google Cloud Platform
■ Data Platform - Evaluate GCP Services to Setup Data Warehouse Platform
■ DW Size - 120TB (70 TB Active + 50 TB Passive )
■ Daily Volume - 1TB ( 80% Batch + 20% Streaming )
■ Data - Structured & Semi-structured (JSON, XML)
■ Data Pipelines - Mostly ELT - Datastage to Teradata (landing layer), Teradata SQL to Transform Data
■ Data Analytics - Tableau Reports - Customer Reports
■ Enterprise Scheduler - Control-M , Ticketing Tool - JIRA , Alerting via Slack, Email
■ Monitoring Dashboards , 24X7 Support Team
18
DW - Google BigQuery vs Azure Synapse
BigQuery Synapse Observations
● Supports More Than 90% of Requirements
● SaaS Offering , Cloud Managed
● Very Well Integrated
1 Data Platform
Checkpoint
● Native Drivers to Support Batch & Stream
● Highest Data Processing Speed
● Storage vs Compute - Scaling In and Out
● Automatic Scaling, Performance Efficient
2 Data Processing
Checkpoint
● Can Be Integrated With Any BI Tools
● Support AI/ML Libraries and Jobs
● Performance Efficient - Data Processing , Scanning
3 Data Analytics
Checkpoint
● Customized Logging & Monitoring
● Native vs Customized Dashboards
● Integration With Various Alerting, Messaging Tools
5 Data Operations
● High Availability
● Automatic Failover , No DR Required
● Performance & Cost Efficient
● Pay as You Go vs Commitment Comparison Based on
Overall Usage
4 Business Checkpoint
19
Evaluation - Final Report
Approach 1 Approach 2
DW BigQuery BigQuery
ETL + ELT Pipelines
Modify DS jobs to use BQ connector to load data to BQ
landing layer
Convert DS load jobs to BQ load jobs to pull data from source
and load to BQ
(this is depending on types of source systems and integration
complexity)
Data Storage
Store active data in BQ native tables with roll up policies
and store passive datasets on GCS layer depending on
usage of tables. External tables can be built on GCS
datasets.
Store active data in BQ native tables with roll up policies and
store passive datasets on GCS layer depending on usage of
tables.External tables can be built on GCS datasets.
Data Analytics Tableau connections can be replaced with BQ connections Tableau connections can be replaced with BQ connections
Data Pipeline Scheduler &
Maint
Control-M can be used to trigger the pipelines,
Orchestration can be done using Composer. Existing
ticketing tools, alerting tools can be used as is
Control-M can be used to trigger the pipelines, Orchestration can
be done using Composer.Existing ticketing tools, alerting tools
can be used as is
BigQuery is opted here post evaluation which is completely based on the initial sign up to GCP as well as data storage % ratio between active and
passive storage. Azure Synapse can offer the same capabilities however choices here are business & enterprise driven.
20
Thank You
Stay in Touch
Pooja Kelgaonkar
poojakelgaonkar@gmail.com & pooja.kelgaonkar@rackspace.com
www.linkedin.com/in/poojakelgaonkar
poojakelgaonkar.medium.com

More Related Content

What's hot

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Tristan Baker
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleDATAVERSITY
 
Architecting Modern Data Platforms
Architecting Modern Data PlatformsArchitecting Modern Data Platforms
Architecting Modern Data PlatformsAnkit Rathi
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...DATAVERSITY
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture DesignKujambu Murugesan
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Adopting Multi-Cloud Services with Confidence
Adopting Multi-Cloud Services with ConfidenceAdopting Multi-Cloud Services with Confidence
Adopting Multi-Cloud Services with ConfidenceKevin Hakanson
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationDATAVERSITY
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachDATAVERSITY
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks DeltaDatabricks
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesLars E Martinsson
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as ProductDATAVERSITY
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks FundamentalsDalibor Wijas
 

What's hot (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at ScaleHow to Use a Semantic Layer to Deliver Actionable Insights at Scale
How to Use a Semantic Layer to Deliver Actionable Insights at Scale
 
Architecting Modern Data Platforms
Architecting Modern Data PlatformsArchitecting Modern Data Platforms
Architecting Modern Data Platforms
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
Data Architecture, Solution Architecture, Platform Architecture — What’s the ...
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Adopting Multi-Cloud Services with Confidence
Adopting Multi-Cloud Services with ConfidenceAdopting Multi-Cloud Services with Confidence
Adopting Multi-Cloud Services with Confidence
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected Approach
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Enterprise Data Architecture Deliverables
Enterprise Data Architecture DeliverablesEnterprise Data Architecture Deliverables
Enterprise Data Architecture Deliverables
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
Databricks Fundamentals
Databricks FundamentalsDatabricks Fundamentals
Databricks Fundamentals
 

Similar to Data Platform Architecture Principles and Evaluation Criteria

Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisAlex Henthorn-Iwane
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Precisely
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Web Services
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data AnalyticsAmazon Web Services
 
Big Data application - OSS / BSS
Big Data application - OSS / BSSBig Data application - OSS / BSS
Big Data application - OSS / BSSKeyur Thakore
 
rough-work.pptx
rough-work.pptxrough-work.pptx
rough-work.pptxsharpan
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Jaroslav Gergic
 
IPC Data Analysis and Extraction
IPC Data Analysis and ExtractionIPC Data Analysis and Extraction
IPC Data Analysis and Extractionpzybrick
 
Acme data engineering case study
Acme data engineering case studyAcme data engineering case study
Acme data engineering case studyMukul Sood
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsCloudera, Inc.
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse OptimizationCloudera, Inc.
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresKangaroot
 

Similar to Data Platform Architecture Principles and Evaluation Criteria (20)

Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Hadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data WarehouseHadoop and Your Enterprise Data Warehouse
Hadoop and Your Enterprise Data Warehouse
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Cloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow AnalysisCloud-Scale BGP and NetFlow Analysis
Cloud-Scale BGP and NetFlow Analysis
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?Which Change Data Capture Strategy is Right for You?
Which Change Data Capture Strategy is Right for You?
 
Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.Amazon Redshift with Full 360 Inc.
Amazon Redshift with Full 360 Inc.
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
Big Data application - OSS / BSS
Big Data application - OSS / BSSBig Data application - OSS / BSS
Big Data application - OSS / BSS
 
rough-work.pptx
rough-work.pptxrough-work.pptx
rough-work.pptx
 
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
Big Data Pipeline for Analytics at Scale @ FIT CVUT 2014
 
IPC Data Analysis and Extraction
IPC Data Analysis and ExtractionIPC Data Analysis and Extraction
IPC Data Analysis and Extraction
 
Acme data engineering case study
Acme data engineering case studyAcme data engineering case study
Acme data engineering case study
 
How Data Drives Business at Choice Hotels
How Data Drives Business at Choice HotelsHow Data Drives Business at Choice Hotels
How Data Drives Business at Choice Hotels
 
Database Freedom | AWS Floor28
Database Freedom | AWS Floor28Database Freedom | AWS Floor28
Database Freedom | AWS Floor28
 
Data Warehouse Optimization
Data Warehouse OptimizationData Warehouse Optimization
Data Warehouse Optimization
 
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT InfrastructuresOPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
OPEN'17_4_Postgres: The Centerpiece for Modernising IT Infrastructures
 
Big Data + PeopleSoft = BIG WIN!
Big Data + PeopleSoft = BIG WIN!Big Data + PeopleSoft = BIG WIN!
Big Data + PeopleSoft = BIG WIN!
 

More from ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...ScyllaDB
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
Database Performance at Scale Masterclass: Database Internals by Pavel Emelya...
 
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr SarnaDatabase Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
Database Performance at Scale Masterclass: Driver Strategies by Piotr Sarna
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 

Recently uploaded

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Data Platform Architecture Principles and Evaluation Criteria

  • 1. Data Platform Architecture Principles and Evaluation Criteria Pooja Kelgaonkar, Senior Data Architect at Rackspace Technology
  • 2. Pooja Kelgaonkar ■ Senior Data Architect - GCP, Snowflake ■ Specialist in Data Modernization Implementations ■ Expertise in “Data” domain ■ Learner, Tech Blogger, Tech evangelist ■ Reading, listening Indian classical music
  • 3. ■ Architecture Principles ■ Data Modernization ■ Data Platform Offerings ■ Evaluation Criteria ■ Sample Use Case - Evaluation Comparison Agenda 3
  • 5. Framework Pillars Operational Excellence 05 ● Serviceability - Easy Operations & Maint ● Maintenance - Data Pipeline Maintenance ● Reduced Ops Activities & Cost Efficiency 04 ● Performance Efficiency ● Cost Efficiency - Cost Optimized Availability 03 ● Reliability ● Resiliency of System ● Availability - System Time UP Scalability 02 ● Horizontal Scaling ● Vertical Scaling ● Auto Scaling Security 01 ● Access Management & Controls ● Data Protection - Encryption , Data Masking ● Compliance - ISO, HIPPA , PCI DSS ● Data Governance 5
  • 6. ■ Cloud Migration / Adoption - 5Rs of transformation ■ Rehost , Refactor , Revise , Rebuild and Replace Data Modernization Journey Data Discovery Analysis of existing Data Architecture, System Design and evaluating the need, requirements of new data system Data Architecture & Assessment Designing new data platform, assessment of data modelling, Data Governance and Security Data Architecture & Engineering Data Platform implementation, Data Pipeline development and enhancement POCs. Designing end to end cycle. Go Live & DataOps Soft launch/ early cut off to integrate with other systems and signing off from business users. Implementing operations of new platform and modified pipelines Data Migration & Pipeline Development/Conversion Actual pipeline development, conversion on new platform. Implementing , testing and validating pipelines/data on new platform. 05 01 02 03 04 05 01 02 03 04 6
  • 7. Design Framework Pillars & Considerations 7 Teams Architects Engineering Operations Who? When? How? Business Drive Technology Drive Management & Engagement Drive What? End User SLAs Assessment & SLA Setting System Assessment & Technology Evaluations Signed Up Services vs Open Source vs Hybrid Evaluations Business Assessment Technology Evangelist Sign Up for Services Business Teams
  • 9. There are various offerings to implement Data Platform by Public Cloud providers for DB / DW / Data Lake / Data Mesh / SQL / NoSQL etc. Cloud Native ● AWS Glue ● EMR ● Kinesis, Opensearch ● RDS , Aurora ● Redshift ● DynamoDB , DocumentDB AWS ● Azure Data Factory ● HDInsight ● Azure Stream Analytics ● Azure SQL, Managed SQL ● SQL Server, PostgreSQL ● MariaDB, CosmosDB, Managed Cassandra Azure ● DataFlow, Data Fusion ● DataProc ● Pub/Sub, Stream ● Cloud SQL , Cloud Spanner ● BigQuery , BigLake ● Bigtable, Firestore, Memorystore GCP 9
  • 10. There are a variety of offerings to implement data platform and design data pipelines using native and open source services. Data on Cloud - Common Offerings 10
  • 11. Data Platform - Evaluation Criteria (Assessment Phase)
  • 12. Evaluation - Pre-Requisites Evaluation Criteria Existing/Cross Application Platform to be Evaluated New Platforms to be Explored Platform Offerings Existing Support Tier/ Billing Plans Platform Offerings Probable Platform to be Evaluated, Cost Comparisons Done? Managed/Native Services / BYOL services Existing System Licenses, Integrators- BI, OPS tools Managed/Native Services / BYOL services Existing System Licenses, Integrators - BI, OPS tools Specific Evaluation or Open Evaluation to Select Best Fit for Given Use Case 12
  • 13. 13 Evaluation - Inputs Capex vs Opex % of Data Scan vs Processed Compute vs Storage Utilization Data Challenges System Challenges Capex vs Opex % Storage vs Scan vs Processed Compute vs Storage Utilization % Data Challenges System Challenges
  • 14. Evaluation - CheckPoints 1 Data Operations & Business Critical Requirements ● Data Pipeline Management - Monitoring & Operations ● Business Requirements - 24X7 Monitoring vs SLAs ● Critical Applications - Availability & SLAs 3 Business Checkpoint ● Data Availability - SLAs ● End User Agreements ● Business Requirements - Specific to Tooling ● Existing Cost utilization ● Performance Ratio - Current vs Expected ● Modernization Drive 5 Data Platform Checkpoint ● Type of Data - Structured, Semi-Structured, Unstructured ● Sources of Data - Files , DBs, ioTs, Devices, APIs ● Consumers of Data - Users vs System ● Frequency of Data - Batch, RealTime ● Data Storage - Active vs Passive ● Data Modelling - Schema, Tables , DB Objects 2 Data Analytics Checkpoint ● Data Analytics - BI Tooling ● Predictive Analytics - Algorithms, Tools, Libraries used ● AI/ML Use Cases - Customer Facing vs In-House ● Enterprise vs Cloud Native 4 Data Processing Checkpoint ● Target Systems Integrations ● Data Usage - Hot Data vs Cold Data ● Data Stored vs Data Processed vs Reads ● Data Pipelines - Batch vs Streaming ● Data Pipeline Complexity - S/M/C/VC ● Data Pipeline Scheduling - Tools , Cron jobs, Native Schedulers, Event based ● ETL vs ELT Requirements 14
  • 15. 15 Evaluation - Metrics Checkpoint Category Metrics Data Checkpoint Data Integrators No of Sources No of Target No of Specific Systems Total Storage Volume Daily Delta Volume Data Modelling Frequency of Schema Evolution No of Objects % of NoSQL Objects % of PL SQL Objects Data Processing Data Pipelines No of S/M/C Jobs No of External Functions Integrated (Java/Python/SQL) No of ETL Jobs (Tool Based) No of Compute Intense Jobs No of Storage Intense Jobs Checkpoint Category Metrics Business Checkpoint Operations No of Times SLA Challenged No of End Users Affected Reliability No of times Data compromised No of DR activities No of end users impacted Performance Efficiency Total Batch Time No of Times Batch SLA Impacted No of End User Reports No of End Users/Consumers No of Poor Performing Reports/Queries Cost Utilization Overall Billing ( Capex ) Total Operations, Maint cost Data Operations Monitoring No of Support Team Members No of Monitoring Dashboards Data Analytics Analytics No of ML Jobs/Algorithms ML Integrators
  • 17. Evaluation - Pre-Requisites 17 Evaluation Criteria Existing/Cross Application Platform to be Evaluated New Platforms to be Explored Platform Offerings Existing Support Tier/ Billing Plans Platform Offerings Probable Platform to be Evaluated, Cost Comparisons Done? Managed/Native Services / BYOL services Existing System Licenses, Integrators- BI, OPS tools Managed/Native Services / BYOL services Existing System Licenses, Integrators - BI, OPS tools Specific Evaluation or Open Evaluation to Select Best Fit for Given Use Case
  • 18. Evaluation - Inputs ■ Domain - Retail , DW - Teradata, ETL - DataStage ■ Platform - Recently Signed up for Google Cloud Platform ■ Data Platform - Evaluate GCP Services to Setup Data Warehouse Platform ■ DW Size - 120TB (70 TB Active + 50 TB Passive ) ■ Daily Volume - 1TB ( 80% Batch + 20% Streaming ) ■ Data - Structured & Semi-structured (JSON, XML) ■ Data Pipelines - Mostly ELT - Datastage to Teradata (landing layer), Teradata SQL to Transform Data ■ Data Analytics - Tableau Reports - Customer Reports ■ Enterprise Scheduler - Control-M , Ticketing Tool - JIRA , Alerting via Slack, Email ■ Monitoring Dashboards , 24X7 Support Team 18
  • 19. DW - Google BigQuery vs Azure Synapse BigQuery Synapse Observations ● Supports More Than 90% of Requirements ● SaaS Offering , Cloud Managed ● Very Well Integrated 1 Data Platform Checkpoint ● Native Drivers to Support Batch & Stream ● Highest Data Processing Speed ● Storage vs Compute - Scaling In and Out ● Automatic Scaling, Performance Efficient 2 Data Processing Checkpoint ● Can Be Integrated With Any BI Tools ● Support AI/ML Libraries and Jobs ● Performance Efficient - Data Processing , Scanning 3 Data Analytics Checkpoint ● Customized Logging & Monitoring ● Native vs Customized Dashboards ● Integration With Various Alerting, Messaging Tools 5 Data Operations ● High Availability ● Automatic Failover , No DR Required ● Performance & Cost Efficient ● Pay as You Go vs Commitment Comparison Based on Overall Usage 4 Business Checkpoint 19
  • 20. Evaluation - Final Report Approach 1 Approach 2 DW BigQuery BigQuery ETL + ELT Pipelines Modify DS jobs to use BQ connector to load data to BQ landing layer Convert DS load jobs to BQ load jobs to pull data from source and load to BQ (this is depending on types of source systems and integration complexity) Data Storage Store active data in BQ native tables with roll up policies and store passive datasets on GCS layer depending on usage of tables. External tables can be built on GCS datasets. Store active data in BQ native tables with roll up policies and store passive datasets on GCS layer depending on usage of tables.External tables can be built on GCS datasets. Data Analytics Tableau connections can be replaced with BQ connections Tableau connections can be replaced with BQ connections Data Pipeline Scheduler & Maint Control-M can be used to trigger the pipelines, Orchestration can be done using Composer. Existing ticketing tools, alerting tools can be used as is Control-M can be used to trigger the pipelines, Orchestration can be done using Composer.Existing ticketing tools, alerting tools can be used as is BigQuery is opted here post evaluation which is completely based on the initial sign up to GCP as well as data storage % ratio between active and passive storage. Azure Synapse can offer the same capabilities however choices here are business & enterprise driven. 20
  • 21. Thank You Stay in Touch Pooja Kelgaonkar poojakelgaonkar@gmail.com & pooja.kelgaonkar@rackspace.com www.linkedin.com/in/poojakelgaonkar poojakelgaonkar.medium.com