SlideShare uma empresa Scribd logo
1 de 39
Baixar para ler offline
The Evolution of the Data
Platform and What It Means
to Enterprise Analytic
Strategy
Presented by: William McKnight
President, McKnight Consulting Group
williammcknight
www.mcknightcg.com
(214) 514-1444
FINANCIAL
SERVICES
HEALTHCARE
CPG/
MANUFACTURING
OTHER
AtScale Helps Customers Modernize their
Analytics Infrastructure
2
RETAIL
Where AtScale Fits in the New Analytics Stack
3
COMPONENT
CONSUMPTION
VISUALIZATION, ANALYSIS,
REPORTING
SEMANTIC LAYER
QUERY ACCESS, FILTERING, MASKING,
AUDITING
PREPARED DATA
DATA PROCESSING, MODELING
RAW DATA
DATA STORAGE, ENCRYPTION
DATA TRANSFORMATION
ETL,MERGING, AGGREGATION
LAYER (FUNCTION)
BI Tools
AI/ML
Tools
Application
s
Multi-dimensional Engine
Data Governance Engine
Virtualization Engine
Data Warehouse
File Access
Engine
ETL Engine
File System (Data
Lake)
Data
Catalog
AtScale Proof Points
4
Mitigate Risks Associated with
Data & Analytics
● Tyson Foods migrated from HDP to RedShift in 1 click
● UHG used offshore resources while protecting PII
● Home Depot supports internal & external users
One Consistent/Compliant View
of Business Metrics &
Definitions
● Cigna runs PMPM calculations across all dimensions
● Wayfair migrated to GCP while keeping OLAP
● Kohl’s can now deliver inventory management to LOB
Control Complexity & Cost of
Analytics
● Home Depot serves 10x more analysts for same cost
● BOL.com reduced their BigQuery costs by 91%
● Wayfair increased their data retention by 5x
Accelerate Data Driven
Decisions at Scale
● Koch Industries improved query performance by 40x
● Kohl’s eliminated a 9 month new data prep cycle
● BOL.com delivers 93% of queries in under 1 second
AtScale: How We Do It
5
Universal Semantic Layer
✓ Supports SQL and OLAP
✓ One location to define business metrics
✓ Abstracts physical location & format
Intelligent Data Virtualization
✓ Works with “Live” data
✓ Supports on-premise & Cloud
✓ Automates performance management
Autonomous Data Engineering
✓ Automates data engineering tasks
✓ Proactively tunes performance
✓ Reduces & makes cloud costs predictable
Security & Governance
✓ Extends security frameworks to cloud
✓ Inherits security of data platform
✓ Universal row & column level security
6
Business
Intelligence &
Analytics
Tools
Big Data
Platforms &
Engines
API (REST) SQL (JDBC / ODBC)MDX (XMLA)
Visual
Data Modeler
Logical
Security
Layer
Metadata Repository
Semantic Model / Data Catalog / Data Lineage Map
Multi-Dimensional Engine
Virtualized Acceleration Structures
Data Abstraction Layer
Adaptive
Analytics Fabric
Interfaces
AtScale’s Adaptive Analytics Fabric
Distributed Query Engine
New
AI
Engine
Physical Security & Governance Layer
Test
Improvement Factor with AtScale
BigQuery Redshift Snowflake Synapse Databricks
Query
Performance1 8x Faster 12x Faster 6x Faster 3x Faster 12x Faster
User
Concurrency2 20x Faster 61x Faster 16x Faster 9x Faster 86x Faster
Improved ROI3 10x Better 2.6x Better 4x Better 2x Better 11x Better
Complexity4 76% less complex SQL queries
TPC-DS 10TB Benchmark:
Improvements with AtScale
7
1. Elapsed time for executing 1 query five times
2. Elapsed time executing 1 (x5), 5, 25, 50 queries
3. Compute costs for cluster time (Redshift, Snowflake) or bytes read (BigQuery) for user concurrency test
4. Complexity score for SQL queries for number of: functions, operations, tables, objects & subqueries (AtScale = 258, TPC-DS = 1,057)
William McKnight
President, McKnight Consulting Group
• Frequent keynote speaker and trainer internationally
• Consulted to many Global 1000 companies
• Hundreds of articles, blogs, white papers, field tests, etc.
in publication
• Focused on delivering business value and solving business
problems utilizing proven, streamlined approaches to
information management
• Former Database Engineer, Fortune 50 Information
Technology executive and Ernst&Young Entrepreneur of
Year Finalist
• Owner/consultant: Data strategy and implementation
consulting firm
• 25+ years of information management and data
experience
8
McKnight Consulting Group Offerings
Strategy
Training
Strategy
 Trusted Advisor
 Action Plans
 Roadmaps
 Tool Selections
 Program Management
Training
 Classes
 Workshops
Implementation
 Data/Data Warehousing/Business
Intelligence/Analytics
 Master Data Management
 Governance/Quality
 Big Data
Implementation
9
10
All Data Under
Management
Data is Under Management when it is…
• In a leveragable platform
• In an appropriate platform for its profile and
usage
• With high non-functionals (Availability,
performance, scalability, stability, durability,
secure)
• Data is captured at the most granular level
• Data is at a data quality standard (as defined by
Data Governance)
11
AI Data
• Call center recordings and chat logs
• Streaming sensor data, historical maintenance records and
search logs
• Customer account data and purchase history
• Email response metrics
• Product catalogs and data sheets
• Public references
• YouTube video content audio tracks
• User website behaviors
• Sentiment analysis, user-generated content, social graph data,
and other external data sources
12
The Relational Database Data Page
© McKnight Consulting Group, 2010
Page Header
Page
Footer
Row IDs
Records
1120Aris Doug Johnson Practice Director
206-555-5636 doug.johnson@aris.com
1121Stolt Offshore MS Ltd Craig LennoxDirector 226-5555-
1269 craig.lennox@stoltoffshore.com
1122Medtronic, Inc. Mark Smith Principle Database Administrator
763-555-2557
mark.smith@medtronic.com
Columnar Orientation
15
Distributed File Systems
16
Node 5Node 4Node 3Node 2Node 1
Block
1
Block
3
Block
2
Block
1
Block
3
Block
2
Block
3
Block
2
Block
1
Hadoop Sequence File and Parquet File
17
Cloud Storage
Data Scientist Workbench and Data Warehouse
Staging
OLTP
Systems
Data Lake
Data Scientists
ERP
CRM
Supply
Chain
MDM
…
Data
Warehouse
Data Mart
Stream or
Batch
Updates
DI
Real-Time,
Event-Driven
Apps
18
Graph Databases
Bridge
vertex
Bridge
vertex
19
• Subject: John R Peterson Predicate: Knows Object: Frank T Smith
• Subject: Triple #1 Predicate: Confidence Percent Object: 70
• Subject: Triple #1 Predicate: Provenance Object: Mary L Jones
Data Warehouses, Data Marts,
Data Lakes, Big Data Stores
Best Category and Top Tool Picked
Best Category Picked
Top 2 Category Picked
The No Decision Platform
80%
70%
60%
50%
Increasing Probability that Platform
Selection Leads to Success
3 Major Decisions
• Decision #1: The Data Store Type
– The largest factor for distinguishing between databases and file-based scale-out system utilization is the
data profile. The latter is best for data that fits the loose label of 'unstructured' (or semi-structured)
data, while more traditional data -- and smaller volumes of all data -- still belong in a relational
database.
• Decision #2: Data Store Placement
– You must also decide where to place your data store -- on-premises or in the cloud (and which cloud). In
the past, the only clear choice for most organizations was on-premises data. However, the costs of scale
are gnawing away at the notion that this remains the best approach for a data platform. For more on
why databases are moving to the cloud, please read this article.
• Decision #3: The Workload Architecture
– Finally, you must keep in mind the distinction between operational or analytical workloads. Short
transactional requests and more complex (often longer) analytics requests demand different
architectures. Analytics databases, though quite diverse, are the preferred platforms for the analytics
workload.
(and Price)
22
What is the Data Platform for?
• Operational Database
• Operational Real-Time
• Operational Big Data
• Operational Data Hub
• Master Data Management
• A Data Warehouse
• A Dependent Data Mart
– Dependent
– Independent
• A Data Lake
• Analytic Big Data Application
• Archive Storage
• A Staging Area
23
Analytics Reference Architecture
Logs
(Apps, Web,
Devices)
User tracking
Operational
Metrics
Offload
data
Raw Data Topics
JSON, AVRO
Processed
Data Topics
Sensors
and
/ or
Transactiona
l/ Context
Data
OLTP/ODS
ETL
Or
EL with
T in Spark
Batch
Low
Latency
Applications
Files
In-
database
analytics
Reach
through
or ETL/ELT
or
Stream
Processing
or
Stream
Processing
Q
Q
Data
Warehouse
Data Warehousing
• Data Warehouses (still) have a lower
total cost of ownership than data
marts
• A data warehouse is a SHARED
platform
– Build once, use many
– Access at Data Warehouse
– Access by creating a mart off the DW
• Still A LOT cheaper than building from scratch
“… a subject-
oriented, integrated,
non-volatile, time-
variant collection of
data, organized to
support
management
needs.” — Bill Inmon
Data Warehouses Have Flavors
● The Customer Experience Transformation Data Warehouse focuses on
customer attributes and touchpoints to improve the value of
customers.
● The Asset Maximization with IoT data warehouse deals with the high
volume of edge data tracking the physical assets of the organization.
● The Operational Extension Data Warehouse supports company
operations directly with real- time analytics.
● The Risk Management Data Warehouse supports the ever-growing
compliance and reporting requirements and corporate risk.
● The Finance Modernization Data Warehouse handles the voluminous
financial reporting and ensures the bottom line is considered in every
aspect of the business.
● The Product Innovation Data Warehouse delivers all product-related
information into the decisions of the product life cycle.
27
Cloud Analytic Databases
• Azure SQL Data Warehouse is scaled by Data Warehouse Units (DWUs)
which are bundled combinations of CPU, memory, and I/O. According to
Microsoft, DWUs are “abstract, normalized measures of compute
resources and performance.”
• Amazon Redshift uses EC2-like instances with tightly-coupled compute and
storage nodes which is a “node” in a more conventional sense
• Snowflake “nodes” are loosely defined as a measure of virtual compute
resources. Their architecture is described as “a hybrid of traditional
shared-disk database architectures and shared-nothing database
architectures.” Thus, it is difficult to infer what a “node” actually is.
• Google BigQuery does not use the concept of a node at all, but instead
refers to “slots” as “a unit of computational capacity required to execute
SQL queries,” which is also a vague and abstract concept to the average
user.
What is a Node?
• For many, you pay for compute resources as a function of time
– The hourly rate can vary slightly by region
– You may also choose the hourly rate based on certain enterprise features
you need
– Also, you need to add the separate storage charge to store the terabytes of
data (compressed)
• This is also expressed as per hour, although it is substantially less than
compute
• Alternatively, some cloud vendors have consumption-based pricing
models, where instead of paying by the hour, you pay by the byte
processed
– You would multiply the terabytes of data by the on-demand dollars per
terabyte pricing
– There is also a cost-per-hour flat rate where you would need to calculate
how long it would take to run your queries to completion
Costing the Platform
• Autonomous Administration
• Lack of Platform Features Leads to Increased Configuration
and Management
– stored procedures, referential integrity and uniqueness capabilities
– mission critical options for backup and disaster recovery, which
typically includes a standby database
– full ANSI-SQL compliance
• Performance
Total Cost of Ownership is More Than Just Cloud
Costs
• If an additional identical cluster is deployed to handle the additional user queries, the cost doubles
for the time period the additional cluster is up and running
Pricing Gotchas: Scale Out Impact on Cost
• Whenever a data warehouse does not have enough memory
to build a join hash table and keep it in memory, it has to spill
it to disk
– This is costly in terms of performance, because the DBMS has to do
double work writing, sorting, and reading the hash table information
all on disk—rather than in memory
• If you want to provision a medium-sized cluster and let it scale
up to two medium clusters during the busy hours to handle
the higher concurrency, a large JOIN would spill to disk on one
of the clusters
Pricing Gotchas: Memory Pressure on Scale Out
Compute
• In the cloud, storage is offered
by media type—solid state
(SSD) or spinning hard drive
(HDD)
• SSD is also offered in tiers
– Azure has Standard and Premium
– AWS has general purpose (gp2)
and provisioned (io1) SSD tiers.
Pricing Gotchas: Appropriate I/O is Expensive
Data Integration
34
Data Virtualization
“The right answer is not
always to centralize the
data. Data Virtualization
will be of utmost
importance as the
‘perpetual short-term’
solution to the need.”
35
Data Warehouses
Marts & Cubes Operational
Data Stores Transactional
Sources
File Systems
Big Data
Enterprise Data
Virtualization
Capabilities for Data Integration for Enterprise
Data
• Comprehensive Native Connectivity
• Multi-Latency Data Ingestion
• Data Integration (in ETL, ELT, Streaming)
• Data Quality and Data Governance
• Data Cataloging and Metadata Management
• Enterprise Trust, Enterprise Scale (or Class)
• AI Intelligence and Automation
• Ecosystem and Multi-cloud
Enterprise Data Integration Options
Data Integration Options
Project Technical Environments Recommended For
Consideration Project Scope
Heterogenous:
Cloudera Any Any
IBM Any Any
Informatica Any Any
Talend Any Any
Specialist:
AWS (Glue) Environments on AWS with core of Redshift, EMR Any
Azure (Azure Data Factory) Environments on Azure with core of Synapse, HDInsight Any
FiveTran Any Contained scope
Google Environments on GCP with core of BQ, DataProc Any
Matillion Any Contained scope
Oracle Environments with Oracle database Any
SAP SAP-only environments SAP projects
The Evolution of the Data
Platform and What It Means
to Enterprise Analytic
Strategy
Presented by: William McKnight
President, McKnight Consulting Group
williammcknight
www.mcknightcg.com
(214) 514-1444

Mais conteúdo relacionado

Mais procurados

The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star Schema
DATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 
Platforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern EngineeringPlatforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern Engineering
DATAVERSITY
 
Slides: Beyond Metadata — Enrich Your Metadata Management with Deep-Level Dat...
Slides: Beyond Metadata — Enrich Your Metadata Management with Deep-Level Dat...Slides: Beyond Metadata — Enrich Your Metadata Management with Deep-Level Dat...
Slides: Beyond Metadata — Enrich Your Metadata Management with Deep-Level Dat...
DATAVERSITY
 
Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...
Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...
Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...
DATAVERSITY
 

Mais procurados (20)

Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
 
Slides: Enterprise Architecture vs. Data Architecture
Slides: Enterprise Architecture vs. Data ArchitectureSlides: Enterprise Architecture vs. Data Architecture
Slides: Enterprise Architecture vs. Data Architecture
 
The Death of the Star Schema
The Death of the Star SchemaThe Death of the Star Schema
The Death of the Star Schema
 
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Platforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern EngineeringPlatforming the Major Analytic Use Cases for Modern Engineering
Platforming the Major Analytic Use Cases for Modern Engineering
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
 
How to Create a Data Analytics Roadmap
How to Create a Data Analytics RoadmapHow to Create a Data Analytics Roadmap
How to Create a Data Analytics Roadmap
 
Unlocking the Value of Your Data Lake
Unlocking the Value of Your Data LakeUnlocking the Value of Your Data Lake
Unlocking the Value of Your Data Lake
 
DAS Slides: Data Virtualization – Separating Myth from Reality
DAS Slides: Data Virtualization – Separating Myth from RealityDAS Slides: Data Virtualization – Separating Myth from Reality
DAS Slides: Data Virtualization – Separating Myth from Reality
 
Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...Building an Effective Data & Analytics Operating Model A Data Modernization G...
Building an Effective Data & Analytics Operating Model A Data Modernization G...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
DAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best PracticesDAS Slides: Data Quality Best Practices
DAS Slides: Data Quality Best Practices
 
NTXISSACSC3 - Why Enterprise Information Management is the Key to GRC by Mika...
NTXISSACSC3 - Why Enterprise Information Management is the Key to GRC by Mika...NTXISSACSC3 - Why Enterprise Information Management is the Key to GRC by Mika...
NTXISSACSC3 - Why Enterprise Information Management is the Key to GRC by Mika...
 
Slides: Beyond Metadata — Enrich Your Metadata Management with Deep-Level Dat...
Slides: Beyond Metadata — Enrich Your Metadata Management with Deep-Level Dat...Slides: Beyond Metadata — Enrich Your Metadata Management with Deep-Level Dat...
Slides: Beyond Metadata — Enrich Your Metadata Management with Deep-Level Dat...
 
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture MaturityADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
 
RWDG Slides: Governing Your Data Catalog, Business Glossary, and Data Dictionary
RWDG Slides: Governing Your Data Catalog, Business Glossary, and Data DictionaryRWDG Slides: Governing Your Data Catalog, Business Glossary, and Data Dictionary
RWDG Slides: Governing Your Data Catalog, Business Glossary, and Data Dictionary
 
Death of the Dashboard
Death of the DashboardDeath of the Dashboard
Death of the Dashboard
 
Data-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDMData-Ed Online: Unlock Business Value through Reference & MDM
Data-Ed Online: Unlock Business Value through Reference & MDM
 
Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...
Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...
Slides: Migrate BI Dashboards to Run Directly on a Cloud Data Lake in Five Ea...
 

Semelhante a ADV Slides: The Evolution of the Data Platform and What It Means to Enterprise Analytic Strategy

Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo
 

Semelhante a ADV Slides: The Evolution of the Data Platform and What It Means to Enterprise Analytic Strategy (20)

ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
ADV Slides: Platforming Your Data for Success – Databases, Hadoop, Managed Ha...
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
Data Fabric - Why Should Organizations Implement a Logical and Not a Physical...
 
Modern Data Management for Federal Modernization
Modern Data Management for Federal ModernizationModern Data Management for Federal Modernization
Modern Data Management for Federal Modernization
 
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
Denodo Partner Connect: A Review of the Top 5 Differentiated Use Cases for th...
 
Data Mesh using Microsoft Fabric
Data Mesh using Microsoft FabricData Mesh using Microsoft Fabric
Data Mesh using Microsoft Fabric
 
Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)Data Virtualization: Introduction and Business Value (UK)
Data Virtualization: Introduction and Business Value (UK)
 
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
Building a Single Logical Data Lake: For Advanced Analytics, Data Science, an...
 
Chip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochureChip ICT | Hgst storage brochure
Chip ICT | Hgst storage brochure
 
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data LakesADV Slides: Building and Growing Organizational Analytics with Data Lakes
ADV Slides: Building and Growing Organizational Analytics with Data Lakes
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
 
Fast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow PresentationFast Data Strategy Houston Roadshow Presentation
Fast Data Strategy Houston Roadshow Presentation
 
Estimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics PlatformEstimating the Total Costs of Your Cloud Analytics Platform
Estimating the Total Costs of Your Cloud Analytics Platform
 
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
DAMA & Denodo Webinar: Modernizing Data Architecture Using Data Virtualization
 
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
Data Virtualization Journey: How to Grow from Single Project and to Enterpris...
 
Benefits of a data lake
Benefits of a data lake Benefits of a data lake
Benefits of a data lake
 

Mais de DATAVERSITY

The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
DATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
DATAVERSITY
 

Mais de DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Último

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 

Último (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 

ADV Slides: The Evolution of the Data Platform and What It Means to Enterprise Analytic Strategy

  • 1. The Evolution of the Data Platform and What It Means to Enterprise Analytic Strategy Presented by: William McKnight President, McKnight Consulting Group williammcknight www.mcknightcg.com (214) 514-1444
  • 2. FINANCIAL SERVICES HEALTHCARE CPG/ MANUFACTURING OTHER AtScale Helps Customers Modernize their Analytics Infrastructure 2 RETAIL
  • 3. Where AtScale Fits in the New Analytics Stack 3 COMPONENT CONSUMPTION VISUALIZATION, ANALYSIS, REPORTING SEMANTIC LAYER QUERY ACCESS, FILTERING, MASKING, AUDITING PREPARED DATA DATA PROCESSING, MODELING RAW DATA DATA STORAGE, ENCRYPTION DATA TRANSFORMATION ETL,MERGING, AGGREGATION LAYER (FUNCTION) BI Tools AI/ML Tools Application s Multi-dimensional Engine Data Governance Engine Virtualization Engine Data Warehouse File Access Engine ETL Engine File System (Data Lake) Data Catalog
  • 4. AtScale Proof Points 4 Mitigate Risks Associated with Data & Analytics ● Tyson Foods migrated from HDP to RedShift in 1 click ● UHG used offshore resources while protecting PII ● Home Depot supports internal & external users One Consistent/Compliant View of Business Metrics & Definitions ● Cigna runs PMPM calculations across all dimensions ● Wayfair migrated to GCP while keeping OLAP ● Kohl’s can now deliver inventory management to LOB Control Complexity & Cost of Analytics ● Home Depot serves 10x more analysts for same cost ● BOL.com reduced their BigQuery costs by 91% ● Wayfair increased their data retention by 5x Accelerate Data Driven Decisions at Scale ● Koch Industries improved query performance by 40x ● Kohl’s eliminated a 9 month new data prep cycle ● BOL.com delivers 93% of queries in under 1 second
  • 5. AtScale: How We Do It 5 Universal Semantic Layer ✓ Supports SQL and OLAP ✓ One location to define business metrics ✓ Abstracts physical location & format Intelligent Data Virtualization ✓ Works with “Live” data ✓ Supports on-premise & Cloud ✓ Automates performance management Autonomous Data Engineering ✓ Automates data engineering tasks ✓ Proactively tunes performance ✓ Reduces & makes cloud costs predictable Security & Governance ✓ Extends security frameworks to cloud ✓ Inherits security of data platform ✓ Universal row & column level security
  • 6. 6 Business Intelligence & Analytics Tools Big Data Platforms & Engines API (REST) SQL (JDBC / ODBC)MDX (XMLA) Visual Data Modeler Logical Security Layer Metadata Repository Semantic Model / Data Catalog / Data Lineage Map Multi-Dimensional Engine Virtualized Acceleration Structures Data Abstraction Layer Adaptive Analytics Fabric Interfaces AtScale’s Adaptive Analytics Fabric Distributed Query Engine New AI Engine Physical Security & Governance Layer
  • 7. Test Improvement Factor with AtScale BigQuery Redshift Snowflake Synapse Databricks Query Performance1 8x Faster 12x Faster 6x Faster 3x Faster 12x Faster User Concurrency2 20x Faster 61x Faster 16x Faster 9x Faster 86x Faster Improved ROI3 10x Better 2.6x Better 4x Better 2x Better 11x Better Complexity4 76% less complex SQL queries TPC-DS 10TB Benchmark: Improvements with AtScale 7 1. Elapsed time for executing 1 query five times 2. Elapsed time executing 1 (x5), 5, 25, 50 queries 3. Compute costs for cluster time (Redshift, Snowflake) or bytes read (BigQuery) for user concurrency test 4. Complexity score for SQL queries for number of: functions, operations, tables, objects & subqueries (AtScale = 258, TPC-DS = 1,057)
  • 8. William McKnight President, McKnight Consulting Group • Frequent keynote speaker and trainer internationally • Consulted to many Global 1000 companies • Hundreds of articles, blogs, white papers, field tests, etc. in publication • Focused on delivering business value and solving business problems utilizing proven, streamlined approaches to information management • Former Database Engineer, Fortune 50 Information Technology executive and Ernst&Young Entrepreneur of Year Finalist • Owner/consultant: Data strategy and implementation consulting firm • 25+ years of information management and data experience 8
  • 9. McKnight Consulting Group Offerings Strategy Training Strategy  Trusted Advisor  Action Plans  Roadmaps  Tool Selections  Program Management Training  Classes  Workshops Implementation  Data/Data Warehousing/Business Intelligence/Analytics  Master Data Management  Governance/Quality  Big Data Implementation 9
  • 11. Data is Under Management when it is… • In a leveragable platform • In an appropriate platform for its profile and usage • With high non-functionals (Availability, performance, scalability, stability, durability, secure) • Data is captured at the most granular level • Data is at a data quality standard (as defined by Data Governance) 11
  • 12. AI Data • Call center recordings and chat logs • Streaming sensor data, historical maintenance records and search logs • Customer account data and purchase history • Email response metrics • Product catalogs and data sheets • Public references • YouTube video content audio tracks • User website behaviors • Sentiment analysis, user-generated content, social graph data, and other external data sources 12
  • 13.
  • 14. The Relational Database Data Page © McKnight Consulting Group, 2010 Page Header Page Footer Row IDs Records 1120Aris Doug Johnson Practice Director 206-555-5636 doug.johnson@aris.com 1121Stolt Offshore MS Ltd Craig LennoxDirector 226-5555- 1269 craig.lennox@stoltoffshore.com 1122Medtronic, Inc. Mark Smith Principle Database Administrator 763-555-2557 mark.smith@medtronic.com
  • 16. Distributed File Systems 16 Node 5Node 4Node 3Node 2Node 1 Block 1 Block 3 Block 2 Block 1 Block 3 Block 2 Block 3 Block 2 Block 1
  • 17. Hadoop Sequence File and Parquet File 17
  • 18. Cloud Storage Data Scientist Workbench and Data Warehouse Staging OLTP Systems Data Lake Data Scientists ERP CRM Supply Chain MDM … Data Warehouse Data Mart Stream or Batch Updates DI Real-Time, Event-Driven Apps 18
  • 19. Graph Databases Bridge vertex Bridge vertex 19 • Subject: John R Peterson Predicate: Knows Object: Frank T Smith • Subject: Triple #1 Predicate: Confidence Percent Object: 70 • Subject: Triple #1 Predicate: Provenance Object: Mary L Jones
  • 20. Data Warehouses, Data Marts, Data Lakes, Big Data Stores
  • 21. Best Category and Top Tool Picked Best Category Picked Top 2 Category Picked The No Decision Platform 80% 70% 60% 50% Increasing Probability that Platform Selection Leads to Success
  • 22. 3 Major Decisions • Decision #1: The Data Store Type – The largest factor for distinguishing between databases and file-based scale-out system utilization is the data profile. The latter is best for data that fits the loose label of 'unstructured' (or semi-structured) data, while more traditional data -- and smaller volumes of all data -- still belong in a relational database. • Decision #2: Data Store Placement – You must also decide where to place your data store -- on-premises or in the cloud (and which cloud). In the past, the only clear choice for most organizations was on-premises data. However, the costs of scale are gnawing away at the notion that this remains the best approach for a data platform. For more on why databases are moving to the cloud, please read this article. • Decision #3: The Workload Architecture – Finally, you must keep in mind the distinction between operational or analytical workloads. Short transactional requests and more complex (often longer) analytics requests demand different architectures. Analytics databases, though quite diverse, are the preferred platforms for the analytics workload. (and Price) 22
  • 23. What is the Data Platform for? • Operational Database • Operational Real-Time • Operational Big Data • Operational Data Hub • Master Data Management • A Data Warehouse • A Dependent Data Mart – Dependent – Independent • A Data Lake • Analytic Big Data Application • Archive Storage • A Staging Area 23
  • 24. Analytics Reference Architecture Logs (Apps, Web, Devices) User tracking Operational Metrics Offload data Raw Data Topics JSON, AVRO Processed Data Topics Sensors and / or Transactiona l/ Context Data OLTP/ODS ETL Or EL with T in Spark Batch Low Latency Applications Files In- database analytics Reach through or ETL/ELT or Stream Processing or Stream Processing Q Q Data Warehouse
  • 25. Data Warehousing • Data Warehouses (still) have a lower total cost of ownership than data marts • A data warehouse is a SHARED platform – Build once, use many – Access at Data Warehouse – Access by creating a mart off the DW • Still A LOT cheaper than building from scratch “… a subject- oriented, integrated, non-volatile, time- variant collection of data, organized to support management needs.” — Bill Inmon
  • 26. Data Warehouses Have Flavors ● The Customer Experience Transformation Data Warehouse focuses on customer attributes and touchpoints to improve the value of customers. ● The Asset Maximization with IoT data warehouse deals with the high volume of edge data tracking the physical assets of the organization. ● The Operational Extension Data Warehouse supports company operations directly with real- time analytics. ● The Risk Management Data Warehouse supports the ever-growing compliance and reporting requirements and corporate risk. ● The Finance Modernization Data Warehouse handles the voluminous financial reporting and ensures the bottom line is considered in every aspect of the business. ● The Product Innovation Data Warehouse delivers all product-related information into the decisions of the product life cycle.
  • 28. • Azure SQL Data Warehouse is scaled by Data Warehouse Units (DWUs) which are bundled combinations of CPU, memory, and I/O. According to Microsoft, DWUs are “abstract, normalized measures of compute resources and performance.” • Amazon Redshift uses EC2-like instances with tightly-coupled compute and storage nodes which is a “node” in a more conventional sense • Snowflake “nodes” are loosely defined as a measure of virtual compute resources. Their architecture is described as “a hybrid of traditional shared-disk database architectures and shared-nothing database architectures.” Thus, it is difficult to infer what a “node” actually is. • Google BigQuery does not use the concept of a node at all, but instead refers to “slots” as “a unit of computational capacity required to execute SQL queries,” which is also a vague and abstract concept to the average user. What is a Node?
  • 29. • For many, you pay for compute resources as a function of time – The hourly rate can vary slightly by region – You may also choose the hourly rate based on certain enterprise features you need – Also, you need to add the separate storage charge to store the terabytes of data (compressed) • This is also expressed as per hour, although it is substantially less than compute • Alternatively, some cloud vendors have consumption-based pricing models, where instead of paying by the hour, you pay by the byte processed – You would multiply the terabytes of data by the on-demand dollars per terabyte pricing – There is also a cost-per-hour flat rate where you would need to calculate how long it would take to run your queries to completion Costing the Platform
  • 30. • Autonomous Administration • Lack of Platform Features Leads to Increased Configuration and Management – stored procedures, referential integrity and uniqueness capabilities – mission critical options for backup and disaster recovery, which typically includes a standby database – full ANSI-SQL compliance • Performance Total Cost of Ownership is More Than Just Cloud Costs
  • 31. • If an additional identical cluster is deployed to handle the additional user queries, the cost doubles for the time period the additional cluster is up and running Pricing Gotchas: Scale Out Impact on Cost
  • 32. • Whenever a data warehouse does not have enough memory to build a join hash table and keep it in memory, it has to spill it to disk – This is costly in terms of performance, because the DBMS has to do double work writing, sorting, and reading the hash table information all on disk—rather than in memory • If you want to provision a medium-sized cluster and let it scale up to two medium clusters during the busy hours to handle the higher concurrency, a large JOIN would spill to disk on one of the clusters Pricing Gotchas: Memory Pressure on Scale Out Compute
  • 33. • In the cloud, storage is offered by media type—solid state (SSD) or spinning hard drive (HDD) • SSD is also offered in tiers – Azure has Standard and Premium – AWS has general purpose (gp2) and provisioned (io1) SSD tiers. Pricing Gotchas: Appropriate I/O is Expensive
  • 35. Data Virtualization “The right answer is not always to centralize the data. Data Virtualization will be of utmost importance as the ‘perpetual short-term’ solution to the need.” 35 Data Warehouses Marts & Cubes Operational Data Stores Transactional Sources File Systems Big Data Enterprise Data Virtualization
  • 36. Capabilities for Data Integration for Enterprise Data • Comprehensive Native Connectivity • Multi-Latency Data Ingestion • Data Integration (in ETL, ELT, Streaming) • Data Quality and Data Governance • Data Cataloging and Metadata Management • Enterprise Trust, Enterprise Scale (or Class) • AI Intelligence and Automation • Ecosystem and Multi-cloud
  • 38. Data Integration Options Project Technical Environments Recommended For Consideration Project Scope Heterogenous: Cloudera Any Any IBM Any Any Informatica Any Any Talend Any Any Specialist: AWS (Glue) Environments on AWS with core of Redshift, EMR Any Azure (Azure Data Factory) Environments on Azure with core of Synapse, HDInsight Any FiveTran Any Contained scope Google Environments on GCP with core of BQ, DataProc Any Matillion Any Contained scope Oracle Environments with Oracle database Any SAP SAP-only environments SAP projects
  • 39. The Evolution of the Data Platform and What It Means to Enterprise Analytic Strategy Presented by: William McKnight President, McKnight Consulting Group williammcknight www.mcknightcg.com (214) 514-1444