Companies are looking for a single database engine that can address all their varied needs—from transactional to analytical workloads, against structured, semi-structured, and unstructured data, leveraging graph, document, text search, column, key value, wide column, and relational data stores; on a single platform without the latency of data transformation and replication. They are looking for the ultimate database nirvana.
The term hybrid transactional/analytical processing (HTAP), coined by Gartner, perhaps comes closest to describing this concept. 451 Research uses the terms convergence or converged data platform. The terms multi-model or unified are also used. But can such a nirvana be achieved? Some database vendors claim to have already achieved this nirvana. In this talk we will discuss the following challenges on the path to this nirvana, for you to assess how accurate these claims are:
· What is needed for a single query engine to support all workloads?
· What does it take for that single query engine to support multiple storage engines, each serving a different need?
· Can a single query engine support all data models?
· Can it provide enterprise-caliber capabilities?
Attendees looking to assess query and storage engines would benefit from understanding what the key considerations are when picking an engine to run their targeted workloads. Also, developers working on such engines can better understand capabilities they need to provide in order to run workloads that span the HTAP spectrum.
In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing
1. In search of database nirvana
The challenges of delivering Hybrid Transaction/Analytical Processing
Rohit Jain, CTO – 2016
rohit.jain@esgyn.com
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
2. Agenda
The swinging database pendulum
Hybrid Transaction/Analytical Processing (HTAP) Workloads
Query versus storage engines
The challenges of HTAP
◦ Single query engine for all workloads
◦ Supporting multiple storage engines
◦ Same data model for all workloads
◦ Enterprise-caliber capabilities
Conclusion
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
3. RDBMS
The swinging database pendulum
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
RDBMS challenges with Big Data
• High TCO
• Lack of elastic scalability
• Did not meet performance
requirements
• No support for semi-structured &
unstructured data
• Inability to parallelize user code
• No schema flexibility
• Too complex for simple needs
NoSQL
Enter NoSQL – polyglot
programming & persistence
• Key value stores
• Wide column stores (Big Table)
• Document stores
• Text search
• Graph database
• Column stores
4. The swinging database pendulum
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
But enterprises wanted SQL
• Skills prevalent
• Existing tools & applications
• Transaction support often useful
• More efficient when joins needed
• Easier than coding MapReduce
• Merit in rigor of pre-defining columns
• Uniform metadata across applications
NoSQL
But still …
• Too many languages, interfaces, & data structures
• Too much of gluing technologies together
• Compatibility between different versions
• No end-to-end view of workload performance
• Support contracts with multiple vendors
• Too many skills required to develop and manage
• Too much data movement
• No single solution for varied interfaces & use cases
SQL
5. Hybrid Transaction/Analytical
Processing (HTAP) Workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
OLTP
• Mostly transactional
• Sub-second response
• Customer experience
• Large update volume
• Online updates
• No historical data
• High concurrency
• Scales linearly
• Normalized data model
• Custom applications or
third-party solutions
• Keyed updates/queries
• Mostly SMP; MPP for
web-scale
ODS
• Can be transactional
• Sub-second to seconds
• Customer experience or
Business internal
• Low update volume
• Batch to streaming feeds
from OLTP
• Some historical data
• Low concurrency if
internal, high otherwise
• Near linear scale
• Normalized data model
• Custom apps/3rd party
• Keyed queries
• Mostly MPP
BI
• Non-transactional
• Seconds to minutes
• Business internal
• No direct updates
• Batch to streaming feeds
from OLTP/ODS
• Historical data
• Low to high concurrency
• Less linear in scale
• Dimension data model
• BI, OLAP, ROLAP tools –
reporting and dashboards
• Ad hoc and scheduled
queries and large extracts
• Mostly MPP
Analytics
• Non-transactional
• Minutes to hours
• Business internal
• No direct updates
• Batch/aggregates from BI
• Historical and big data
• Low concurrency
• Complex queries,
nonlinear scale
• Columnar store
• Analytical tools
• Ad hoc queries; Analytics
in database
• Mostly MPP
Essential to operate the business To improve performance of the company
6. Query versus storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Hadoop Cluster
Switch Switch
Operational Business Intelligence Analytics
Query Engine
• Allow clients to connect & submit queries
• Distribute connections across cluster
• Compile query
• Execute query
• Return results of query to client
Storage Engine
• Storage structure
• Partitioning
• Automatic data repartitioning
• Select columns
• Select rows based on predicates
• Caching writes and reads
• Clustering by key
• Fast access paths or filtering
• Transactional support
• Replication
• Compression & encryption
• Mixed workload support
• Bulk data ingest/extract
• Indexing
• Colocation or node locality
• Data governance
• Security
• Disaster recovery
• Backup, archive, restore
• Multi-temperature data
support
In-memory
Single Query Engine
7. The challenges of HTAP:
Single query engine for all workloads
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
8. Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
Table
A
Table
B
Partitioned
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Salting / Partitioning (hash, range, …)
Salt key
G D C EF
Non-partitioned
G
D
C
F
E
Clustered by
Primary Key
BA C
Multi-column
clustering key
9. Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Equal-height histograms
• Unique Entry Count
• Lowest and highest values
• Multiple key / join column cardinalities
• Sampling for fast stats updates
• Incremental update stats
• Skew – equal height histograms
10. Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
80 minutes 2 minutes
Skew Buster
11. Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Week Item Store …
01/07/2016 1 1 …
01/07/2016 1 3 …
01/07/2016 1 5 …
01/07/2016 2 34 …
01/07/2016 3 13 …
01/07/2016 3 3 …
01/07/2016 4 2 …
01/07/2016 4 4 …
01/14/2016 1 2 …
01/14/2016 1 4 …
01/14/2016 1 5 …
01/14/2016 1 35 …
01/14/2016 3 1 …
01/14/2016 3 20 …
Where is item = 1, Stores 2 through 5?
• Use of various statistics to
generate an efficient plan
• Sequence of column
access for column stores
12. The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Indexes
• Kinds of indexes and how they are leveraged
• Unique index
• Transactional consistency with base table
• Impact on updates
• Updates during bulk loads
Materialized Views
• Synchronous and asynchronous maintenance
• Overhead of maintenance
• Automatic query rewrite
• User defined materialized views
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
13. Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Serial vs parallel plans
Node 1 Node 2 Node n
Client Application
HDFS
HBase
Region 1
Filters
HDFS HDFS HDFS HDFS
Ethernet
Coprocessors
HBase
Region 2
HBase
Region 3
HBase
Region 4
HBase
Region 5
Master Master
Multi-
fragment
Master
ESP ESP ESP ESP ESP
ESP ESP ESP ESP ESP
14. Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Qry1
Qry2Qry4
Qry3Qry5 Qry6
Qry7
15. Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Optimizer technology, e.g., Cascades used by
Apache Trafodion and Microsoft SQL Server
• Query plan caching for operational
• Query plan cache management
• Extensibility of optimizer to evolve with varied
workloads
• Recognizing query patterns, such as star joins
16. Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Adaptive and parallel joins
• Nested join
• Probe cache for nested join
• Merge join
• Matching partition join
• Repartitioned hash join
• Replication by broadcast hash join
• Inner / outer child broadcast
• Dimensional schema star join
• Inner join
• Left Join
• Right Join
• Full Outer Join
• Self join
Cost Premiums for nested joins or
serial plans
17. Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Compute
Cost
Execution
Environment
Physical
Properties
Estimates
Confidence
Cardinality,
Distribution,
Correlation
Sensitivity
To Estimates
Evaluate
Risk
Risk
Adjustment
Benefit
Risk
Risk Premiums
• Nested join 20%
• Merge join 10%
• Serial plan 5%
?
18. Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Scan
Scan
Join
Group by
• Data flow architecture
• No materialization of intermediate results
• Graceful overflow to disk for large memory
operations
• Efficiencies such as pre-fetch
• Fast path for operational workloads
19. Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
• Priority / SLA-based execution
• Allocation of resources by service level
• Decrease priority with usage increase
• Anti-starvation / switch between
queries based on priority
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Query
Low
Query
Medium
Queue
Memstore
HBase
….
Memstore
HBase
Memstore
HBase
Queue Queue
HBase
Region 1
HBase
Region 3
HBase
Region 5
Query
High
Low Low Low
Medium MediumMedium
High HighHighLow Low Low
Medium MediumMedium
High HighHigh
20. Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Operational workloads
• Referential integrity
• Stored procedures
• Triggers
• Various levels of transactional
isolation and consistency
• …
BI and Analytics workloads
• Materialized views
• Fast / bulk extract, transform,
load (ETL)
• OLAP, time series, statistical,
data mining, and other functions
• …
Needed by both
• Scalar and table mapping UDFs
• Inner, outer, and full outer joins
• Un-nesting of subqueries
• Converting correlated subqueries to joins
• Predicate push down
• Sort avoidance strategies
• Constant folding
• Recursive union
• …
21. The challenges of HTAP:
Supporting multiple storage engines
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
22. Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Storage engine statistics, used by query engine
• Sampling
• Access to changed data for incremental updates
• Update counters to determine refresh schedule
Refresh
23. The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
BA CMulti-column key
Query
Engine
Storage
Engine
A+B+C Single clustering key
Random single row and range
access for operational workloads
31 5
51 7
22 4
22 9
32 4
42 1
23 1
23 2
A=2
range access
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
24. Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Data partitioning across disks and nodes
• Hash, range, or combination
• Salting support
• Query engine imposed salting
• Repartitioning as the cluster expands/contracts
• Read/write access while being rebalanced
• Localize data access to avoid shuffling
CREATE TABLE t(a integer not null primary key, b
integer) SALT USING 4 PARTITIONS;
HBase
Region
HDFS
HBase
Region
HDFS
HBase
Region
HDFS
HBase
Region
HDFS
INSERT(s) SELECT(s)
PART 1 PART 2 PART 3 PART 4
25. Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
The challenges of HTAP
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Data types supported
• Query to storage engine data
type mapping
• Value constraint enforcement
CHARACTER(n) Character string. Fixed-length n
VARCHAR(n) or
CHARACTER VARYING(n)
Character string. Variable length. Maximum length n
BINARY(n) Binary string. Fixed-length n
BOOLEAN Stores TRUE or FALSE values
VARBINARY(n) or
BINARY VARYING(n)
Binary string. Variable length. Maximum length n
INTEGER(p) Integer numerical (no decimal). Precision p
SMALLINT Integer numerical (no decimal). Precision 5
INTEGER Integer numerical (no decimal). Precision 10
BIGINT Integer numerical (no decimal). Precision 19
DECIMAL(p,s) Exact numerical, precision p, scale s. Example:
decimal(5,2) is a number that has 3 digits before the
decimal and 2 digits after the decimal
NUMERIC(p,s) Exact numerical, precision p, scale s. (Same as DECIMAL)
FLOAT(p) Approximate numerical, mantissa precision p. A floating
number in base 10 exponential notation. The size
argument for this type consists of a single number
specifying the minimum precision
REAL Approximate numerical, mantissa precision 7
FLOAT Approximate numerical, mantissa precision 16
DOUBLE PRECISION Approximate numerical, mantissa precision 16
DATE Stores year, month, and day values
TIME Stores hour, minute, and second values
TIMESTAMP Stores year, month, day, hour, minute, and second values
INTERVAL Composed of a number of integer fields, representing a
period of time, depending on the type of interval
ARRAY A set-length and ordered collection of elements
26. Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
The challenges of HTAP
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Data types supported
• Query to storage engine data
type mapping
• Value constraint enforcement
• Referential constraints
• Character sets
• Collations
• Compression
• Encryption
27. Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Projection at storage or query engine level
• Predicates evaluated by query and storage engines
• Predicates applied to compressed data
• Multi-column predicates
• IN lists; size of IN lists
• Multiple predicates with ORs and ANDs (pushdown)
• Evaluate predicates in sequence of filtering effectiveness
• Predicates comparing different columns of same table
• Complex expression evaluation
• Evaluation of functions
• Default or missing values on retrieval
C2C1 C3
G1 7
R2 4
F2 9
T2 4
B2 1
.... ..
C5C4 C6
23 T
15 F
57 R
89 M
82 N
.... ..
project
28. Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Server side extensibility e.g. HBase coprocessors
or Cassandra triggers to push down:
• Complex predicate evaluation with expressions
and functions
• Pre-aggregation
• Collocated joins or index maintenance
• Transactional support
• Security enforcement
• Some ANSI Trigger actions
29. The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Mapping of security frameworks for the query and
storage engines to enforce ANSI SQL security
• Integration with underlying Hadoop Kerberos security
• Integration with security solutions, like Sentry or Ranger
• Integration with security logging and SIEM solutions
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
30. Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Replication for high availability, backup and restore, and
multi-data center support from query & storage engines
• ACID or BASE transactional support
• Integration between the query and storage engines,
such as write ahead logs, and use of coprocessors
• Completely scalable and distributed transaction
management architecture
• Multi datacenter support – active-active single or
multiple master replication
• Overhead of transactions on throughput and system
resources
• Online backup and point in time recovery
31. The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Single-Master Multiple-Masters
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
32. The challenges of HTAP:
Supporting multiple storage engines
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
Time
Full transactionally consistent snapshot
Snapshots after non-transactional
changes such as bulk loads
Transactional changes
captured continuously
Point-in-time recovery
33. The challenges of HTAP:
Supporting multiple storage engines
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
Point-in-time recovery
Time
Drop table or erroneous
large transactional update
Restore previous
full snapshot
Initiate recovery
to point-in-time
34. Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Mapping storage to query engine metadata
• Handling storage engine specific options
• Support provided for external tables
• Changes to external tables outside of the query engine
• Operational vs. analytics objects
35. The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
As nodes are added query
engine immediately uses them
for queries and transactions
Storage engine rebalances
data automatically
• Transactional consistency across bulk loads
• Rowset inserts and selects
• Fast scanning options – snapshot scans, prefetching
• Integration for parallel operations
• Concurrency and mixed workload capability
• Elastic scale for Cloud deployments
36. The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Storage and query engine error logging
• Mapping of storage engine errors to meaningful error
messages and resolution options by the query engine
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
37. Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Minimize operational and performance impact of storage
engine operational aspects, e.g., compaction or splitting
38. The challenges of HTAP:
Same data model for all workloads …
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Normal Form
Normal form
• 1NF
• 2NF
• 3NF
• BCNF
• 4NF
• 5NF
• 6NF
Star Schema
Snowflake Schema
Query engine integration with storage
engine(s) to support all these data models
39. The challenges of HTAP:
Same data model for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Normal form
• 1NF
• 2NF
• 3NF
• BCNF
• 4NF
• 5NF
• 6NF
Star Schema
Snowflake Schema
Query engine integration with storage
engine(s) to support all these data models
40. The challenges of HTAP:
Same data model for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
NoSQL Data Models
“NoSQL Data Modeling Techniques”
by Ilya Katsov
Highly Scalable Blog
… and these!
41. The challenges of HTAP:
Enterprise-caliber capabilities
High Availability
Security
Manageability
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
42. The challenges of HTAP:
Enterprise-caliber capabilities
High Availability
Security
Manageability
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Percentage of uptime 99.99% = 52.56 minutes
downtime to 99.999% = 5.26
• Online operations (data available for reads and writes)
o Upgrading the OS
o Upgrading the file system
o Upgrading the storage engine
o Upgrading the query engine
o Redistribute data to accommodate node and/or disk
expansions and contractions
o Changing table definition, e.g., data type changes,
and adding, dropping, renaming columns
o Create/drop secondary indexes
o Full and incremental backups
43. The challenges of HTAP:
Enterprise-caliber capabilities
High Availability
Security
Manageability
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
44. The challenges of HTAP:
Enterprise-caliber capabilities
High Availability
Security
Manageability
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Schema Management Performance Management Monitoring Security Management BAR Management
Object Management Performance Monitoring Database Monitor User Management Backup Analysis
Graphical Object Editor Live Performance Monitoring Event Monitoring Role Management Recovery
Cross-Platform Schema Knowledge Data Repository Live Event Monitoring Account Migration Log Backup
Bottleneck Analysis Threshold Alerts Audit Report Backup Reports
SQL Management Job/Workload Analysis Health Index Alarm Archival
Query Builder Job/Workload Wizard Live Health Monitoring
Visual Difference Tool Job/Workload Management Response Times Maintenance Configuration Management
Data Management Live Job/Workload
Monitoring
Alert Center Repository Aging OS Provisioning
Data Migration OS Analysis Remote Monitoring Automated Maintenance Cluster Provisioning
SQL Profiler Capacity Capture Central Monitoring Instance Provisioning
Automated Import Capacity Trending Hardware Inventory Change Management Cloud Provisioning
Visual Explain Plans Capacity Forecast Hardware Monitoring Schema Capture Configuration Editor
Session Management Space Management Schema Compare and Synch
Lock Management Reorganization Management Troubleshooting Notifications
Process Management Query Cost Simulation Health Analysis Schema Rotation
Consistency Checks Historical Reports Problem Correlation Collaboration
Online Schema Evolution Bottleneck Tuning Automated Actions Virtual Changes
Built-In Automation Access Path Analysis
45. The challenges of HTAP:
Enterprise-caliber capabilities
High Availability
Security
Manageability
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Operational performance by transactions per second
• Analytical performance by query
• Overhead of gathering metrics on operational and analytical workloads
• Configurable statistics collection
• Workload management by Service Level Objectives
o Based on priority and/or resource allocation
o High priority operational workloads vs analytical workloads
• End-to-end visibility of transaction and query metrics
• Metric breakdown down to the query operation
• Metrics for table access across workloads down to the partition level
• Skew or bottlenecks
• Integration with YARN
46. Conclusion
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Detailed O’Reilly report:
http://www.oreilly.com/data/free/in
-search-of-database-nirvana.csp
It ain’t easy!!
Very few products can even come close