SlideShare uma empresa Scribd logo
1 de 46
In search of database nirvana
The challenges of delivering Hybrid Transaction/Analytical Processing
Rohit Jain, CTO – 2016
rohit.jain@esgyn.com
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Agenda
The swinging database pendulum
Hybrid Transaction/Analytical Processing (HTAP) Workloads
Query versus storage engines
The challenges of HTAP
◦ Single query engine for all workloads
◦ Supporting multiple storage engines
◦ Same data model for all workloads
◦ Enterprise-caliber capabilities
Conclusion
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
RDBMS
The swinging database pendulum
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
RDBMS challenges with Big Data
• High TCO
• Lack of elastic scalability
• Did not meet performance
requirements
• No support for semi-structured &
unstructured data
• Inability to parallelize user code
• No schema flexibility
• Too complex for simple needs
NoSQL
Enter NoSQL – polyglot
programming & persistence
• Key value stores
• Wide column stores (Big Table)
• Document stores
• Text search
• Graph database
• Column stores
The swinging database pendulum
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
But enterprises wanted SQL
• Skills prevalent
• Existing tools & applications
• Transaction support often useful
• More efficient when joins needed
• Easier than coding MapReduce
• Merit in rigor of pre-defining columns
• Uniform metadata across applications
NoSQL
But still …
• Too many languages, interfaces, & data structures
• Too much of gluing technologies together
• Compatibility between different versions
• No end-to-end view of workload performance
• Support contracts with multiple vendors
• Too many skills required to develop and manage
• Too much data movement
• No single solution for varied interfaces & use cases
SQL
Hybrid Transaction/Analytical
Processing (HTAP) Workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
OLTP
• Mostly transactional
• Sub-second response
• Customer experience
• Large update volume
• Online updates
• No historical data
• High concurrency
• Scales linearly
• Normalized data model
• Custom applications or
third-party solutions
• Keyed updates/queries
• Mostly SMP; MPP for
web-scale
ODS
• Can be transactional
• Sub-second to seconds
• Customer experience or
Business internal
• Low update volume
• Batch to streaming feeds
from OLTP
• Some historical data
• Low concurrency if
internal, high otherwise
• Near linear scale
• Normalized data model
• Custom apps/3rd party
• Keyed queries
• Mostly MPP
BI
• Non-transactional
• Seconds to minutes
• Business internal
• No direct updates
• Batch to streaming feeds
from OLTP/ODS
• Historical data
• Low to high concurrency
• Less linear in scale
• Dimension data model
• BI, OLAP, ROLAP tools –
reporting and dashboards
• Ad hoc and scheduled
queries and large extracts
• Mostly MPP
Analytics
• Non-transactional
• Minutes to hours
• Business internal
• No direct updates
• Batch/aggregates from BI
• Historical and big data
• Low concurrency
• Complex queries,
nonlinear scale
• Columnar store
• Analytical tools
• Ad hoc queries; Analytics
in database
• Mostly MPP
Essential to operate the business To improve performance of the company
Query versus storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Hadoop Cluster
Switch Switch
Operational Business Intelligence Analytics
Query Engine
• Allow clients to connect & submit queries
• Distribute connections across cluster
• Compile query
• Execute query
• Return results of query to client
Storage Engine
• Storage structure
• Partitioning
• Automatic data repartitioning
• Select columns
• Select rows based on predicates
• Caching writes and reads
• Clustering by key
• Fast access paths or filtering
• Transactional support
• Replication
• Compression & encryption
• Mixed workload support
• Bulk data ingest/extract
• Indexing
• Colocation or node locality
• Data governance
• Security
• Disaster recovery
• Backup, archive, restore
• Multi-temperature data
support
In-memory
Single Query Engine
The challenges of HTAP:
Single query engine for all workloads
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
Table
A
Table
B
Partitioned
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Salting / Partitioning (hash, range, …)
Salt key
G D C EF
Non-partitioned
G
D
C
F
E
Clustered by
Primary Key
BA C
Multi-column
clustering key
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Equal-height histograms
• Unique Entry Count
• Lowest and highest values
• Multiple key / join column cardinalities
• Sampling for fast stats updates
• Incremental update stats
• Skew – equal height histograms
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
80 minutes 2 minutes
Skew Buster
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Week Item Store …
01/07/2016 1 1 …
01/07/2016 1 3 …
01/07/2016 1 5 …
01/07/2016 2 34 …
01/07/2016 3 13 …
01/07/2016 3 3 …
01/07/2016 4 2 …
01/07/2016 4 4 …
01/14/2016 1 2 …
01/14/2016 1 4 …
01/14/2016 1 5 …
01/14/2016 1 35 …
01/14/2016 3 1 …
01/14/2016 3 20 …
Where is item = 1, Stores 2 through 5?
• Use of various statistics to
generate an efficient plan
• Sequence of column
access for column stores
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Indexes
• Kinds of indexes and how they are leveraged
• Unique index
• Transactional consistency with base table
• Impact on updates
• Updates during bulk loads
Materialized Views
• Synchronous and asynchronous maintenance
• Overhead of maintenance
• Automatic query rewrite
• User defined materialized views
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Serial vs parallel plans
Node 1 Node 2 Node n
Client Application
HDFS
HBase
Region 1
Filters
HDFS HDFS HDFS HDFS
Ethernet
Coprocessors
HBase
Region 2
HBase
Region 3
HBase
Region 4
HBase
Region 5
Master Master
Multi-
fragment
Master
ESP ESP ESP ESP ESP
ESP ESP ESP ESP ESP
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Qry1
Qry2Qry4
Qry3Qry5 Qry6
Qry7
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Optimizer technology, e.g., Cascades used by
Apache Trafodion and Microsoft SQL Server
• Query plan caching for operational
• Query plan cache management
• Extensibility of optimizer to evolve with varied
workloads
• Recognizing query patterns, such as star joins
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Adaptive and parallel joins
• Nested join
• Probe cache for nested join
• Merge join
• Matching partition join
• Repartitioned hash join
• Replication by broadcast hash join
• Inner / outer child broadcast
• Dimensional schema star join
• Inner join
• Left Join
• Right Join
• Full Outer Join
• Self join
Cost Premiums for nested joins or
serial plans
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Compute
Cost
Execution
Environment
Physical
Properties
Estimates
Confidence
Cardinality,
Distribution,
Correlation
Sensitivity
To Estimates
Evaluate
Risk
Risk
Adjustment
Benefit
Risk
Risk Premiums
• Nested join 20%
• Merge join 10%
• Serial plan 5%


?
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Scan
Scan
Join
Group by
• Data flow architecture
• No materialization of intermediate results
• Graceful overflow to disk for large memory
operations
• Efficiencies such as pre-fetch
• Fast path for operational workloads
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
• Priority / SLA-based execution
• Allocation of resources by service level
• Decrease priority with usage increase
• Anti-starvation / switch between
queries based on priority
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Query
Low
Query
Medium
Queue
Memstore
HBase
….
Memstore
HBase
Memstore
HBase
Queue Queue
HBase
Region 1
HBase
Region 3
HBase
Region 5
Query
High
Low Low Low
Medium MediumMedium
High HighHighLow Low Low
Medium MediumMedium
High HighHigh
Data structure – key support, clustering, partitioning
Statistics
Predicates on non-leading or non-key columns
Indexes and materialized views
Degree of parallelism
Reducing the search space
Join type
Data flow and access
Mixed workload
Feature support
The challenges of HTAP:
Single query engine for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Operational workloads
• Referential integrity
• Stored procedures
• Triggers
• Various levels of transactional
isolation and consistency
• …
BI and Analytics workloads
• Materialized views
• Fast / bulk extract, transform,
load (ETL)
• OLAP, time series, statistical,
data mining, and other functions
• …
Needed by both
• Scalar and table mapping UDFs
• Inner, outer, and full outer joins
• Un-nesting of subqueries
• Converting correlated subqueries to joins
• Predicate push down
• Sort avoidance strategies
• Constant folding
• Recursive union
• …
The challenges of HTAP:
Supporting multiple storage engines
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Storage engine statistics, used by query engine
• Sampling
• Access to changed data for incremental updates
• Update counters to determine refresh schedule
Refresh
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
BA CMulti-column key
Query
Engine
Storage
Engine
A+B+C Single clustering key
Random single row and range
access for operational workloads
31 5
51 7
22 4
22 9
32 4
42 1
23 1
23 2
A=2
range access
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Data partitioning across disks and nodes
• Hash, range, or combination
• Salting support
• Query engine imposed salting
• Repartitioning as the cluster expands/contracts
• Read/write access while being rebalanced
• Localize data access to avoid shuffling
CREATE TABLE t(a integer not null primary key, b
integer) SALT USING 4 PARTITIONS;
HBase
Region
HDFS
HBase
Region
HDFS
HBase
Region
HDFS
HBase
Region
HDFS
INSERT(s) SELECT(s)
PART 1 PART 2 PART 3 PART 4
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
The challenges of HTAP
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Data types supported
• Query to storage engine data
type mapping
• Value constraint enforcement
CHARACTER(n) Character string. Fixed-length n
VARCHAR(n) or
CHARACTER VARYING(n)
Character string. Variable length. Maximum length n
BINARY(n) Binary string. Fixed-length n
BOOLEAN Stores TRUE or FALSE values
VARBINARY(n) or
BINARY VARYING(n)
Binary string. Variable length. Maximum length n
INTEGER(p) Integer numerical (no decimal). Precision p
SMALLINT Integer numerical (no decimal). Precision 5
INTEGER Integer numerical (no decimal). Precision 10
BIGINT Integer numerical (no decimal). Precision 19
DECIMAL(p,s) Exact numerical, precision p, scale s. Example:
decimal(5,2) is a number that has 3 digits before the
decimal and 2 digits after the decimal
NUMERIC(p,s) Exact numerical, precision p, scale s. (Same as DECIMAL)
FLOAT(p) Approximate numerical, mantissa precision p. A floating
number in base 10 exponential notation. The size
argument for this type consists of a single number
specifying the minimum precision
REAL Approximate numerical, mantissa precision 7
FLOAT Approximate numerical, mantissa precision 16
DOUBLE PRECISION Approximate numerical, mantissa precision 16
DATE Stores year, month, and day values
TIME Stores hour, minute, and second values
TIMESTAMP Stores year, month, day, hour, minute, and second values
INTERVAL Composed of a number of integer fields, representing a
period of time, depending on the type of interval
ARRAY A set-length and ordered collection of elements
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
The challenges of HTAP
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Data types supported
• Query to storage engine data
type mapping
• Value constraint enforcement
• Referential constraints
• Character sets
• Collations
• Compression
• Encryption
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Projection at storage or query engine level
• Predicates evaluated by query and storage engines
• Predicates applied to compressed data
• Multi-column predicates
• IN lists; size of IN lists
• Multiple predicates with ORs and ANDs (pushdown)
• Evaluate predicates in sequence of filtering effectiveness
• Predicates comparing different columns of same table
• Complex expression evaluation
• Evaluation of functions
• Default or missing values on retrieval
C2C1 C3
G1 7
R2 4
F2 9
T2 4
B2 1
.... ..
C5C4 C6
23 T
15 F
57 R
89 M
82 N
.... ..
project
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Server side extensibility e.g. HBase coprocessors
or Cassandra triggers to push down:
• Complex predicate evaluation with expressions
and functions
• Pre-aggregation
• Collocated joins or index maintenance
• Transactional support
• Security enforcement
• Some ANSI Trigger actions
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Mapping of security frameworks for the query and
storage engines to enforce ANSI SQL security
• Integration with underlying Hadoop Kerberos security
• Integration with security solutions, like Sentry or Ranger
• Integration with security logging and SIEM solutions
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Replication for high availability, backup and restore, and
multi-data center support from query & storage engines
• ACID or BASE transactional support
• Integration between the query and storage engines,
such as write ahead logs, and use of coprocessors
• Completely scalable and distributed transaction
management architecture
• Multi datacenter support – active-active single or
multiple master replication
• Overhead of transactions on throughput and system
resources
• Online backup and point in time recovery
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Single-Master Multiple-Masters
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
The challenges of HTAP:
Supporting multiple storage engines
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
Time
Full transactionally consistent snapshot
Snapshots after non-transactional
changes such as bulk loads
Transactional changes
captured continuously
Point-in-time recovery
The challenges of HTAP:
Supporting multiple storage engines
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
Point-in-time recovery
Time
Drop table or erroneous
large transactional update
Restore previous
full snapshot
Initiate recovery
to point-in-time
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Mapping storage to query engine metadata
• Handling storage engine specific options
• Support provided for external tables
• Changes to external tables outside of the query engine
• Operational vs. analytics objects
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
As nodes are added query
engine immediately uses them
for queries and transactions
Storage engine rebalances
data automatically
• Transactional consistency across bulk loads
• Rowset inserts and selects
• Fast scanning options – snapshot scans, prefetching
• Integration for parallel operations
• Concurrency and mixed workload capability
• Elastic scale for Cloud deployments
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Storage and query engine error logging
• Mapping of storage engine errors to meaningful error
messages and resolution options by the query engine
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
Statistics
Key structure
Partitioning
Data type support
Projection and selection
Extensibility
Security enforcement
Transaction management
Metadata support
Performance, scale, and
concurrency considerations
Error handling
Other operational aspects
The challenges of HTAP:
Supporting multiple storage engines
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Minimize operational and performance impact of storage
engine operational aspects, e.g., compaction or splitting
The challenges of HTAP:
Same data model for all workloads …
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Normal Form
Normal form
• 1NF
• 2NF
• 3NF
• BCNF
• 4NF
• 5NF
• 6NF
Star Schema
Snowflake Schema
Query engine integration with storage
engine(s) to support all these data models
The challenges of HTAP:
Same data model for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Normal form
• 1NF
• 2NF
• 3NF
• BCNF
• 4NF
• 5NF
• 6NF
Star Schema
Snowflake Schema
Query engine integration with storage
engine(s) to support all these data models
The challenges of HTAP:
Same data model for all workloads
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
NoSQL Data Models
“NoSQL Data Modeling Techniques”
by Ilya Katsov
Highly Scalable Blog
… and these!
The challenges of HTAP:
Enterprise-caliber capabilities
High Availability
Security
Manageability
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
The challenges of HTAP:
Enterprise-caliber capabilities
High Availability
Security
Manageability
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Percentage of uptime 99.99% = 52.56 minutes
downtime to 99.999% = 5.26
• Online operations (data available for reads and writes)
o Upgrading the OS
o Upgrading the file system
o Upgrading the storage engine
o Upgrading the query engine
o Redistribute data to accommodate node and/or disk
expansions and contractions
o Changing table definition, e.g., data type changes,
and adding, dropping, renaming columns
o Create/drop secondary indexes
o Full and incremental backups
The challenges of HTAP:
Enterprise-caliber capabilities
High Availability
Security
Manageability
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
The challenges of HTAP:
Enterprise-caliber capabilities
High Availability
Security
Manageability
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Schema Management Performance Management Monitoring Security Management BAR Management
Object Management Performance Monitoring Database Monitor User Management Backup Analysis
Graphical Object Editor Live Performance Monitoring Event Monitoring Role Management Recovery
Cross-Platform Schema Knowledge Data Repository Live Event Monitoring Account Migration Log Backup
Bottleneck Analysis Threshold Alerts Audit Report Backup Reports
SQL Management Job/Workload Analysis Health Index Alarm Archival
Query Builder Job/Workload Wizard Live Health Monitoring
Visual Difference Tool Job/Workload Management Response Times Maintenance Configuration Management
Data Management Live Job/Workload
Monitoring
Alert Center Repository Aging OS Provisioning
Data Migration OS Analysis Remote Monitoring Automated Maintenance Cluster Provisioning
SQL Profiler Capacity Capture Central Monitoring Instance Provisioning
Automated Import Capacity Trending Hardware Inventory Change Management Cloud Provisioning
Visual Explain Plans Capacity Forecast Hardware Monitoring Schema Capture Configuration Editor
Session Management Space Management Schema Compare and Synch
Lock Management Reorganization Management Troubleshooting Notifications
Process Management Query Cost Simulation Health Analysis Schema Rotation
Consistency Checks Historical Reports Problem Correlation Collaboration
Online Schema Evolution Bottleneck Tuning Automated Actions Virtual Changes
Built-In Automation Access Path Analysis
The challenges of HTAP:
Enterprise-caliber capabilities
High Availability
Security
Manageability
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
• Operational performance by transactions per second
• Analytical performance by query
• Overhead of gathering metrics on operational and analytical workloads
• Configurable statistics collection
• Workload management by Service Level Objectives
o Based on priority and/or resource allocation
o High priority operational workloads vs analytical workloads
• End-to-end visibility of transaction and query metrics
• Metric breakdown down to the query operation
• Metrics for table access across workloads down to the partition level
• Skew or bottlenecks
• Integration with YARN
Conclusion
(C) Copyright 2015 Esgyn Corporation Esgyn Confidential
Detailed O’Reilly report:
http://www.oreilly.com/data/free/in
-search-of-database-nirvana.csp
It ain’t easy!!
Very few products can even come close

Mais conteúdo relacionado

Mais procurados

Ml infra at an early stage
Ml infra at an early stageMl infra at an early stage
Ml infra at an early stageNick Handel
 
On-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudOn-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudMarin Dimitrov
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion EngineAdam Doyle
 
Empowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark StreamingEmpowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark StreamingDatabricks
 
In-Memory Oracle BI Applications (UKOUG Analytics Event, July 2013)
In-Memory Oracle BI Applications (UKOUG Analytics Event, July 2013)In-Memory Oracle BI Applications (UKOUG Analytics Event, July 2013)
In-Memory Oracle BI Applications (UKOUG Analytics Event, July 2013)Mark Rittman
 
Feature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleFeature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleNoriaki Tatsumi
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceMarin Dimitrov
 
Manage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repositoryManage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repositorySynaltic Group
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Databricks
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists jlacefie
 
Talend AS A Product
Talend AS A ProductTalend AS A Product
Talend AS A ProductAbdul Manaf
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineeringThang Bui (Bob)
 
Driving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data AssetsDriving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data AssetsEmbarcadero Technologies
 
From discovering to trusting data
From discovering to trusting dataFrom discovering to trusting data
From discovering to trusting datamarkgrover
 
Talend Open Studio Data Integration
Talend Open Studio Data IntegrationTalend Open Studio Data Integration
Talend Open Studio Data IntegrationRoberto Marchetto
 
GraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL QueriesGraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL QueriesMarin Dimitrov
 
Introduction to Columnstore Indexes
Introduction to Columnstore IndexesIntroduction to Columnstore Indexes
Introduction to Columnstore IndexesJason Strate
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseDatabricks
 

Mais procurados (20)

Data migration services
Data migration servicesData migration services
Data migration services
 
Ml infra at an early stage
Ml infra at an early stageMl infra at an early stage
Ml infra at an early stage
 
On-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the CloudOn-Demand RDF Graph Databases in the Cloud
On-Demand RDF Graph Databases in the Cloud
 
Data Ingestion Engine
Data Ingestion EngineData Ingestion Engine
Data Ingestion Engine
 
Empowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark StreamingEmpowering Real Time Patient Care Through Spark Streaming
Empowering Real Time Patient Care Through Spark Streaming
 
In-Memory Oracle BI Applications (UKOUG Analytics Event, July 2013)
In-Memory Oracle BI Applications (UKOUG Analytics Event, July 2013)In-Memory Oracle BI Applications (UKOUG Analytics Event, July 2013)
In-Memory Oracle BI Applications (UKOUG Analytics Event, July 2013)
 
Feature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scaleFeature drift monitoring as a service for machine learning models at scale
Feature drift monitoring as a service for machine learning models at scale
 
DataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-ServiceDataGraft Platform: RDF Database-as-a-Service
DataGraft Platform: RDF Database-as-a-Service
 
Manage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repositoryManage tracability with Apache Atlas, a flexible metadata repository
Manage tracability with Apache Atlas, a flexible metadata repository
 
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
Building a MLOps Platform Around MLflow to Enable Model Productionalization i...
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists
 
Talend AS A Product
Talend AS A ProductTalend AS A Product
Talend AS A Product
 
Demystifying data engineering
Demystifying data engineeringDemystifying data engineering
Demystifying data engineering
 
Driving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data AssetsDriving Business Value Through Agile Data Assets
Driving Business Value Through Agile Data Assets
 
From discovering to trusting data
From discovering to trusting dataFrom discovering to trusting data
From discovering to trusting data
 
Talend Open Studio Data Integration
Talend Open Studio Data IntegrationTalend Open Studio Data Integration
Talend Open Studio Data Integration
 
GraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL QueriesGraphDB Connectors – Powering Complex SPARQL Queries
GraphDB Connectors – Powering Complex SPARQL Queries
 
Introduction to Columnstore Indexes
Introduction to Columnstore IndexesIntroduction to Columnstore Indexes
Introduction to Columnstore Indexes
 
Where to Start ETL Developer Career
Where to Start ETL Developer CareerWhere to Start ETL Developer Career
Where to Start ETL Developer Career
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 

Destaque

Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overviewRohit Jain
 
ReferenceLetter
ReferenceLetterReferenceLetter
ReferenceLetterAnqi Tong
 
Decreto 067 modifica decreto 065
Decreto 067 modifica decreto 065Decreto 067 modifica decreto 065
Decreto 067 modifica decreto 065Remedios Antioquia
 
Ana's Blog
Ana's BlogAna's Blog
Ana's BlogAna_Ruiz
 
How to Create Non Traditional Employee Engagement
How to Create Non Traditional Employee EngagementHow to Create Non Traditional Employee Engagement
How to Create Non Traditional Employee EngagementAlex Putman
 
Güthle product catalogue
Güthle product catalogueGüthle product catalogue
Güthle product catalogueKUNAL BANKAPUR
 
Letter of reference JBucchi
Letter of reference JBucchiLetter of reference JBucchi
Letter of reference JBucchiJeff Bucchi
 
Evolution du crâne chez les vertébrés
Evolution du crâne chez les vertébrésEvolution du crâne chez les vertébrés
Evolution du crâne chez les vertébrésHassan NAIT-SI
 
N. ren
N. renN. ren
N. renNCS
 
15 secondary sedimentation
15 secondary sedimentation15 secondary sedimentation
15 secondary sedimentationAkepati S. Reddy
 
Artificial Heart Valves presentation
Artificial Heart Valves presentationArtificial Heart Valves presentation
Artificial Heart Valves presentationKhade Grant
 

Destaque (13)

Trafodion overview
Trafodion overviewTrafodion overview
Trafodion overview
 
ReferenceLetter
ReferenceLetterReferenceLetter
ReferenceLetter
 
Decreto 067 modifica decreto 065
Decreto 067 modifica decreto 065Decreto 067 modifica decreto 065
Decreto 067 modifica decreto 065
 
Declaración jurada de Javier Tineo
Declaración jurada de Javier TineoDeclaración jurada de Javier Tineo
Declaración jurada de Javier Tineo
 
Ana's Blog
Ana's BlogAna's Blog
Ana's Blog
 
How to Create Non Traditional Employee Engagement
How to Create Non Traditional Employee EngagementHow to Create Non Traditional Employee Engagement
How to Create Non Traditional Employee Engagement
 
Manual electrolux frigorífico en3853mox
Manual electrolux   frigorífico en3853moxManual electrolux   frigorífico en3853mox
Manual electrolux frigorífico en3853mox
 
Güthle product catalogue
Güthle product catalogueGüthle product catalogue
Güthle product catalogue
 
Letter of reference JBucchi
Letter of reference JBucchiLetter of reference JBucchi
Letter of reference JBucchi
 
Evolution du crâne chez les vertébrés
Evolution du crâne chez les vertébrésEvolution du crâne chez les vertébrés
Evolution du crâne chez les vertébrés
 
N. ren
N. renN. ren
N. ren
 
15 secondary sedimentation
15 secondary sedimentation15 secondary sedimentation
15 secondary sedimentation
 
Artificial Heart Valves presentation
Artificial Heart Valves presentationArtificial Heart Valves presentation
Artificial Heart Valves presentation
 

Semelhante a In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

In Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAPIn Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAPHBaseCon
 
Kushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Singh
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketDremio Corporation
 
MariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB plc
 
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB plc
 
Driving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle AnalyticsDriving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle AnalyticsPerficient, Inc.
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?RTTS
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...Cambridge Semantics
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web developmentTung Nguyen
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachKent Graziano
 
Delivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analyticsDelivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analyticsMariaDB plc
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
Monitorama: How monitoring can improve the rest of the company
Monitorama: How monitoring can improve the rest of the companyMonitorama: How monitoring can improve the rest of the company
Monitorama: How monitoring can improve the rest of the companyJeff Weinstein
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jrJonathan Raspaud
 
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Victor Holman
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIAmazon Web Services
 
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan PachenkoPGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan PachenkoEqunix Business Solutions
 
Fast, Powerful and Scalable Analytics
Fast, Powerful and Scalable AnalyticsFast, Powerful and Scalable Analytics
Fast, Powerful and Scalable AnalyticsMariaDB plc
 
Multi Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and TelliusMulti Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and Telliusdatamantra
 

Semelhante a In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing (20)

In Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAPIn Search of Database Nirvana: Challenges of Delivering HTAP
In Search of Database Nirvana: Challenges of Delivering HTAP
 
Kushal Data Warehousing PPT
Kushal Data Warehousing PPTKushal Data Warehousing PPT
Kushal Data Warehousing PPT
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 
MariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStoreMariaDB AX: Solución analítica con ColumnStore
MariaDB AX: Solución analítica con ColumnStore
 
MariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStoreMariaDB AX: Analytics with MariaDB ColumnStore
MariaDB AX: Analytics with MariaDB ColumnStore
 
Driving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle AnalyticsDriving Digital Transformation with Machine Learning in Oracle Analytics
Driving Digital Transformation with Machine Learning in Oracle Analytics
 
What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?What is a Data Warehouse and How Do I Test It?
What is a Data Warehouse and How Do I Test It?
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile ApproachUsing OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
Using OBIEE and Data Vault to Virtualize Your BI Environment: An Agile Approach
 
Delivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analyticsDelivering fast, powerful and scalable analytics
Delivering fast, powerful and scalable analytics
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
Monitorama: How monitoring can improve the rest of the company
Monitorama: How monitoring can improve the rest of the companyMonitorama: How monitoring can improve the rest of the company
Monitorama: How monitoring can improve the rest of the company
 
Big and fast data strategy 2017 jr
Big and fast data strategy 2017 jrBig and fast data strategy 2017 jr
Big and fast data strategy 2017 jr
 
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...Choosing the Right Business Intelligence Tools for Your Data and Architectura...
Choosing the Right Business Intelligence Tools for Your Data and Architectura...
 
Initiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AIInitiate Edinburgh 2019 - Big Data Meets AI
Initiate Edinburgh 2019 - Big Data Meets AI
 
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan PachenkoPGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
 
Fast, Powerful and Scalable Analytics
Fast, Powerful and Scalable AnalyticsFast, Powerful and Scalable Analytics
Fast, Powerful and Scalable Analytics
 
Multi Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and TelliusMulti Source Data Analysis using Spark and Tellius
Multi Source Data Analysis using Spark and Tellius
 

Último

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 

Último (20)

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 

In search of database nirvana - The challenges of delivering Hybrid Transaction/Analytical Processing

  • 1. In search of database nirvana The challenges of delivering Hybrid Transaction/Analytical Processing Rohit Jain, CTO – 2016 rohit.jain@esgyn.com (C) Copyright 2015 Esgyn Corporation Esgyn Confidential
  • 2. Agenda The swinging database pendulum Hybrid Transaction/Analytical Processing (HTAP) Workloads Query versus storage engines The challenges of HTAP ◦ Single query engine for all workloads ◦ Supporting multiple storage engines ◦ Same data model for all workloads ◦ Enterprise-caliber capabilities Conclusion (C) Copyright 2015 Esgyn Corporation Esgyn Confidential
  • 3. RDBMS The swinging database pendulum (C) Copyright 2015 Esgyn Corporation Esgyn Confidential RDBMS challenges with Big Data • High TCO • Lack of elastic scalability • Did not meet performance requirements • No support for semi-structured & unstructured data • Inability to parallelize user code • No schema flexibility • Too complex for simple needs NoSQL Enter NoSQL – polyglot programming & persistence • Key value stores • Wide column stores (Big Table) • Document stores • Text search • Graph database • Column stores
  • 4. The swinging database pendulum (C) Copyright 2015 Esgyn Corporation Esgyn Confidential But enterprises wanted SQL • Skills prevalent • Existing tools & applications • Transaction support often useful • More efficient when joins needed • Easier than coding MapReduce • Merit in rigor of pre-defining columns • Uniform metadata across applications NoSQL But still … • Too many languages, interfaces, & data structures • Too much of gluing technologies together • Compatibility between different versions • No end-to-end view of workload performance • Support contracts with multiple vendors • Too many skills required to develop and manage • Too much data movement • No single solution for varied interfaces & use cases SQL
  • 5. Hybrid Transaction/Analytical Processing (HTAP) Workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential OLTP • Mostly transactional • Sub-second response • Customer experience • Large update volume • Online updates • No historical data • High concurrency • Scales linearly • Normalized data model • Custom applications or third-party solutions • Keyed updates/queries • Mostly SMP; MPP for web-scale ODS • Can be transactional • Sub-second to seconds • Customer experience or Business internal • Low update volume • Batch to streaming feeds from OLTP • Some historical data • Low concurrency if internal, high otherwise • Near linear scale • Normalized data model • Custom apps/3rd party • Keyed queries • Mostly MPP BI • Non-transactional • Seconds to minutes • Business internal • No direct updates • Batch to streaming feeds from OLTP/ODS • Historical data • Low to high concurrency • Less linear in scale • Dimension data model • BI, OLAP, ROLAP tools – reporting and dashboards • Ad hoc and scheduled queries and large extracts • Mostly MPP Analytics • Non-transactional • Minutes to hours • Business internal • No direct updates • Batch/aggregates from BI • Historical and big data • Low concurrency • Complex queries, nonlinear scale • Columnar store • Analytical tools • Ad hoc queries; Analytics in database • Mostly MPP Essential to operate the business To improve performance of the company
  • 6. Query versus storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Hadoop Cluster Switch Switch Operational Business Intelligence Analytics Query Engine • Allow clients to connect & submit queries • Distribute connections across cluster • Compile query • Execute query • Return results of query to client Storage Engine • Storage structure • Partitioning • Automatic data repartitioning • Select columns • Select rows based on predicates • Caching writes and reads • Clustering by key • Fast access paths or filtering • Transactional support • Replication • Compression & encryption • Mixed workload support • Bulk data ingest/extract • Indexing • Colocation or node locality • Data governance • Security • Disaster recovery • Backup, archive, restore • Multi-temperature data support In-memory Single Query Engine
  • 7. The challenges of HTAP: Single query engine for all workloads Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support (C) Copyright 2015 Esgyn Corporation Esgyn Confidential
  • 8. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support Table A Table B Partitioned The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Salting / Partitioning (hash, range, …) Salt key G D C EF Non-partitioned G D C F E Clustered by Primary Key BA C Multi-column clustering key
  • 9. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Equal-height histograms • Unique Entry Count • Lowest and highest values • Multiple key / join column cardinalities • Sampling for fast stats updates • Incremental update stats • Skew – equal height histograms
  • 10. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential 80 minutes 2 minutes Skew Buster
  • 11. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Week Item Store … 01/07/2016 1 1 … 01/07/2016 1 3 … 01/07/2016 1 5 … 01/07/2016 2 34 … 01/07/2016 3 13 … 01/07/2016 3 3 … 01/07/2016 4 2 … 01/07/2016 4 4 … 01/14/2016 1 2 … 01/14/2016 1 4 … 01/14/2016 1 5 … 01/14/2016 1 35 … 01/14/2016 3 1 … 01/14/2016 3 20 … Where is item = 1, Stores 2 through 5? • Use of various statistics to generate an efficient plan • Sequence of column access for column stores
  • 12. The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Indexes • Kinds of indexes and how they are leveraged • Unique index • Transactional consistency with base table • Impact on updates • Updates during bulk loads Materialized Views • Synchronous and asynchronous maintenance • Overhead of maintenance • Automatic query rewrite • User defined materialized views Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support
  • 13. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Serial vs parallel plans Node 1 Node 2 Node n Client Application HDFS HBase Region 1 Filters HDFS HDFS HDFS HDFS Ethernet Coprocessors HBase Region 2 HBase Region 3 HBase Region 4 HBase Region 5 Master Master Multi- fragment Master ESP ESP ESP ESP ESP ESP ESP ESP ESP ESP
  • 14. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Qry1 Qry2Qry4 Qry3Qry5 Qry6 Qry7
  • 15. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Optimizer technology, e.g., Cascades used by Apache Trafodion and Microsoft SQL Server • Query plan caching for operational • Query plan cache management • Extensibility of optimizer to evolve with varied workloads • Recognizing query patterns, such as star joins
  • 16. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Adaptive and parallel joins • Nested join • Probe cache for nested join • Merge join • Matching partition join • Repartitioned hash join • Replication by broadcast hash join • Inner / outer child broadcast • Dimensional schema star join • Inner join • Left Join • Right Join • Full Outer Join • Self join Cost Premiums for nested joins or serial plans
  • 17. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Compute Cost Execution Environment Physical Properties Estimates Confidence Cardinality, Distribution, Correlation Sensitivity To Estimates Evaluate Risk Risk Adjustment Benefit Risk Risk Premiums • Nested join 20% • Merge join 10% • Serial plan 5%   ?
  • 18. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Scan Scan Join Group by • Data flow architecture • No materialization of intermediate results • Graceful overflow to disk for large memory operations • Efficiencies such as pre-fetch • Fast path for operational workloads
  • 19. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support • Priority / SLA-based execution • Allocation of resources by service level • Decrease priority with usage increase • Anti-starvation / switch between queries based on priority The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Query Low Query Medium Queue Memstore HBase …. Memstore HBase Memstore HBase Queue Queue HBase Region 1 HBase Region 3 HBase Region 5 Query High Low Low Low Medium MediumMedium High HighHighLow Low Low Medium MediumMedium High HighHigh
  • 20. Data structure – key support, clustering, partitioning Statistics Predicates on non-leading or non-key columns Indexes and materialized views Degree of parallelism Reducing the search space Join type Data flow and access Mixed workload Feature support The challenges of HTAP: Single query engine for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Operational workloads • Referential integrity • Stored procedures • Triggers • Various levels of transactional isolation and consistency • … BI and Analytics workloads • Materialized views • Fast / bulk extract, transform, load (ETL) • OLAP, time series, statistical, data mining, and other functions • … Needed by both • Scalar and table mapping UDFs • Inner, outer, and full outer joins • Un-nesting of subqueries • Converting correlated subqueries to joins • Predicate push down • Sort avoidance strategies • Constant folding • Recursive union • …
  • 21. The challenges of HTAP: Supporting multiple storage engines Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects (C) Copyright 2015 Esgyn Corporation Esgyn Confidential
  • 22. Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Storage engine statistics, used by query engine • Sampling • Access to changed data for incremental updates • Update counters to determine refresh schedule Refresh
  • 23. The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential BA CMulti-column key Query Engine Storage Engine A+B+C Single clustering key Random single row and range access for operational workloads 31 5 51 7 22 4 22 9 32 4 42 1 23 1 23 2 A=2 range access Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects
  • 24. Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Data partitioning across disks and nodes • Hash, range, or combination • Salting support • Query engine imposed salting • Repartitioning as the cluster expands/contracts • Read/write access while being rebalanced • Localize data access to avoid shuffling CREATE TABLE t(a integer not null primary key, b integer) SALT USING 4 PARTITIONS; HBase Region HDFS HBase Region HDFS HBase Region HDFS HBase Region HDFS INSERT(s) SELECT(s) PART 1 PART 2 PART 3 PART 4
  • 25. Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects The challenges of HTAP Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Data types supported • Query to storage engine data type mapping • Value constraint enforcement CHARACTER(n) Character string. Fixed-length n VARCHAR(n) or CHARACTER VARYING(n) Character string. Variable length. Maximum length n BINARY(n) Binary string. Fixed-length n BOOLEAN Stores TRUE or FALSE values VARBINARY(n) or BINARY VARYING(n) Binary string. Variable length. Maximum length n INTEGER(p) Integer numerical (no decimal). Precision p SMALLINT Integer numerical (no decimal). Precision 5 INTEGER Integer numerical (no decimal). Precision 10 BIGINT Integer numerical (no decimal). Precision 19 DECIMAL(p,s) Exact numerical, precision p, scale s. Example: decimal(5,2) is a number that has 3 digits before the decimal and 2 digits after the decimal NUMERIC(p,s) Exact numerical, precision p, scale s. (Same as DECIMAL) FLOAT(p) Approximate numerical, mantissa precision p. A floating number in base 10 exponential notation. The size argument for this type consists of a single number specifying the minimum precision REAL Approximate numerical, mantissa precision 7 FLOAT Approximate numerical, mantissa precision 16 DOUBLE PRECISION Approximate numerical, mantissa precision 16 DATE Stores year, month, and day values TIME Stores hour, minute, and second values TIMESTAMP Stores year, month, day, hour, minute, and second values INTERVAL Composed of a number of integer fields, representing a period of time, depending on the type of interval ARRAY A set-length and ordered collection of elements
  • 26. Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects The challenges of HTAP Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Data types supported • Query to storage engine data type mapping • Value constraint enforcement • Referential constraints • Character sets • Collations • Compression • Encryption
  • 27. Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Projection at storage or query engine level • Predicates evaluated by query and storage engines • Predicates applied to compressed data • Multi-column predicates • IN lists; size of IN lists • Multiple predicates with ORs and ANDs (pushdown) • Evaluate predicates in sequence of filtering effectiveness • Predicates comparing different columns of same table • Complex expression evaluation • Evaluation of functions • Default or missing values on retrieval C2C1 C3 G1 7 R2 4 F2 9 T2 4 B2 1 .... .. C5C4 C6 23 T 15 F 57 R 89 M 82 N .... .. project
  • 28. Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Server side extensibility e.g. HBase coprocessors or Cassandra triggers to push down: • Complex predicate evaluation with expressions and functions • Pre-aggregation • Collocated joins or index maintenance • Transactional support • Security enforcement • Some ANSI Trigger actions
  • 29. The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Mapping of security frameworks for the query and storage engines to enforce ANSI SQL security • Integration with underlying Hadoop Kerberos security • Integration with security solutions, like Sentry or Ranger • Integration with security logging and SIEM solutions Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects
  • 30. Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Replication for high availability, backup and restore, and multi-data center support from query & storage engines • ACID or BASE transactional support • Integration between the query and storage engines, such as write ahead logs, and use of coprocessors • Completely scalable and distributed transaction management architecture • Multi datacenter support – active-active single or multiple master replication • Overhead of transactions on throughput and system resources • Online backup and point in time recovery
  • 31. The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Single-Master Multiple-Masters Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects
  • 32. The challenges of HTAP: Supporting multiple storage engines Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects Time Full transactionally consistent snapshot Snapshots after non-transactional changes such as bulk loads Transactional changes captured continuously Point-in-time recovery
  • 33. The challenges of HTAP: Supporting multiple storage engines Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects Point-in-time recovery Time Drop table or erroneous large transactional update Restore previous full snapshot Initiate recovery to point-in-time
  • 34. Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Mapping storage to query engine metadata • Handling storage engine specific options • Support provided for external tables • Changes to external tables outside of the query engine • Operational vs. analytics objects
  • 35. The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects As nodes are added query engine immediately uses them for queries and transactions Storage engine rebalances data automatically • Transactional consistency across bulk loads • Rowset inserts and selects • Fast scanning options – snapshot scans, prefetching • Integration for parallel operations • Concurrency and mixed workload capability • Elastic scale for Cloud deployments
  • 36. The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Storage and query engine error logging • Mapping of storage engine errors to meaningful error messages and resolution options by the query engine Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects
  • 37. Statistics Key structure Partitioning Data type support Projection and selection Extensibility Security enforcement Transaction management Metadata support Performance, scale, and concurrency considerations Error handling Other operational aspects The challenges of HTAP: Supporting multiple storage engines (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Minimize operational and performance impact of storage engine operational aspects, e.g., compaction or splitting
  • 38. The challenges of HTAP: Same data model for all workloads … (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Normal Form Normal form • 1NF • 2NF • 3NF • BCNF • 4NF • 5NF • 6NF Star Schema Snowflake Schema Query engine integration with storage engine(s) to support all these data models
  • 39. The challenges of HTAP: Same data model for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Normal form • 1NF • 2NF • 3NF • BCNF • 4NF • 5NF • 6NF Star Schema Snowflake Schema Query engine integration with storage engine(s) to support all these data models
  • 40. The challenges of HTAP: Same data model for all workloads (C) Copyright 2015 Esgyn Corporation Esgyn Confidential NoSQL Data Models “NoSQL Data Modeling Techniques” by Ilya Katsov Highly Scalable Blog … and these!
  • 41. The challenges of HTAP: Enterprise-caliber capabilities High Availability Security Manageability (C) Copyright 2015 Esgyn Corporation Esgyn Confidential
  • 42. The challenges of HTAP: Enterprise-caliber capabilities High Availability Security Manageability (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Percentage of uptime 99.99% = 52.56 minutes downtime to 99.999% = 5.26 • Online operations (data available for reads and writes) o Upgrading the OS o Upgrading the file system o Upgrading the storage engine o Upgrading the query engine o Redistribute data to accommodate node and/or disk expansions and contractions o Changing table definition, e.g., data type changes, and adding, dropping, renaming columns o Create/drop secondary indexes o Full and incremental backups
  • 43. The challenges of HTAP: Enterprise-caliber capabilities High Availability Security Manageability (C) Copyright 2015 Esgyn Corporation Esgyn Confidential
  • 44. The challenges of HTAP: Enterprise-caliber capabilities High Availability Security Manageability (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Schema Management Performance Management Monitoring Security Management BAR Management Object Management Performance Monitoring Database Monitor User Management Backup Analysis Graphical Object Editor Live Performance Monitoring Event Monitoring Role Management Recovery Cross-Platform Schema Knowledge Data Repository Live Event Monitoring Account Migration Log Backup Bottleneck Analysis Threshold Alerts Audit Report Backup Reports SQL Management Job/Workload Analysis Health Index Alarm Archival Query Builder Job/Workload Wizard Live Health Monitoring Visual Difference Tool Job/Workload Management Response Times Maintenance Configuration Management Data Management Live Job/Workload Monitoring Alert Center Repository Aging OS Provisioning Data Migration OS Analysis Remote Monitoring Automated Maintenance Cluster Provisioning SQL Profiler Capacity Capture Central Monitoring Instance Provisioning Automated Import Capacity Trending Hardware Inventory Change Management Cloud Provisioning Visual Explain Plans Capacity Forecast Hardware Monitoring Schema Capture Configuration Editor Session Management Space Management Schema Compare and Synch Lock Management Reorganization Management Troubleshooting Notifications Process Management Query Cost Simulation Health Analysis Schema Rotation Consistency Checks Historical Reports Problem Correlation Collaboration Online Schema Evolution Bottleneck Tuning Automated Actions Virtual Changes Built-In Automation Access Path Analysis
  • 45. The challenges of HTAP: Enterprise-caliber capabilities High Availability Security Manageability (C) Copyright 2015 Esgyn Corporation Esgyn Confidential • Operational performance by transactions per second • Analytical performance by query • Overhead of gathering metrics on operational and analytical workloads • Configurable statistics collection • Workload management by Service Level Objectives o Based on priority and/or resource allocation o High priority operational workloads vs analytical workloads • End-to-end visibility of transaction and query metrics • Metric breakdown down to the query operation • Metrics for table access across workloads down to the partition level • Skew or bottlenecks • Integration with YARN
  • 46. Conclusion (C) Copyright 2015 Esgyn Corporation Esgyn Confidential Detailed O’Reilly report: http://www.oreilly.com/data/free/in -search-of-database-nirvana.csp It ain’t easy!! Very few products can even come close