Database is the new black. Ever the backbone of information architectures, database technology continually evolves to meet growing and changing business needs. New types of data and applications make the database more important than ever, and understanding which technology best serves your use case is paramount to building durable systems. These days, the choices are many, so users should be careful when deciding which direction to go. Register for this Exploratory Webcast to hear veteran database Analyst Dr. Robin Bloor explain why the database market has exploded in recent years. He'll outline the current database landscape, and provide insights about which kinds of technologies are suitable for the growing variety of business needs today. He'll also focus on key auxiliary technologies that enable modern databases to do perform efficiently.
3. Database Disruption
The forces of nature
often converge to
transform the very
foundations of our
infrastructure.
In the database
landscape, recent
developments have
resulted in a massive
transformation of
the DBMS market.
Understanding your
requirements is key
success these days.
6. Database Fundamentals
q Built for a collection of
resources – which could
be engineered for the
application
q Shares data among
multiple concurrent users
q Optimizes performance
q Handles resilience
q Provides ACID properties
to some degree
8. Hardware Factors
q CPUs, GPUs & FPGAs
q Cross breeding
q 3D Xpoint and PCM (and
Memristor?)
q SSDs & parallel access
q Parallel hardware
architectures
Performance is accelerating
and costs continue to fall.
9. The Cloud
q A Cloud Database is no
different to an on-prem,
in theory
q Most databases now
available in the cloud
q Some databases are cloud
focused (Snowflake, Reed
Shift)
q Some are hybrid (NuoDb
is a good example)
10. Data Growth
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Corporate
Databases
+ Unstructured Data
+ Partner & Customer Data
+ Web Data
+ Social Network Data
+ Streaming Data
+ IoT Data
+ Personal Data
+ Log File Data
Data growth is roughly 55% pa. Always has been.
11. The Global Map and Data Options
u Move the data to
the processing
u Move the
processing to the
data
u Move the
processing and the
data
u Shard
There will not be a single physical database (or data lake) for a
multitude of reasons.
13. Everything in flux
u Hardware (network,
storage, servers)
u Data Sources
u Data Staging
u Data Volumes
u Data Flow
u Data Governance
u Query Languages
u Data Usage
u Data Structures
u Schema definition
u Ingest speeds
u Data Workloads
u Applications
14. NoSQL Confusion
As the graph indicates,
there is some overlap
between SQL databases
and other databases.
What to choose is a use-
case driven decision.
There never was a
“universal database”
and probably there
never will be.
15. NoSQL World
q Some NDBMS do not attempt to
provide all ACID properties.
q Some NDBMS use a distributed
scale-out architecture with data
redundancy.
q XML DBMS using XQuery are
NDBMS.
q Some documents stores are
NDBMS
q Object databases are NDBMS
(Gemstone, Objectivity,
ObjectStore, etc.)
q Key value stores
q Graph DBMS are NDMBS
q Large data pools (BigTable,
Hbase, Mnesia, etc.) are NDBM
17. SQL Merits and Demerits
q SQL: very good for set
manipulation.
q Works for OLTP and many
query environments.
q Not good for nested data
structures (documents, web
pages, etc.)
q Not good for ordered data
sets
q Not good for data graphs
(networks of values)
Not a Swiss Army Knife!
18. The Impedance Mismatch
q The RDBMS stores data organized
according to table structures
q The OO programmer manipulates
data organized according to
complex object structures,
which may have specific
methods associated with them.
q The data does not simply map to
the structure it has within the
database
q Consequently a mapping activity
is necessary to get and put data
q Basically: hierarchies, types,
result sets, crappy APIs,
language bindings, tools.
19. The SQL Barrier
q SQL has:
q DDL (for data definition)
q DML (for Select, Project and
Join)
q But it has little MML (Math)
or TML (Time)
q Usually result sets are brought to
the client for further analytical
manipulation, but this creates
problems
q Alternatively doing all analytical
manipulation in the database
creates problems
21. Database Mismatch
A key problem is that we talk
mostly about computation over data
when we talk about “big data” and
analytics, a potential mismatch for
both relational and NoSQL
22. Database Workload Parameters
q Read-intensive vs. write-
intensive
q Mutable vs. immutable data
q Immediate vs. eventual
consistency
q Short vs. long data latency
q Predictable vs.
unpredictable data access
patterns
q Simple vs. complex data
types
23. Horses for Courses
q Relational row store databases for
conventionally tooled low to mid-
scale OLTP
q Relational databases for ACID
requirements
q Parallel databases (row or column)
for unpredictable or variable query
workloads
q Specialized databases for complex
data query workloads
q NoSQL (KVS, DHT) for high scale
OLTP
q NoSQL (KVS, DHT) for low latency
read-mostly data access
q Parallel databases (row or column)
for analytic workloads over tabular
data
q NoSQL / Hadoop for batch analytic
workloads over large data volumes
24. Database Tools: A Call Out
q Have you noticed how databases
are not self-running.
q DBA’s are in short supply and the
need for them is increasing
q Database diversity doesn’t help
in this area.
q DBA Tools:
q SQL analysis
q Performance analysis
q Security management
q Capacity planning
q Database deployment
q We meet the same problem with
data lakes – except that there
are very few tools
25. The Impact of Parallelism
We used to see 10x performance
improvement every 6 years, now we
see 1000x (and that’s just an
approximation) regularly
27. The Perfect Storm – The Data Lake
q The triumph of Open
Source as a business model
q The dominance of Apache
q Hadoop, the platform
for data
q Spark, for speed
q Kafka & Nifi for data
flow
q The triumph of the cloud
and its dominance
q Cost collapse
28. The Primary Role of the Data Lake
System of Record
Data Governance
Application Platform
29. The Evolved Conception
Analytics
or BI Apps
Data
Governance
Data Lake
Mgt
Static Data Sources Data Streams
To
Databases
Data Marts
Other Apps
ETL
Data
Lake
Ingest
u Static data and data
streams
u Real-time data ingest
u Data Governance
u Data Lake Mgt
u Analytics & BI
u Extracts
The data lake becomes
the system of record
31. The Full Picture
Data
Cleansing
Data
Security
Ingest
Metadata
Mgt
Real-Time
Apps
Transform &
Aggregate
Search &
Query
BI, Visual'n
& Analytics
Other
Apps
Data Lake
Mgt
Data
Governance
DATA LAKE
To
Databases
Data Marts
Other Apps
Archive
Life Cycle
Mgt Extracts
Servers, Desktops, Mobile, Network Devices, Embedded
Chips, RFID, IoT, The Cloud, Oses, VMs, Log Files, Sys
Mgt Apps, ESBs, Web Services, SaaS, Business Apps,
Office Apps, BI Apps, Workflow, Data Streams, Social...
32. Data Governance
If data governance was important
before Big Data, (and it was) it is
far more important in the era of
Data Lakes
33. Data Governance
System of record
Data provenance & lineage
Data cleansing
Data security
Data compliance
Data integrity
Data audit record
Data life-cycle mgt
Data meaning
Data Governance is a perpetual
process
35. A TRANSACTION is a
MOLECULE of ATOMIC EVENTS
The ATOM of data has
become the EVENT
Events: Atoms and Molecules
36. Events
Think of events as drops of water.
They can live in streams, and they
can also live in data pools and data
lakes and databases.
37. Event Types
q Instantiation Event
q A State Report
q A Trigger Event
q A Correction Event
We also need to consider:
Data Refinement
Aggregations
Homogeneous Collections
Derived Data
38. § The pulse and the
threshold alert
§ Some of this involves
distributed processing
§ There are known apps
and unknown apps, so
analytical exploration
needs to be enabled
§ Only aggregations will
migrate
DepotDepot
Central
Hub
Source
Proc.
Depot
Proc.
Central
Proc.
Sensors, controllers, CPUs
Data Data
Data
Event Based IoT Architecture
39. u Time
u Geographic location
u Virtual/logical location
u Source device & SW
u Device ID
u Derivation (if derived)
u Creator
u Owner
u Permissions
u Status (for replication)
u Metadata
u Audit Trail
u Archive flag
Self-defining data