SQL, NoSQL, Distributed SQL: Choose your DataStore carefully

SQL, NoSQL, Distributed SQL:
Choose your DataStore Carefully
Md Kamaruzzaman

About Me
Md Kamaruzzaman
Lead Software Architect
Blog: https://md-kamaruzzaman.medium.com/
Twitter: @KamaruzzMd

DataStore Types
Relational
Database
(SQL)
Key-Value
Store
Document
Database
Graph
Database
Columnnar
Database
Wide
Column
Store
Time Series
Database
Graph
Database
In-Memory
Database
Search
Engine
Object
Oriented
Database
Spatial
Database
NewSQL
Distributed
SQL
Data
Warehouse
Object
Storage

Why Right DataStore Matters?
One size does
not fit all
Fulfil the
Functional
Requirements
Fulfil the
Non-Functional
Requirements
Avoid rewriting
the Code due to
DataStore
change
Reduce
migration cost
Enable faster
development
Reduce
operational cost
Reduce
maintenance
cost

CAP Theorem
Distributed DataStore: Data is stored in more than one
Node (using Sharding and Replication)
Consistency: All clients see the same
view of data, even right after update or
delete
Availability: All clients can always read and write
Partition-tolerance: The system continues
to work as expected in case of network partition
(communication loss or delay between nodes).

Relational Database (SQL)
Key Features:
• Based on E.F. Codd’s paper on RDBMS (1970)
• Table based and relational
• Multi-table, multi-row ACID transactional guarantee
• Vertically scalable
• Referential integrity
• Normalization of Data
• Structured Query language (SQL)
• Battle-tested
• Sharding is managed by the Application/Middleware
• CA
• Example: PostgreSQL, Oracle, MS-SQL, MySQL

Relational
Database
When to use:
• As OLTP Database with ACID transactional guarantee
• Structured Data
• Vertically scalable Data
• Data is relational
When not to use:
• As OLAP Database
• Semi-structured (e.g. JSON, XML) or unstructured
Data
• Horizontal scalability
• Geographic Data distribution is required
• Data is extremely relational

Key-Value Store
Key Features:
• Data structure is Key-Value pair (HashTable)
• Value can be wide range of data structures (objects)
• Horizontally scalable using shared-nothing sharding
• Sharding is managed by the Database
• Schemaless
• Data redundancy and duplication due to denormalization
• In memory Key-Value store can be used as distributed Cache
• CP or AP
• Example: Redis, Memcached, RocksDB

Key-Value
Store
When to use:
• As OLTP Database with no ACID transactional
guarantee
• High throughput, low latency Read/Write
• Horizontal scalability with sharding managed by the
Database
• Large amount of Dataset
• In-Memory Key-Value Store:
• Improved database access performance
• CMS, Real-time systems
When not to use:
• Dataset is small

Document Database
Key Features:
• Database to store semi-structured Data (e.g. JSON, XML)
• Schemaless
• Multi-document ACID transactional guarantee
• Horizontally scalable
• CP or AP
• Example: MongoDB, CouchDB

Document
Database
When to use:
• As OLTP Database with limited ACID transactional
guarantee
• Data is semi-structured with advanced query features
• Rapid application development
• Data is schemaless
• Horizontal scaling with sharding managed by the Database
• Documents give better performance over normalized table
due to data structure
When not to use:
• Data is structured
• As OLAP Database (OLAP)
• Multi-table (collection) ACID transactional guarantee is
needed

Wide Column Store
Key Features:
• Two dimensional key-value store
• Column families are stored separately
• Schemaless
• Horizontally scalable with shared nothing sharding
• AP
• Low latency write operations
• Example: Cassandra, ScyllaDB, BigTable

Distributed
Wide Column
Store
When to use:
• As OLTP Database with no ACID transactional
guarantee
• Planet scale database with massive amount of
write/read
• Near linear horizontal scalability with sharding
managed by the Database
• Extremely large amount of Dataset
• As OLAP Database with additional OLAP tools (e.g.
Spark)
• Extremely low latency write/read
When not to use:
• Data is document (e.g. JSON)

Graph Database
Key Features:
• Use Graph Data structure (nodes,
edges, properties)
• Relationships are first class citizens
• Optimal for highly connected Dataset
• Use Graph Algorithms (e.g. Graph Traversal)
for faster queries.
• ACID Transactional guarantee.
• CP
• Example: Neo4j, Gremlin
Source: https://neo4j.com

Graph
Database
When to use:
• Relationship in the Data is very important
• Schema is evolving
When not to use:
• As analytics Database (OLAP)
• Data is not relational (disconnected) or lowly
relational
• Data is Document
• Key-Value store

Distributed SQL
Key Features:
• Table based and relational
• Multi-table, multi-row ACID transactional guarantee within some constraint (e.g. in Availability
zones)
• Geographic Data Distribution
• Referential integrity
• Structured Query language (SQL)
• CP with very high availability
• Example: Google Spanner, CockroachDB, YugabyteDB

Distributed
SQL
When to use:
• Near linear horizontal scalability with sharding managed
by the Database
• Consistency, Availability and Partition-tolerance within an
SLA
• Geographic Data distribution is required
• Data is structured and relational
When not to use:
• Geographic Data distribution is not required
• Lower Database price is desired
• Semi-structured (e.g. JSON, XML) or unstructured Data
• Vertically scalable data

Search Engine
Key Features:
• Provide Full-text search using Inverted Index
• Supports stop-word, synonyms, auto correction
• Data is structured or semi-structured
• Geo queries (location based search)
• CP
• Example: Apache Solr, Elasticsearch
Source: https://community.hitachivantara.com

Search Engine
When to use:
• Moderate to advanced full-text search is needed
• Horizontal scalability with sharding managed by
the Database
• GEO search is needed
• Structured or semi-structured data (e.g. Log Data,
JSON, XML)
When not to use:
• As operational Database (OLTP)
• As analytics Database (OLAP)

Object Storage
Key Features:
• Manages Data as objects
• Store Data as well as Metadata (Unique ID, Security Info)
• Single Repository (Flat hierarchy)
• REST API for CRUD operations
• Used mainly for unstructured and semi-structured data
• Provide globally unique identifier to access the data
• AP
• Data replication and data distribution at object-level granularity
• Example: Amazon S3, Azure Blob Storage, Google Cloud Storage

Object
Storage
When to use:
• To store unstructured and semi-structured data with
Object level granularity (e.g. Streaming Videos, Images,
CSV/XML files, Backups)
• High availability and high durability
• To reduce cost
• As Data lake
• Automatic Backup and Redundancy
When not to use:
• As operation Database (OLTP)
• As analytical Database (OLAP)
• Block Storage, directories
• Structured data

Data Warehouse
Key Features:
• Large-scale Analytical (OLAP) Database
• Data abstraction is structed and relational
• Central repositories of all analytical data
• Columnar store
• Supports SQL
• Horizontally Scalabale
• AP
• Massively Parallel Processing (MPP)
• Efficient distributed query execution engine
• Petabyte, Exabyte scale dataset
• Example: Google BigQuery, Snowflake, Amazon Redshift
Source: https://datawarehouseinfo.com

Data
Warehouse
When to use:
• As Analytical Database with Complex analytical queries
• Extremely large Dataset (Petabyte)
• Limitless scaling
• Faster querying for large-scale database
• BI and advanced analytics are critical for the company
• Value added features like Machine Learning
When not to use:
• As operation Database (OLTP)
• Data Warehouse is overkill for the company due to
price or nature of business
• OLTP Databases or Data Lakes are used for Analytics

Future
One size fits many
Hybrid Transactional/Analytical Processing (HTAP)
Multi-Model Database
SQL, NoSQL will learn from each other
Multi-Cloud Database-as-a-Service (DBaaS)
Serverless Database
Specialized Hardware for different Database
Many new Databases

SQL, NoSQL, Distributed SQL: Choose your DataStore carefully

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a SQL, NoSQL, Distributed SQL: Choose your DataStore carefully

Semelhante a SQL, NoSQL, Distributed SQL: Choose your DataStore carefully (20)

Último

Último (20)

SQL, NoSQL, Distributed SQL: Choose your DataStore carefully