You can watch the replay for this Geek Sync webcast, Data in the Cloud: Understanding Amazon Database Services with Visual Models, in the IDERA Resource Center, http://ow.ly/QYVj50A4qkv.
As a data professional, you understand that a data model is primarily used for designing databases. But as more databases move up to the cloud, data modeling can also serve as a visual approach to capture concepts and relationships for database services, such as Amazon RDS, Aurora, and Redshift. Data models can demystify the complexities perceived and associated with managing and modeling cloud databases. Henry Nirsberger will show you how conceptual data models for Amazon database services can clarify confusion and accelerate an understanding of these complex offerings.
Speaker: Henry Nirsberger (CDMP, CBIP) is the author of “A Conceptual Data Model for Amazon EC2” and CEO of HMN Consulting LLC, providing IT consulting services specializing in Data Management, Enterprise Architecture, Cloud, Facilitation, and IT Leadership. As a trained facilitator, he has facilitated over 600 IT design and planning sessions for data modeling, process modeling, database design, project planning, process improvement, requirements consensus, strategy planning, issues management, and team building. He continues to be an unremitting student of data modeling, cloud computing, enterprise architecture, and all aspects of data management. His certifications include CDMP (DAMA), CBIP (TDWI), CDP-DM (ICCP), CFPIM (APICS 1984–2003) and TOGAF 9.
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Geek Sync | Data in the Cloud: Understanding Amazon Database Services with Visual Models
1. Data in the Cloud: Understanding Amazon Database
Services with Visual Models
Henry M. Nirsberger, CEO
HMN Consulting, LLC
Info@HMNconsulting.com
www.HMNconsulting.com
A Mind Map for Cloud
Database Services!
2. Agenda
1. Intro/Objective …. Why?
2. What? Amazon Database Services?
3. Data Modeling Tutorial … AWS EC2 Basics
4. Visual Model A Conceptual Data Model …
Amazon EC2 Basics
Relational Database Service (RDS)
Aurora
Neptune
DocumentDB
Redshift
DynamoDB
ElastiCache
Quantum Ledger Database (QLDB)
5. Q&A
6. IDERA … ER/Studio Demo
ER
Studio
3. Intro/Objective …. Why?
• Many enterprises … Migrating existing apps to Amazon Web
Services (AWS) … The Amazon Cloud
• Early stages of Cloud Migration … Web/App servers … holding
back on Database Servers?
• Many IT Data PROs Little or no direct experience with AWS
and Amazon Database Services …
Challenge Quickly learning AWS cloud computing
concepts and Database Services
"The Cloud"
• Relational Oracle, SQL Server, MySQL, etc.
• Non-Relational … NoSQL Graph, Document, Ledger
databases for new classes of apps (e.g., Recommender Engines)
4. Intro/Objective …. Why?
AWS Database and EC2 documentation
• Amazon Relational Database Service, API Reference (API Version 2014-10-31)
• Amazon Aurora, User Guide for Aurora (API Version 2014-10-31)
• Amazon Neptune, User Guide (API Version 2017-11-29)
• Amazon DocumentDB, Developer Guide (API version: 2014-10-31)
• Amazon Redshift, API Reference (API Version 2012-12-01)
• Amazon DynamoDB, Developer Guide (API Version 2012-08-10)
• Amazon ElastiCache, API Reference (API Version 2015-02-02)
• Amazon ElastiCache for Redis, ElastiCache for Redis User Guide (API Version
2015-02-02)
• Amazon ElastiCache, ElastiCache for Memcached User Guide (API Version
2015-02-02)
• Amazon Elastic Compute Cloud User Guide for Linux Instances (2016)
• Amazon Elastic Compute Cloud, API Reference (API Version 2016-11-15)
• Amazon Quantum Ledger Database (Amazon QLDB): Developer Guide (API
version: 2019-01-02, Latest documentation update: September 10, 2019)
• Thousands of pages … Cliffs Notes? Rosetta Stone?
• Sometimes, the best way to understand a complex subject
area? Study its data model!
5. Intro/Objective…Why? Cont’d
“A Conceptual Data Model for
Amazon EC2”
(Kindle eBook)
“Data In The Cloud: A Conceptual Data
Model for Amazon Database Services”
(Kindle eBook and Paperback)
6. Agenda
1. Intro/Objective …. Why?
2. What? Amazon Database Services?
3. Data Modeling Tutorial/Refresher
4. Visual Model A Conceptual Data Model …
Amazon EC2 Basics
Relational Database Service (RDS)
Aurora
Neptune
DocumentDB
Redshift
DynamoDB
ElastiCache
Quantum Ledger Database (QLDB
5. Q&A
7. Amazon Database Services?
• Amazon Web Services (AWS) foundation Amazon EC2 (Elastic
Compute Cloud)
• EC2 … Virtual Machines, Network, Storage in the cloud
• Infrastructure as a Service (IaaS).
• Foundation for Platform as a Service (PaaS).
• Amazon Database Services are Platform Services built on top of EC2
Reducing CAPEX & OPEX Substantial Paradigm Shift vs.
Provisioning IT infrastructure in private data centers.
"The Amazon Cloud"
Relational Database Service (RDS), Aurora, Neptune,
DocumentDB, Redshift, DynamoDB, ElastiCache, Quantum
Ledger Database (QLDB)
“Managed Services” fewer worries … for provisioning
servers, backups, scaling resources, HA, etc. Fewer DBAs??
8. Amazon Database Services?
Relational … SQL
• Relational Database Service (RDS) Database Instances: Oracle
DB, MS SQL Server, MySQL, MariaDB, and PostgreSQL
• Aurora Clusters of Database Instances … open source DB engines
(MySQL and PostgreSQL)
• Redshift Clusters, columnar … PostgreSQL … very large data sets
(e.g., BIDW)
Non-Relational … NoSQL
• Neptune Graph DB engines (Gremlin & SPARQL GQLs)
• DocumentDB Clusters of document DB servers (MongoDB)
• DynamoDB Serverless … structured & semi-structured data (JSON
files) .. Cache Clusters for global internet scale apps
• ElastiCache Cache Clusters … in-memory (Memcached & Redis)
• Quantum Ledger Database (QLDB) Ledger databases … blends
relational, document, and blockchain concepts.
Much More to Data “Life” than Relational Stuff!
9. Agenda
1. Intro/Objective …. Why?
2. What? Amazon Database Services?
3. Data Modeling Tutorial … AWS EC2 Basics
4. Visual Model A Conceptual Data Model …
Relational Database Service (RDS)
Aurora
Neptune
DocumentDB
Redshift
DynamoDB
ElastiCache
Quantum Ledger Database (QLDB
5. Q&A
• IE
• Information Engineering
Notation
10. Data Modeling Tutorial .. Intro AWS EC2 Basics
Regions global, geo
locations … e.g. US East
Region (N. Virginia)
Availability Zones are isolated data
centers … for High Availability.
Account For billing
AWS resources & usage.
Amazon Machine Images (AMI)
templates for creating virtual machines
(“instances”), E.g. AMI for launching a
Linux/Apache Web Server.
11. Amazon EC2 Basics, Cont’d
• An Image (AMI) can be used to
launch many instances (virtual
machines) … a 1 to Many
relationship.
• An instance can be used to
create 1 or more AMIs.
Each instance has an instance
type … indicating the size of the
instance in terms of vCPUs,
RAM & Storage.
12. Amazon EC2 Basics, Cont’d
•Classless Inter-Domain
Routing (CIDR) IP Address
Range.
•An IP address is part of a CIDR
Block, e.g. 192.168.0.0/16.
•Each account can have 1 or
more Virtual Private Clouds
(VPC) -- a virtual network for
logically separating AWS
resources.
•E.g. for different for orgs, or for
development vs. production
apps, etc.
•Each VPC is composed of 1 or more subnets, e.g. for
Web, app or DB servers.
•Each subnet within a single availability zone
•A VPC can traverse > 1 availability zone HA.
13. Agenda
1. Intro/Objective …. Why?
2. What? Amazon Database Services?
3. Data Modeling Tutorial … AWS EC2 Basics
4. Visual Model A Conceptual Data Model …
Relational Database Service (RDS)
Aurora
Neptune
DocumentDB
Redshift
DynamoDB
ElastiCache
Quantum Ledger Database (QLDB
5. Q&A
Managed Service for
Migrating or Creating
Relational Databases
18. Agenda
1. Intro/Objective …. Why?
2. What? Amazon Database Services?
3. Data Modeling Tutorial … AWS EC2 Basics
4. Visual Model A Conceptual Data Model …
Relational Database Service (RDS)
Aurora
Neptune
DocumentDB
Redshift
DynamoDB
ElastiCache
Quantum Ledger Database (QLDB
5. Q&A
• Database Clusters Many
DB instances
• Open Source … MySQL &
PostgreSQL
19. Aurora
• Database Clusters Many DB instances
• Open Source MySQL and PostgreSQL
• Cluster security groups, subnet group, parameter group,
engine version, a source for event notifications.
• Aurora Concepts:
o Primary DB instance
o Read Replica DB instances
o Read Replica DB clusters
o Virtual cluster volumes … SSD … replicated across AZs
o Backtracking … Change Records … rewind/undo
o Serverless DB clusters … warm pools of DB instances
How model these concepts??
Both Reads & Writes
Read-only, Performance, HA,
Updates Auto Synchronized
Cross-Region Clusters…
remote customers … MySQL
20. Aurora
2 Types of Clusters:
• DB Clusters
• Cache Clusters … in-
memory data
4 Types of DB Clusters:
• Aurora
• Neptune
• DocumentDB
• Redshift
Clusters … Overview
Cluster
Taxonomy
21. Aurora
DB Clusters inherit Cluster
relationships, e.g. Security
Groups (~ Firewalls)
• Aurora Clusters inherit
DB Cluster relationships,
e.g. Snapshots (backups)
• Aurora Clusters Many
DB Instances
• Aurora … Special
relationships, e.g. for
MySQL cross-region
replicas, backtracks.
Aurora Cluster …
Overview
Firewalls
22. Agenda
1. Intro/Objective …. Why?
2. What? Amazon Database Services?
3. Data Modeling Tutorial/Refresher
4. Visual Model A Conceptual Data Model …
Relational Database Service (RDS)
Aurora
Neptune
DocumentDB
Redshift
DynamoDB
ElastiCache
Quantum Ledger Database (QLDB
5. Q&A
• Apps Complex M/M Relationships,
e.g. Recommender engines
• Graph Databases
Why?
23. Neptune
• Database Clusters
• NoSQL “Non SQL” or "Not Only SQL" … Non-Relational
• SQL … GQLs … Graph Query Languages: Gremlin, SPARQL
• Labelled Property Graphs (LPGs)
• Graph Data Structures
Vertices/Nodes (~ Rows)
Edges (~ Relationships)
Properties (~ Columns)
Person
1
Person
3
Person
2
Edge
e.g. “Friend”
Edge
e.g. “Connected”
Label = PERSON PERSON
PERSON
Vertex/Node
• Graph Databases
• Complex M/M relationships,
e.g. Recommender engines
NoSQL Examples??
24. Neptune
• Gremlin Graph Query Language … GQL
ADDV (Add Vertex) … SQL INSERT
ADDE (Add Edge) … linkage from 1 Vertex to another …
analogous to a Foreign Key
PROPERTY (Add Property) … Column Value ... Schemaless
HAS … Filtering … analogous to SQL SELECT
DROP: The drop step … analogous to SQL DELETE
Examples
g.addV('person').property(id, 'PER-0001').
property('name','Random A. Person').property('dob', '03/03/1995')
g.addE('friend').from(g.V('PER-0001')).to(g.V('PER-0002'))
SQL Select, Insert,
Update, Delete
How model Neptune platform
concepts??
NoSQL
25. Neptune
• Neptune Clusters inherit DB
Cluster relationships, e.g.
Snapshots (backups)
• Neptune Clusters Many DB
Instances … graph DB
instances
• Primary & Read Replicas
• Unlike Aurora Clusters …
• No cross-region replicas
• No backtracks
Neptune Clusters similar to
Aurora Clusters …
Differences in RED
27. DocumentDB
• Database Clusters
• NoSQL … Differences with RDS, Aurora, and Neptune
Key Concepts
• MongoDB
• Collections … like Tables
• Documents … like Rows
• Field … a Key-Value pair … like a column of a row
• Embedded Documents … Nested Data
• Document Databases
• MongoDB … JSON files
Semi-structured Data
JSON Key-value Pairs
NoSQL Examples??
1/Many Relationships
within a Document
28. DocumentDB
• insertOne (~ SQL INSERT) inserts a document into a collection.
• insertMany: Inserts multiple documents into a collection.
• find: (SQL SELECT) retrieves documents from a collection.
• updateOne: (SQL UPDATE) updates a document in a collection
• updateMany: updates all documents that satisfy search criteria for a
specified collection.
• deleteOne: (SQL DELETE) removes a document from a collection based on
search criteria.
• deleteMany: This method removes all documents that satisfy specified
search criteria from a specified collection.
How model DocumentDB platform concepts?
{
"SSN": "123-45-6789",
“EmployeeID”: “PER-0001”,
"Name": "Random A. Person",
"DOB": "1990-01-01",
“Jobtitle”: “sales person”,
"Street": "1000 Any Street",
"City": "Any Town",
"State-Province": "NY",
"Country": "USA"
}
Document (~ Row)
SQL Select, Insert, Update,
Delete
Employee
Collection
NoSQL
29. DocumentDB
• DocumentDB Clusters inherit
DB Cluster relationships, e.g.
Snapshots (backups)
• DocumentDB Clusters
Many DB Instances
• Primary & Read Replicas
• Unlike Aurora Clusters …
• No cross-region replicas
• No backtracks
DocumentDB Clusters similar
to Aurora & Neptune Clusters
… Differences in RED
30. Agenda
1. Intro/Objective …. Why?
2. What? Amazon Database Services?
3. Data Modeling Tutorial/Refresher
4. Visual Model A Conceptual Data Model …
Relational Database Service (RDS)
Aurora
Neptune
DocumentDB
Redshift
DynamoDB
ElastiCache
Quantum Ledger Database (QLDB
5. Q&A
• Analytics … OLAP
• Large data sets … Fast Response
Columnar Database
Massively Parallel Processing
Why?
31. Redshift
• Database Clusters
• For OLAP & BIDW Large data sets … Few columns accessed
• Comfort Zone … Relational … PostgreSQL
• New Vocabulary:
o Leader Node Many Compute Nodes
o Columnar Data … Single column values for many rows
stored in each data block
How model Redshift
platform concepts??
Both Reads & Writes
Read-only, Performance, HA,
Updates Auto Synchronized
Many “Nodes” …
not “DB instances”
Star Schema
Dimension Tables & Fact Tables
Partitioned Data Sets … Distributed across Nodes
Massively Parallel Processing (MPP)
Fast!
32. Redshift
• Redshift Clusters inherit DB
Cluster relationships, e.g.
Snapshots (backups)
• Unlike Aurora Clusters …
• No cross-region replicas
• No backtracks
• Other Differences in RED
• Redshift Clusters Many
“Nodes”
• Leader & Compute Nodes
o Partitioned Data Sets
o Massively Parallel Processing
• Table Restore Requests
33. Agenda
1. Intro/Objective …. Why?
2. What? Amazon Database Services?
3. Data Modeling Tutorial/Refresher
4. Visual Model A Conceptual Data Model …
Relational Database Service (RDS)
Aurora
Neptune
DocumentDB
Redshift
DynamoDB
ElastiCache
Quantum Ledger Database (QLDB
5. Q&A
• Multimaster Database
• Cache Clusters
• Globally Distributed, Internet Scale Apps
• Thousands of concurrent users
Why?
34. DynamoDB
• NoSQL … structured & semi-structured … key-value pairs … JSON
Documents
• New Vocabulary …
o Tables Items (~ Rows) Attributes (key-value pairs, ~ Columns)
o Global Tables … replicated across regions … updates synchronized
o Throughput Settings … Serverless … No provisioning of DB servers
Read Capacity Units (RCUs) … anticipated # of table reads/sec
Write Capacity Units (WCUs) … anticipated # of table writes/sec
Auto Scaling Policies
o Cache Clusters Item Cache, Query Cache, eventually and strongly
consistent reads … DynamoDB Accelerator … DAX Clusters
Multimaster Database
Performance, World-wide Access, Disaster Recovery, HA
• Serverless Based on Table Reads/Writes
• Servers automatically allocated from a “warm pool” of servers
Globally Distributed, Internet Scale Applications
NoSQL Examples??
35. DynamoDB
DynamoDB vs. SQL
• PutItem Adds an item to a table ....…..
• GetItem Retrieving a single item by its primary key
• Query Retrieving multiple items based on query filters
• UpdateItem Update a single item ….….SQL UPDATE
• DeleteItem Deletes one item ………..
NoSQL
~ SQL SELECT
~ SQL INSERT
~ SQL UPDATE
~ SQL DELETE
SQL Select, Insert, Update,
Delete
How model DynamoDB
platform concepts??
36. DynamoDB
Global Table … replicated
across Regions
• Auto Scaling Policies
• Serverless …
• # of Reads on each Table
• # of Writes on each Table
Schemaless … Attributes
Not Predefined
~ Rows
Table Indexes
What about Cache
Clusters … DAX
Clusters??
37. DynamoDB
2 Types of Clusters:
• DB Clusters
• Cache Clusters … in-
memory data
2 Types of Cache Clusters:
• DAX Clusters
• ElastiCache Clusters
• DAX DynamoDB
Accelerator … response times
~ Microseconds
• In-memory … Pareto Principle
• Primary Node … Read Replica Nodes
• Item Cache Items accessed using Keys
• Query Cache Result sets accessed Parameter Values
38. Agenda
1. Intro/Objective …. Why?
2. What? Amazon Database Services?
3. Data Modeling Tutorial/Refresher
4. Visual Model A Conceptual Data Model …
Relational Database Service (RDS)
Aurora
Neptune
DocumentDB
Redshift
DynamoDB
ElastiCache
Quantum Ledger Database (QLDB
5. Q&A
• In-memory storage of data
• Rapid Response
• No back-end database
servers?
Why?
39. ElastiCache
• Cache Server Clusters NoSQL, Key-values, In-memory
• Possible to persist and recover data using …
o Backups
o Change logs
• New Vocabulary:
Memcached
Redis
Lazy Loading Caching
Write Through Caching
Replication Groups
For Even Faster
Response Times
• Open-source, partitioning data across
multiple Cache Servers …. called “nodes”
• High Availability Multiple AZs
• Data Structure Server ... beyond Key-Value Pairs
• Abstract Data Types: e.g., Lists, Sorted Sets,
Hashes (~ Rows … of Key-Value Pairs) Redis
• App updates DB & Cache
• Cache always current
• Cache Miss App accesses DB directly
• App refreshes cache data
• Each Partition Group of Nodes
• Primary node & Read Replica nodes
Redis
Possibly No back-end database servers??
40. ElastiCache
Redis … Data Structure Server
• Strings ~ Blob
• Hashes ~ Row in an RDBMS … Row of Key-Value pairs
• Lists … Ordered sequence of string values
• Sets … Unordered sequence of string values
• Publish/Subscribe … Message subscriptions
Memcached … Key-value store
• Strings hash table
• Key String Value Another String Value
NoSQL …
API Examples
• LPUSH
• RPUSH
• LRANGE
• HMSET
• HMGET
• HEXISTS
• Set Data
• Add Data
• Replace Data
• Append Data
• Prepend Data
• Get Data
• Delete Key
SQL Select, Insert, Update,
Delete
NoSQL
41. ElastiCache
2 Types of Clusters:
• DB Clusters
• Cache Clusters …
in-memory data
2 Types of Cache
Clusters:
• DAX Clusters
• ElastiCache Clusters
• In-memory … Pareto
Principle
• ElastiCache Nodes
42. ElastiCache
Super Fast
Response Times
Replication Group = A
Type of ElastiCache
Cluster
Redis
A Replication Group
has many Node
Groups
A Node Group for
Each Partition A
Primary node & Read
Replica nodes
43. Agenda
1. Intro/Objective …. Why?
2. What? Amazon Database Services?
3. Data Modeling Tutorial/Refresher
4. Visual Model A Conceptual Data Model …
Relational Database Service (RDS)
Aurora
Neptune
DocumentDB
Redshift
DynamoDB
ElastiCache
Quantum Ledger Database (QLDB)
5. Q&A
• CyberSecurity Threats? Data Integrity?
• Ledger Databases … System of Record
• Immutable … Append Only
• Blockchain Concepts
Why?
44. Quantum Ledger Database (QLDB)
• Ledger database for System of Record (SOR) apps Complete
transaction history (e.g., eCommerce order tracking & fulfillment).
• Append only Journal of entries … Built-in change history
Smorgasbord of Concepts …
Relational … Document … Blockchain Concepts
Cyber-security threats to
data integrity?
• Tables
• SQL Like
Avoids … Triggers, Stored Procedures,
Partitioned Tables, Audit Logs, etc.
No Updates to existing data
• Merkle Trees
• Merkle Audit Proofs
Documents Key-Value
Pairs, like Rows in a Table
• Documents in “Blocks”
linked by cryptography …
SHA-256 Hash Codes
• Immutable and Verifiable
45. Quantum Ledger Database (QLDB)
• PartiQL Open Source … ~ SQL INSERT, SELECT, UPDATE, DELETE
• Extensions to SQL Access to documents
Dot Notation and
Aliasing of nested data.
INSERT INTO PurchaseOrder
{
'POId' : 'PO123456789',
'CustomerId' : 'Any Random Customer',
'OrderDate' : `2019-12-25T`,
'POItems' :
[
{ 'ItemId' : 'Random Widget A' , 'Qty' : 1, 'UnitPrice': 1.75},
{ 'ItemId' : 'Random Widget B' , 'Qty' : 2, 'UnitPrice': 2.75},
{ 'ItemId' : 'Random Widget C' , 'Qty' : 3, 'UnitPrice': 3.75}
]
SELECT po.POId, po.OrderDate, poi.ItemId, poi.Qty
FROM PurchaseOrder AS po, @po.POItems AS poi
WHERE po.CustomerId = 'Any Random Customer'
• Alias for nested data
• Simplifies access
• Avoids Table Join