1. Z A H I D M I A N
F E B R U A R Y 2 7 , 2 0 1 1
Amazon SimpleDB
2. Need for NoSQL
Avoid Overhead Associated with Traditional RDBMS
Scale Horizontally (significant) as well as
Vertically
High Availability
Simplify data storage and model (make it efficient
for storing and retrieving data)
Generally a Hash table
3. Tradeoffs SimpleDB vs. RDBMS
Simplicity
Lack of support for joins, views, constraints, transactions,
stored procedures, etc.
Schema-less, type-less (all values are stored as text)
Simplified querying language (Select * …)
No fine-tuning necessary
Uses Web Services to access data
BASE implementation instead of ACID
Key is “Eventual” commits
4. Tradeoffs SimpleDB vs. RDBMS
Proprietary Query “language”
Designed to retrieve Items (not records)
Basic operations
Specific operations like CreateDomain, DeleteAttributes,
PutAttributes, etc.
Storage Structure
One large Hash table
Each value is hash, so automatically indexed
Little or No Infrastructure planning
Hosted by Amazon
6. SimpleDB Object Model
User Account (One Store per account)
Domain – equivalent to a Table
Item – equivalent to a Record
Attribute – equivalent to a Column
Value – equivalent to a column value
Multiple values per attribute are allowed
8. Application Design Considerations
Normalized vs. Non-Normalized Storage
Data Caching at the Application level
Normalized Data
Contacts ContactEmailAdresses
ContactID Name DOB Gender ContactID EmailAddress
1 Adam Smith … M 1 asmith1@...
2 Sarjo T … M 1 asmith2@...
3 Sarah K … F 2 sarjot@...
3 sarah1@...
3 sarah2@...
9. Application Design Considerations
Non-Normalized Data in SimpleDB
Contacts
ContactID Name DOB Gender EmailAddress
1 Adam Smith … M asmith1@...
asmith2@...
asmith3@...
2 Sarjo T … M sarjot@...
3 Sarah K … F sarah1@...
sarah2@...
Contacts
ContactID Name DOB Gender EmailAddress1 EmailAddress2 EmailAddress3
1 Adam Smith … M asmith1@... asmith2@... asmith3@...
2 Sarjo T … M sarjot@...
3 Sarah K … F sarah1@... sarah2@...
Add Additional Attributes as needed
Null attributes don’t exist
Add Additional Values as needed
10. Application Design Considerations
Analytical Processing
No support for group by or aggregation
Application must implement appropriate functionality
Can be costly operation at the data level
Bulk Operations
Little support for bulk updates
At least two trips (one to get the items, the other to send batch
request)
11. Application Design Considerations
No Transactional Support
Application must “mimic” a transaction by guaranteeing
commits
Support for Consistent Reads (discouraged)
Constraints
All constraints (type or data) must be handled by the
Application
12. Application Design Considerations
Working With Data/Values
Value Size Limit of 1024 bytes
Possibly break into chunks of data
Lexicographical search creates problems
Negative Numbers Offset
Need to use an “offset” number to add to numeric values to
handle negative values
Zero Padding
Pad all numbers with leading “0”
Dates
Convert all dates to ISO 8601 standard before saving
13. Hosting Environment
Challenges to Consider
Data Privacy
Legal Requirements
No Backup Support
“Lock-in” Factor (can’t migrate from SimpleDB)
“Open Cash Register” Problem (rogue script/processing can be
costly)
Difficult to Maintain DB for Application Development
Lifecycle (unit test, dev, test, perf, prod)
14. Pros of Using SimpleDB
Item Explanation
Infrastructure Amazon hosts the environment, so virtually no cost to get started; no
need for a local datacenter; “pay as you go” for processing
Simplicity Extremely efficient storage and retrieval of data
Flexibility Schema-less; type-less data; easy prototyping
Security data is stored with Amazon and accessible through authenticated
requests only
High Availability BASE implementation provides high availability
Fault-Tolerance Data replicated across multiple nodes; managed by Amazon
Indexing Hash table storage means all data is “automatically” indexed
15. Cons of Using SimpleDB
Item Explanation
Not RDBMS Not a RDBMS substitute. Lacks features like stored procedures, referential integrity, views,
datatypes, text search, schemas, granular security
Lacks “rich” SQL Rudimentary search operations; cannot group by, aggregate, etc.
SLA Loosely defined SLA;
Joins Joins can be performed at the application layer, but requires multiple operations between
client/server
Limits on data 10 GB Store; 100 domains; 256 values per item; 1,024 bytes per attribute
Limits on Operations 1 MB response size; 2,500 items returned per Select; 5 seconds maximum for operation
Limits on Predicates 20 maximum predicates per Select; cannot reference other attributes of the Item
Hosting No local implementation makes it difficult to develop application (release management,
performance testing, unit testing, etc.); no backup support; privacy issues
Migration Limited options for migrating data
16. Appropriate Use Cases
Type of
Application
Explanation
Managing Data for Online
Games
User scores and achievement data; User settings or preferences; user-generated
content (comments, feedback, etc.); dynamic game content
Managing Session State Applications like online games, web sites, and batch processes can manage the
state of their process
“static” content Nightly Builds from RDBMS (e.g. pre-configured Sales Per Region data);
Simple Collections Any collections (e.g., urls, contacts, etc.)
17. Inappropriate Use Cases
Type of Application Explanation
Analytical Processing Applications where data computation is required
on large data
Highly Structured Data Requirements Applications that require constraints and
structures
Data Privacy If data privacy is an issue
Allowing Third-party Extensions Makes it difficult since there is no schema
Data is core-competency When data infrastructure is the core-
competency; when data storage is what gives you
leverage over others