This document discusses NoSQL databases and their use for analytics. It begins by defining NoSQL as non-relational databases that trade traditional relational features for simplicity and scalability. It then covers different types of NoSQL databases and the trades they make versus relational databases. The document also discusses how NoSQL databases allow for distributed, geographically distributed data and different consistency models. It proposes using NoSQL databases for operational data storage and analytics by leveraging their flexibility, scalability and ability to handle large, sparse datasets through techniques like MapReduce transformations.
2. NoSQL
• Perhaps better is “Non-Relational”
• Departure from conventional relational db
• Trade traditional features for simplicity, scalability,
flexibility
5. NoSQL Distributed
Name Age Phone
Bob 43 555-1212
Jenny 32 555-1213
Sally 28 555-1214
Joe 45 555-1215
Up to
Petabytes
6. Consistency
Name Age Phone
Bob 43 555-1212
Jenny 32 555-1213
Sally 28 555-1214
Joe 45 555-1215
Name Age Phone
Bob 43 555-1212
Jenny 32 867-5309
Sally 28 555-1214
Joe 45 555-1215
Name Age Phone
Bob 43 555-1212
Jenny 32 555-1213
Sally 28 555-1214
Joe 45 555-1215
X
Multiple Data Centers
Single Data Center
14. NoSQL and Analytics
• Importing operational data can create a scale
problem
• Combining operational data can create sparse
data
• Operational schemas may change
19. NoSQL Data Loading Shift
NoSQL Analytics!
!
Composite, Sparse Schemas
Scale out
Aggressive Indexing
Data Discovery
Conventional BI!
!
Data cleaning
Regularization
Denormalization
Star Schema
Known operational Schemas