SlideShare uma empresa Scribd logo
1 de 106
WHERETO PUT DATA
– or –
What are we going to do with all this stuff?
About The Speaker
Application Developer/Architect – 21 years
Web Developer – 15 years
Web Operations – 7 years
Relational
Shard
Replicate
Graph
Document
Key-value
Cache
Distributed
ACID
BASE
Cluster
Schema
Schemaless
Neo4J
Redis
Riak
Cassandra
Voldemort
NetStorage
S3
CloudFront
MongoDB
CouchDB
Memcached
Coherence
GridGain
BigMemory
HDFS
HBase
eXist
Xindice
BDB
BDB
Scalability Consistency
Relational Not Only SQL
Flexible Rigid
App Web Media Web API Gateway
CMSApp layer
Cache farm Search farm
Indexer
Search
Admin
Jobbers
Caching
Proxy/LB
Caching
Proxy/LB
CDN
Egress
ISP
Freshness
Response Time
Consistency
Resilience
Total Cost
Sustainability
Agility
BACK INTHE 90’S
BACK INTHE 90’S
BACK INTHE 90’S
OS 2200
Hierarchical ("Network") Database
OS 2200
Hierarchical ("Network") Database
Relational Mapper
OS 2200
Hierarchical ("Network") Database
Relational Mapper
POSIX.1 Virtual Machine
OS 2200
Hierarchical ("Network") Database
Relational Mapper
POSIX.1 Virtual Machine
COBOL Compiler
ANSI SQL Library
Given enough time, and
perversity, you can create any
query model on top of any
storage model.
SAY WHEN
The Importance of ResponseTime Distribution
FUNDAMENTAL PREMISE
Black Box
THERE ARETHINGSYOU
CANNOT KNOW
Will a response arrive?
When?
Was it stored or computed?
Is it still true?
ASYMMETRY OFTIME
Send
Request
Send
Request
100 ms 200 ms
ASYMMETRY OFTIME
Send
Request
100 ms 200 ms 300 ms 400 ms 500 ms 600 ms
ASYMMETRY OFTIME
Send
Request
100 ms 200 ms 300 ms 400 ms 500 ms 600 ms
Time
Out
ASYMMETRY OFTIME
9000 100 200 300 400 500 600 700 800
1
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Time Elapsed (ms)
ConfidenceofResponsebeforeTimeout
Send
Request
100 ms 200 ms 300 ms 400 ms 500 ms 600 ms
Time
Out
Response
Arrives
ASYMMETRY OFTIME
To the observer, there is no
difference between “too slow” and
“not there”.
0%
6.25%
12.50%
18.75%
25.00%
0 100 200 300 400 500 600 700 800 900 1000
Response Time Histogram%ofResponses
Response Time (ms)
0%
6.25%
12.50%
18.75%
25.00%
0 100 200 300 400 500 600 700 800 900 1000
Response Time Histogram%ofResponses
Response Time (ms)
0%
6.25%
12.50%
18.75%
25.00%
0 100 200 300 400 500 600 700 800 900 1000
Response Time Histogram%ofResponses
Response Time (ms)
0%
25.00%
50.00%
75.00%
100.00%
0 100 200 300 400 500 600 700 800 900 ∞
Response Time Histogram%ofResponses
Response Time (ms)
This is a talk about data and
data storage.
scalability?
What about
Why am I talking so much about
observers and response time?
Why do we worry about
scalability?
0%
25.00%
50.00%
75.00%
100.00%
0 100 200 300 400 500 600 700 800 900 1000
Response Time Histogram%ofResponses
Response Time (ms)
Empty Box Time
Response time of a
single request on an
unloaded system.
Scalability is a means, not an end.
What we need is
fast response time,
under all loads.
App Web Media Web API Gateway
CMSApp layer
Cache farm Search farm
Indexer
Search
Admin
Jobbers
Caching
Proxy/LB
Caching
Proxy/LB
CDN
Egress
HOWTRUE?
App Web Media Web API Gateway
CMSApp layer
Cache farm Search farm
Indexer
Search
Admin
Jobbers
Caching
Proxy/LB
Caching
Proxy/LB
CDN
Egress
App Web Media Web API Gateway
CMS
App layer
Cache farm Search farm
Indexer
Search
Admin
Jobbers
Caching
Proxy/LB
Caching
Proxy/LB
CDN
Egress
ISP
You cannot be sure the data is unchanged since your
observation, except by making another observation.
P(unch) = F(dC/dt, dt)
UNCERTAINTY PRINCIPLE
THE ROLE OF SURPRISE
Unlikely answers are often more interesting.
SAYS WHO?
Thoughts on Consistency
OBSERVABILITY
OBSERVABILITY
Steve
Brian
OBSERVABILITY
Left Right
OBSERVABILITY
Steve
Brian
# Steve Brian
1 R R
2 L R
3 R L
STATE SPACE
X1 = {L, R}
X2 = {L, R}
SUPER-OBSERVER
Has a view which dominates the views of all other observers.
SUPER-OBSERVER
There are no one-to-many mappings from the super-
observer’s states to any other observer’s states.
SUPER-OBSERVER
A super-observer is maximally present if it can discriminate
among the Cartesian product of all other observations.
Observer Set of States
Steve {L, R}
Brian {L, R}
Super-Observer {L → B, R → F} × {L, R}
OBSERVABILITY
# Steve Brian Super-Observer
1 R R {F, R}
2 L R {B, R}
3 R L {F, L}
PORKY PIG’S
WINDOW SHADE
If Porky Pig is looking at the window shade,
he always observes it to be down.
If he is looking away from the window shade,
it rolls up.
FIRST DIMENSION
X1 = {looking, not looking}
SECOND DIMENSION
X1 = {looking, not looking}
X2 = {shade open, shade closed}
FORBIDDEN STATES
X1 = {looking, not looking}
X2 = {shade open, shade closed}
STATE SPACE
Cartesian product of all possible sets of states.
Example
1,000,000 bytes of RAM
8 bits per byte
2 states per bit
8,000,000 dimensions with 2 values each
or
1,000,000 dimensions with 256 values each
STATE SPACE
10,000,000 rows in a table
20 columns
Whole database is a single point in a 200,000,000
dimensional space.
Changes to data are transforms of that point.
State over time is the trajectory of that point.
CONSISTENCY
Not every point in state space is allowed.
BLACK BOX HYPOTHESIS
BLACK BOX HYPOTHESIS
External observers can only ever ask for projections of the
state space, at defined points in time.
BLACK BOX HYPOTHESIS
State space trajectories may cross into forbidden states, as
long as those are not revealed to observers.
PROJECTION
X1 = {looking, not looking}
Is Porky looking at the window shade?
P1
P2
BLACK BOX HYPOTHESIS
Even two clustered machines have their own state spaces.
It’s impossible for either to be a superobserver.
OBSERVED CONSISTENCY
Sufficient to ensure that forbidden states cannot be observed.
DOES A SUPEROBSERVER EXIST?
Only if there is exactly one single-threaded CPU,
in exactly one computer.
CONSEQUENCES
Consistency doesn’t exist in most systems today.
Sometimes we can fake it.
Many times, it doesn’t really matter.
WHAT ABOUT CAP?
Consistency:
“...there must exist a total order on all operations such that
each operation looks as if it were completed at a single instant.”
Seth Gilbert and Nancy Lynch. 2002.
Brewer's conjecture and the feasibility of consistent, available,
partition-tolerant web services.
SIGACT News 33, 2 (June 2002), 51-59. DOI=10.1145/564585.564601
http://doi.acm.org/10.1145/564585.564601
WHAT ABOUT CAP?
Linearizability
Seth Gilbert and Nancy Lynch. 2002.
Brewer's conjecture and the feasibility of consistent, available,
partition-tolerant web services.
SIGACT News 33, 2 (June 2002), 51-59. DOI=10.1145/564585.564601
http://doi.acm.org/10.1145/564585.564601
The data base consists of entities which are
related in certain ways. These relationships are
best thought of as assertions about the data.
Examples of such assertions are:
“Names is an index forTelephone_numbers.”
“The value of Count_of_X gives the number of
employees in department X.”
The data base is said to be consistent if it satisfies
all its assertions. In some cases, the data base must
become temporarily inconsistent in order to
transform it to a new consistent state.
From "Granularity of Locks and Degrees of Consistency in a
Shared Data Base",
J.N. Gray, R.A. Lorie, G.R. Putzolu, I.L.Traiger, 1976
From "Granularity of Locks and Degrees of Consistency in a
Shared Data Base",
J.N. Gray, R.A. Lorie, G.R. Putzolu, I.L.Traiger, 1976
Consistency is a predicate C on entities and their
values.The predicate is generally not known to the
system but is embodied in the structure of the
transactions.
From "Transactions and Consistency in Distributed Database Systems",
I.L.Traiger, J.N. Gray, C.A. Galtieri, and B.G. Lindsay, 1982
“C” VERSUS “A”?
Response Time
Consistency
Resilience
See also: http://goo.gl/1Yv3
→ http://dbmsmusings.blogspot.com/2010/04/problems-with-cap-and-yahoos-little.html
WHAT’STHAT?
Data Models and Composability
DATA MODEL DEFINED
The system’s representation of the consistency predicate C.
LATENT MODEL
Implicit in the structure of application code.
EXPLICIT
Visible to storage engine or applications, expressed in
machine-readable form.
HOMOICONIC
Explicit, and available for expressions, computations,
and validation together with statements about the data
itself.
Non-uniform.
CHALLENGES
Non-uniform.
Layered.
CHALLENGES
Non-uniform.
Layered.
Confined.
CHALLENGES
Non-uniform.
Layered.
Confined.
Non-composable.
CHALLENGES
ENFORCING C
Must account for overlapping wavefronts of information.
There is no master clock.
Simultaneity is positional.
Make C explicit in the application, don’t rely on storage engine.
HOW LONG?
On Lifecycles and Lifespans
DOWN WITHTHE IRON FIST
Throw out the DBAs
Throw out the schemas
Unstructured
Semi-structured
DOWN WITHTHE IRON FIST
Put the application in charge.
DOWN WITHTHE IRON FIST
but...
DIFFICULTIES
Application versions
Validating correct behavior
Capturing knowledge about that behavior
UGLYTRUTH
Data routinely outlives applications.
Response Time
Total Cost
Sustainability
Agility
WHERE NOW?
WHERETO PUTYOUR DATA?
Data exists everywhere.
WHERETO PUTYOUR DATA?
Nothing lasts forever.
WHERETO PUTYOUR DATA?
Understand freshness.
WHERETO PUTYOUR DATA?
Engineer a good response time
distribution.
WHERETO PUTYOUR DATA?
Select the consistency model you need.
WHERETO PUTYOUR DATA?
Be agile and adaptable.
WHERETO PUTYOUR DATA?
Make it sustainable.
Michael T. Nygard
michael.nygard@n6consulting.com
@mtnygard
© 2010 N6 Consulting, LLC.All Rights Reserved.

Mais conteúdo relacionado

Destaque

Fault tolerance made easy
Fault tolerance made easyFault tolerance made easy
Fault tolerance made easy
Uwe Friedrichsen
 
[ML15]Class Cat佐々木さん「いち早く人工知能テクノロジーを取り入れた製品・サービスを市場に展開するには?」
[ML15]Class Cat佐々木さん「いち早く人工知能テクノロジーを取り入れた製品・サービスを市場に展開するには?」[ML15]Class Cat佐々木さん「いち早く人工知能テクノロジーを取り入れた製品・サービスを市場に展開するには?」
[ML15]Class Cat佐々木さん「いち早く人工知能テクノロジーを取り入れた製品・サービスを市場に展開するには?」
AINOW
 

Destaque (11)

Resiliency jenna-2013
Resiliency jenna-2013Resiliency jenna-2013
Resiliency jenna-2013
 
AppSphere 15 - Preparing for System Failure: How Pearson used AppDynamics to ...
AppSphere 15 - Preparing for System Failure: How Pearson used AppDynamics to ...AppSphere 15 - Preparing for System Failure: How Pearson used AppDynamics to ...
AppSphere 15 - Preparing for System Failure: How Pearson used AppDynamics to ...
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliency
 
Resilience engineering
Resilience engineeringResilience engineering
Resilience engineering
 
FORUM PA 2015 - Microservices with IBM Bluemix
FORUM PA 2015 - Microservices with IBM BluemixFORUM PA 2015 - Microservices with IBM Bluemix
FORUM PA 2015 - Microservices with IBM Bluemix
 
Fault tolerance made easy
Fault tolerance made easyFault tolerance made easy
Fault tolerance made easy
 
Patterns of resilience
Patterns of resiliencePatterns of resilience
Patterns of resilience
 
Resiliency through failure @ QConNY 2013
Resiliency through failure @ QConNY 2013Resiliency through failure @ QConNY 2013
Resiliency through failure @ QConNY 2013
 
[ML15]Class Cat佐々木さん「いち早く人工知能テクノロジーを取り入れた製品・サービスを市場に展開するには?」
[ML15]Class Cat佐々木さん「いち早く人工知能テクノロジーを取り入れた製品・サービスを市場に展開するには?」[ML15]Class Cat佐々木さん「いち早く人工知能テクノロジーを取り入れた製品・サービスを市場に展開するには?」
[ML15]Class Cat佐々木さん「いち早く人工知能テクノロジーを取り入れた製品・サービスを市場に展開するには?」
 
Resilient Architecture
Resilient ArchitectureResilient Architecture
Resilient Architecture
 
How to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your NicheHow to Become a Thought Leader in Your Niche
How to Become a Thought Leader in Your Niche
 

Semelhante a Where to put_my_data

CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
J Singh
 
HbaseHivePigbyRohitDubey
HbaseHivePigbyRohitDubeyHbaseHivePigbyRohitDubey
HbaseHivePigbyRohitDubey
Rohit Dubey
 

Semelhante a Where to put_my_data (20)

Technical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DBTechnical overview of Azure Cosmos DB
Technical overview of Azure Cosmos DB
 
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
Azure Cosmos DB - NoSQL Strikes Back (An introduction to the dark side of you...
 
Azure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep DiveAzure Cosmos DB - Technical Deep Dive
Azure Cosmos DB - Technical Deep Dive
 
NoSQL Database
NoSQL DatabaseNoSQL Database
NoSQL Database
 
Scalable Data Storage Getting You Down? To The Cloud!
Scalable Data Storage Getting You Down? To The Cloud!Scalable Data Storage Getting You Down? To The Cloud!
Scalable Data Storage Getting You Down? To The Cloud!
 
Scalable Data Storage Getting you Down? To the Cloud!
Scalable Data Storage Getting you Down? To the Cloud!Scalable Data Storage Getting you Down? To the Cloud!
Scalable Data Storage Getting you Down? To the Cloud!
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Modeling data and best practices for the Azure Cosmos DB.
Modeling data and best practices for the Azure Cosmos DB.Modeling data and best practices for the Azure Cosmos DB.
Modeling data and best practices for the Azure Cosmos DB.
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
Everything You Need to Know About NewSQL in 2020
Everything You Need to Know About NewSQL in 2020Everything You Need to Know About NewSQL in 2020
Everything You Need to Know About NewSQL in 2020
 
CockroachDB
CockroachDBCockroachDB
CockroachDB
 
NOSQL
NOSQLNOSQL
NOSQL
 
HbaseHivePigbyRohitDubey
HbaseHivePigbyRohitDubeyHbaseHivePigbyRohitDubey
HbaseHivePigbyRohitDubey
 
ROC for Adaptive Systems
ROC for Adaptive SystemsROC for Adaptive Systems
ROC for Adaptive Systems
 
Hadoop bank
Hadoop bankHadoop bank
Hadoop bank
 
Hashgraph as Code
Hashgraph as CodeHashgraph as Code
Hashgraph as Code
 
MongoDB - General Purpose Database
MongoDB - General Purpose DatabaseMongoDB - General Purpose Database
MongoDB - General Purpose Database
 
JEE on DC/OS
JEE on DC/OSJEE on DC/OS
JEE on DC/OS
 
JEE on DC/OS - MesosCon Europe
JEE on DC/OS - MesosCon EuropeJEE on DC/OS - MesosCon Europe
JEE on DC/OS - MesosCon Europe
 
Cloudifying High Availability: The Case for Elastic Disaster Recovery
Cloudifying High Availability: The Case for Elastic Disaster RecoveryCloudifying High Availability: The Case for Elastic Disaster Recovery
Cloudifying High Availability: The Case for Elastic Disaster Recovery
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 

Where to put_my_data