SlideShare uma empresa Scribd logo
1 de 23
Baixar para ler offline
Architectural Anti Patterns
Notes on Data Distribution and Handling Failures
Gleicon Moraes
http://zenmachine.wordpress.com
http://github.com/gleicon
@gleicon
Required Listening: King Crimson - Discipline
Anti Patterns
Evolution from SQL Anti Patterns (NoSQL:br May 2010)
More than just RDBMS
Large volumes of data
Distribution
Architecture
Research on other tools
Message Queues, DHT, Job Schedulers, NoSQL
Indexing, Map/Reduce
New revision since QConSP 2010: included Hierarchical
Sharding, Embedded lists and Distributed Global Locking
RDBMS Anti Patterns
Not all things fit on a relational database, single ou distributed
The eternal table-as-a-tree
Dynamic table creation
Table as cache
Table as queue
Table as log file
Stoned Procedures
Row Alignment
Extreme JOINs
Your scheme must be printed in an A3 sheet.
Your ORM issue full queries for Dataset iterations
Hierarchical Sharding
Embedded lists
Distributed global locking
Doing it wrong, Junior !
The eternal tree
Problem: Most threaded discussion example uses something
like a table which contains all threads and answers, relating to
each other by an id. Usually the developer will come up with
his own binary-tree version to manage this mess.
id - parent_id -author - text
1 - 0 - gleicon - hello world
2 - 1 - elvis - shout !
Alternative: Document storage:
{ thread_id:1, title: 'the meeting', author: 'gleicon', replies:[
{
'author': elvis, text:'shout', replies:[{...}]
}
]
}
Dynamic table creation
Problem: To avoid huge tables, one must come with a
"dynamic schema". For example, lets think about a document
management company, which is adding new facilities over the
country. For each storage facility, a new table is created:
item_id - row - column - stuff
1 - 10 - 20 - cat food
2 - 12 - 32 - trout
Now you have to come up with "dynamic queries", which will
probably query a "central storage" table and issue a huge join
to check if you have enough cat food over the country.
Alternatives:
- Document storage, modeling a facility as a document
- Key/Value, modeling each facility as a SET
Table as cache
Problem: Complex queries demand that a result be stored in a
separated table, so it can be queried quickly. Worst than views
Alternatives:
- Really ?
- Memcached
- Redis + AOF + EXPIRE
- De-normalization
Table as queue
Problem: A table which holds messages to be completed.
Worse, they must be ordered by
time of creation.
Corolary: Job Scheduler table
Alternatives:
- RestMQ, Resque
- Any other message broker
- Redis (LISTS - LPUSH + RPOP)
- Use the right tool
Table as log file
Problem: A table in which data gets written as a log file. From
time to time it needs to be purged. Truncating this table once
a day usually is the first task assigned to new DBAs.
Alternative:
- MongoDB capped collection
- Redis, and RRD pattern
- RIAK
Stoned procedures
Problem: Stored procedures hold most of your applications
logic. Also, some triggers are used to - well - trigger important
data events.
SP and triggers has the magic property of vanishing of our
memories and being impossible to keep versioned.
Alternative:
- Now be careful so you dont use map/reduce as modern
stoned procedures. Unfit for real time search/processing
- Use your preferred language for business stuff, and let event
handling to pub/sub or message queues.
Row Alignment
Problem: Extra rows are created but not used, just in case.
Usually they are named as a1, a2, a3, a4 and called padding.
There's good will behind that, specially when version 1 of the
software needed an extra column in a 150M lines database
and it took 2 days to run an ALTER TABLE. But that's no
excuse.
Alternative:
- Quit being cheap. Quit feeling 'hacker' about padding
- Document based databases as MongoDB and CouchDB, has
no schema. New atributes are local to the document and can
be added easily.
Extreme JOINs
Problem: Business stuff modeled as tables. Table inheritance
(Product -> SubProduct_A). To find the complete data for a
user plan, one must issue gigantic queries with lots of JOINs.
Alternative:
- Document storage, as MongoDB
might help having important
information together.
- De-normalization
- Serialized objects
Your scheme fits in an A3 sheet
Problem: Huge data schemes are difficult to manage. Extreme
specialization creates tables which converges to key/value
model. The normal form get priority over common sense.
Product_A Product_B
id - desc id - desc
Alternatives:
- De-normalization
- Another scheme ?
- Document store for flattening model
- Key/Value
- See 'Extreme JOINs'
Your ORM ...
Problem: Your ORM issue full queries for dataset iterations,
your ORM maps and creates tables which mimics your
classes, even the inheritance, and the performance is bad
because the queries are huge, etc, etc
Alternative:
- Apart from denormalization and good old common sense,
ORMs are trying to bridge two things with distinct impedance.
- There is nothing to relational models which maps cleanly to
classes and objects. Not even the basic unit which is the
domain(set) of each column. Black Magic ?
Hierarchical Sharding
Problem: Genius at work. Distinct databases inside a RDBMS
ranging from A to Z, each database has tables for users
starting with the proper letter. Each table has user data.
Fictional example: e-mail accounts management
> show databases;
a b c d e f g h i j k l m n o p q r s t u w x z
> use a
> show tables;
...alberto alice alma ... (and a lot more)
There is no way to query anything in common for all users with
out application side processing. In this particular case this
sharding was uncalled for as relational databases have all
tools to deal with this particular case of 'different clients and
data'
Embedded Lists
Problem: As data complexity grows, one thinks that it's proper
for the application to handle different data structures
embedded in a single cell or row. The popular 'Let's use
commas to separe it'. Another name could be Stored CSV
> select group_field from that_email_sharded_database.user
"a@email1.net, b@email1.net,c@email2.net"
This is a bit complementar of 'Your scheme fits in a A3 sheet'.
This kind of optimization, while harmless at first sight, spreads
without control. Querying such fields yields unwanted results.
You might only use languages where string splitting is not
expensive or remember to compile RegExps before querying a
large data volume. Either learn to model your data, or resort to
K/V stores as Redis.
Distributed Global Locking
Problem: Someone learns java and synchronize. A bit later
genius thinks that a distributed synchronize would be
awesome. The proper place to do that would be the database
of course. Start with a reference counter in a table and end up
with this:
> select COALESCE(GET_LOCK('my_lock',0 ),0 )
Plain and simple, you might find it embedded in a magic class
called DistributedSynchronize or ClusterSemaphore. Locks,
transactions and reference counters (which may act as soft
locks) doesn't belongs to the database. While its they use is
questionable even in code, the matter of fact is that you are
doing it wrong, if you are doing like that.
No silver bullet
- Think about data
handling and your
system architecture
- Think outside the norm
- De-normalize
- Simplify
- Know stuff (Message
queues, NoSQL, DHT)
Cycle of changes - Product A
1. There was the database model
2. Then, the cache was needed. Performance was no good.
3. Cache key: query, value: resultset
4. High or inexistent expiration time [w00t]
(Now there's a turning point. Data didn't need to change often.
Denormalization was a given with cache)
5. The cache needs to be warmed or the app wont work.
6. Key/Value storage was a natural choice. No data on MySQL
anymore.
Cycle of changes - Product B
1. Postgres DB storing crawler results.
2. There was a counter in each row, and updating this counter
caused contention errors.
3. Memcache for reads. Performance is better.
4. First MongoDB test, no more deadlocks from counter
update.
5. Data model was simplified, the entire crawled doc was
stored.
Stuff to think about
Think if the data you use aren't de-normalized somewhere
(cached)
Most of the anti-patterns signals that there are architectural
issues instead of only database issues.
The NoSQL route (or at least a partial NoSQL route) may
simplify it.
Are you dependent on cache ? Does your application fails
when there is no cache ? Does it just slows down ?
Think about the way to put and to get back your data from the
database (be it SQL or NoSQL).
Thanks
http://scienceblogs.com/evolgen/upload/2007/04/rube_back_scratch.gif
http://en.wikipedia.org/wiki/Rube_Goldberg_machine

Mais conteúdo relacionado

Mais procurados

Datastage free tutorial
Datastage free tutorialDatastage free tutorial
Datastage free tutorialtekslate1
 
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsChapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsnehabsairam
 
ODI11g, Hadoop and "Big Data" Sources
ODI11g, Hadoop and "Big Data" SourcesODI11g, Hadoop and "Big Data" Sources
ODI11g, Hadoop and "Big Data" SourcesMark Rittman
 
From Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLFrom Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLCloudera, Inc.
 
Datastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsDatastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsshanker_uma
 
The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designCalpont
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented DatabaseSuvradeep Rudra
 
Building High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic ApplicationsBuilding High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic ApplicationsCalpont
 
Chapter 5 design of keyvalue databses from nosql for mere mortals
Chapter 5 design of keyvalue databses from nosql for mere mortalsChapter 5 design of keyvalue databses from nosql for mere mortals
Chapter 5 design of keyvalue databses from nosql for mere mortalsnehabsairam
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databasesFabio Fumarola
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
Chapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortalsChapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortalsnehabsairam
 
The immutable database datomic
The immutable database   datomicThe immutable database   datomic
The immutable database datomicLaurence Chen
 
Big table presentation-final
Big table presentation-finalBig table presentation-final
Big table presentation-finalYunming Zhang
 

Mais procurados (20)

Datastage free tutorial
Datastage free tutorialDatastage free tutorial
Datastage free tutorial
 
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortalsChapter 4 terminolgy of keyvalue databses from nosql for mere mortals
Chapter 4 terminolgy of keyvalue databses from nosql for mere mortals
 
ODI11g, Hadoop and "Big Data" Sources
ODI11g, Hadoop and "Big Data" SourcesODI11g, Hadoop and "Big Data" Sources
ODI11g, Hadoop and "Big Data" Sources
 
From Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETLFrom Raw Data to Analytics with No ETL
From Raw Data to Analytics with No ETL
 
Datastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobsDatastage parallell jobs vs datastage server jobs
Datastage parallell jobs vs datastage server jobs
 
The thinking persons guide to data warehouse design
The thinking persons guide to data warehouse designThe thinking persons guide to data warehouse design
The thinking persons guide to data warehouse design
 
Hdfs Dhruba
Hdfs DhrubaHdfs Dhruba
Hdfs Dhruba
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Rise of Column Oriented Database
Rise of Column Oriented DatabaseRise of Column Oriented Database
Rise of Column Oriented Database
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Dwh faqs
Dwh faqsDwh faqs
Dwh faqs
 
Migration from 8.1 to 11.3
Migration from 8.1 to 11.3Migration from 8.1 to 11.3
Migration from 8.1 to 11.3
 
Building High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic ApplicationsBuilding High Performance MySQL Query Systems and Analytic Applications
Building High Performance MySQL Query Systems and Analytic Applications
 
Chapter 5 design of keyvalue databses from nosql for mere mortals
Chapter 5 design of keyvalue databses from nosql for mere mortalsChapter 5 design of keyvalue databses from nosql for mere mortals
Chapter 5 design of keyvalue databses from nosql for mere mortals
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Chapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortalsChapter 7(documnet databse termininology) no sql for mere mortals
Chapter 7(documnet databse termininology) no sql for mere mortals
 
The immutable database datomic
The immutable database   datomicThe immutable database   datomic
The immutable database datomic
 
Sql Server Basics
Sql Server BasicsSql Server Basics
Sql Server Basics
 
Big table presentation-final
Big table presentation-finalBig table presentation-final
Big table presentation-final
 

Semelhante a Architectural Anti Patterns and Data Distribution Failures

Architecture by Accident
Architecture by AccidentArchitecture by Accident
Architecture by AccidentGleicon Moraes
 
Architectural Anti Patterns - Notes on Data Distribution and Handling Failures
Architectural Anti Patterns - Notes on Data Distribution and Handling FailuresArchitectural Anti Patterns - Notes on Data Distribution and Handling Failures
Architectural Anti Patterns - Notes on Data Distribution and Handling FailuresGleicon Moraes
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesJon Meredith
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGuang Xu
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational DatabasesUdi Bauman
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLHyderabad Scalability Meetup
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151xlight
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Applicationsupertom
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!Andraz Tori
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupesh Bansal
 
Comparing sql and nosql dbs
Comparing sql and nosql dbsComparing sql and nosql dbs
Comparing sql and nosql dbsVasilios Kuznos
 
SPL_ALL_EN.pptx
SPL_ALL_EN.pptxSPL_ALL_EN.pptx
SPL_ALL_EN.pptx政宏 张
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLArseny Chernov
 
MPTStore: A Fast, Scalable, and Stable Resource Index
MPTStore: A Fast, Scalable, and Stable Resource IndexMPTStore: A Fast, Scalable, and Stable Resource Index
MPTStore: A Fast, Scalable, and Stable Resource IndexChris Wilper
 
NoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to CouchbaseNoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to CouchbaseMohammad Shaker
 

Semelhante a Architectural Anti Patterns and Data Distribution Failures (20)

Architecture by Accident
Architecture by AccidentArchitecture by Accident
Architecture by Accident
 
Architectural Anti Patterns - Notes on Data Distribution and Handling Failures
Architectural Anti Patterns - Notes on Data Distribution and Handling FailuresArchitectural Anti Patterns - Notes on Data Distribution and Handling Failures
Architectural Anti Patterns - Notes on Data Distribution and Handling Failures
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
Understanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQLUnderstanding and building big data Architectures - NoSQL
Understanding and building big data Architectures - NoSQL
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
 
SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!SQL or NoSQL, that is the question!
SQL or NoSQL, that is the question!
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
 
Comparing sql and nosql dbs
Comparing sql and nosql dbsComparing sql and nosql dbs
Comparing sql and nosql dbs
 
SPL_ALL_EN.pptx
SPL_ALL_EN.pptxSPL_ALL_EN.pptx
SPL_ALL_EN.pptx
 
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQLCompressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
Compressed Introduction to Hadoop, SQL-on-Hadoop and NoSQL
 
No sql
No sqlNo sql
No sql
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
disertation
disertationdisertation
disertation
 
Mongo db
Mongo dbMongo db
Mongo db
 
MPTStore: A Fast, Scalable, and Stable Resource Index
MPTStore: A Fast, Scalable, and Stable Resource IndexMPTStore: A Fast, Scalable, and Stable Resource Index
MPTStore: A Fast, Scalable, and Stable Resource Index
 
NoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to CouchbaseNoSQL - A Closer Look to Couchbase
NoSQL - A Closer Look to Couchbase
 

Mais de Gleicon Moraes

Como arquiteturas de dados quebram
Como arquiteturas de dados quebramComo arquiteturas de dados quebram
Como arquiteturas de dados quebramGleicon Moraes
 
Arquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devopsArquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devopsGleicon Moraes
 
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...
DNAD 2015  - Como a arquitetura emergente de sua aplicação pode jogar contra ...DNAD 2015  - Como a arquitetura emergente de sua aplicação pode jogar contra ...
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...Gleicon Moraes
 
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...Gleicon Moraes
 
Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014Gleicon Moraes
 
A closer look to locaweb IaaS
A closer look to locaweb IaaSA closer look to locaweb IaaS
A closer look to locaweb IaaSGleicon Moraes
 
Semi Automatic Sentiment Analysis
Semi Automatic Sentiment AnalysisSemi Automatic Sentiment Analysis
Semi Automatic Sentiment AnalysisGleicon Moraes
 
L'esprit de l'escalier
L'esprit de l'escalierL'esprit de l'escalier
L'esprit de l'escalierGleicon Moraes
 
OSCon - Performance vs Scalability
OSCon - Performance vs ScalabilityOSCon - Performance vs Scalability
OSCon - Performance vs ScalabilityGleicon Moraes
 
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)Gleicon Moraes
 
RestMQ - HTTP/Redis based Message Queue
RestMQ - HTTP/Redis based Message QueueRestMQ - HTTP/Redis based Message Queue
RestMQ - HTTP/Redis based Message QueueGleicon Moraes
 
NoSQL and SQL Anti Patterns
NoSQL and SQL Anti PatternsNoSQL and SQL Anti Patterns
NoSQL and SQL Anti PatternsGleicon Moraes
 

Mais de Gleicon Moraes (16)

Como arquiteturas de dados quebram
Como arquiteturas de dados quebramComo arquiteturas de dados quebram
Como arquiteturas de dados quebram
 
Arquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devopsArquitetura emergente - sobre cultura devops
Arquitetura emergente - sobre cultura devops
 
API Gateway report
API Gateway reportAPI Gateway report
API Gateway report
 
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...
DNAD 2015  - Como a arquitetura emergente de sua aplicação pode jogar contra ...DNAD 2015  - Como a arquitetura emergente de sua aplicação pode jogar contra ...
DNAD 2015 - Como a arquitetura emergente de sua aplicação pode jogar contra ...
 
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...
QCon SP 2015 - Advogados do diabo: como a arquitetura emergente de sua aplica...
 
Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014Por trás da infraestrutura do Cloud - Campus Party 2014
Por trás da infraestrutura do Cloud - Campus Party 2014
 
Locaweb cloud and sdn
Locaweb cloud and sdnLocaweb cloud and sdn
Locaweb cloud and sdn
 
A closer look to locaweb IaaS
A closer look to locaweb IaaSA closer look to locaweb IaaS
A closer look to locaweb IaaS
 
Semi Automatic Sentiment Analysis
Semi Automatic Sentiment AnalysisSemi Automatic Sentiment Analysis
Semi Automatic Sentiment Analysis
 
L'esprit de l'escalier
L'esprit de l'escalierL'esprit de l'escalier
L'esprit de l'escalier
 
OSCon - Performance vs Scalability
OSCon - Performance vs ScalabilityOSCon - Performance vs Scalability
OSCon - Performance vs Scalability
 
Patterns of fail
Patterns of failPatterns of fail
Patterns of fail
 
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
Dlsecyx pgroammr (Dyslexic Programmer - cool stuff for scaling)
 
RestMQ - HTTP/Redis based Message Queue
RestMQ - HTTP/Redis based Message QueueRestMQ - HTTP/Redis based Message Queue
RestMQ - HTTP/Redis based Message Queue
 
NoSQL and SQL Anti Patterns
NoSQL and SQL Anti PatternsNoSQL and SQL Anti Patterns
NoSQL and SQL Anti Patterns
 
Redis
RedisRedis
Redis
 

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Architectural Anti Patterns and Data Distribution Failures

  • 1. Architectural Anti Patterns Notes on Data Distribution and Handling Failures Gleicon Moraes http://zenmachine.wordpress.com http://github.com/gleicon @gleicon
  • 2. Required Listening: King Crimson - Discipline
  • 3. Anti Patterns Evolution from SQL Anti Patterns (NoSQL:br May 2010) More than just RDBMS Large volumes of data Distribution Architecture Research on other tools Message Queues, DHT, Job Schedulers, NoSQL Indexing, Map/Reduce New revision since QConSP 2010: included Hierarchical Sharding, Embedded lists and Distributed Global Locking
  • 4. RDBMS Anti Patterns Not all things fit on a relational database, single ou distributed The eternal table-as-a-tree Dynamic table creation Table as cache Table as queue Table as log file Stoned Procedures Row Alignment Extreme JOINs Your scheme must be printed in an A3 sheet. Your ORM issue full queries for Dataset iterations Hierarchical Sharding Embedded lists Distributed global locking
  • 5. Doing it wrong, Junior !
  • 6. The eternal tree Problem: Most threaded discussion example uses something like a table which contains all threads and answers, relating to each other by an id. Usually the developer will come up with his own binary-tree version to manage this mess. id - parent_id -author - text 1 - 0 - gleicon - hello world 2 - 1 - elvis - shout ! Alternative: Document storage: { thread_id:1, title: 'the meeting', author: 'gleicon', replies:[ { 'author': elvis, text:'shout', replies:[{...}] } ] }
  • 7. Dynamic table creation Problem: To avoid huge tables, one must come with a "dynamic schema". For example, lets think about a document management company, which is adding new facilities over the country. For each storage facility, a new table is created: item_id - row - column - stuff 1 - 10 - 20 - cat food 2 - 12 - 32 - trout Now you have to come up with "dynamic queries", which will probably query a "central storage" table and issue a huge join to check if you have enough cat food over the country. Alternatives: - Document storage, modeling a facility as a document - Key/Value, modeling each facility as a SET
  • 8. Table as cache Problem: Complex queries demand that a result be stored in a separated table, so it can be queried quickly. Worst than views Alternatives: - Really ? - Memcached - Redis + AOF + EXPIRE - De-normalization
  • 9. Table as queue Problem: A table which holds messages to be completed. Worse, they must be ordered by time of creation. Corolary: Job Scheduler table Alternatives: - RestMQ, Resque - Any other message broker - Redis (LISTS - LPUSH + RPOP) - Use the right tool
  • 10. Table as log file Problem: A table in which data gets written as a log file. From time to time it needs to be purged. Truncating this table once a day usually is the first task assigned to new DBAs. Alternative: - MongoDB capped collection - Redis, and RRD pattern - RIAK
  • 11. Stoned procedures Problem: Stored procedures hold most of your applications logic. Also, some triggers are used to - well - trigger important data events. SP and triggers has the magic property of vanishing of our memories and being impossible to keep versioned. Alternative: - Now be careful so you dont use map/reduce as modern stoned procedures. Unfit for real time search/processing - Use your preferred language for business stuff, and let event handling to pub/sub or message queues.
  • 12. Row Alignment Problem: Extra rows are created but not used, just in case. Usually they are named as a1, a2, a3, a4 and called padding. There's good will behind that, specially when version 1 of the software needed an extra column in a 150M lines database and it took 2 days to run an ALTER TABLE. But that's no excuse. Alternative: - Quit being cheap. Quit feeling 'hacker' about padding - Document based databases as MongoDB and CouchDB, has no schema. New atributes are local to the document and can be added easily.
  • 13. Extreme JOINs Problem: Business stuff modeled as tables. Table inheritance (Product -> SubProduct_A). To find the complete data for a user plan, one must issue gigantic queries with lots of JOINs. Alternative: - Document storage, as MongoDB might help having important information together. - De-normalization - Serialized objects
  • 14. Your scheme fits in an A3 sheet Problem: Huge data schemes are difficult to manage. Extreme specialization creates tables which converges to key/value model. The normal form get priority over common sense. Product_A Product_B id - desc id - desc Alternatives: - De-normalization - Another scheme ? - Document store for flattening model - Key/Value - See 'Extreme JOINs'
  • 15. Your ORM ... Problem: Your ORM issue full queries for dataset iterations, your ORM maps and creates tables which mimics your classes, even the inheritance, and the performance is bad because the queries are huge, etc, etc Alternative: - Apart from denormalization and good old common sense, ORMs are trying to bridge two things with distinct impedance. - There is nothing to relational models which maps cleanly to classes and objects. Not even the basic unit which is the domain(set) of each column. Black Magic ?
  • 16. Hierarchical Sharding Problem: Genius at work. Distinct databases inside a RDBMS ranging from A to Z, each database has tables for users starting with the proper letter. Each table has user data. Fictional example: e-mail accounts management > show databases; a b c d e f g h i j k l m n o p q r s t u w x z > use a > show tables; ...alberto alice alma ... (and a lot more) There is no way to query anything in common for all users with out application side processing. In this particular case this sharding was uncalled for as relational databases have all tools to deal with this particular case of 'different clients and data'
  • 17. Embedded Lists Problem: As data complexity grows, one thinks that it's proper for the application to handle different data structures embedded in a single cell or row. The popular 'Let's use commas to separe it'. Another name could be Stored CSV > select group_field from that_email_sharded_database.user "a@email1.net, b@email1.net,c@email2.net" This is a bit complementar of 'Your scheme fits in a A3 sheet'. This kind of optimization, while harmless at first sight, spreads without control. Querying such fields yields unwanted results. You might only use languages where string splitting is not expensive or remember to compile RegExps before querying a large data volume. Either learn to model your data, or resort to K/V stores as Redis.
  • 18. Distributed Global Locking Problem: Someone learns java and synchronize. A bit later genius thinks that a distributed synchronize would be awesome. The proper place to do that would be the database of course. Start with a reference counter in a table and end up with this: > select COALESCE(GET_LOCK('my_lock',0 ),0 ) Plain and simple, you might find it embedded in a magic class called DistributedSynchronize or ClusterSemaphore. Locks, transactions and reference counters (which may act as soft locks) doesn't belongs to the database. While its they use is questionable even in code, the matter of fact is that you are doing it wrong, if you are doing like that.
  • 19. No silver bullet - Think about data handling and your system architecture - Think outside the norm - De-normalize - Simplify - Know stuff (Message queues, NoSQL, DHT)
  • 20. Cycle of changes - Product A 1. There was the database model 2. Then, the cache was needed. Performance was no good. 3. Cache key: query, value: resultset 4. High or inexistent expiration time [w00t] (Now there's a turning point. Data didn't need to change often. Denormalization was a given with cache) 5. The cache needs to be warmed or the app wont work. 6. Key/Value storage was a natural choice. No data on MySQL anymore.
  • 21. Cycle of changes - Product B 1. Postgres DB storing crawler results. 2. There was a counter in each row, and updating this counter caused contention errors. 3. Memcache for reads. Performance is better. 4. First MongoDB test, no more deadlocks from counter update. 5. Data model was simplified, the entire crawled doc was stored.
  • 22. Stuff to think about Think if the data you use aren't de-normalized somewhere (cached) Most of the anti-patterns signals that there are architectural issues instead of only database issues. The NoSQL route (or at least a partial NoSQL route) may simplify it. Are you dependent on cache ? Does your application fails when there is no cache ? Does it just slows down ? Think about the way to put and to get back your data from the database (be it SQL or NoSQL).