SlideShare uma empresa Scribd logo
1 de 53
Baixar para ler offline
NoSQL & Architectures
Eberhard Wolff
@ewolff

Eberhard Wolff - @ewolff
About me
Eberhard Wolff
► Freelance consultant
► Head technology advisory board at
adesso
► Speaker
► Author
► 

Blog: http://ewolff.com
► Twitter: @ewolff
► 

Eberhard Wolff - @ewolff
Back in the Days….

Eberhard Wolff - @ewolff
NoSQL Is All About the Persistence
Question

Eberhard Wolff - @ewolff
Key-Value Stores
Key

Maps keys to values
► Just a large globally available Map
► i.e. not very powerful data model
► 

Value

42

Some
data

No complex queries or indices
► Just access by key
► Might add e.g. full text engine
► 

Redis: Cache + Persistence
► Riak: Massive scale
+Solr queries
► 

Eberhard Wolff - @ewolff
Wide Column
Add any "column" you like to a row
► key-(column-value)
► Column families like tables
► E.g. in the "Users" column family
► 

>  "someuser" è ("username"è"someuser"),

XX
XX

XX
XX
XX

XX

XX
XX

XX

XX

XX

XX

XX

XX

XX
XX

XX
XX

XX
XX

XX

XX

XX

XX

XX
XX

XX

XX
XX

XX

XX

XX

XX

Columns named: indexing possible
► So fast queries possible

XX

XX

XX

("email" è"someuser@example.com")
► 

XX

XX

XX

XX

XX

xX

XX

XX

XX
XX

XX

XX

Apache Cassandra
► Amazon SimpleDB
► Apache HBase
► All tuned for large data sets
► 

Eberhard Wolff - @ewolff
Document Stores
Aggregates are typically stored as
"documents“ (key-value collection)
► JSON quite common
► No fixed schema
► Indexes possible
► Queries possible
► 

> E.g. "find all baskets that contain the product 123"

Still great horizontal scalability
► Relations might be modeled as links
► 

MongoDB, CouchDB

► 

Eberhard Wolff - @ewolff
Graph 
Nodes with Properties
► Typed relationships with
properties
► 

Ideal e.g. to model relations in a
social network

► 

Easy to find number of followers,
degree of relation etc.
► Hard to scale out
► 

Neo4j

► 

Eberhard Wolff - @ewolff
NoSQL Benefits
Costs
•  Scale out instead of Scale Up
•  Cheap Hardware
•  Usually Open Source

Dev

Ops
Flexibility
•  Schema in code not in
database
•  Easier to upgrade schema
•  Easier to handle
heterogeneous data

No Object/relational impedance mismatch
•  NoSQL database are more OO like
Eberhard Wolff - @ewolff
Drivers
Exponential Data
Growth

Key Value

Scale Out

Wide Column

Semi Structured
Data

Document

More Connected
Data

Graph

Cost

Flexibility

Eberhard Wolff - @ewolff
Document-oriented Databases are
the best NoSQL database
For at least one definition of “best”

Eberhard Wolff - @ewolff
Document-oriented databases
Offer scale out
> Unless you need huge amounts of data

► 

Offer a rich and flexible data model
> …and queries

► 

Cost
Flexibility

Other databases have other sweet spots
> Huge data sets
> Graph structures
> Analyzing data

► 

Niches or mainstream?

► 

Eberhard Wolff - @ewolff
Financial System
Different financial products

► 

Mapping objects / database

► 

Inheritance

► 

Eberhard Wolff - @ewolff
E/R Model
Zero
Bond

Stock

Option

Investment

> 20 database tables Country
Up to 25 attributes

Currency
Eberhard Wolff - @ewolff
#SRLSY??
Eberhard Wolff - @ewolff
Investment
Type

ID

Price

Country

Country
Currency

Zero
Bond
Interest
Rate

Fixed
Rate
Bond
Interest
Rate

Stock

Option
…

Preferred

Underlying
asset
Eberhard Wolff - @ewolff
Polyglot Persistence in Ecommerce
Application
Needs transactions
& reports. Data fit well in
tables.

Complex document-like
data structures and
complex queries

Financial Data

Product Catalog

RDBMS

Document
Store

High Performance &
Scalability
No complex queries

Based on friends, their
purchases and reviews

Shopping Cart

Recommendation

Key / Value

Graph

Eberhard Wolff - @ewolff
The NoSQL Game
Needs transactions
& reports. Data fit well in
tables.

Complex document-like
data structures and
complex queries

2700
High Score!

Financial Data

Product Catalog

RDBMS

Document
Store

0

1000

High Performance &
Scalability
No complex queries

Based on friends, their
purchases and reviews

Shopping Cart

Recommendation

Key / Value

Graph

900

800

Eberhard Wolff - @ewolff
Just Like the Patterns Game!
Points for each Pattern used
Extra points if one class implements
multiple Pattern

Eberhard Wolff - @ewolff
This is not how

Software Architecture works.

Eberhard Wolff - @ewolff
Why not?
More is worse!
More hardware
More Developer Skills
Not necessarily bad

More Ops Trouble
•  Installation
•  Backup
•  Disaster Recovery
•  Monitoring
•  Optimizations

Eberhard Wolff - @ewolff
But: Polyglot Persistence Has a Point
Object-oriented Databases did it wrong
► Strategy: Replace RDBMS
► Enterprises will stick to RDBMS
► Pure technology migration basically
never happens
► …only vendors think differently
► 

Eberhard Wolff - @ewolff
Archive

Classic approach for
current data

NoSQL for the archive

Current Data

Archive

RDBMS

Document
Store

Eberhard Wolff - @ewolff
Archives for Insurances
Legacy migration
► Querying and visualizing not migrated
data
► i.e. old contracts
► Legacy hard- and software can be
switched off
► Flexibility: Host data formats
► Cost: Inexpensively handling large data
volumes
► 

Eberhard Wolff - @ewolff
Complex Document Processing System

MongoDB
Documentoriented
Documents

Redis
Key/value
in memory
Meta Data for
quick access

elastic
search
Search
engine
Search
index

Eberhard Wolff - @ewolff
Alternative: Only elasticsearch

•  Stores original documents as
well
•  (like a key/value store)
•  Support for complex queries
elastic
•  Very powerful features also for search
data mining / analytics
•  Not well suited for update heavy
operations
•  Backup / disaster recovery?
•  Written in Java
Eberhard Wolff - @ewolff
Scaling elasticsearch

Shard 1

Replica 1

Replica 2

Shard 2
Shard 3

Server

Server

Replica 3
Server
Eberhard Wolff - @ewolff
Alternative: Only MongoDB
•  Now with (limited beta)
fulltext search
•  Excellent support for updates
•  Quite fast – memory mapped
MongoDB
 files
•  Also fast for updates
•  Disaster recovery possible
•  Map/Reduce support
•  Written in C++
Eberhard Wolff - @ewolff
Scaling MongoDB

Replica 1

Replica 1

Replica 2

Replica 2

Replica 3

Replica 3

Shard 1

Shard 2
Eberhard Wolff - @ewolff
Scaling MongoDB

Replica 1

Replica 1

Replica 1

Replica 2

Replica 2

Replica 2

Replica 3

Replica 3

Replica 3

Shard 1

Shard 2

Shard 3
Eberhard Wolff - @ewolff
What about Redis?
•  MongoDB uses memory mapped
files
– Why Redis?
•  Like a Swiss Knife
•  Cache
•  Messaging
•  Central coordination in a
distributed environment
•  Written in C

Redis

Eberhard Wolff - @ewolff
Scaling Redis

Asynchronous replication built in
Replica
Server
Replica
Eberhard Wolff - @ewolff
Alternative: Riak
• 
• 

• 

• 
• 
• 

Key / value store
But includes Solr for fulltext
search
What is the difference to a
document store then?
Map/reduce possible
Written in Erlang
Smart scaling
Eberhard Wolff - @ewolff
Scaling Riak
Server A
Shard3
Shard1

Server B
Shard1
Shard2

Shard4

Shard4

Server D
Shard2
Shard4

Server C
Shard2
Shard3

Shard3

Shard1
Eberhard Wolff - @ewolff
Scaling Riak
Server A
Shard3
Shard1

Server B
Shard1
Shard2

Shard4

Shard4

Server D
Shard2
Shard4

Server C
Shard2
Shard3

Shard3

Shard1
Eberhard Wolff - @ewolff
Scaling Riak
Server A
Shard3
Shard1

Server B
Shard1
Shard2

Shard4

Shard4
New Server

Server D
Shard2
Shard4

Server C
Shard2
Shard3

Shard3

Shard1
Eberhard Wolff - @ewolff
Key/Value!
Document-oriented Databases are
the best NoSQL database
For at least one definition of “best”

Eberhard Wolff - @ewolff
MongoDB
 Redis

riak

elastic
search

Your Choice – a trade off!
Typical architecture
decision

Eberhard Wolff - @ewolff
Data Access: RDBMS
Optimizations

Data Model

•  Indices
•  Tables

spaces
No need to
change code
•  …

•  Schema
•  Stored Procedures

DBA

Data Access
•  Queries
•  Other code

RDBMS

Architect/
Developer
Eberhard Wolff - @ewolff
RDBMS separate data from
data access
Indices

Joins and normalization
allow flexible data access
patterns

Eberhard Wolff - @ewolff
Sacrifice Joins for Scalability
► Join: Combine tables to retrieve results
► Need transactions spanning multiple
tables
► Example: Customer table + addresses
► Inserts need locks and consistency
across both tables
Limits scalability
► Global and distributed locks are nasty
► Consistency limits either availability or
partition tolerance
Eberhard Wolff - @ewolff
► 
CAP Theorem
Consistency

► 
> All nodes see the same data
> Not the ACID Consistency

Availability

► 
> Node failure do not prevent survivors from operating

Partition Tolerance

► 
> System continues to operate despite arbitrary message loss
C

Can at max have two
A
P
► Or rather: If network fail – choose A or C.
► 

Eberhard Wolff - @ewolff
CAP Theorem
Consistency
Quorum

Partition
Tolerance

DNS
Replication

RDBMS
2 Phase
Commit

Availability
Eberhard Wolff - @ewolff
BASE
► Basically Available Soft state
Eventually consistent
► I.e. trade consistency for
availability
Pun concerning ACID…
► Not the same C, however!
► 

Eberhard Wolff - @ewolff
BASE
Eventually consistent
► If no updates are sent for a
while all previous updates will
eventually propagate through
the system
► Then all replicas are consistent
► Can deal with network
partitioning: Message will be
transferred later
► All replicas are always available
► 

Pun concerning ACID…
► Not the same C, however!
► 

Eberhard Wolff - @ewolff
Banking is BASE
ATMs relax rules on providing cash if
network partitioned

► 

Your account is only guaranteed to be
consistent by the end of the year

► 

Eberhard Wolff - @ewolff
No Joins - What now?
► Customer and addresses must be
consistent!
► Solution: Store both as one entity
► Atomic changes easily possible
► Queries might be distributed across
multiple notes
“NoSQL does not support transactions /
ACID” is wrong

► 

> NoSQL does not support Joins is better
> Atomic changes still possible
> Schema design different

Eberhard Wolff - @ewolff
Data Access MongoDB
Optimizations
•  Only basic
indices
Other
optimizations
must be

done in

code

DBA

Data Model
•  Influences access

patterns
Data Access
•  WriteConcerns

how much do
love your data?
•  Shard key
•  Consistency

MongoDB

Architect/
Developer
Eberhard Wolff - @ewolff
Cluster: RDBMS
► 

Transparent to developers

► 

How many nodes?

► 

A special setup of hardware and RDBMS software

DBA
Eberhard Wolff - @ewolff
Cluster: MongoDB
► 

CAP theorem
> If the network is
down choose
> Consistency xor
> Availabilty

► 

Deals with replication
►  MongoDB has
master / slave
replication

Write Concerns:
> Unacknowledged
> Acknowledged
> Journaled
> Some nodes in the
replica set

► 

Queries might go to
master only or also
slaves
►  Influences
consistency
► 

MongoDB

Architect/
Developer
Eberhard Wolff - @ewolff
More Power and more Responsibility
Architect

DB Admin

Eberhard Wolff - @ewolff
Architects
Architecture has always been a multidimensional problem
► 

► 

Need to choose persistence technology

► 

Need to think about operations

► 

Needs to do DBA work

Eberhard Wolff - @ewolff
NoSQL Is All About the Persistence
Question

Eberhard Wolff - @ewolff

Mais conteúdo relacionado

Mais procurados

Exceptions are the Norm: Dealing with Bad Actors in ETL
Exceptions are the Norm: Dealing with Bad Actors in ETLExceptions are the Norm: Dealing with Bad Actors in ETL
Exceptions are the Norm: Dealing with Bad Actors in ETLDatabricks
 
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetFast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetOwen O'Malley
 
Fine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionFine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionOwen O'Malley
 
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013Owen O'Malley
 
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and ParquetFast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and ParquetOwen O'Malley
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solrguest432cd6
 
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3Dongjoon Hyun
 
How to Design a Good Database
How to Design a Good DatabaseHow to Design a Good Database
How to Design a Good DatabaseNur Hidayat
 
Feeding automated test by Joe Beale
Feeding automated test by Joe BealeFeeding automated test by Joe Beale
Feeding automated test by Joe BealeQA or the Highway
 
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsSolr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsLucidworks
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5israelekpo
 
Things I wish I'd known - AtoM tips, tricks, and gotchas
Things I wish I'd known - AtoM tips, tricks, and gotchasThings I wish I'd known - AtoM tips, tricks, and gotchas
Things I wish I'd known - AtoM tips, tricks, and gotchasArtefactual Systems - AtoM
 
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3DataWorks Summit
 
SFDC Introduction to Apex
SFDC Introduction to ApexSFDC Introduction to Apex
SFDC Introduction to ApexSujit Kumar
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big featuresDavid Smiley
 

Mais procurados (19)

CSV import in AtoM
CSV import in AtoMCSV import in AtoM
CSV import in AtoM
 
Exceptions are the Norm: Dealing with Bad Actors in ETL
Exceptions are the Norm: Dealing with Bad Actors in ETLExceptions are the Norm: Dealing with Bad Actors in ETL
Exceptions are the Norm: Dealing with Bad Actors in ETL
 
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and ParquetFast Access to Your Data - Avro, JSON, ORC, and Parquet
Fast Access to Your Data - Avro, JSON, ORC, and Parquet
 
Fine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column EncryptionFine Grain Access Control for Big Data: ORC Column Encryption
Fine Grain Access Control for Big Data: ORC Column Encryption
 
ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013ORC File and Vectorization - Hadoop Summit 2013
ORC File and Vectorization - Hadoop Summit 2013
 
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and ParquetFast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
Fast Spark Access To Your Complex Data - Avro, JSON, ORC, and Parquet
 
State-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache SolrState-of-the-Art Drupal Search with Apache Solr
State-of-the-Art Drupal Search with Apache Solr
 
Update on HDF5 1.8
Update on HDF5 1.8Update on HDF5 1.8
Update on HDF5 1.8
 
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3
 
OpenRefine Tutorial
OpenRefine TutorialOpenRefine Tutorial
OpenRefine Tutorial
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
How to Design a Good Database
How to Design a Good DatabaseHow to Design a Good Database
How to Design a Good Database
 
Feeding automated test by Joe Beale
Feeding automated test by Joe BealeFeeding automated test by Joe Beale
Feeding automated test by Joe Beale
 
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabsSolr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
Solr Distributed Indexing in WalmartLabs: Presented by Shengua Wan, WalmartLabs
 
Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5Building Intelligent Search Applications with Apache Solr and PHP5
Building Intelligent Search Applications with Apache Solr and PHP5
 
Things I wish I'd known - AtoM tips, tricks, and gotchas
Things I wish I'd known - AtoM tips, tricks, and gotchasThings I wish I'd known - AtoM tips, tricks, and gotchas
Things I wish I'd known - AtoM tips, tricks, and gotchas
 
ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3ORC improvement in Apache Spark 2.3
ORC improvement in Apache Spark 2.3
 
SFDC Introduction to Apex
SFDC Introduction to ApexSFDC Introduction to Apex
SFDC Introduction to Apex
 
Solr: 4 big features
Solr: 4 big featuresSolr: 4 big features
Solr: 4 big features
 

Semelhante a NoSQL & Architectures Explained for Scaling Applications

NoSQL Riak MongoDB Elasticsearch - All The Same?
NoSQL Riak MongoDB Elasticsearch - All The Same?NoSQL Riak MongoDB Elasticsearch - All The Same?
NoSQL Riak MongoDB Elasticsearch - All The Same?Eberhard Wolff
 
Micro Services - Small is Beautiful
Micro Services - Small is BeautifulMicro Services - Small is Beautiful
Micro Services - Small is BeautifulEberhard Wolff
 
Micro Service – The New Architecture Paradigm
Micro Service – The New Architecture ParadigmMicro Service – The New Architecture Paradigm
Micro Service – The New Architecture ParadigmEberhard Wolff
 
Software Architecture for DevOps and Continuous Delivery
Software Architecture for DevOps and Continuous DeliverySoftware Architecture for DevOps and Continuous Delivery
Software Architecture for DevOps and Continuous DeliveryEberhard Wolff
 
Microservice - All is Small, All is Well?
Microservice - All is Small, All is Well?Microservice - All is Small, All is Well?
Microservice - All is Small, All is Well?Eberhard Wolff
 
Java Architectures - a New Hope
Java Architectures - a New HopeJava Architectures - a New Hope
Java Architectures - a New HopeEberhard Wolff
 
High Availability and Scalability: Too Expensive! Architectures for Future E...
High Availability and Scalability: Too Expensive! Architectures for Future E...High Availability and Scalability: Too Expensive! Architectures for Future E...
High Availability and Scalability: Too Expensive! Architectures for Future E...Eberhard Wolff
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage CCG
 
Microservice With Spring Boot and Spring Cloud
Microservice With Spring Boot and Spring CloudMicroservice With Spring Boot and Spring Cloud
Microservice With Spring Boot and Spring CloudEberhard Wolff
 
Redis - The Universal NoSQL Tool
Redis - The Universal NoSQL ToolRedis - The Universal NoSQL Tool
Redis - The Universal NoSQL ToolEberhard Wolff
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impalamarkgrover
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Michael Rys
 
Java Application Servers Are Dead!
Java Application Servers Are Dead!Java Application Servers Are Dead!
Java Application Servers Are Dead!Eberhard Wolff
 
Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit
 
NoSQL: An Analysis
NoSQL: An AnalysisNoSQL: An Analysis
NoSQL: An AnalysisAndrew Brust
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL DatabasesRajith Pemabandu
 
Apache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketingApache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketingearnwithme2522
 

Semelhante a NoSQL & Architectures Explained for Scaling Applications (20)

NoSQL Riak MongoDB Elasticsearch - All The Same?
NoSQL Riak MongoDB Elasticsearch - All The Same?NoSQL Riak MongoDB Elasticsearch - All The Same?
NoSQL Riak MongoDB Elasticsearch - All The Same?
 
Micro Services - Small is Beautiful
Micro Services - Small is BeautifulMicro Services - Small is Beautiful
Micro Services - Small is Beautiful
 
Micro Service – The New Architecture Paradigm
Micro Service – The New Architecture ParadigmMicro Service – The New Architecture Paradigm
Micro Service – The New Architecture Paradigm
 
Software Architecture for DevOps and Continuous Delivery
Software Architecture for DevOps and Continuous DeliverySoftware Architecture for DevOps and Continuous Delivery
Software Architecture for DevOps and Continuous Delivery
 
Microservice - All is Small, All is Well?
Microservice - All is Small, All is Well?Microservice - All is Small, All is Well?
Microservice - All is Small, All is Well?
 
Java Architectures - a New Hope
Java Architectures - a New HopeJava Architectures - a New Hope
Java Architectures - a New Hope
 
High Availability and Scalability: Too Expensive! Architectures for Future E...
High Availability and Scalability: Too Expensive! Architectures for Future E...High Availability and Scalability: Too Expensive! Architectures for Future E...
High Availability and Scalability: Too Expensive! Architectures for Future E...
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
Microservice With Spring Boot and Spring Cloud
Microservice With Spring Boot and Spring CloudMicroservice With Spring Boot and Spring Cloud
Microservice With Spring Boot and Spring Cloud
 
Incredible Impala
Incredible Impala Incredible Impala
Incredible Impala
 
Redis - The Universal NoSQL Tool
Redis - The Universal NoSQL ToolRedis - The Universal NoSQL Tool
Redis - The Universal NoSQL Tool
 
Redshift overview
Redshift overviewRedshift overview
Redshift overview
 
Introduction to Impala
Introduction to ImpalaIntroduction to Impala
Introduction to Impala
 
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
Introduction to Azure Data Lake and U-SQL for SQL users (SQL Saturday 635)
 
Java Application Servers Are Dead!
Java Application Servers Are Dead!Java Application Servers Are Dead!
Java Application Servers Are Dead!
 
Spark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir VolkSpark Summit EU talk by Shay Nativ and Dvir Volk
Spark Summit EU talk by Shay Nativ and Dvir Volk
 
NoSQL: An Analysis
NoSQL: An AnalysisNoSQL: An Analysis
NoSQL: An Analysis
 
Apache hive
Apache hiveApache hive
Apache hive
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL Databases
 
Apache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketingApache Hive, data segmentation and bucketing
Apache Hive, data segmentation and bucketing
 

Mais de Eberhard Wolff

Architectures and Alternatives
Architectures and AlternativesArchitectures and Alternatives
Architectures and AlternativesEberhard Wolff
 
The Frontiers of Continuous Delivery
The Frontiers of Continuous DeliveryThe Frontiers of Continuous Delivery
The Frontiers of Continuous DeliveryEberhard Wolff
 
Four Times Microservices - REST, Kubernetes, UI Integration, Async
Four Times Microservices - REST, Kubernetes, UI Integration, AsyncFour Times Microservices - REST, Kubernetes, UI Integration, Async
Four Times Microservices - REST, Kubernetes, UI Integration, AsyncEberhard Wolff
 
Microservices - not just with Java
Microservices - not just with JavaMicroservices - not just with Java
Microservices - not just with JavaEberhard Wolff
 
Deployment - Done Right!
Deployment - Done Right!Deployment - Done Right!
Deployment - Done Right!Eberhard Wolff
 
Data Architecture not Just for Microservices
Data Architecture not Just for MicroservicesData Architecture not Just for Microservices
Data Architecture not Just for MicroservicesEberhard Wolff
 
How to Split Your System into Microservices
How to Split Your System into MicroservicesHow to Split Your System into Microservices
How to Split Your System into MicroservicesEberhard Wolff
 
Microservices and Self-contained System to Scale Agile
Microservices and Self-contained System to Scale AgileMicroservices and Self-contained System to Scale Agile
Microservices and Self-contained System to Scale AgileEberhard Wolff
 
How Small Can Java Microservices Be?
How Small Can Java Microservices Be?How Small Can Java Microservices Be?
How Small Can Java Microservices Be?Eberhard Wolff
 
Data Architecturen Not Just for Microservices
Data Architecturen Not Just for MicroservicesData Architecturen Not Just for Microservices
Data Architecturen Not Just for MicroservicesEberhard Wolff
 
Microservices: Redundancy=Maintainability
Microservices: Redundancy=MaintainabilityMicroservices: Redundancy=Maintainability
Microservices: Redundancy=MaintainabilityEberhard Wolff
 
Self-contained Systems: A Different Approach to Microservices
Self-contained Systems: A Different Approach to MicroservicesSelf-contained Systems: A Different Approach to Microservices
Self-contained Systems: A Different Approach to MicroservicesEberhard Wolff
 
Microservices Technology Stack
Microservices Technology StackMicroservices Technology Stack
Microservices Technology StackEberhard Wolff
 
Software Architecture for Innovation
Software Architecture for InnovationSoftware Architecture for Innovation
Software Architecture for InnovationEberhard Wolff
 
Five (easy?) Steps Towards Continuous Delivery
Five (easy?) Steps Towards Continuous DeliveryFive (easy?) Steps Towards Continuous Delivery
Five (easy?) Steps Towards Continuous DeliveryEberhard Wolff
 
Nanoservices and Microservices with Java
Nanoservices and Microservices with JavaNanoservices and Microservices with Java
Nanoservices and Microservices with JavaEberhard Wolff
 
Microservices: Architecture to Support Agile
Microservices: Architecture to Support AgileMicroservices: Architecture to Support Agile
Microservices: Architecture to Support AgileEberhard Wolff
 
Microservices: Architecture to scale Agile
Microservices: Architecture to scale AgileMicroservices: Architecture to scale Agile
Microservices: Architecture to scale AgileEberhard Wolff
 
Microservices, DevOps, Continuous Delivery – More Than Three Buzzwords
Microservices, DevOps, Continuous Delivery – More Than Three BuzzwordsMicroservices, DevOps, Continuous Delivery – More Than Three Buzzwords
Microservices, DevOps, Continuous Delivery – More Than Three BuzzwordsEberhard Wolff
 

Mais de Eberhard Wolff (20)

Architectures and Alternatives
Architectures and AlternativesArchitectures and Alternatives
Architectures and Alternatives
 
Beyond Microservices
Beyond MicroservicesBeyond Microservices
Beyond Microservices
 
The Frontiers of Continuous Delivery
The Frontiers of Continuous DeliveryThe Frontiers of Continuous Delivery
The Frontiers of Continuous Delivery
 
Four Times Microservices - REST, Kubernetes, UI Integration, Async
Four Times Microservices - REST, Kubernetes, UI Integration, AsyncFour Times Microservices - REST, Kubernetes, UI Integration, Async
Four Times Microservices - REST, Kubernetes, UI Integration, Async
 
Microservices - not just with Java
Microservices - not just with JavaMicroservices - not just with Java
Microservices - not just with Java
 
Deployment - Done Right!
Deployment - Done Right!Deployment - Done Right!
Deployment - Done Right!
 
Data Architecture not Just for Microservices
Data Architecture not Just for MicroservicesData Architecture not Just for Microservices
Data Architecture not Just for Microservices
 
How to Split Your System into Microservices
How to Split Your System into MicroservicesHow to Split Your System into Microservices
How to Split Your System into Microservices
 
Microservices and Self-contained System to Scale Agile
Microservices and Self-contained System to Scale AgileMicroservices and Self-contained System to Scale Agile
Microservices and Self-contained System to Scale Agile
 
How Small Can Java Microservices Be?
How Small Can Java Microservices Be?How Small Can Java Microservices Be?
How Small Can Java Microservices Be?
 
Data Architecturen Not Just for Microservices
Data Architecturen Not Just for MicroservicesData Architecturen Not Just for Microservices
Data Architecturen Not Just for Microservices
 
Microservices: Redundancy=Maintainability
Microservices: Redundancy=MaintainabilityMicroservices: Redundancy=Maintainability
Microservices: Redundancy=Maintainability
 
Self-contained Systems: A Different Approach to Microservices
Self-contained Systems: A Different Approach to MicroservicesSelf-contained Systems: A Different Approach to Microservices
Self-contained Systems: A Different Approach to Microservices
 
Microservices Technology Stack
Microservices Technology StackMicroservices Technology Stack
Microservices Technology Stack
 
Software Architecture for Innovation
Software Architecture for InnovationSoftware Architecture for Innovation
Software Architecture for Innovation
 
Five (easy?) Steps Towards Continuous Delivery
Five (easy?) Steps Towards Continuous DeliveryFive (easy?) Steps Towards Continuous Delivery
Five (easy?) Steps Towards Continuous Delivery
 
Nanoservices and Microservices with Java
Nanoservices and Microservices with JavaNanoservices and Microservices with Java
Nanoservices and Microservices with Java
 
Microservices: Architecture to Support Agile
Microservices: Architecture to Support AgileMicroservices: Architecture to Support Agile
Microservices: Architecture to Support Agile
 
Microservices: Architecture to scale Agile
Microservices: Architecture to scale AgileMicroservices: Architecture to scale Agile
Microservices: Architecture to scale Agile
 
Microservices, DevOps, Continuous Delivery – More Than Three Buzzwords
Microservices, DevOps, Continuous Delivery – More Than Three BuzzwordsMicroservices, DevOps, Continuous Delivery – More Than Three Buzzwords
Microservices, DevOps, Continuous Delivery – More Than Three Buzzwords
 

Último

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 

Último (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

NoSQL & Architectures Explained for Scaling Applications

  • 1. NoSQL & Architectures Eberhard Wolff @ewolff Eberhard Wolff - @ewolff
  • 2. About me Eberhard Wolff ► Freelance consultant ► Head technology advisory board at adesso ► Speaker ► Author ►  Blog: http://ewolff.com ► Twitter: @ewolff ►  Eberhard Wolff - @ewolff
  • 3. Back in the Days…. Eberhard Wolff - @ewolff
  • 4. NoSQL Is All About the Persistence Question Eberhard Wolff - @ewolff
  • 5. Key-Value Stores Key Maps keys to values ► Just a large globally available Map ► i.e. not very powerful data model ►  Value 42 Some data No complex queries or indices ► Just access by key ► Might add e.g. full text engine ►  Redis: Cache + Persistence ► Riak: Massive scale +Solr queries ►  Eberhard Wolff - @ewolff
  • 6. Wide Column Add any "column" you like to a row ► key-(column-value) ► Column families like tables ► E.g. in the "Users" column family ►  >  "someuser" è ("username"è"someuser"), XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX XX Columns named: indexing possible ► So fast queries possible XX XX XX ("email" è"someuser@example.com") ►  XX XX XX XX XX xX XX XX XX XX XX XX Apache Cassandra ► Amazon SimpleDB ► Apache HBase ► All tuned for large data sets ►  Eberhard Wolff - @ewolff
  • 7. Document Stores Aggregates are typically stored as "documents“ (key-value collection) ► JSON quite common ► No fixed schema ► Indexes possible ► Queries possible ►  > E.g. "find all baskets that contain the product 123" Still great horizontal scalability ► Relations might be modeled as links ►  MongoDB, CouchDB ►  Eberhard Wolff - @ewolff
  • 8. Graph Nodes with Properties ► Typed relationships with properties ►  Ideal e.g. to model relations in a social network ►  Easy to find number of followers, degree of relation etc. ► Hard to scale out ►  Neo4j ►  Eberhard Wolff - @ewolff
  • 9. NoSQL Benefits Costs •  Scale out instead of Scale Up •  Cheap Hardware •  Usually Open Source Dev Ops Flexibility •  Schema in code not in database •  Easier to upgrade schema •  Easier to handle heterogeneous data No Object/relational impedance mismatch •  NoSQL database are more OO like Eberhard Wolff - @ewolff
  • 10. Drivers Exponential Data Growth Key Value Scale Out Wide Column Semi Structured Data Document More Connected Data Graph Cost Flexibility Eberhard Wolff - @ewolff
  • 11. Document-oriented Databases are the best NoSQL database For at least one definition of “best” Eberhard Wolff - @ewolff
  • 12. Document-oriented databases Offer scale out > Unless you need huge amounts of data ►  Offer a rich and flexible data model > …and queries ►  Cost Flexibility Other databases have other sweet spots > Huge data sets > Graph structures > Analyzing data ►  Niches or mainstream? ►  Eberhard Wolff - @ewolff
  • 13. Financial System Different financial products ►  Mapping objects / database ►  Inheritance ►  Eberhard Wolff - @ewolff
  • 14. E/R Model Zero Bond Stock Option Investment > 20 database tables Country Up to 25 attributes Currency Eberhard Wolff - @ewolff
  • 17. Polyglot Persistence in Ecommerce Application Needs transactions & reports. Data fit well in tables. Complex document-like data structures and complex queries Financial Data Product Catalog RDBMS Document Store High Performance & Scalability No complex queries Based on friends, their purchases and reviews Shopping Cart Recommendation Key / Value Graph Eberhard Wolff - @ewolff
  • 18. The NoSQL Game Needs transactions & reports. Data fit well in tables. Complex document-like data structures and complex queries 2700 High Score! Financial Data Product Catalog RDBMS Document Store 0 1000 High Performance & Scalability No complex queries Based on friends, their purchases and reviews Shopping Cart Recommendation Key / Value Graph 900 800 Eberhard Wolff - @ewolff
  • 19. Just Like the Patterns Game! Points for each Pattern used Extra points if one class implements multiple Pattern Eberhard Wolff - @ewolff
  • 20. This is not how
 Software Architecture works. Eberhard Wolff - @ewolff
  • 21. Why not? More is worse! More hardware More Developer Skills Not necessarily bad More Ops Trouble •  Installation •  Backup •  Disaster Recovery •  Monitoring •  Optimizations Eberhard Wolff - @ewolff
  • 22. But: Polyglot Persistence Has a Point Object-oriented Databases did it wrong ► Strategy: Replace RDBMS ► Enterprises will stick to RDBMS ► Pure technology migration basically never happens ► …only vendors think differently ►  Eberhard Wolff - @ewolff
  • 23. Archive Classic approach for current data NoSQL for the archive Current Data Archive RDBMS Document Store Eberhard Wolff - @ewolff
  • 24. Archives for Insurances Legacy migration ► Querying and visualizing not migrated data ► i.e. old contracts ► Legacy hard- and software can be switched off ► Flexibility: Host data formats ► Cost: Inexpensively handling large data volumes ►  Eberhard Wolff - @ewolff
  • 25. Complex Document Processing System MongoDB Documentoriented Documents Redis Key/value in memory Meta Data for quick access elastic search Search engine Search index Eberhard Wolff - @ewolff
  • 26. Alternative: Only elasticsearch •  Stores original documents as well •  (like a key/value store) •  Support for complex queries elastic •  Very powerful features also for search data mining / analytics •  Not well suited for update heavy operations •  Backup / disaster recovery? •  Written in Java Eberhard Wolff - @ewolff
  • 27. Scaling elasticsearch Shard 1 Replica 1 Replica 2 Shard 2 Shard 3 Server Server Replica 3 Server Eberhard Wolff - @ewolff
  • 28. Alternative: Only MongoDB •  Now with (limited beta) fulltext search •  Excellent support for updates •  Quite fast – memory mapped MongoDB files •  Also fast for updates •  Disaster recovery possible •  Map/Reduce support •  Written in C++ Eberhard Wolff - @ewolff
  • 29. Scaling MongoDB Replica 1 Replica 1 Replica 2 Replica 2 Replica 3 Replica 3 Shard 1 Shard 2 Eberhard Wolff - @ewolff
  • 30. Scaling MongoDB Replica 1 Replica 1 Replica 1 Replica 2 Replica 2 Replica 2 Replica 3 Replica 3 Replica 3 Shard 1 Shard 2 Shard 3 Eberhard Wolff - @ewolff
  • 31. What about Redis? •  MongoDB uses memory mapped files – Why Redis? •  Like a Swiss Knife •  Cache •  Messaging •  Central coordination in a distributed environment •  Written in C Redis Eberhard Wolff - @ewolff
  • 32. Scaling Redis Asynchronous replication built in Replica Server Replica Eberhard Wolff - @ewolff
  • 33. Alternative: Riak •  •  •  •  •  •  Key / value store But includes Solr for fulltext search What is the difference to a document store then? Map/reduce possible Written in Erlang Smart scaling Eberhard Wolff - @ewolff
  • 34. Scaling Riak Server A Shard3 Shard1 Server B Shard1 Shard2 Shard4 Shard4 Server D Shard2 Shard4 Server C Shard2 Shard3 Shard3 Shard1 Eberhard Wolff - @ewolff
  • 35. Scaling Riak Server A Shard3 Shard1 Server B Shard1 Shard2 Shard4 Shard4 Server D Shard2 Shard4 Server C Shard2 Shard3 Shard3 Shard1 Eberhard Wolff - @ewolff
  • 36. Scaling Riak Server A Shard3 Shard1 Server B Shard1 Shard2 Shard4 Shard4 New Server Server D Shard2 Shard4 Server C Shard2 Shard3 Shard3 Shard1 Eberhard Wolff - @ewolff
  • 37. Key/Value! Document-oriented Databases are the best NoSQL database For at least one definition of “best” Eberhard Wolff - @ewolff
  • 38. MongoDB Redis riak elastic search Your Choice – a trade off! Typical architecture decision Eberhard Wolff - @ewolff
  • 39. Data Access: RDBMS Optimizations Data Model •  Indices •  Tables
 spaces No need to change code •  … •  Schema •  Stored Procedures DBA Data Access •  Queries •  Other code RDBMS Architect/ Developer Eberhard Wolff - @ewolff
  • 40. RDBMS separate data from data access Indices Joins and normalization allow flexible data access patterns Eberhard Wolff - @ewolff
  • 41. Sacrifice Joins for Scalability ► Join: Combine tables to retrieve results ► Need transactions spanning multiple tables ► Example: Customer table + addresses ► Inserts need locks and consistency across both tables Limits scalability ► Global and distributed locks are nasty ► Consistency limits either availability or partition tolerance Eberhard Wolff - @ewolff ► 
  • 42. CAP Theorem Consistency ►  > All nodes see the same data > Not the ACID Consistency Availability ►  > Node failure do not prevent survivors from operating Partition Tolerance ►  > System continues to operate despite arbitrary message loss C Can at max have two A P ► Or rather: If network fail – choose A or C. ►  Eberhard Wolff - @ewolff
  • 44. BASE ► Basically Available Soft state Eventually consistent ► I.e. trade consistency for availability Pun concerning ACID… ► Not the same C, however! ►  Eberhard Wolff - @ewolff
  • 45. BASE Eventually consistent ► If no updates are sent for a while all previous updates will eventually propagate through the system ► Then all replicas are consistent ► Can deal with network partitioning: Message will be transferred later ► All replicas are always available ►  Pun concerning ACID… ► Not the same C, however! ►  Eberhard Wolff - @ewolff
  • 46. Banking is BASE ATMs relax rules on providing cash if network partitioned ►  Your account is only guaranteed to be consistent by the end of the year ►  Eberhard Wolff - @ewolff
  • 47. No Joins - What now? ► Customer and addresses must be consistent! ► Solution: Store both as one entity ► Atomic changes easily possible ► Queries might be distributed across multiple notes “NoSQL does not support transactions / ACID” is wrong ►  > NoSQL does not support Joins is better > Atomic changes still possible > Schema design different Eberhard Wolff - @ewolff
  • 48. Data Access MongoDB Optimizations •  Only basic indices Other optimizations must be
 done in
 code DBA Data Model •  Influences access
 patterns Data Access •  WriteConcerns
 how much do love your data? •  Shard key •  Consistency MongoDB Architect/ Developer Eberhard Wolff - @ewolff
  • 49. Cluster: RDBMS ►  Transparent to developers ►  How many nodes? ►  A special setup of hardware and RDBMS software DBA Eberhard Wolff - @ewolff
  • 50. Cluster: MongoDB ►  CAP theorem > If the network is down choose > Consistency xor > Availabilty ►  Deals with replication ►  MongoDB has master / slave replication Write Concerns: > Unacknowledged > Acknowledged > Journaled > Some nodes in the replica set ►  Queries might go to master only or also slaves ►  Influences consistency ►  MongoDB Architect/ Developer Eberhard Wolff - @ewolff
  • 51. More Power and more Responsibility Architect DB Admin Eberhard Wolff - @ewolff
  • 52. Architects Architecture has always been a multidimensional problem ►  ►  Need to choose persistence technology ►  Need to think about operations ►  Needs to do DBA work Eberhard Wolff - @ewolff
  • 53. NoSQL Is All About the Persistence Question Eberhard Wolff - @ewolff