SlideShare a Scribd company logo
1 of 54
Abdelmonaim Remani | Just.me Inc.


The Rise of NoSQL and
 Polyglot Persistence
About Me
• Software Architect at Just.me Inc.
• Interested in technology evangelism and enterprise software
  development and architecture
• Frequent speaker (JavaOne, JAX, OSCON, ORDEV, etc…)
• Open-source advocate
• President and founder of a number of user group
   – NorCal Java User Group
   – The Silicon Valley Spring User Group
   – The Silicon Valley Dart Meetup
• Bio:         http://about.me/PolymathicCoder
• Twitter:     @PolymathicCoder
• Email:       abdelmonaim.remani@gmail.com
License




• Creative Commons Attribution Non-Commercial 3.0 Unported
   – http://creativecommons.org/licenses/by-nc/3.0


• Disclaimer: The graphics and the logo in the presentation
  belong to their rightful owners
The Golden Age of Relational
        Databases
Relational Data Stores
• Relational Data Stores have been the
  predominant choice in storing data
  – The existence mature solutions
    • Oracle, MySQL, Ms SQL Server, etc…
  – Wide adoption and familiarity
    • Developers and even advanced business users
  – An abundance of tools
  – Etc…
• It became the De-Facto standard
The Relational Model
• Data
  – Stored in
     • 2 dimensional tables (Relations)
     • Rows (tuples) and columns (attributes)
 • Has well-define enforced schema
   – Relations themselves
   – Integrity constrains
• Normalization
  – Smaller tables with well-defined relationship
    between them
  – Why?
      • Minimized redundancy
      • No modification anomalies
          – Modification Propagation or cascading
The Relational Model
• Supported by SQL (Structured Query
  Language)
  – A somewhat standardized query language
  – Very flexible
  – Many Operations
    • Across multiple relations such as JOIN
    • Aggregations such as GROUP BY
    • Etc…
The Relational Model
• Transactional
  • ACID
    – Atomicity
        » All or nothing
    – Consistency
        » From one valid state to another
    – Isolation
        » Concurrency result in a valid state
    – Durability
        » Once committed, it’s forever
The Relational Model
• Designed with the assumptions that
 – The end-user will directly interact with database

   » It makes sense that the RDBMS should manage concurrency
     and integrity

   » Access Patterns are unknown

     » A flexible query language that is close to English

     » Data structure with no bias towards a particular pattern of
       querying

 – The database runs on a single machine

   » The only way to promise true ACID
Road Bumps
• We started building more complex applications on top
  of relational databases
 – Business logic moved out of the RDBMS

   » Fewer triggers and stored procedures and replaced by
     equivalent application layer code

 – The applications themselves evolved beyond the procedural
   paradigm to a more OOP approach

   » The Object-Relational impedance mismatch

     » ORM framework to the rescue
Scalability
We became data hoarders!
• As our datasets grew out of control
• Performance decreases exponentially
  – We buy a beefier machines
     • Larry Ellison’s most expensive RAC and make
       him even richer
• This put off the problem for a little while
Optimization
• We hire a guy
  – Indexes half of the databases
     • Made those queries a little faster
  – Creates materialized views for complex joins
     • Nightmare to maintain, get stale, etc…
  – He de-normalizes
     • Any thing but a smooth transition!
     • Redundancy
  – He introduces Caching
     • Data too stale
     • More redundancy
Clustering
• We hire another guy
   – Tells us that we hit the limit of the one machine
   – You need to scale out (Horizontally)
      • Master/Slave
          – Assuming you read more than you write
          – Write to the Master and Read from the Slaves
          – Master needs to replicate data across the slaves
              » Risk incorrect reads
          – How’s that consistent?!!
      • Sharding
          –   Improves reads as much as writes
          –   Can’t join across partitions
          –   No referential integrity
          –   Requires modification of client applications
          –   Introduces a single-point of failure
          –   How’s that consistent?!!
What’s the Point?
• We vertically scale our relational
  database
  – We’re no longer consistent
  – No ACIDity?
  – We loose query flexibility
• Are we doing something wrong?
The CAP Theorem
The CAP Theorem
• Eric Brewer on distributed systems
  – Pick tow out of
    • Consistency
    • Availability
    • Partition Tolerance
• There is Fast Cheap Good service
  – Cheap Good service won’t be Fast
  – Fast Good service won’t be Cheap
  – Fast Cheap service won’t be Good
Relational Model & CAP
• Relational Data Stores happen to favor
  – Consistency and Availability
  – For historical reasons
     • They are key to certain type of applications
     • The bank example
        – I deposit $100 in my friend’s bank account
        – Blah blah blah…
• According to CAP, Partition Tolerance is
  impossible meaning that horizontal
  scaling is impossible
Scheiße!
• We’re in a pickle
  – Too much data in CA model
  – Vertical Scaling
     • Too expensive
     • Not sustainable
• Forced to explore other alternatives in
  light of CAP
What AP Looks Like
• Partition Tolerance
  – Since we reached the limit of the one machine
    we have no choice but to scale horizontally
  – Which means to be partition tolerant
• Availability
  – Nobody is willing to give up most of the time
  – This becomes even better with distribution
  – In a cluster of servers
     • The individual node might be unreliable by itself
     • But a whole inherently reliable
What AP Looks Like
• According the CAP we simply cannot have C
• Consistency
  – I make a update and all subsequent read the most
    updated value
  – Unfortunately this is impossible as it takes time for
    the change to be replicated across each node of
    the cluster
• What a bummer?!
• Let’s look and AP system
  – DNS (Domain Naming Service)
     • Not all the nodes have the most updated records (You
       register that domain name and wait for a few days to
       guarantee that every DNS knows about it)
Eventual Consistency
• This is no so bad
   – It means that we just settled for a lesser degree
     Consistency
• So what if
   – Mohammad in Morocco updated his relationship status
     to single on an some edge node
   – His cousin who lives Spain saw it immediately because
     they happen to be on the same edge node
   – His secret admirer Sara who lives in the United States
     could not see it until an hour later
   – His bother in Japan got the update the next day
   – They all got it eventually!
• Eventual Consistency as Opposed to Immediate
  Consistency
The Compromise
• We settle for weaker consistency model
  – BASE
    • Basically Available
    • Soft state
    • Eventual Consistency
• ACID on the individual node BASE on
  the cluster
The Slippery Slope of the
        Faithless
You might as well Question…
• Schema
 – Logical
   • Well-defined and rigid in relational databases
   • Why not a flexible one or even no schema
 – Physical
   • B Trees in most relational databases
   • Why not use some other underlying data
     structure
You might as well Question…
• Integrity Constraints
  – Who cares?
• A Query Language
  – Anything would do…
• Security
  – None
• Name it…
NoSQL: Going Rogue…
NoSQL
• A wide range of specialized data stores
  with the goal of addressing the challenges
  of the relational model
• Eric Evans
  – The whole point of seeking alternatives is that
    you need to solve a problem that relational
    databases are a bad fit for
• Let me make it easier
  – It is does not anti-SQL or anti-Relational
  – Any data store that is non-relational
• “Not Only SQL” instead of “NO SQL”
SQL             vs.            NoSQL
A single machine                  A cluster
       CA                        AP/CA/CP
 Scale Vertically             Scale Horizontally
      SQL                       Custom APIs
      ACID                          BASE
  Full Indexes                 Mostly on Keys


            There are outliers of course
SQL              vs.            NoSQL
    Rigid Schema                    Schema-less
   Flexible Queries              Pre-defined Queries

• SQL (Relational)
  – Concerned about what the data consists of
• NoSQL (Non-Relational)
  – Concerned with how the data is queried

                There are outliers of course
The Zoo
Key-Value Data Stores
• Basically a big hash map associative array
   – Very Simple
   – Very fast read and write
   – No secondary indexes
• Use When
   – Your data is not highly related
   – All you need is basic CRUD
• Challenges
   – Complex queries
• Check out the Amazon Dynamo Paper
       • http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-
         sosp2007.pdf
• Featured Projects
   – DynamoDB http://hbase.apache.org/
   – Riak http://wiki.basho.com/
   – Redis http://redis.io/
Columnar Stores
•   In a table, data of the same column is stored together
     – Storage is not wasted on null value as in row-based stores (RDBMS)
     – Great for sparse tables
     – Very fast column operation including aggregation
•   Use When
     – Big Data (Excellent leverage of Map Reduce)
     – Need compression or versioning
•   Challenges
     – You better know your access patterns before hand
     – Keys design is not trivial
•   Check out Google’s BigTable Paper
     – http://static.googleusercontent.com/external_content/untrusted_dlcp/research.go
       ogle.com/en/us/archive/bigtable-osdi06.pdf
•   Featured Projects
     – Hbase http://hbase.apache.org/
     – Cassanda http://cassandra.apache.org/
Document Data Stores
•   Nested structures of hashes and their values
     – A document can be
          •   Simply a hash and its value
          •   Hash and another document as its value
          •   No limit in depth
     –   Very Flexible schema
     –   Well-Indexed data
     –   Works well with OOP (No impedance mismatch)
     –   De-normalize as a best practice
•   Use when
     – You don’t know much about the schema
     – The schema very likely to change
•   Challenges
     – Complex Join-like queries
     – Self-referencing documents and circular dependencies
•   Projects
     – MongoDB http://www.mongodb.org/
     – CouchDB http://couchdb.apache.org/
Graph Data Stores
• A graph
   –   Perfect for highly interconnected data
   –   Allows for explicit relationships
   –   Fined graph grained-traversal
   –   Very Flexible
   –   Works well with OOP (No impedance mismatch)
• Use when
   – Your data looks like a graph and requires graph question
   – You are smart enough not to try this on another data store
• Challenges
   – Doesn’t scale-well horizontally
• Featured Projects
   – Neo4j http://neo4j.org/
Relational Data Stores
• Use when
   – Your data Highly relational
   – There is a need to break data into small pieces and
     assemble it in different ways
   – When consistence is king
   – Access patterns are unknown
   – Reporting
• Challenges
   – Doesn’t scale-well horizontally
• Featured Projects
   –   Oracle http://www.oracle.com/index.html
   –   Postgres http://www.postgresql.org/
   –   Ms SQL Server http://dev.mysql.com/
   –   MySQL http://www.mysql.com/
How do you choose?
If It Doesn’t Fit, You Must Acquit!
• Data
  –   Does it have a natural structure?
  –   How it is connected to each other?
  –   How is it distributed?
  –   How much?
• Access Patterns
  – Reads/Writes ratio?
  – Uniform or random?
• CAP
Other Considerations
•   Maturity
•   Stability
•   Maintainability
•   Durability
•   Cost
•   Tools
•   Familiarity
For Fairness’ Sake!
For Fairness’ Sake!
• Relational data stores did not fail us
  – They actually perform very well
• We failed ourselves
  – By using them as solutions for problems
    they weren’t designed to solve to begin
    with
• Take any data store and you’ll get as
  much trouble
For Fairness’ Sake!
• You can’t expect
  – A flathead screwdriver to work on a Philips
    as well as one with the matching Philips
    blade
  – A crosshead screwdriver to work on
    flathead screw
Polyglot Persistence
Polyglot Persistence
• Enterprise application are complex and
  combine complex problems
  – Assumption that we should use one data store is
    absurd
  – You can’t try to fit all in one model and expect no
    problem
• Polyglot Persistence
  – To leverage multiple data storages, based on the
    way data is used by the application
     • Associated with a learning curve
     • Long term investment (More productive in the long-run)
  – Leverage the strength of multiple data stores
Polyglot Persistence
• Example
  –   MongoDB for the product catalog
  –   Redis for shopping cart
  –   DynamoDB for social profile info
  –   Neo4j for the social graph
  –   HBase for inbox and public feed messages
  –   MySQL for payment and account info
  –   Cassandra for audit and activity log
• Disclaimer: I’m not making any
  recommendation here.
NoSQL in the Cloud
NoSQL in the Cloud
• NoSQL as a commodity
  – Fully managed data stores (No
    maintenance)
  – Elastic scaling
  – Cheap storage
• Featured:
  – Amazon AWS
  – Heroku Add-ons
  – CloudFoundry
As Promised!
The A’s the Q’s in the Abstract
• What does the rise of all these NoSQL mean
  to my enterprise?
   – I’m guessing a lot
• What is NoSQL to begin with?
   – Any non-relational data store
• Does it mean “NO SQL”?
   – No
• Could this be just another fad?
   – I don’t think so
The A’s the Q’s in the Abstract
• Is a good idea to be the future of my
  enterprise on these new exotic
  technologies and simply abandon
  proven mature RDBMS?
  – It’s up to you. I will say “No guts, no glory!”
• How scalable is scalable?
  – However much you need it to be
The A’s the Q’s in the Abstract
• Assuming that I am sold, how do I
  choose the one that fits my needs the
  best?
  – I’ll tell you if you hire me
• Is there a middle ground somewhere?
  – Polyglot Persistence
• What is this Polyglot Persistence I hear
  about?
  – It’s the middle ground
Any Other Questions?
Thank You All!

@PolymathicCoder

More Related Content

What's hot

NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introductionPooyan Mehrparvar
 
Replication in Distributed Database
Replication in Distributed DatabaseReplication in Distributed Database
Replication in Distributed DatabaseAbhilasha Lahigude
 
Networking threads
Networking threadsNetworking threads
Networking threadsNilesh Pawar
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSAmazon Web Services
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...Simplilearn
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 
CAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and PracticesCAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and PracticesYoav Francis
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache SparkRahul Jain
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sqlRam kumar
 
Distributed Query Processing
Distributed Query ProcessingDistributed Query Processing
Distributed Query ProcessingMythili Kannan
 

What's hot (20)

NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
 
Replication in Distributed Database
Replication in Distributed DatabaseReplication in Distributed Database
Replication in Distributed Database
 
Networking threads
Networking threadsNetworking threads
Networking threads
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
Replication in Distributed Systems
Replication in Distributed SystemsReplication in Distributed Systems
Replication in Distributed Systems
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
What Is Apache Spark? | Introduction To Apache Spark | Apache Spark Tutorial ...
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Introduction to Amazon Aurora
Introduction to Amazon AuroraIntroduction to Amazon Aurora
Introduction to Amazon Aurora
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
CAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and PracticesCAP Theorem - Theory, Implications and Practices
CAP Theorem - Theory, Implications and Practices
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
SOA
SOASOA
SOA
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Cluster Computing
Cluster ComputingCluster Computing
Cluster Computing
 
Non relational databases-no sql
Non relational databases-no sqlNon relational databases-no sql
Non relational databases-no sql
 
Distributed Query Processing
Distributed Query ProcessingDistributed Query Processing
Distributed Query Processing
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 

Similar to The Rise of NoSQL and Polyglot Persistence

Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systemselliando dias
 
Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Ricard Clau
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQLRTigger
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLRichard Schneeman
 
Mongo db model relationships with documents
Mongo db model relationships with documentsMongo db model relationships with documents
Mongo db model relationships with documentsDr. Awase Khirni Syed
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An OverviewC. Scyphers
 
Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixJason Brown
 
Real World Performance - OLTP
Real World Performance - OLTPReal World Performance - OLTP
Real World Performance - OLTPConnor McDonald
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloudImaginea
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The CloudImaginea
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureArthur Gimpel
 
NoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseNoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseJoe Alex
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)Ben Stopford
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsJonas Bonér
 
Scaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHPScaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHP120bi
 

Similar to The Rise of NoSQL and Polyglot Persistence (20)

NoSql
NoSqlNoSql
NoSql
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Storage Systems For Scalable systems
Storage Systems For Scalable systemsStorage Systems For Scalable systems
Storage Systems For Scalable systems
 
Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014Big Data! Great! Now What? #SymfonyCon 2014
Big Data! Great! Now What? #SymfonyCon 2014
 
Sql vs NoSQL
Sql vs NoSQLSql vs NoSQL
Sql vs NoSQL
 
Scaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQLScaling the Web: Databases & NoSQL
Scaling the Web: Databases & NoSQL
 
Mongo db model relationships with documents
Mongo db model relationships with documentsMongo db model relationships with documents
Mongo db model relationships with documents
 
Big Data Platforms: An Overview
Big Data Platforms: An OverviewBig Data Platforms: An Overview
Big Data Platforms: An Overview
 
Cassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating NetflixCassandra from the trenches: migrating Netflix
Cassandra from the trenches: migrating Netflix
 
Real World Performance - OLTP
Real World Performance - OLTPReal World Performance - OLTP
Real World Performance - OLTP
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data Architecture
 
No SQL
No SQLNo SQL
No SQL
 
NoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed DatabaseNoSQL A brief look at Apache Cassandra Distributed Database
NoSQL A brief look at Apache Cassandra Distributed Database
 
Big iron 2 (published)
Big iron 2 (published)Big iron 2 (published)
Big iron 2 (published)
 
Scalability, Availability & Stability Patterns
Scalability, Availability & Stability PatternsScalability, Availability & Stability Patterns
Scalability, Availability & Stability Patterns
 
Scaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHPScaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHP
 

More from Abdelmonaim Remani

The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling SoftwareAbdelmonaim Remani
 
The Art of Metaprogramming in Java
The Art of Metaprogramming in Java  The Art of Metaprogramming in Java
The Art of Metaprogramming in Java Abdelmonaim Remani
 
Building enterprise web applications with spring 3
Building enterprise web applications with spring 3Building enterprise web applications with spring 3
Building enterprise web applications with spring 3Abdelmonaim Remani
 
Introduction To Building Enterprise Web Application With Spring Mvc
Introduction To Building Enterprise Web Application With Spring MvcIntroduction To Building Enterprise Web Application With Spring Mvc
Introduction To Building Enterprise Web Application With Spring MvcAbdelmonaim Remani
 
Introduction To Rich Internet Applications
Introduction To Rich Internet ApplicationsIntroduction To Rich Internet Applications
Introduction To Rich Internet ApplicationsAbdelmonaim Remani
 

More from Abdelmonaim Remani (8)

The Eschatology of Java
The Eschatology of JavaThe Eschatology of Java
The Eschatology of Java
 
The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling Software
 
How RESTful Is Your REST?
How RESTful Is Your REST?How RESTful Is Your REST?
How RESTful Is Your REST?
 
The Art of Metaprogramming in Java
The Art of Metaprogramming in Java  The Art of Metaprogramming in Java
The Art of Metaprogramming in Java
 
Le Tour de xUnit
Le Tour de xUnitLe Tour de xUnit
Le Tour de xUnit
 
Building enterprise web applications with spring 3
Building enterprise web applications with spring 3Building enterprise web applications with spring 3
Building enterprise web applications with spring 3
 
Introduction To Building Enterprise Web Application With Spring Mvc
Introduction To Building Enterprise Web Application With Spring MvcIntroduction To Building Enterprise Web Application With Spring Mvc
Introduction To Building Enterprise Web Application With Spring Mvc
 
Introduction To Rich Internet Applications
Introduction To Rich Internet ApplicationsIntroduction To Rich Internet Applications
Introduction To Rich Internet Applications
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

The Rise of NoSQL and Polyglot Persistence

  • 1. Abdelmonaim Remani | Just.me Inc. The Rise of NoSQL and Polyglot Persistence
  • 2. About Me • Software Architect at Just.me Inc. • Interested in technology evangelism and enterprise software development and architecture • Frequent speaker (JavaOne, JAX, OSCON, ORDEV, etc…) • Open-source advocate • President and founder of a number of user group – NorCal Java User Group – The Silicon Valley Spring User Group – The Silicon Valley Dart Meetup • Bio: http://about.me/PolymathicCoder • Twitter: @PolymathicCoder • Email: abdelmonaim.remani@gmail.com
  • 3. License • Creative Commons Attribution Non-Commercial 3.0 Unported – http://creativecommons.org/licenses/by-nc/3.0 • Disclaimer: The graphics and the logo in the presentation belong to their rightful owners
  • 4. The Golden Age of Relational Databases
  • 5. Relational Data Stores • Relational Data Stores have been the predominant choice in storing data – The existence mature solutions • Oracle, MySQL, Ms SQL Server, etc… – Wide adoption and familiarity • Developers and even advanced business users – An abundance of tools – Etc… • It became the De-Facto standard
  • 6. The Relational Model • Data – Stored in • 2 dimensional tables (Relations) • Rows (tuples) and columns (attributes) • Has well-define enforced schema – Relations themselves – Integrity constrains • Normalization – Smaller tables with well-defined relationship between them – Why? • Minimized redundancy • No modification anomalies – Modification Propagation or cascading
  • 7. The Relational Model • Supported by SQL (Structured Query Language) – A somewhat standardized query language – Very flexible – Many Operations • Across multiple relations such as JOIN • Aggregations such as GROUP BY • Etc…
  • 8. The Relational Model • Transactional • ACID – Atomicity » All or nothing – Consistency » From one valid state to another – Isolation » Concurrency result in a valid state – Durability » Once committed, it’s forever
  • 9. The Relational Model • Designed with the assumptions that – The end-user will directly interact with database » It makes sense that the RDBMS should manage concurrency and integrity » Access Patterns are unknown » A flexible query language that is close to English » Data structure with no bias towards a particular pattern of querying – The database runs on a single machine » The only way to promise true ACID
  • 10. Road Bumps • We started building more complex applications on top of relational databases – Business logic moved out of the RDBMS » Fewer triggers and stored procedures and replaced by equivalent application layer code – The applications themselves evolved beyond the procedural paradigm to a more OOP approach » The Object-Relational impedance mismatch » ORM framework to the rescue
  • 12. We became data hoarders! • As our datasets grew out of control • Performance decreases exponentially – We buy a beefier machines • Larry Ellison’s most expensive RAC and make him even richer • This put off the problem for a little while
  • 13. Optimization • We hire a guy – Indexes half of the databases • Made those queries a little faster – Creates materialized views for complex joins • Nightmare to maintain, get stale, etc… – He de-normalizes • Any thing but a smooth transition! • Redundancy – He introduces Caching • Data too stale • More redundancy
  • 14. Clustering • We hire another guy – Tells us that we hit the limit of the one machine – You need to scale out (Horizontally) • Master/Slave – Assuming you read more than you write – Write to the Master and Read from the Slaves – Master needs to replicate data across the slaves » Risk incorrect reads – How’s that consistent?!! • Sharding – Improves reads as much as writes – Can’t join across partitions – No referential integrity – Requires modification of client applications – Introduces a single-point of failure – How’s that consistent?!!
  • 15. What’s the Point? • We vertically scale our relational database – We’re no longer consistent – No ACIDity? – We loose query flexibility • Are we doing something wrong?
  • 17. The CAP Theorem • Eric Brewer on distributed systems – Pick tow out of • Consistency • Availability • Partition Tolerance • There is Fast Cheap Good service – Cheap Good service won’t be Fast – Fast Good service won’t be Cheap – Fast Cheap service won’t be Good
  • 18. Relational Model & CAP • Relational Data Stores happen to favor – Consistency and Availability – For historical reasons • They are key to certain type of applications • The bank example – I deposit $100 in my friend’s bank account – Blah blah blah… • According to CAP, Partition Tolerance is impossible meaning that horizontal scaling is impossible
  • 19. Scheiße! • We’re in a pickle – Too much data in CA model – Vertical Scaling • Too expensive • Not sustainable • Forced to explore other alternatives in light of CAP
  • 20. What AP Looks Like • Partition Tolerance – Since we reached the limit of the one machine we have no choice but to scale horizontally – Which means to be partition tolerant • Availability – Nobody is willing to give up most of the time – This becomes even better with distribution – In a cluster of servers • The individual node might be unreliable by itself • But a whole inherently reliable
  • 21. What AP Looks Like • According the CAP we simply cannot have C • Consistency – I make a update and all subsequent read the most updated value – Unfortunately this is impossible as it takes time for the change to be replicated across each node of the cluster • What a bummer?! • Let’s look and AP system – DNS (Domain Naming Service) • Not all the nodes have the most updated records (You register that domain name and wait for a few days to guarantee that every DNS knows about it)
  • 22. Eventual Consistency • This is no so bad – It means that we just settled for a lesser degree Consistency • So what if – Mohammad in Morocco updated his relationship status to single on an some edge node – His cousin who lives Spain saw it immediately because they happen to be on the same edge node – His secret admirer Sara who lives in the United States could not see it until an hour later – His bother in Japan got the update the next day – They all got it eventually! • Eventual Consistency as Opposed to Immediate Consistency
  • 23. The Compromise • We settle for weaker consistency model – BASE • Basically Available • Soft state • Eventual Consistency • ACID on the individual node BASE on the cluster
  • 24. The Slippery Slope of the Faithless
  • 25. You might as well Question… • Schema – Logical • Well-defined and rigid in relational databases • Why not a flexible one or even no schema – Physical • B Trees in most relational databases • Why not use some other underlying data structure
  • 26. You might as well Question… • Integrity Constraints – Who cares? • A Query Language – Anything would do… • Security – None • Name it…
  • 28. NoSQL • A wide range of specialized data stores with the goal of addressing the challenges of the relational model • Eric Evans – The whole point of seeking alternatives is that you need to solve a problem that relational databases are a bad fit for • Let me make it easier – It is does not anti-SQL or anti-Relational – Any data store that is non-relational • “Not Only SQL” instead of “NO SQL”
  • 29. SQL vs. NoSQL A single machine A cluster CA AP/CA/CP Scale Vertically Scale Horizontally SQL Custom APIs ACID BASE Full Indexes Mostly on Keys There are outliers of course
  • 30. SQL vs. NoSQL Rigid Schema Schema-less Flexible Queries Pre-defined Queries • SQL (Relational) – Concerned about what the data consists of • NoSQL (Non-Relational) – Concerned with how the data is queried There are outliers of course
  • 31.
  • 33. Key-Value Data Stores • Basically a big hash map associative array – Very Simple – Very fast read and write – No secondary indexes • Use When – Your data is not highly related – All you need is basic CRUD • Challenges – Complex queries • Check out the Amazon Dynamo Paper • http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo- sosp2007.pdf • Featured Projects – DynamoDB http://hbase.apache.org/ – Riak http://wiki.basho.com/ – Redis http://redis.io/
  • 34. Columnar Stores • In a table, data of the same column is stored together – Storage is not wasted on null value as in row-based stores (RDBMS) – Great for sparse tables – Very fast column operation including aggregation • Use When – Big Data (Excellent leverage of Map Reduce) – Need compression or versioning • Challenges – You better know your access patterns before hand – Keys design is not trivial • Check out Google’s BigTable Paper – http://static.googleusercontent.com/external_content/untrusted_dlcp/research.go ogle.com/en/us/archive/bigtable-osdi06.pdf • Featured Projects – Hbase http://hbase.apache.org/ – Cassanda http://cassandra.apache.org/
  • 35. Document Data Stores • Nested structures of hashes and their values – A document can be • Simply a hash and its value • Hash and another document as its value • No limit in depth – Very Flexible schema – Well-Indexed data – Works well with OOP (No impedance mismatch) – De-normalize as a best practice • Use when – You don’t know much about the schema – The schema very likely to change • Challenges – Complex Join-like queries – Self-referencing documents and circular dependencies • Projects – MongoDB http://www.mongodb.org/ – CouchDB http://couchdb.apache.org/
  • 36. Graph Data Stores • A graph – Perfect for highly interconnected data – Allows for explicit relationships – Fined graph grained-traversal – Very Flexible – Works well with OOP (No impedance mismatch) • Use when – Your data looks like a graph and requires graph question – You are smart enough not to try this on another data store • Challenges – Doesn’t scale-well horizontally • Featured Projects – Neo4j http://neo4j.org/
  • 37. Relational Data Stores • Use when – Your data Highly relational – There is a need to break data into small pieces and assemble it in different ways – When consistence is king – Access patterns are unknown – Reporting • Challenges – Doesn’t scale-well horizontally • Featured Projects – Oracle http://www.oracle.com/index.html – Postgres http://www.postgresql.org/ – Ms SQL Server http://dev.mysql.com/ – MySQL http://www.mysql.com/
  • 38. How do you choose?
  • 39. If It Doesn’t Fit, You Must Acquit! • Data – Does it have a natural structure? – How it is connected to each other? – How is it distributed? – How much? • Access Patterns – Reads/Writes ratio? – Uniform or random? • CAP
  • 40. Other Considerations • Maturity • Stability • Maintainability • Durability • Cost • Tools • Familiarity
  • 42. For Fairness’ Sake! • Relational data stores did not fail us – They actually perform very well • We failed ourselves – By using them as solutions for problems they weren’t designed to solve to begin with • Take any data store and you’ll get as much trouble
  • 43. For Fairness’ Sake! • You can’t expect – A flathead screwdriver to work on a Philips as well as one with the matching Philips blade – A crosshead screwdriver to work on flathead screw
  • 45. Polyglot Persistence • Enterprise application are complex and combine complex problems – Assumption that we should use one data store is absurd – You can’t try to fit all in one model and expect no problem • Polyglot Persistence – To leverage multiple data storages, based on the way data is used by the application • Associated with a learning curve • Long term investment (More productive in the long-run) – Leverage the strength of multiple data stores
  • 46. Polyglot Persistence • Example – MongoDB for the product catalog – Redis for shopping cart – DynamoDB for social profile info – Neo4j for the social graph – HBase for inbox and public feed messages – MySQL for payment and account info – Cassandra for audit and activity log • Disclaimer: I’m not making any recommendation here.
  • 47. NoSQL in the Cloud
  • 48. NoSQL in the Cloud • NoSQL as a commodity – Fully managed data stores (No maintenance) – Elastic scaling – Cheap storage • Featured: – Amazon AWS – Heroku Add-ons – CloudFoundry
  • 50. The A’s the Q’s in the Abstract • What does the rise of all these NoSQL mean to my enterprise? – I’m guessing a lot • What is NoSQL to begin with? – Any non-relational data store • Does it mean “NO SQL”? – No • Could this be just another fad? – I don’t think so
  • 51. The A’s the Q’s in the Abstract • Is a good idea to be the future of my enterprise on these new exotic technologies and simply abandon proven mature RDBMS? – It’s up to you. I will say “No guts, no glory!” • How scalable is scalable? – However much you need it to be
  • 52. The A’s the Q’s in the Abstract • Assuming that I am sold, how do I choose the one that fits my needs the best? – I’ll tell you if you hire me • Is there a middle ground somewhere? – Polyglot Persistence • What is this Polyglot Persistence I hear about? – It’s the middle ground