SlideShare uma empresa Scribd logo
1 de 26
Harri Kauhanen
2010-03-09




   NoSQL databases
Database paradigms

• Relational (RDBMS)
• NoSQL
 •   Key-value stores

 •   Document databases

 •   Wide column stores (BigTable and clones)

 •   Graph databases


• Others
Relational databases

• ACID (Atomicity, Consistency, Isolation and
  Durability)
• SQL
• MySQL, PostreSQL, Oracle, ...
dog                                bark
       id: integer                        id: integer
     name: varchar                      dog_id: integer
     mood: varchar                         bark: text
     birthdate: date
      color: enum
                                           comment
                                          id: integer
                                        bark_id: integer
                                        dog_id: integer
                                        comment: text




id                   name        mood        birth_date            color
         12 Stella          Happy         2007-04-01       NULL
         13 Wimma           Hungry        NULL             black
          9 Ninja           NULL          NULL             NULL
Key-value stores
• “One key, one value, no duplicates, and
  crazy fast.”
• It is a Hash!
• The value is binary object aka. “blob” – the
  DB does not understand it and does not
  want to understand it.
• Amazon Dynamo, MemcacheDB, ...
“value”
 “key”
                 ...
         name_€#_Stella^^^
         mood_€#_Happy^^^
dog_12     birthdate%///
             135465645)
                 ...
Document databases

• Key-value store, but the value is (usually)
  structured and “understood” by the DB.
• Querying data is possible (by other means
  than just a key).
• Amazon SimpleDB, CouchDB, MongoDB,
  Riak, ...
“document”
      “key”

                          {
                              type: “Dog”,
                              name: “Stella”,
     dog_12                   mood: “Happy”,
                              birthdate: 2007-04-01
                          }




                        vs.
id            name        mood       birth_date           color
      12 Stella      Happy        2007-04-01      NULL
      13 Wimma       Hungry       NULL            black
       9 Ninja       NULL         NULL            NULL
{
             type: “Dog”,
             name: “Stella”,
             mood: “Happy”,
             birthdate: 2007-04-01,
             barks: [
               {
                 bark: “I had to wear stupid..”
                 comments: [
                   {
dog_12                dog_id: “dog_4”,
                      comment: “You look so cute!”
                   }, {
                      dog_id: “dog_14”,
                      comment: “I hate it, too!”
                   }
                 ]
               }
             ]
         }
Wide column stores

• Often referred as “BigTable clones”
• "a sparse, distributed multi-dimensional
  sorted map"
• Google BigTable, Cassandra (Facebook),
  HBase, ...
“column”

“row-id” “column family”      “title”        “time”       “value”



dog_12              dog    birthdate 15          2007-04-01
                    dog     mood        11       Angry
                    dog     mood        45       Happy
                    dog     name        25       Stella
                    dog     color       34       Black

                   bark      text       11       I had to wear...
Graph databases


• “Relation database is a collection loosely
  connected tables” whereas “Graph
  database is a multi-relational graph”.
• Neo4j, InfoGrid, ...
Dog



 Stella
                           type

              name


              mood        dog_12        barks    bark_59         I had to wear stupid...
 Happy

             birth_date
                                           comment_to


2007-04-01
                                    comment_83             You look so Cute



                             comments



                  dog_4
• Relationships in RDBMS are “weak”.
  •   You may “define” one by using constraints,
      documenting a relationship, writing code, using
      naming conventions etc.

• Relationships in graph databases are first
  class citizens.
• There are no relationships in key-value
  stores, document databases and wide
  column stores.
  •   You may “define” one by using validations,
      documenting a relationship, writing code, using
      naming conventions etc.
• Relational databases have almost limitless
  indexing, and a very strong language for
  dynamic, cross-table, queries (SQL)
  •   That’s why they handle all kinds of relationships
      well and dynamically.

• NoSQL databases...
  •   ...might have limited support for dynamic queries
      and indexing

  •   ...don’t support JOIN like operations of SQL

  •   ...but you can store some relationships into
      document itself
How to query NoSQL?
• Key-Value
• Row-id/column-family:title[/time]
  •   “stella_12”/”dogs”:”name” → Stella

• Graph traversal
• API
• Query-language
• Integration to indexing and search engines
• Map-Reduce
Map-Reduce
• “MapReduce is a programming model and
  an associated implementation for
  processing and generating large data sets.”
• Often JavaScript (NoSQL implementations)
Map-function
  function map(doc) {
    if (doc['type'] == 'Dog') {
      emit(doc['mood'], doc['birthdate']);
    }
  }


• Generates “indexed view” of data/
  documents
• This view is just another hash, but both key
  and value can be “anything”
Reduce-function
• Aggregate results for a “view” (after the
  Map-function)

function reduce(mood, listOfBirthdates) {
  return averageBirthDate(listOfBirthdates);
}



• Map-phase is easy to distribute, but you is also
  easy to write poor Reduce-functions
Theorems
• CAP
 • Consistency,
    Availability,
    Partition tolerance
 • “Pick two”
• N/R/W (adjusting CAP)
No consistency?
Eventual consistency
Why NoSQL?
• Schema-free
• Massive data stores
• Scalability
• Some services simpler to implement than
  using RDBMS
• Great fit for many “Web 2.0” services
Why NOT NoSQL

• DRBMS databases and tools are mature
• NoSQL implementations often “alpha”
• Data consistency, transactions
• “Don’t scale until you need it”
RDBMS vs. NoSQL

• Strong consistency vs. Eventual consistency
• Big dataset vs. HUGE datasets
• Scaling is possible vs. Scaling is easy
• SQL vs. Map-Reduce
• Good availability vs.Very high availability
Questions?

Mais conteúdo relacionado

Mais procurados

NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL DatabasesBADR
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databasesAshwani Kumar
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and UsesSuvradeep Rudra
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.Navdeep Charan
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...Simplilearn
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2Fabio Fumarola
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremRahul Jain
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraGokhan Atil
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQLDon Demcsak
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBaseCloudera, Inc.
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDBvaluebound
 

Mais procurados (20)

Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
NOSQL vs SQL
NOSQL vs SQLNOSQL vs SQL
NOSQL vs SQL
 
A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.A Seminar on NoSQL Databases.
A Seminar on NoSQL Databases.
 
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
HBase Tutorial For Beginners | HBase Architecture | HBase Tutorial | Hadoop T...
 
6 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/26 Data Modeling for NoSQL 2/2
6 Data Modeling for NoSQL 2/2
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Mongo db intro.pptx
Mongo db intro.pptxMongo db intro.pptx
Mongo db intro.pptx
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Intro to Big Data and NoSQL
Intro to Big Data and NoSQLIntro to Big Data and NoSQL
Intro to Big Data and NoSQL
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Apache Hadoop and HBase
Apache Hadoop and HBaseApache Hadoop and HBase
Apache Hadoop and HBase
 
The Basics of MongoDB
The Basics of MongoDBThe Basics of MongoDB
The Basics of MongoDB
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 

NoSQL databases

  • 1. Harri Kauhanen 2010-03-09 NoSQL databases
  • 2. Database paradigms • Relational (RDBMS) • NoSQL • Key-value stores • Document databases • Wide column stores (BigTable and clones) • Graph databases • Others
  • 3. Relational databases • ACID (Atomicity, Consistency, Isolation and Durability) • SQL • MySQL, PostreSQL, Oracle, ...
  • 4. dog bark id: integer id: integer name: varchar dog_id: integer mood: varchar bark: text birthdate: date color: enum comment id: integer bark_id: integer dog_id: integer comment: text id name mood birth_date color 12 Stella Happy 2007-04-01 NULL 13 Wimma Hungry NULL black 9 Ninja NULL NULL NULL
  • 5. Key-value stores • “One key, one value, no duplicates, and crazy fast.” • It is a Hash! • The value is binary object aka. “blob” – the DB does not understand it and does not want to understand it. • Amazon Dynamo, MemcacheDB, ...
  • 6. “value” “key” ... name_€#_Stella^^^ mood_€#_Happy^^^ dog_12 birthdate%/// 135465645) ...
  • 7. Document databases • Key-value store, but the value is (usually) structured and “understood” by the DB. • Querying data is possible (by other means than just a key). • Amazon SimpleDB, CouchDB, MongoDB, Riak, ...
  • 8. “document” “key” { type: “Dog”, name: “Stella”, dog_12 mood: “Happy”, birthdate: 2007-04-01 } vs. id name mood birth_date color 12 Stella Happy 2007-04-01 NULL 13 Wimma Hungry NULL black 9 Ninja NULL NULL NULL
  • 9. { type: “Dog”, name: “Stella”, mood: “Happy”, birthdate: 2007-04-01, barks: [ { bark: “I had to wear stupid..” comments: [ { dog_12 dog_id: “dog_4”, comment: “You look so cute!” }, { dog_id: “dog_14”, comment: “I hate it, too!” } ] } ] }
  • 10. Wide column stores • Often referred as “BigTable clones” • "a sparse, distributed multi-dimensional sorted map" • Google BigTable, Cassandra (Facebook), HBase, ...
  • 11. “column” “row-id” “column family” “title” “time” “value” dog_12 dog birthdate 15 2007-04-01 dog mood 11 Angry dog mood 45 Happy dog name 25 Stella dog color 34 Black bark text 11 I had to wear...
  • 12. Graph databases • “Relation database is a collection loosely connected tables” whereas “Graph database is a multi-relational graph”. • Neo4j, InfoGrid, ...
  • 13. Dog Stella type name mood dog_12 barks bark_59 I had to wear stupid... Happy birth_date comment_to 2007-04-01 comment_83 You look so Cute comments dog_4
  • 14. • Relationships in RDBMS are “weak”. • You may “define” one by using constraints, documenting a relationship, writing code, using naming conventions etc. • Relationships in graph databases are first class citizens. • There are no relationships in key-value stores, document databases and wide column stores. • You may “define” one by using validations, documenting a relationship, writing code, using naming conventions etc.
  • 15. • Relational databases have almost limitless indexing, and a very strong language for dynamic, cross-table, queries (SQL) • That’s why they handle all kinds of relationships well and dynamically. • NoSQL databases... • ...might have limited support for dynamic queries and indexing • ...don’t support JOIN like operations of SQL • ...but you can store some relationships into document itself
  • 16. How to query NoSQL? • Key-Value • Row-id/column-family:title[/time] • “stella_12”/”dogs”:”name” → Stella • Graph traversal • API • Query-language • Integration to indexing and search engines • Map-Reduce
  • 17. Map-Reduce • “MapReduce is a programming model and an associated implementation for processing and generating large data sets.” • Often JavaScript (NoSQL implementations)
  • 18. Map-function function map(doc) { if (doc['type'] == 'Dog') { emit(doc['mood'], doc['birthdate']); } } • Generates “indexed view” of data/ documents • This view is just another hash, but both key and value can be “anything”
  • 19. Reduce-function • Aggregate results for a “view” (after the Map-function) function reduce(mood, listOfBirthdates) { return averageBirthDate(listOfBirthdates); } • Map-phase is easy to distribute, but you is also easy to write poor Reduce-functions
  • 20. Theorems • CAP • Consistency, Availability, Partition tolerance • “Pick two” • N/R/W (adjusting CAP)
  • 23. Why NoSQL? • Schema-free • Massive data stores • Scalability • Some services simpler to implement than using RDBMS • Great fit for many “Web 2.0” services
  • 24. Why NOT NoSQL • DRBMS databases and tools are mature • NoSQL implementations often “alpha” • Data consistency, transactions • “Don’t scale until you need it”
  • 25. RDBMS vs. NoSQL • Strong consistency vs. Eventual consistency • Big dataset vs. HUGE datasets • Scaling is possible vs. Scaling is easy • SQL vs. Map-Reduce • Good availability vs.Very high availability

Notas do Editor

  1. Hello... Today, I’m going to talk about one of the latest buzz words, so called “NoSQL” databases. A little disclaimer: I personally have real-life experience with only one NoSQL database called CouchDB. I use CouchDB at my Haukut.fi service. So, why did I want to talk about the subject: I wanted to learn this stuff myself. What options there are to CouchDB and should I perhaps pick another option for my next generation Haukut.fi. Also, I strongly feel that relational databases are going to be, if not replaced, they will at least get strong competition from these NoSQL solutions. This will not happen within all domains and services, but for simple consumer web services this will eventually happen.
  2. Ok, lets quickly review the database paradigms we have to choose from. We have relational databases -- the ones you use here at Futurice every day in most of our projects. Then there is NoSQL. NoSQL a relatively new term now getting hot and popular on blogsphere and Twitter. Like many buzzwords out there, NoSQL does not have a definite definition. It could refer to all those databases where you don’t have to use SQL. Some more friendly and wise people would put it more softly “not only SQL”. Well, I am not even going to try to give a definition, but instead I will list what kind of data stores are usually categorized under this umbrella term. They are: ... These I am going to talk about today, but then there are also others such as Object Databases, and they are out of the scope of this presentation.
  3. Let’s start with something familiar. The great promise of relational databases is they are ACID. They all support a very strong query language called SQL. And these are some familiar examples of relation database implementations.
  4. This could be a visual representation of a relational database and the relationships between the tables. We have dogs. A dog must have a name and it may have mood, birthdate and a color. We all know that dogs can really speak, and they speak by barking. Dogs may also comment barks made by the others. Quite simple. The content of table ‘Dog’ could be something like this. We can see that Stella is a happy dog and has birth on April fool’s day. Nothing special here, either. You know the stuff.
  5. Ok, that was quick. So, let’s get into the business :) The simplest form of NoSQL are the Key-value stores. They are really simple, you could say “One key...” All in all, a key value-store is just a persistent Hash. The value-part can be anything. The key-value store does not care a bit about the content of it. Example implementations. Amazon Dynamo is very important one, because those Amazon guys published some theoretical material, and other NoSQL databases are partly based on these writings.
  6. If you want to visualize it, you can see that we have a string based key, and that the value could be a serialized Java-object, or anything at all.
  7. Document databases. The basic idea is still quite simple. They are key-value stores, but the difference is that the value... Because of this, querying of data is somehow possible. There might by a query language like SQL, but that is not actually very common. The examples of document databases are...
  8. What is the difference between this picture and the one two slides earlier? The “value” is now called “document”. That’s it!? Well, the document here has a structure. The structure could be JSON, XML or anything. Having a structure means that we might do something with the data. Not just return a binary blob. But having a structure does not mean it should have a predefined structure. With relational databases you have to define a schema before you can store data. If you compare this document here with our relational example, you can see that in the document we do not have a “color”. It could be there, but it does not need to be there. On the other hand, if we want to add a new attribute, we just add it. With relational database we need to adjust the structure of the table, and that’s not always so pleasant. What about relationships? Well, if the relationship is strong enough, you could add it as a part of the document itself. You could do...
  9. ...this. Here, barks and comments are just a part of the dog-document. I would not say that this would be a smart move, because the document might become huge. In Haukut.fi -service “dog” is one document, and “bark” is another. But comments made to a bark ARE part of a bark document.
  10. The next category does not have widely accepted name. Some would call them “wide column stores” and others might say they are “BigTable clones”. Whatever you call them, they ideologically reside somewhere between key-value stores and relational databases. There is no schema, but the data is still semi-structured. Like in relational databases, you could think that there are rows, but they can have any number of columns, and there is no need to store NULL values. Again, if you think of it in terms of relational databases, and talk about “rows” and “tables”, you might just get more confused. I still am a bit confused myself :-) This is one definition... It might not get you any wiser. Hopefully an example will help, but before that let’s see the example implementations. Cassandra is probably the most interesting one, because it is getting a lot of attention. And the reference as the store under the hoods of Facebook is quite good reference, or what do you think :-)
  11. Ok, here’s the example I promised. I could not figure out the best way to depict this, but let’s hope this works. Like in the previous examples, we have a dog called Stella. Stella’s ID is dog_12 and Stella has a number of attributes such as birthdate, mood and name. Internally, we could store this information into a table structure, much like in relational database. Now, if we want to add an attribute “color” to Stella, we can do it easily. Ok, let’s look at terms on the top of the picture. “Row-id” is the “key” to the stored item. ”Column family” and “title” together form a “column”. The difference between these is that the “title” is dynamic and you can define new titles on the fly. But the column family is more or less fixed. It is a bit like “table” in relational database and it is costly to update it’s name, for instance. The “time” is simply the version of an attribute. If we look at the attribute “mood” here, you will see that Stella has been angry from time 11 on, and she became happy at time 45. If you query an attribute without time, you would simply get the latest value. A record is not tied to a single column family. Like in the document database example, I could say “barks are attributes of a dog”. Like this. Perhaps this explains why they are called “wide column stores” as a record can easily consist of a large number of attributes.
  12. Then there are Graph databases. Someone would perhaps leave them out from the “NoSQL” family, but the authors themselves shout that they should belong to the hype. I don’t really know too much about them. The Neo4j seems to be most popular one.
  13. Here’s an example how a graph database could look like. Again, there is our dog_12 having a name, mood, birthdate and so on. The relationships within graph databases are strong. What I mean to say is...
  14. Relationships in... There is no REAL relationship between the tables, you may... In graph databases, however, the relationships are... It means that you can do very efficient calculations you might need in some social applications. I’m talking about friends and friends-of-friends here. What about the other NoSQL databases. You could say that... BUT just like in relational databases, you may... The only difference between these sentences is that here I talk about “constraints” and here “validations”... and again, they are, in a sense, the same thing. Why, then, relational databases are so good handling relationships? It is because they have...
  15. ...SQL. And great support for indexing! NoSQL databases, on the other hand, might... And they usually don’t... But, on positive side, you... However, compared to the power SQL, isn’t this quite disappointing?
  16. Well, how do you query NoSQL then? Of course, you can access a document with a key you know. ...and with a BigTable also query per attribute like this. With graph databases you do “graph traversals”. Usually there is an API which might support queries such as “give me all the dogs”. Some NoSQL solutions also provide an SQL-like query language. Then you can integrate to various search engines and some NoSQL solutions may provide integration out-of-the box. ...but the most common way seems to be to use Map-Reduce functions.
  17. When Google popularized the term Map-Reduce, the first sentence of the publication was: “...” The big idea is that without understanding about parallel and distributed programming, users may write quite simple Map and Reduce functions to solve any kinds of problems. These functions can then be run in a big cluster of processes and computers. These Map and Reduce functions could be written with any language, but quite often NoSQL databases provide a JavaScript based solution. The main point is not the language. Rather you just need a means to process a record, and then a way to return processed data back to the pipeline.
  18. A very simple example of a Map function could be this. The parameter passed to the Map-function is always a single document. The example here checks that given document is a dog, and if it is, it will return dog’s mood as a key and birthdate as value. Now, think that we have 10 million dogs in our database. It will be very easy to distribute this function because given a single document, you should always get the same key/value pair back. It is a little bit like rendering an animated 3D movie, where you distribute the rendering of each individual frame to number of workers, and the result will be frame number as a key and the bitmap as value. From NoSQL databases point of view, you may use Map-functions to generate views for your data. Internally the database caches the results so that the data will be very fast to access. With this map function defined as a view, I could easily find the birth dates of happy dogs.
  19. Reduce function exists to aggregate results from a dataset. A common scenario would be to calculate sum of values. Here’s an example of quite stupid reduce function. You always pass two parameters to reduce phase: a key and values for the given key. This reduce function would simply calculate the average ages of happy dogs, angry dogs, bored dogs and so on. And as a footnote, you know it is easy to write poorly performing SQL, but you can easily write poor reduce functions as well.
  20. If you want to get deeper into the subject, you should probably google this keyword: CAP Shortly, CAP theorem says that: if you have a distributed data system, you can only get two out of these three features. Consistency: meaning that once you write something into the database, all the readers of the system will get the same result. Availability: meaning the database will function even if a single component in the system fails. Partition tolerance is easiest to explain using an example. Imagine we have a database distributed that have nodes all over Europe and United states. Now, if the network connections suddenly disappear between the continents, we will have two partitions. If the system is still able to operate normally allowing both reads and writes, and is able to recover once the connections between Europe and USA works again, the system is called “partition tolerant”. If you think of a relational database there is no such concept as partition tolerance at all. However, relational databases are very consistent and can be quite highly available. Most NoSQL databases focus on providing very high availability. Most of the systems are partition tolerant, but there are also system providing consistency over partition tolerance. Then, there are some databases that let you decide the level of CAP you want. If you are interested, you could try googling N/R/W.
  21. Ok, doesn’t it sound quite bad if a database is not consistent? If you write your comment into Facebook, some of your friends will see it immediately and others only after couple of seconds? The database designers (and I) think that consistency is not always top priority. What matters is that the data will be..
  22. ..eventually consistent. Eventual consistency means that the consistent state may happen instantly, after a few seconds or after a network connections have been restored. But it will eventually happen. If you get better performance, better availability and better scalability, the ACID level consistency might not be that important after all. And of course, this depends on the domain and the application you are doing. Eventual consistency might not be a good idea when you are transacting money or doing other such critical task.
  23. Why you should thing a NoSQL solution for your next project? Being schema-free is a HUGE help for many problems. And it almost always makes the life of developer much more pleasant. If you need to store massive amount of data, or know you need to scale easily, NoSQL might be for you. I could also claim that some services are perhaps easier to implement with a NoSQL solution that with a relational database. And this very true for many simple Web 2.0 services.
  24. Then, why NoSQL is not always the best choice. First, relational databases have been around for many decades, and they are mature. The developer tools are mature. NoSQL solutions, on the other hand, are often new projects with great ideas and even greater promises, but only “alpha” quality. If you need strong data consistency, or if you need uncompromised support for transactions, use a relational database. And the last point. We often tend to think of scalability issues too early. It might not be a bad choice to write an application with the tools you know best, and when the time hits, and you need to scale, then think if a NoSQL solution could solve some of your scalability issues.
  25. This is my last slide. If you want to compare relational databases with NoSQL databases, these could be the main points. We talked about consistency. In relational model, with ACID operation, it is very strong. Relational databases support rather big datasets, but some NoSQL give you almost unlimited scalability in terms of data size. Scaling can mean many things, but you could safely say that NoSQL solutions usually scale up much better than relational databases do. SQL is wonderful query language and NoSQL solutions often do not support any kind of query language. If they do, the languages are likely not as expressive as SQL is. On the other hand, Map-Reduce can solve some problems very efficiently. And in theory at least, high availability is also easier achieved with many NoSQL solutions. And remember, NoSQL solutions are very different from each other. And this slide perhaps simplifies things too much. But it is still a very good slide to finish this presentation. Thank you!