SlideShare a Scribd company logo
1 of 22
NoSQL databases and MapReduce J Singh Early Stage IT
What’s so fun about databases? Traditional database discussions talked about Employee records Bank records Now we talk about Web search Data mining The collective intelligence of tweets Scientific and medical databases
How much data can a database hold? The biggest OLTP databases 2001: 1.1 – 10.3 TB. 2003: 9.1 – 29.2 TB. 2005: 17.7 – 100.4 TB. 2010: ~2.5 PB. The trend will continue Very large databases bring new unique challenges
Historical Context Late 1990’s. The web scales out. Suddenly, databases not adequate for holding the data being accumulated Scale out vs. Scale up
Brewer’s Conjecture (p1) Source: Eric Brewer’s July 2000 PODC Keynote Main points: Classic “Distributed Systems” don’t work They focus on computation, not data Distributing computation is easy, distributing data is hard DBMS research is about ACID (mostly) Atomicity, Consistency, Isolation and Durability But we forfeit “C” and “I” for availability, graceful degradation and performance – this tradeoff is fundamental BASE Basically Available Soft-state Eventual Consistency
Brewer’s Conjecture (p2) BASE Weak consistency stale data OK Availability first Best effort Approximate answers OK Aggressive (optimistic) Simpler! Faster Easier evolution ACID Strong consistency Isolation Focus on “commit” Nested transactions Availability? Conservative (pessimistic) Difficult evolution (e.g. schema) But I think it’s a spectrum Eric Brewer
CAP Theorem Since then, Brewer’s conjecture formally proved: Gilbert & Lynch, 2002 Thus Brewer’s conjecture became the CAP theorem… …and contributed to the birth of the NoSQL movement But the theory is not settled While http://nosql-database.org/ lists 122 NoSQL databases
What is NoSQL? Stands for Not Only SQL Class of non-relational data storage systems Usually do not require a fixed table schema nor do they use the concept of joins All NoSQL offerings relax one or more of the ACID properties
Forces at Work Three major papers were the seeds of the NoSQL movement CAP Theorem (discussed above) BigTable(Google) Dynamo (Amazon) Some types of data could not be modeled well in RDBMS Document Storage and Indexing Recursive Data and Graphs Time Series Data Genomics Data
NoSQL Databases Key-Value Stores A storage system that stores values, indexed by a key. Example: Voldemort, Dynomite, Tokyo Cabinet BigTable Clones (aka "ColumnFamily") A tabular model where each row (at least in theory) can have an individual configuration of columns. Example: HBase, Hypertable, Cassandra, Amazon SimpleDB
NoSQL Databases Document Databases Collections of documents, which contain key-value collections (called "documents") Example: CouchDB, MongoDB, Riak Graph Databases Nodes & relationships, both of which can hold key-value pairs Example: AllegroGraph, InfoGrid, Neo4j
Amazon SimpleDB Key-value store Written in Erlang, (as is CouchDB) Data is modeled in terms of Domain, a container of entities, Item, an entity and  Attribute and Value, a property of an Item Eventually Consistent, except when ReadConsistent flag specified Impressive performance numbers,  e.g., .7 sec to store 1 million records SQL-like SELECT select output_list from domain_name [where expression] [sort_instructions] [limit limit]
Google Datastore Part of App Engine; also used for internal applications Used for all storage Incorporates a transaction model to ensure high consistency Optimistic locking Transactions can fail CAP implications Datastore isn’t just “eventually consistent” They offer two commercial options (with different prices) Master/Slave  Low latency but also lower availability Asynchronous replication High Replication Strong availability at the cost of higher latency
[object Object]
For more info, see video of Ryan Barrett’s talk at Google I/ODatastore Application at Google
Databases and Key-Value Stores http://browsertoolkit.com/fault-tolerance.png
MapReduce Conceptual Underpinnings Programming model from Lisp and other functional languages (map square '(1 2 3 4))  (1 4 9 16) (reduce + '(1 4 9 16)) 30  Easy to distribute Nice failure/retry semantics
MapReduce Flow
HadoopMapReduce An Open Source project of the Apache Foundation Other Hadoop-related projects at Apache include: Cassandra™: A scalable multi-master database with no single points of failure. HBase™: A scalable, distributed database that supports structured data storage for large tables. Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying. Pig™: A high-level data-flow language and execution framework for parallel computation. See the Apache Hadoop website for more.
Hadoop Availability Run on your laptop Run on your server Run on Amazon Cloud Introduction at IBM DeveloperWorks Run on Google App Engine It’s not Hadoop, it’s Google’s implementation of MapReduce
MapReduce Statistics @ GOOG Take-away message: MapReduce is not a “new-fangled technology of the future” It is here, it is proven, use it!
End of an Era? The Relational Model is not necessarily the answer It was excellent for data processing Not a natural fit for Data Warehouses Web-oriented search Real-time analytics, and Semi-structured data i.e., Semantic Web SQL is not the answer Coupling between modern programming languages and SQL are “ugly beyond belief” Programming languages have evolved while SQL has remained static Pascal C/C++ Java The little languages: Python, Perl, PHP, Ruby ,[object Object],A critique of the “one size fits all” assumption in DBMS

More Related Content

What's hot (20)

Data models in NoSQL
Data models in NoSQLData models in NoSQL
Data models in NoSQL
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
 
NOSQL Databases types and Uses
NOSQL Databases types and UsesNOSQL Databases types and Uses
NOSQL Databases types and Uses
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Spark Workshop
Spark WorkshopSpark Workshop
Spark Workshop
 
MongoDB presentation
MongoDB presentationMongoDB presentation
MongoDB presentation
 
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)The evolution of Netflix's S3 data warehouse (Strata NY 2018)
The evolution of Netflix's S3 data warehouse (Strata NY 2018)
 
NOSQL vs SQL
NOSQL vs SQLNOSQL vs SQL
NOSQL vs SQL
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
CS8391 Data Structures Part B Questions Anna University
CS8391 Data Structures Part B Questions Anna UniversityCS8391 Data Structures Part B Questions Anna University
CS8391 Data Structures Part B Questions Anna University
 
NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
Mongo db intro.pptx
Mongo db intro.pptxMongo db intro.pptx
Mongo db intro.pptx
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Viewers also liked

Query mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesQuery mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesArangoDB Database
 
Query Languages for Document Stores
Query Languages for Document StoresQuery Languages for Document Stores
Query Languages for Document StoresInteractiveCologne
 
Amadeus big data
Amadeus big dataAmadeus big data
Amadeus big dataMayank Shah
 
Chaordic - BigData e MapReduce - Robson Motta
Chaordic - BigData e MapReduce - Robson Motta Chaordic - BigData e MapReduce - Robson Motta
Chaordic - BigData e MapReduce - Robson Motta Chaordic
 
(SEC402) Intrusion Detection in the Cloud | AWS re:Invent 2014
(SEC402) Intrusion Detection in the Cloud | AWS re:Invent 2014(SEC402) Intrusion Detection in the Cloud | AWS re:Invent 2014
(SEC402) Intrusion Detection in the Cloud | AWS re:Invent 2014Amazon Web Services
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation Ericsson Labs
 
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsLeveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsMethod360
 
Leveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessLeveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessDataWorks Summit
 
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Alexandre Morgaut
 
Leveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessLeveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessDataWorks Summit
 
The NoSQL Way in Postgres
The NoSQL Way in PostgresThe NoSQL Way in Postgres
The NoSQL Way in PostgresEDB
 
A testbeszéd
A testbeszédA testbeszéd
A testbeszédnovigi
 

Viewers also liked (20)

Query mechanisms for NoSQL databases
Query mechanisms for NoSQL databasesQuery mechanisms for NoSQL databases
Query mechanisms for NoSQL databases
 
CouchDB Map/Reduce
CouchDB Map/ReduceCouchDB Map/Reduce
CouchDB Map/Reduce
 
Query Languages for Document Stores
Query Languages for Document StoresQuery Languages for Document Stores
Query Languages for Document Stores
 
Amadeus big data
Amadeus big dataAmadeus big data
Amadeus big data
 
Chaordic - BigData e MapReduce - Robson Motta
Chaordic - BigData e MapReduce - Robson Motta Chaordic - BigData e MapReduce - Robson Motta
Chaordic - BigData e MapReduce - Robson Motta
 
Drupal 6 Database layer
Drupal 6 Database layerDrupal 6 Database layer
Drupal 6 Database layer
 
Ejemplos acid
Ejemplos acidEjemplos acid
Ejemplos acid
 
Spark and MongoDB
Spark and MongoDBSpark and MongoDB
Spark and MongoDB
 
(SEC402) Intrusion Detection in the Cloud | AWS re:Invent 2014
(SEC402) Intrusion Detection in the Cloud | AWS re:Invent 2014(SEC402) Intrusion Detection in the Cloud | AWS re:Invent 2014
(SEC402) Intrusion Detection in the Cloud | AWS re:Invent 2014
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation
 
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP AnalyticsLeveraging SAP HANA with Apache Hadoop and SAP Analytics
Leveraging SAP HANA with Apache Hadoop and SAP Analytics
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Leveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessLeveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine Business
 
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
Wakanda: NoSQL for Model-Driven Web applications - NoSQL matters 2012
 
Leveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine BusinessLeveraging SAP, Hadoop, and Big Data to Redefine Business
Leveraging SAP, Hadoop, and Big Data to Redefine Business
 
The NoSQL Way in Postgres
The NoSQL Way in PostgresThe NoSQL Way in Postgres
The NoSQL Way in Postgres
 
A testbeszéd
A testbeszédA testbeszéd
A testbeszéd
 
Arckifejezések
ArckifejezésekArckifejezések
Arckifejezések
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
MapReduce入門
MapReduce入門MapReduce入門
MapReduce入門
 

Similar to NoSQL databases and MapReduce: From Big Data to distributed computing models

Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless DatabasesDan Gunter
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...Felix Gessert
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, HowIgor Moochnick
 
Database Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastInside Analysis
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12mark madsen
 
Implementation of nosql for robotics
Implementation of nosql for roboticsImplementation of nosql for robotics
Implementation of nosql for roboticsJoão Gabriel Lima
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookAmr Awadallah
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
Big Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldBig Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldJongwook Woo
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...South London Geek Nights
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilledrICh morrow
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and HowBigBlueHat
 
The World of Structured Storage System
The World of Structured Storage SystemThe World of Structured Storage System
The World of Structured Storage SystemSchubert Zhang
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasThoughtworks
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irdatastack
 
05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.pptAnandKonj1
 

Similar to NoSQL databases and MapReduce: From Big Data to distributed computing models (20)

Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
NoSQL Data Stores in Research and Practice - ICDE 2016 Tutorial - Extended Ve...
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Database Revolution - Exploratory Webcast
Database Revolution - Exploratory WebcastDatabase Revolution - Exploratory Webcast
Database Revolution - Exploratory Webcast
 
Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12Database revolution opening webcast 01 18-12
Database revolution opening webcast 01 18-12
 
Implementation of nosql for robotics
Implementation of nosql for roboticsImplementation of nosql for robotics
Implementation of nosql for robotics
 
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and FacebookHow Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
How Hadoop Revolutionized Data Warehousing at Yahoo and Facebook
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Big Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data WorldBig Data Fundamentals in the Emerging New Data World
Big Data Fundamentals in the Emerging New Data World
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
 
Nosql
NosqlNosql
Nosql
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilled
 
NoSQL: Why, When, and How
NoSQL: Why, When, and HowNoSQL: Why, When, and How
NoSQL: Why, When, and How
 
The World of Structured Storage System
The World of Structured Storage SystemThe World of Structured Storage System
The World of Structured Storage System
 
Next Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon ThomasNext Generation Data Platforms - Deon Thomas
Next Generation Data Platforms - Deon Thomas
 
NoSQL Basics - A Quick Tour
NoSQL Basics - A Quick TourNoSQL Basics - A Quick Tour
NoSQL Basics - A Quick Tour
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
 
05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt
 

More from J Singh

OpenLSH - a framework for locality sensitive hashing
OpenLSH  - a framework for locality sensitive hashingOpenLSH  - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashingJ Singh
 
Designing analytics for big data
Designing analytics for big dataDesigning analytics for big data
Designing analytics for big dataJ Singh
 
Open LSH - september 2014 update
Open LSH  - september 2014 updateOpen LSH  - september 2014 update
Open LSH - september 2014 updateJ Singh
 
PaaS - google app engine
PaaS  - google app enginePaaS  - google app engine
PaaS - google app engineJ Singh
 
Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)J Singh
 
Data Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsData Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsJ Singh
 
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceFacebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceJ Singh
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data LaboratoryJ Singh
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop EcosystemJ Singh
 
Social Media Mining using GAE Map Reduce
Social Media Mining using GAE Map ReduceSocial Media Mining using GAE Map Reduce
Social Media Mining using GAE Map ReduceJ Singh
 
High Throughput Data Analysis
High Throughput Data AnalysisHigh Throughput Data Analysis
High Throughput Data AnalysisJ Singh
 
CS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitCS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitJ Singh
 
CS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlCS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlJ Singh
 
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query OptimizationJ Singh
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query ExecutionJ Singh
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementJ Singh
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceJ Singh
 
CS 542 Database Index Structures
CS 542 Database Index StructuresCS 542 Database Index Structures
CS 542 Database Index StructuresJ Singh
 
CS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and PerformanceCS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and PerformanceJ Singh
 
CS 542 Overview of query processing
CS 542 Overview of query processingCS 542 Overview of query processing
CS 542 Overview of query processingJ Singh
 

More from J Singh (20)

OpenLSH - a framework for locality sensitive hashing
OpenLSH  - a framework for locality sensitive hashingOpenLSH  - a framework for locality sensitive hashing
OpenLSH - a framework for locality sensitive hashing
 
Designing analytics for big data
Designing analytics for big dataDesigning analytics for big data
Designing analytics for big data
 
Open LSH - september 2014 update
Open LSH  - september 2014 updateOpen LSH  - september 2014 update
Open LSH - september 2014 update
 
PaaS - google app engine
PaaS  - google app enginePaaS  - google app engine
PaaS - google app engine
 
Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)Mining of massive datasets using locality sensitive hashing (LSH)
Mining of massive datasets using locality sensitive hashing (LSH)
 
Data Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and TradeoffsData Analytic Technology Platforms: Options and Tradeoffs
Data Analytic Technology Platforms: Options and Tradeoffs
 
Facebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/ReduceFacebook Analytics with Elastic Map/Reduce
Facebook Analytics with Elastic Map/Reduce
 
Big Data Laboratory
Big Data LaboratoryBig Data Laboratory
Big Data Laboratory
 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
 
Social Media Mining using GAE Map Reduce
Social Media Mining using GAE Map ReduceSocial Media Mining using GAE Map Reduce
Social Media Mining using GAE Map Reduce
 
High Throughput Data Analysis
High Throughput Data AnalysisHigh Throughput Data Analysis
High Throughput Data Analysis
 
CS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed CommitCS 542 -- Concurrency Control, Distributed Commit
CS 542 -- Concurrency Control, Distributed Commit
 
CS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency ControlCS 542 -- Failure Recovery, Concurrency Control
CS 542 -- Failure Recovery, Concurrency Control
 
CS 542 -- Query Optimization
CS 542 -- Query OptimizationCS 542 -- Query Optimization
CS 542 -- Query Optimization
 
CS 542 -- Query Execution
CS 542 -- Query ExecutionCS 542 -- Query Execution
CS 542 -- Query Execution
 
CS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage ManagementCS 542 Putting it all together -- Storage Management
CS 542 Putting it all together -- Storage Management
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
CS 542 Database Index Structures
CS 542 Database Index StructuresCS 542 Database Index Structures
CS 542 Database Index Structures
 
CS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and PerformanceCS 542 Controlling Database Integrity and Performance
CS 542 Controlling Database Integrity and Performance
 
CS 542 Overview of query processing
CS 542 Overview of query processingCS 542 Overview of query processing
CS 542 Overview of query processing
 

Recently uploaded

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 

Recently uploaded (20)

Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 

NoSQL databases and MapReduce: From Big Data to distributed computing models

  • 1. NoSQL databases and MapReduce J Singh Early Stage IT
  • 2. What’s so fun about databases? Traditional database discussions talked about Employee records Bank records Now we talk about Web search Data mining The collective intelligence of tweets Scientific and medical databases
  • 3. How much data can a database hold? The biggest OLTP databases 2001: 1.1 – 10.3 TB. 2003: 9.1 – 29.2 TB. 2005: 17.7 – 100.4 TB. 2010: ~2.5 PB. The trend will continue Very large databases bring new unique challenges
  • 4. Historical Context Late 1990’s. The web scales out. Suddenly, databases not adequate for holding the data being accumulated Scale out vs. Scale up
  • 5. Brewer’s Conjecture (p1) Source: Eric Brewer’s July 2000 PODC Keynote Main points: Classic “Distributed Systems” don’t work They focus on computation, not data Distributing computation is easy, distributing data is hard DBMS research is about ACID (mostly) Atomicity, Consistency, Isolation and Durability But we forfeit “C” and “I” for availability, graceful degradation and performance – this tradeoff is fundamental BASE Basically Available Soft-state Eventual Consistency
  • 6. Brewer’s Conjecture (p2) BASE Weak consistency stale data OK Availability first Best effort Approximate answers OK Aggressive (optimistic) Simpler! Faster Easier evolution ACID Strong consistency Isolation Focus on “commit” Nested transactions Availability? Conservative (pessimistic) Difficult evolution (e.g. schema) But I think it’s a spectrum Eric Brewer
  • 7. CAP Theorem Since then, Brewer’s conjecture formally proved: Gilbert & Lynch, 2002 Thus Brewer’s conjecture became the CAP theorem… …and contributed to the birth of the NoSQL movement But the theory is not settled While http://nosql-database.org/ lists 122 NoSQL databases
  • 8. What is NoSQL? Stands for Not Only SQL Class of non-relational data storage systems Usually do not require a fixed table schema nor do they use the concept of joins All NoSQL offerings relax one or more of the ACID properties
  • 9. Forces at Work Three major papers were the seeds of the NoSQL movement CAP Theorem (discussed above) BigTable(Google) Dynamo (Amazon) Some types of data could not be modeled well in RDBMS Document Storage and Indexing Recursive Data and Graphs Time Series Data Genomics Data
  • 10. NoSQL Databases Key-Value Stores A storage system that stores values, indexed by a key. Example: Voldemort, Dynomite, Tokyo Cabinet BigTable Clones (aka "ColumnFamily") A tabular model where each row (at least in theory) can have an individual configuration of columns. Example: HBase, Hypertable, Cassandra, Amazon SimpleDB
  • 11. NoSQL Databases Document Databases Collections of documents, which contain key-value collections (called "documents") Example: CouchDB, MongoDB, Riak Graph Databases Nodes & relationships, both of which can hold key-value pairs Example: AllegroGraph, InfoGrid, Neo4j
  • 12. Amazon SimpleDB Key-value store Written in Erlang, (as is CouchDB) Data is modeled in terms of Domain, a container of entities, Item, an entity and Attribute and Value, a property of an Item Eventually Consistent, except when ReadConsistent flag specified Impressive performance numbers, e.g., .7 sec to store 1 million records SQL-like SELECT select output_list from domain_name [where expression] [sort_instructions] [limit limit]
  • 13. Google Datastore Part of App Engine; also used for internal applications Used for all storage Incorporates a transaction model to ensure high consistency Optimistic locking Transactions can fail CAP implications Datastore isn’t just “eventually consistent” They offer two commercial options (with different prices) Master/Slave Low latency but also lower availability Asynchronous replication High Replication Strong availability at the cost of higher latency
  • 14.
  • 15. For more info, see video of Ryan Barrett’s talk at Google I/ODatastore Application at Google
  • 16. Databases and Key-Value Stores http://browsertoolkit.com/fault-tolerance.png
  • 17. MapReduce Conceptual Underpinnings Programming model from Lisp and other functional languages (map square '(1 2 3 4))  (1 4 9 16) (reduce + '(1 4 9 16)) 30 Easy to distribute Nice failure/retry semantics
  • 19. HadoopMapReduce An Open Source project of the Apache Foundation Other Hadoop-related projects at Apache include: Cassandra™: A scalable multi-master database with no single points of failure. HBase™: A scalable, distributed database that supports structured data storage for large tables. Hive™: A data warehouse infrastructure that provides data summarization and ad hoc querying. Pig™: A high-level data-flow language and execution framework for parallel computation. See the Apache Hadoop website for more.
  • 20. Hadoop Availability Run on your laptop Run on your server Run on Amazon Cloud Introduction at IBM DeveloperWorks Run on Google App Engine It’s not Hadoop, it’s Google’s implementation of MapReduce
  • 21. MapReduce Statistics @ GOOG Take-away message: MapReduce is not a “new-fangled technology of the future” It is here, it is proven, use it!
  • 22.
  • 23. Take Aways NoSQL databases are a solution to web-scale problems A lot of data lives outside relational databases With SQLnix.org, we are starting a local resource for NoSQL database knowledge Taking on projects to apply the technology, not just read about it. If you want to work on it, please contact us. Thanks