SlideShare uma empresa Scribd logo
1 de 39
The complexity for minimum component costs has increased at a rate of roughly a
factor of two per year...Certainly over the short term this rate can be expected to
continue, if not to increase. Over the longer term, the rate of increase is a bit more
uncertain, although there is no reason to believe it will not remain nearly constant
for at least 10 years.
-- Gordon Moore, 1965
…Then you better start swimmin’…Or you’ll sink like a
stone…For the times they are a-changin’.
-- Bob Dylan
•NoSQL is a set of concepts that allows the rapid
and efficient processing of data sets with a focus
on performance, reliability, and agility.
Definition of NoSQL
Sounds great… What???
Operational Data
• Read and written by applications to carry out their ordinary functions.
• Examples:
• Shopping cart data in Amazon.com
• Information about employees in a human resources system
• Buy/Sell prices in Fidelity
• Posts made by Facebook users
• Travel Itineraries for bookings done on Expedia
Two Categories of Data
Analytical Data
• Used to provide business intelligence (BI).
• Data is often created by storing the operational data used by applications
over time, and it’s commonly read-only.
• Because these analytical datasets provide a historical record, they’re
commonly much bigger than an application’s current operational data.
• Example:
• A e-commerce company might record all of the purchase data from its web
application, then analyze this data to learn about customer buying habits or market
trends.
• Facebook might sell all the posts made by its users to other companies who can
analyze the posts to determine each user’s significant events so that they can tailor
offers based on user needs, likes and dislikes.
Two Categories of Data
The Problem called Big Data
Cracks in the Single CPU RDBMS System
due to pressure from the four business drivers of the current age.
Volume
• Need to query big data always resulted in performance concerns
in RDBMS.
• These performance concerns were solved by purchasing faster
processors.
• But, the power wall was reached which meant increasing
processor speed was no longer an option.
• System designers shifted their focus from increasing speed on a
single chip (vertical scaling or scale up) to using more processors
working together (horizontal scaling or scale out).
The Problem called Big Data
Velocity
• Many single-processor RDBMSs are unable to keep up with the
demands of real-time inserts and online queries to the database
made by public-facing websites.
• RDBMSs frequently index many columns of every new row, a
process which decreases system performance.
• When single-processor RDBMSs are used as a back end to a web
store front, the random bursts in web traffic slow down response
for everyone, and tuning these systems can be costly when both
high read and write throughput is desired.
• This was another reason for engineers to look for a scaled out
solution.
The Problem called Big Data
Variability
• Companies that want to capture and report on exception data
struggle when attempting to use rigid database schema
structures imposed by RDBMS. For example, if a business unit
wants to capture a few custom fields for a particular customer,
all customer rows within the database need to store this
information even though it doesn’t apply.
• Adding new columns to an RDBMS requires the system be shut
down and ALTER TABLE commands to be run. When a database
is large, this process can impact system availability, costing time
and money.
• This was another reason engineers looked for a more viable
solution.
The Problem called Big Data
Agility
• The most complex part of building applications using RDBMSs is
the process of putting data into and getting data out of the
database.
• If your data has nested and repeated subgroups of data
structures, you need to include an object-relational mapping
layer. The responsibility of this layer is to generate the correct
combination of INSERT, UPDATE, DELETE, and SELECT SQL
statements to move object data to and from the RDBMS
persistence layer.
• This process isn’t simple and is associated with the largest
barrier to rapid change when developing new or modifying
existing applications.
The Problem called Big Data
• It’s more than rows in tables
• NoSQL systems store and retrieve data from many formats: key-value stores, graph
databases, column-family stores, document stores, and even rows in tables.
• It’s free of joins
• NoSQL systems allow you to extract your data using simple interfaces without joins.
• It’s schema-free
• NoSQL systems allow you to drag-and-drop your data into a folder and then query it
without creating an entity-relational model.
The Solution called NoSQL
• It works on many processors
• NoSQL systems allow you to store your database on multiple processors and maintain
high-speed performance.
• It uses shared-nothing commodity computers
• Most NoSQL systems leverage low-cost commodity processors that have separate
RAM and disk.
• It supports linear scalability
• When you add more processors, you get a consistent increase in performance.
• It’s innovative
• NoSQL offers options to a single way of storing, retrieving, and manipulating data.
NoSQL supporters (also known as NoSQLers) have an inclusive attitude about NoSQL
and recognize SQL solutions as viable options. To the NoSQL community, NoSQL
means “Not only SQL.”
What else?
• It’s not about not using the SQL language
• It’s not only open source
• It’s not only about volume
• It’s not about cloud computing
• It’s not just a clever use of RAM and SSD
• It’s not an elite group of products
• It’s not just Hadoop
What is NoSQL not…
Single Complex Component Vs Multiple Simple Components
• Removes Complexity
• Promotes Reuse
• Easier Maintenance
• Functions distributed to many NoSQL (and SQL) databases that
consist of simple tools that have simpler interfaces and well-
defined roles.
• NoSQL products take a Master of one thing Vs Jack of All things
approach.
• Example: MemCache to share objects in RAM, MapReduce to
run batch jobs, DynamoDB to store key-value items.
NoSQL Concepts
Use application tiers to simplify design
NoSQL Concepts
Strategic Use of RAM, SSD and HDD using Consistent Hashing
NoSQL Concepts
Transaction Control Using ACID
•Atomicity
•Consistency
•Isolation
•Durability
NoSQL Concepts
Transaction Control Using BASE
•BAsic Availability
•Soft State
•Eventual Consistency
NoSQL Concepts
NoSQL Concepts
ACID BASE
Get transaction details right Never block a write
Block any reports while you are
working
Focus on throughput, not consistency
Be pessimistic, anything might go
wrong!
Be optimistic, if one service fails it will
eventually get caught up
Detailed testing and failure mode
analysis
Some reports may be inconsistent for
a while, but don’t worry
Lots of locks and unlocks Keep things simple and avoid locks
Automatic Sharding
NoSQL Concepts
Eric Brewer’s CAP Theorem for Replication
Consistency—Having a single, up-to-date, readable version of your data
available to all clients. Consistency here is concerned with multiple clients
reading the same items from replicated partitions and getting consistent
results.
High availability—Knowing that the distributed database will always allow
database clients to update items without delay. Internal communication
failures between replicated data shouldn’t prevent updates.
Partition tolerance—The ability of the system to keep responding to client
requests even if there’s a communication failure between database partitions.
This is analogous to a person still having an intelligent conversation even after
a link between parts of their brain isn’t working.
NoSQL Concepts
NoSQL Concepts Eric Brewer’s CAP Theorem for Replication
NoSQL Concepts
in Action
Four Quadrants of Data Technologies
Operational Relational
SQL Relational Databases
Oracle
SQL Server
MySQL
Relational Analytics
Oracle
SQL Server
MySQL
NoSQL Key-Value Stores
DynamoDB, Azure Tables, Riak, etc.
Column Family Stores
Apache HBase, Apache Cassandra,
Google BigTable, etc.
Document Stores
MongoDB, DocumentDB, etc.
Graph Stores
Neo4j, AllegoGraph, etc.
Big Data Analytics
Hadoop
HDInsight
Operational NoSQL
RDBMS
Key/Value Stores
Column Family Stores
Document Stores
Document Stores
Graph Stores
Graph Store Example: Social Network
Graph Store Example: User’s Order History
Graph Store Example: Airport Terminal
Analytical NoSQL
Big Data Analytics using Hadoop
Big Data Analytics using Hadoop
Hadoop Core Technologies
• Hadoop Distributed File System (HDFS)
• Provides a way to store and access very large binary files across a cluster of
commodity servers and disk drives.
• Hadoop MapReduce
• Supports the creation of applications that process large amounts of analytical data in
parallel. That data is commonly stored in HDFS.
• Hive
• A Hadoop-based framework for querying and analyzing data. Among other things, it
provides HiveQL, a SQL-like language that can generate MapReduce jobs.
• Pig
• Another Hadoop-based framework for working with data. It provides a language called
Pig Latin for creating MapReduce jobs.
Big Data Analytics using Hadoop
• NoSQL really means Not Only SQL
• Volume, Velocity, Variability & Agility are the main business
drivers for NoSQL.
• Key NoSQL Concepts: Multiple Simple Components, Application
Tiers With External Services, Strategic Use of RAM, SSD, HDD,
BASE Transaction Control, Automatic Sharding, Replication Using
CAP.
• Popular NoSQL Datastores: Key-Value, Column Family,
Document, Graph.
• Big Data Analytics using Hadoop
Quick Recap
Q & A

Mais conteúdo relacionado

Mais procurados

Silicon India Java Conference: Building Scalable Solutions For Commerce Silic...
Silicon India Java Conference: Building Scalable Solutions For Commerce Silic...Silicon India Java Conference: Building Scalable Solutions For Commerce Silic...
Silicon India Java Conference: Building Scalable Solutions For Commerce Silic...Kalaiselvan (Selvan)
 
Design a share point 2013 architecture – the basics
Design a share point 2013 architecture – the basicsDesign a share point 2013 architecture – the basics
Design a share point 2013 architecture – the basicsAlexander Meijers
 
Saas & DBaas
Saas & DBaasSaas & DBaas
Saas & DBaasalkuzaee
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineSunil Nagaraj
 
Getting SharePoint 2010 Deployment Right final
Getting SharePoint 2010 Deployment Right finalGetting SharePoint 2010 Deployment Right final
Getting SharePoint 2010 Deployment Right finalvmaximiuk
 
High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...Robert Mederer
 
Dogfood Conference 2010 - What Every SharePoint 2010 Administrator Must Know
Dogfood Conference 2010 - What Every SharePoint 2010 Administrator Must KnowDogfood Conference 2010 - What Every SharePoint 2010 Administrator Must Know
Dogfood Conference 2010 - What Every SharePoint 2010 Administrator Must Knowvmaximiuk
 
Tuning Your SharePoint Environment
Tuning Your SharePoint EnvironmentTuning Your SharePoint Environment
Tuning Your SharePoint Environmentvmaximiuk
 
Creating a Multi-Layered Secured Postgres Database
Creating a Multi-Layered Secured Postgres DatabaseCreating a Multi-Layered Secured Postgres Database
Creating a Multi-Layered Secured Postgres DatabaseEDB
 
Developing a provider hosted share point app
Developing a provider hosted share point appDeveloping a provider hosted share point app
Developing a provider hosted share point appTalbott Crowell
 
Massive Lift & Shift Migrations to Microsoft Azure with the Microsoft Migrati...
Massive Lift & Shift Migrations to Microsoft Azure with the Microsoft Migrati...Massive Lift & Shift Migrations to Microsoft Azure with the Microsoft Migrati...
Massive Lift & Shift Migrations to Microsoft Azure with the Microsoft Migrati...Morgan Simonsen
 
Who Will Win the Database Wars?
Who Will Win the Database Wars?Who Will Win the Database Wars?
Who Will Win the Database Wars?Christopher Foot
 
MongoDB in the Healthcare Enterprise
MongoDB in the Healthcare EnterpriseMongoDB in the Healthcare Enterprise
MongoDB in the Healthcare EnterpriseMongoDB
 
Introduction to Java Enterprise Edition
Introduction to Java Enterprise EditionIntroduction to Java Enterprise Edition
Introduction to Java Enterprise EditionAbdalla Mahmoud
 
Cloud's Hidden Impact on IT Support Organizations
Cloud's Hidden Impact on IT Support OrganizationsCloud's Hidden Impact on IT Support Organizations
Cloud's Hidden Impact on IT Support OrganizationsChristopher Foot
 
TS 4839 - Enterprise Integration Patterns in Practice
TS 4839 - Enterprise Integration Patterns in PracticeTS 4839 - Enterprise Integration Patterns in Practice
TS 4839 - Enterprise Integration Patterns in Practiceaegloff
 
Virtualizing Sharepoint for Performance and Availability
Virtualizing Sharepoint for Performance and AvailabilityVirtualizing Sharepoint for Performance and Availability
Virtualizing Sharepoint for Performance and AvailabilityDamir Bersinic
 
Introduction to Azure SQL DB
Introduction to Azure SQL DBIntroduction to Azure SQL DB
Introduction to Azure SQL DBChristopher Foot
 

Mais procurados (20)

Silicon India Java Conference: Building Scalable Solutions For Commerce Silic...
Silicon India Java Conference: Building Scalable Solutions For Commerce Silic...Silicon India Java Conference: Building Scalable Solutions For Commerce Silic...
Silicon India Java Conference: Building Scalable Solutions For Commerce Silic...
 
Design a share point 2013 architecture – the basics
Design a share point 2013 architecture – the basicsDesign a share point 2013 architecture – the basics
Design a share point 2013 architecture – the basics
 
Saas & DBaas
Saas & DBaasSaas & DBaas
Saas & DBaas
 
SharePoint Topology
SharePoint Topology SharePoint Topology
SharePoint Topology
 
Databus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture PipelineDatabus - LinkedIn's Change Data Capture Pipeline
Databus - LinkedIn's Change Data Capture Pipeline
 
Getting SharePoint 2010 Deployment Right final
Getting SharePoint 2010 Deployment Right finalGetting SharePoint 2010 Deployment Right final
Getting SharePoint 2010 Deployment Right final
 
High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...High Scalability by Example – How can Web-Architecture scale like Facebook, T...
High Scalability by Example – How can Web-Architecture scale like Facebook, T...
 
Dogfood Conference 2010 - What Every SharePoint 2010 Administrator Must Know
Dogfood Conference 2010 - What Every SharePoint 2010 Administrator Must KnowDogfood Conference 2010 - What Every SharePoint 2010 Administrator Must Know
Dogfood Conference 2010 - What Every SharePoint 2010 Administrator Must Know
 
Tuning Your SharePoint Environment
Tuning Your SharePoint EnvironmentTuning Your SharePoint Environment
Tuning Your SharePoint Environment
 
Creating a Multi-Layered Secured Postgres Database
Creating a Multi-Layered Secured Postgres DatabaseCreating a Multi-Layered Secured Postgres Database
Creating a Multi-Layered Secured Postgres Database
 
RavenDB overview
RavenDB overviewRavenDB overview
RavenDB overview
 
Developing a provider hosted share point app
Developing a provider hosted share point appDeveloping a provider hosted share point app
Developing a provider hosted share point app
 
Massive Lift & Shift Migrations to Microsoft Azure with the Microsoft Migrati...
Massive Lift & Shift Migrations to Microsoft Azure with the Microsoft Migrati...Massive Lift & Shift Migrations to Microsoft Azure with the Microsoft Migrati...
Massive Lift & Shift Migrations to Microsoft Azure with the Microsoft Migrati...
 
Who Will Win the Database Wars?
Who Will Win the Database Wars?Who Will Win the Database Wars?
Who Will Win the Database Wars?
 
MongoDB in the Healthcare Enterprise
MongoDB in the Healthcare EnterpriseMongoDB in the Healthcare Enterprise
MongoDB in the Healthcare Enterprise
 
Introduction to Java Enterprise Edition
Introduction to Java Enterprise EditionIntroduction to Java Enterprise Edition
Introduction to Java Enterprise Edition
 
Cloud's Hidden Impact on IT Support Organizations
Cloud's Hidden Impact on IT Support OrganizationsCloud's Hidden Impact on IT Support Organizations
Cloud's Hidden Impact on IT Support Organizations
 
TS 4839 - Enterprise Integration Patterns in Practice
TS 4839 - Enterprise Integration Patterns in PracticeTS 4839 - Enterprise Integration Patterns in Practice
TS 4839 - Enterprise Integration Patterns in Practice
 
Virtualizing Sharepoint for Performance and Availability
Virtualizing Sharepoint for Performance and AvailabilityVirtualizing Sharepoint for Performance and Availability
Virtualizing Sharepoint for Performance and Availability
 
Introduction to Azure SQL DB
Introduction to Azure SQL DBIntroduction to Azure SQL DB
Introduction to Azure SQL DB
 

Destaque

NJ Wrestling Region 4 Champions
NJ Wrestling Region 4 ChampionsNJ Wrestling Region 4 Champions
NJ Wrestling Region 4 ChampionsJIm Traxinger
 
Riscopriamo il mondo contadino della Maremma Settentrionale
Riscopriamo il mondo contadino  della Maremma SettentrionaleRiscopriamo il mondo contadino  della Maremma Settentrionale
Riscopriamo il mondo contadino della Maremma Settentrionalescuolabloggando
 
Tarea 5 motivacion_judithazuaje
Tarea 5 motivacion_judithazuajeTarea 5 motivacion_judithazuaje
Tarea 5 motivacion_judithazuajeyudy7777
 
Pennsylvanian 1-19-79
Pennsylvanian 1-19-79Pennsylvanian 1-19-79
Pennsylvanian 1-19-79JIm Traxinger
 
Caminante Proyecto Educativo and The Re...Dominican Republic - Global Ministr...
Caminante Proyecto Educativo and The Re...Dominican Republic - Global Ministr...Caminante Proyecto Educativo and The Re...Dominican Republic - Global Ministr...
Caminante Proyecto Educativo and The Re...Dominican Republic - Global Ministr...Ashley Holst
 
CHURCH AND SOCIETY--Table and Intoduction
CHURCH AND SOCIETY--Table and IntoductionCHURCH AND SOCIETY--Table and Intoduction
CHURCH AND SOCIETY--Table and IntoductionRudi Maier
 
J.Levy Persentation - M3
J.Levy Persentation - M3J.Levy Persentation - M3
J.Levy Persentation - M3Jacob Levy
 
Garden State HS Wrestling Champions
Garden State HS Wrestling ChampionsGarden State HS Wrestling Champions
Garden State HS Wrestling ChampionsJIm Traxinger
 
SinglePageApplications
SinglePageApplicationsSinglePageApplications
SinglePageApplicationsAdi Challa
 
1979 EIWA Championship
1979 EIWA Championship1979 EIWA Championship
1979 EIWA ChampionshipJIm Traxinger
 
Spanish Vocab
Spanish VocabSpanish Vocab
Spanish VocabJbark13
 

Destaque (20)

NJ Wrestling Region 4 Champions
NJ Wrestling Region 4 ChampionsNJ Wrestling Region 4 Champions
NJ Wrestling Region 4 Champions
 
El sueno
El suenoEl sueno
El sueno
 
La storia è di tutti
La storia è di tuttiLa storia è di tutti
La storia è di tutti
 
Riscopriamo il mondo contadino della Maremma Settentrionale
Riscopriamo il mondo contadino  della Maremma SettentrionaleRiscopriamo il mondo contadino  della Maremma Settentrionale
Riscopriamo il mondo contadino della Maremma Settentrionale
 
Tarea 5 motivacion_judithazuaje
Tarea 5 motivacion_judithazuajeTarea 5 motivacion_judithazuaje
Tarea 5 motivacion_judithazuaje
 
Un fantasma a...scuola
Un fantasma a...scuolaUn fantasma a...scuola
Un fantasma a...scuola
 
Scuolasicura
ScuolasicuraScuolasicura
Scuolasicura
 
Pennsylvanian 1-19-79
Pennsylvanian 1-19-79Pennsylvanian 1-19-79
Pennsylvanian 1-19-79
 
Caminante Proyecto Educativo and The Re...Dominican Republic - Global Ministr...
Caminante Proyecto Educativo and The Re...Dominican Republic - Global Ministr...Caminante Proyecto Educativo and The Re...Dominican Republic - Global Ministr...
Caminante Proyecto Educativo and The Re...Dominican Republic - Global Ministr...
 
CHURCH AND SOCIETY--Table and Intoduction
CHURCH AND SOCIETY--Table and IntoductionCHURCH AND SOCIETY--Table and Intoduction
CHURCH AND SOCIETY--Table and Intoduction
 
In-Vitro Paper
In-Vitro PaperIn-Vitro Paper
In-Vitro Paper
 
J.Levy Persentation - M3
J.Levy Persentation - M3J.Levy Persentation - M3
J.Levy Persentation - M3
 
Garden State HS Wrestling Champions
Garden State HS Wrestling ChampionsGarden State HS Wrestling Champions
Garden State HS Wrestling Champions
 
SinglePageApplications
SinglePageApplicationsSinglePageApplications
SinglePageApplications
 
Presentation 2
Presentation 2Presentation 2
Presentation 2
 
Basic android
Basic androidBasic android
Basic android
 
No morebullconferencefeb2015
No morebullconferencefeb2015No morebullconferencefeb2015
No morebullconferencefeb2015
 
Presentation 4
Presentation 4Presentation 4
Presentation 4
 
1979 EIWA Championship
1979 EIWA Championship1979 EIWA Championship
1979 EIWA Championship
 
Spanish Vocab
Spanish VocabSpanish Vocab
Spanish Vocab
 

Semelhante a NoSQLDatabases

Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesMaynooth University
 
Introduction to NoSQL database technology
Introduction to NoSQL database technologyIntroduction to NoSQL database technology
Introduction to NoSQL database technologynicolausalex722
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBAhmed Farag
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLDataStax
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataAshnikbiz
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQLbalwinders
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7abdulrahmanhelan
 
Evolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital eraEvolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital eraVishal Puri
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databasesJames Serra
 
Introduction to no sql database
Introduction to no sql databaseIntroduction to no sql database
Introduction to no sql databaseHeman Hosainpana
 
Nosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptxNosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptxRadhika R
 
How to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldHow to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldKaren Lopez
 
3170722_BDA_GTU_Study_Material_Presentations_Unit-3_29092021094744AM.pdf
3170722_BDA_GTU_Study_Material_Presentations_Unit-3_29092021094744AM.pdf3170722_BDA_GTU_Study_Material_Presentations_Unit-3_29092021094744AM.pdf
3170722_BDA_GTU_Study_Material_Presentations_Unit-3_29092021094744AM.pdfKrishnaShah908060
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageBethmi Gunasekara
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Denodo
 

Semelhante a NoSQLDatabases (20)

Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choices
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Introduction to NoSQL database technology
Introduction to NoSQL database technologyIntroduction to NoSQL database technology
Introduction to NoSQL database technology
 
NoSql Brownbag
NoSql BrownbagNoSql Brownbag
NoSql Brownbag
 
Introduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDBIntroduction to NoSQL and MongoDB
Introduction to NoSQL and MongoDB
 
How To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQLHow To Tell if Your Business Needs NoSQL
How To Tell if Your Business Needs NoSQL
 
Transform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big DataTransform your DBMS to drive engagement innovation with Big Data
Transform your DBMS to drive engagement innovation with Big Data
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
Relational and non relational database 7
Relational and non relational database 7Relational and non relational database 7
Relational and non relational database 7
 
Evolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital eraEvolution of Distributed Database Technologies in the Digital era
Evolution of Distributed Database Technologies in the Digital era
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
Introduction to no sql database
Introduction to no sql databaseIntroduction to no sql database
Introduction to no sql database
 
No sql
No sqlNo sql
No sql
 
No sql database
No sql databaseNo sql database
No sql database
 
Nosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptxNosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptx
 
How to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database WorldHow to Survive as a Data Architect in a Polyglot Database World
How to Survive as a Data Architect in a Polyglot Database World
 
3170722_BDA_GTU_Study_Material_Presentations_Unit-3_29092021094744AM.pdf
3170722_BDA_GTU_Study_Material_Presentations_Unit-3_29092021094744AM.pdf3170722_BDA_GTU_Study_Material_Presentations_Unit-3_29092021094744AM.pdf
3170722_BDA_GTU_Study_Material_Presentations_Unit-3_29092021094744AM.pdf
 
No SQL- The Future Of Data Storage
No SQL- The Future Of Data StorageNo SQL- The Future Of Data Storage
No SQL- The Future Of Data Storage
 
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?Data Lake Acceleration vs. Data Virtualization - What’s the difference?
Data Lake Acceleration vs. Data Virtualization - What’s the difference?
 

NoSQLDatabases

  • 1. The complexity for minimum component costs has increased at a rate of roughly a factor of two per year...Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years. -- Gordon Moore, 1965 …Then you better start swimmin’…Or you’ll sink like a stone…For the times they are a-changin’. -- Bob Dylan
  • 2. •NoSQL is a set of concepts that allows the rapid and efficient processing of data sets with a focus on performance, reliability, and agility. Definition of NoSQL Sounds great… What???
  • 3. Operational Data • Read and written by applications to carry out their ordinary functions. • Examples: • Shopping cart data in Amazon.com • Information about employees in a human resources system • Buy/Sell prices in Fidelity • Posts made by Facebook users • Travel Itineraries for bookings done on Expedia Two Categories of Data
  • 4. Analytical Data • Used to provide business intelligence (BI). • Data is often created by storing the operational data used by applications over time, and it’s commonly read-only. • Because these analytical datasets provide a historical record, they’re commonly much bigger than an application’s current operational data. • Example: • A e-commerce company might record all of the purchase data from its web application, then analyze this data to learn about customer buying habits or market trends. • Facebook might sell all the posts made by its users to other companies who can analyze the posts to determine each user’s significant events so that they can tailor offers based on user needs, likes and dislikes. Two Categories of Data
  • 5. The Problem called Big Data Cracks in the Single CPU RDBMS System due to pressure from the four business drivers of the current age.
  • 6. Volume • Need to query big data always resulted in performance concerns in RDBMS. • These performance concerns were solved by purchasing faster processors. • But, the power wall was reached which meant increasing processor speed was no longer an option. • System designers shifted their focus from increasing speed on a single chip (vertical scaling or scale up) to using more processors working together (horizontal scaling or scale out). The Problem called Big Data
  • 7. Velocity • Many single-processor RDBMSs are unable to keep up with the demands of real-time inserts and online queries to the database made by public-facing websites. • RDBMSs frequently index many columns of every new row, a process which decreases system performance. • When single-processor RDBMSs are used as a back end to a web store front, the random bursts in web traffic slow down response for everyone, and tuning these systems can be costly when both high read and write throughput is desired. • This was another reason for engineers to look for a scaled out solution. The Problem called Big Data
  • 8. Variability • Companies that want to capture and report on exception data struggle when attempting to use rigid database schema structures imposed by RDBMS. For example, if a business unit wants to capture a few custom fields for a particular customer, all customer rows within the database need to store this information even though it doesn’t apply. • Adding new columns to an RDBMS requires the system be shut down and ALTER TABLE commands to be run. When a database is large, this process can impact system availability, costing time and money. • This was another reason engineers looked for a more viable solution. The Problem called Big Data
  • 9. Agility • The most complex part of building applications using RDBMSs is the process of putting data into and getting data out of the database. • If your data has nested and repeated subgroups of data structures, you need to include an object-relational mapping layer. The responsibility of this layer is to generate the correct combination of INSERT, UPDATE, DELETE, and SELECT SQL statements to move object data to and from the RDBMS persistence layer. • This process isn’t simple and is associated with the largest barrier to rapid change when developing new or modifying existing applications. The Problem called Big Data
  • 10. • It’s more than rows in tables • NoSQL systems store and retrieve data from many formats: key-value stores, graph databases, column-family stores, document stores, and even rows in tables. • It’s free of joins • NoSQL systems allow you to extract your data using simple interfaces without joins. • It’s schema-free • NoSQL systems allow you to drag-and-drop your data into a folder and then query it without creating an entity-relational model. The Solution called NoSQL
  • 11. • It works on many processors • NoSQL systems allow you to store your database on multiple processors and maintain high-speed performance. • It uses shared-nothing commodity computers • Most NoSQL systems leverage low-cost commodity processors that have separate RAM and disk. • It supports linear scalability • When you add more processors, you get a consistent increase in performance. • It’s innovative • NoSQL offers options to a single way of storing, retrieving, and manipulating data. NoSQL supporters (also known as NoSQLers) have an inclusive attitude about NoSQL and recognize SQL solutions as viable options. To the NoSQL community, NoSQL means “Not only SQL.” What else?
  • 12. • It’s not about not using the SQL language • It’s not only open source • It’s not only about volume • It’s not about cloud computing • It’s not just a clever use of RAM and SSD • It’s not an elite group of products • It’s not just Hadoop What is NoSQL not…
  • 13. Single Complex Component Vs Multiple Simple Components • Removes Complexity • Promotes Reuse • Easier Maintenance • Functions distributed to many NoSQL (and SQL) databases that consist of simple tools that have simpler interfaces and well- defined roles. • NoSQL products take a Master of one thing Vs Jack of All things approach. • Example: MemCache to share objects in RAM, MapReduce to run batch jobs, DynamoDB to store key-value items. NoSQL Concepts
  • 14. Use application tiers to simplify design NoSQL Concepts
  • 15. Strategic Use of RAM, SSD and HDD using Consistent Hashing NoSQL Concepts
  • 16. Transaction Control Using ACID •Atomicity •Consistency •Isolation •Durability NoSQL Concepts
  • 17. Transaction Control Using BASE •BAsic Availability •Soft State •Eventual Consistency NoSQL Concepts
  • 18. NoSQL Concepts ACID BASE Get transaction details right Never block a write Block any reports while you are working Focus on throughput, not consistency Be pessimistic, anything might go wrong! Be optimistic, if one service fails it will eventually get caught up Detailed testing and failure mode analysis Some reports may be inconsistent for a while, but don’t worry Lots of locks and unlocks Keep things simple and avoid locks
  • 20. Eric Brewer’s CAP Theorem for Replication Consistency—Having a single, up-to-date, readable version of your data available to all clients. Consistency here is concerned with multiple clients reading the same items from replicated partitions and getting consistent results. High availability—Knowing that the distributed database will always allow database clients to update items without delay. Internal communication failures between replicated data shouldn’t prevent updates. Partition tolerance—The ability of the system to keep responding to client requests even if there’s a communication failure between database partitions. This is analogous to a person still having an intelligent conversation even after a link between parts of their brain isn’t working. NoSQL Concepts
  • 21. NoSQL Concepts Eric Brewer’s CAP Theorem for Replication
  • 23. Four Quadrants of Data Technologies Operational Relational SQL Relational Databases Oracle SQL Server MySQL Relational Analytics Oracle SQL Server MySQL NoSQL Key-Value Stores DynamoDB, Azure Tables, Riak, etc. Column Family Stores Apache HBase, Apache Cassandra, Google BigTable, etc. Document Stores MongoDB, DocumentDB, etc. Graph Stores Neo4j, AllegoGraph, etc. Big Data Analytics Hadoop HDInsight
  • 25. RDBMS
  • 31. Graph Store Example: Social Network
  • 32. Graph Store Example: User’s Order History
  • 33. Graph Store Example: Airport Terminal
  • 35. Big Data Analytics using Hadoop
  • 36. Big Data Analytics using Hadoop
  • 37. Hadoop Core Technologies • Hadoop Distributed File System (HDFS) • Provides a way to store and access very large binary files across a cluster of commodity servers and disk drives. • Hadoop MapReduce • Supports the creation of applications that process large amounts of analytical data in parallel. That data is commonly stored in HDFS. • Hive • A Hadoop-based framework for querying and analyzing data. Among other things, it provides HiveQL, a SQL-like language that can generate MapReduce jobs. • Pig • Another Hadoop-based framework for working with data. It provides a language called Pig Latin for creating MapReduce jobs. Big Data Analytics using Hadoop
  • 38. • NoSQL really means Not Only SQL • Volume, Velocity, Variability & Agility are the main business drivers for NoSQL. • Key NoSQL Concepts: Multiple Simple Components, Application Tiers With External Services, Strategic Use of RAM, SSD, HDD, BASE Transaction Control, Automatic Sharding, Replication Using CAP. • Popular NoSQL Datastores: Key-Value, Column Family, Document, Graph. • Big Data Analytics using Hadoop Quick Recap
  • 39. Q & A