SlideShare a Scribd company logo
1 of 81
NOSQL DATABASES
AND BIG DATA STORAGE SYSTEMS
Ateeq Ateeq
CONTENT
 1- Introduction to NOSQL Systems
 2- The CAP Theorem
 3- Document-Based NOSQL Systems and MongoDB
 4- NOSQL Key-Value Stores
 5- Column-Based or Wide Column NOSQL Systems
 6- NOSQL Graph Databases and Neo4j
INTRODUCTION TO NOSQL SYSTEMS
 1.1 Emergence of NOSQL Systems
 1.2 Characteristics of NOSQL Systems
 1.3 Categories of NOSQL Systems
1.1 EMERGENCE OF NOSQL SYSTEMS
 SQL system may not be appropriate for some applications
such as Emails
 SQL systems offer too many services (powerful query
language, concurrency control, etc.), which this application
may not need;
 structured data model such the traditional relational model
may be too restrictive.
 SQL require schemas, which are not required by many of
the NOSQL systems.
1.1 EMERGENCE OF NOSQL SYSTEMS
 Examples of NOSQL systems:
 Google – BigTable
 Amazon – DynamoDB
 Facebook – Cassandra
 MongoDB
 CouchDB
 Graph databases like Neo4J and GraphBase
1.2 CHARACTERISTICS OF NOSQL SYSTEMS
 NOSQL characteristics related to distributed
databases and distributed systems.
 NOSQL characteristics related to data models and
query languages.
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
1- Scalability:
 horizontal scalability: adding more nodes for data
storage and processing as the volume of data grows.
 Vertical scalability: expanding the storage and
computing power of existing nodes.
 In NOSQL systems, horizontal scalability is employed
while the system is operational, so techniques for
distributing the existing data among new nodes without
interrupting system operation are necessary.
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
2- Availability, Replication and Eventual Consistency:
 Data is replicated over two or more nodes in a
transparent manner.
 Update must be applied to every copy of the replicated
data items.
 Eventual consistency: is a consistency model used in
distributed computing to achieve high availability that
informally guarantees that, if no new updates are made
to a given data item, eventually all accesses to that item
will return the last updated value.
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
3- Replication Models:
 Master-slave replication: requires one copy to be the
master copy;
 Write operations must be applied to the master copy, usually
using eventual consistency
 For read, all reads are from the master copy, or reads at the
slave copies but would not guarantee that the values are the
latest writes.
 Master-master replication: allows reads and writes at
any of the replicas.
 The values of an item will be temporarily inconsistent.
 Reconciliation method to resolve conflicting write operations of
the same data item at different nodes must be implemented as
part of the master-master replication scheme.
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
 4- Sharding of Files:
 Files can have many millions of records accessed concurrently by
thousands of users.
 Sharding (also known as horizontal) serves to distribute the load
of accessing the file records to multiple nodes.
 Shards works in tandem to improve load balancing on the
replication as well as data availability.
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
 5- High-Performance Data Access:
 Hashing: The location of the value is given by the result of h(k).
 Range partitioning: the location is determined via a range of key values.
Example: location i would hold the objects whose key values K are in the
range Kimin ≤ K ≤ Kimax.
In applications that require range queries, where multiple objects within a range of
key values are retrieved, range partitioned is preferred.
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
CHARACTERISTICS RELATED TO DISTRIBUTED
DATABASES AND DISTRIBUTED SYSTEMS
CHARACTERISTICS RELATED TO DATA MODELS
AND QUERY LANGUAGES.
 1- Not Requiring a Schema:
 Allowing semi-structured and self describing data.
 The users can specify a partial schema in some systems to improve storage
efficiency, but it is not required to have a schema in most of the NOSQL
systems.
 Constraints on the data would have to be programmed in the application
programs that access the data items.
 Languages for describing semi-structured data: JSON (JavaScript Object
Notation) and XML (Extensible Markup Language)
CHARACTERISTICS RELATED TO DATA MODELS
AND QUERY LANGUAGES.
 2- Less Powerful Query Languages:
 In many applications that use NOSQL systems may not require a powerful
query language such as SQL, because search (read) queries in these systems
often locate single objects in a single file based on their object keys.
 Reading and writing the data objects is accomplished by calling the
appropriate operations by the programmer (API).
 SCRUD: Search, Create, Read, Update and Delete
 Provide a high-level query language, but it may not have the full power of
SQL, for example the joins need to be implemented in the application
programs.
CHARACTERISTICS RELATED TO DATA MODELS
AND QUERY LANGUAGES.
 3- Versioning:
 Provide storage of multiple versions of the data items, with the timestamps of
when the data version was created.
1.3 CATEGORIES OF NOSQL SYSTEMS
The most common categories:
1. Document-based NOSQL systems:
 Store data in the form of documents using well-known formats such as JSON.
 Documents are accessible via their document id, but can also be accessed rapidly
using other indexes.
2. NOSQL key-value stores:
 Fast access by the key to the value associated with the key
 Value can be a record or an object or a document or even have a more complex
data structure.
3. Column-based or wide column NOSQL systems:
 Partition a table by column into column families
 Form of vertical partitioning.
4. Graph-based NOSQL systems:
 Data is represented as graphs
 Related nodes can be found by traversing the edges using path expressions.
1.3 CATEGORIES OF NOSQL SYSTEMS
Additional categories :
5. Hybrid NOSQL systems:
 These systems have characteristics from two or more of the common categories..
6. Object databases.
7. XML databases.
THE CAP THEOREM
 The CAP: it’s impossible to guarantee consistency, availability and
partition tolerance at the same time in a distributed system with data
replication.
 Two properties out of the three to guarantee.
 Weaker consistency levels are often used in NOSQL system instead
of guaranteeing serializability.
 Eventual consistency is used.
THE CAP THEOREM
 The CAP theorem is used to explain some of the
competing requirements in a distributed system with
replication.
 The three letters in CAP refers to
 Consistency (among replicated copies):
 The nodes will have the same copies of a replicated data item
visible for various transactions.
 Availability (of the system for read and write operations) :
 Each read or write will either be processed successfully or will
receive a message that the operation cannot be completed.
 Partition tolerance (in the face of the nodes in the system
being partitioned by a network fault).:
 The system can continue operating if the network connecting the
nodes has a fault that results in two or more partitions,
 Nodes in each partition can only communicate among each other.
THE CAP THEOREM
DOCUMENT-BASED NOSQL SYSTEMS AND MONGODB
1. Introduction
2. MongoDB Data Model
3. MongoDB CRUD Operations
4. MongoDB Distributed Systems Characteristics
3.1INTRODUCTION
 Document-based NOSQL systems store data as
collections of similar documents.
 Documents resemble complex objects or XML
documents
 Documents in a collection should be similar, but
they can have different attributes.
 Document-based NOSQL systems: MongoDB and
CouchDB.
3.2 MONGODB DATA MODEL
 MongoDB is a free and open-source cross-platform
document-oriented database.
 Classified as a NoSQL database,
3.2 MONGODB DATA MODEL
 MongoDB documents are stored in BSON (Binary
JSON) format.
 BSON is a variation of JSON with some additional data
types and is more efficient for storage than JSON.
 Individual documents are stored in a collection.
 The operation createCollection is used to create each
collection.
3.2 MONGODB DATA MODEL
 Example: create a collection called project to hold PROJECT
objects from the COMPANY database :
db.createCollection(“project”, { capped : true, size : 1310720,
max : 500 } )
 “project” is the name of the collection (Mandatory)
 Capped: capped means it has upper limits on its storage
space (size) and number of documents (max).
 Capping helps the system to choose the storage options
for each collection.
3.2 MONGODB DATA MODEL
 Example: create a document collection called worker :
db.createCollection(“worker”, { capped : true, size : 5242880, max : 2000 } )
 Each document has a unique ObjectId field “_id”
 The _id is by default:
 Automatically indexed in the collection.
 The value is system-generated.
 System-generated have a specific format – “combines the timestamp when the object is
created, the node id, the process id and a counter “.
 User-generated can have any value specified by the user as long as its.
3.2 MONGODB DATA MODEL
 A collection does not have a schema.
 The structure of the data fields in documents is chosen based on
how documents will be accessed and used, and the user can choose
a normalized design (similar to normalized relational tuples) or a
denormalized design (similar to XML documents or complex objects).
 Interdocument references can be specified by storing in one
document the ObjectId or ObjectIds of other related documents.
3.2 MONGODB DATA MODEL
Company database example
3.2 MONGODB DATA MODEL
Project info
Embedded workers info
3.2 MONGODB DATA MODEL
Project info
Embedded workers array
Workers
3.2 MONGODB DATA MODEL
Project ID as an attribute
3.2 MONGODB DATA MODEL
3.3 MONGODB CRUD OPERATIONS
 Insert:
 db.<collection_name>.insert(<document(s)>)
 Example:
 Db.project.insert({_id:”P1”, Pname:”ProjectX”,Plocation:”Jenin”})
 Delete: remove
 db.<collection_name>.remove(<condition>)
 Example:
 db.project.remove( {"_id": ObjectId(“P1")});
3.3 MONGODB CRUD OPERATIONS
 Read: fined
 db.<collection_name>.find(<condition>)
 Example:
 Db.project.find({"_id": ObjectId(“P1")})
 Update:
 db.<collection_name>. update(SELECTIOIN_CRITERIA,
UPDATED_DATA)
 Example:
 Db.project.update({"_id" : ObjectId(P1)},{$set:{‘PLocation':‘AAUJ'}})
3.4 MONGODB DISTRIBUTED SYSTEMS
CHARACTERISTICS
 Replication in MongoDB
 Sharding in MongoDB
REPLICATION IN MONGODB
 Master-slave approach for replication.
 All read and write are done on the primary copy.
 Secondary copies are to recover from primary fails.
SHARDING IN MONGODB
 Sharding of the documents in the collection—also
known as horizontal partitioning— divides the
documents into disjoint partitions known as shards.
 Two ways:
 Range partitioning
 Hash partitioning
SHARDING IN MONGODB
 Range and Hash portioning require that the user
specify a particular document field to be used as
the basis for partitioning the documents into shards.
 The partitioning field—known as the “shard key”,
must exist in every document in the collection, and
it must have an index.
 The values of the shard key are divided into
chunks, and the documents are partitioned based
on the chunks of shard key values
SHARDING IN MONGODB
 Chunks created by specifying a range of key values
and each chunk contains the key values in one
range.
 If range queries are commonly applied to a
collection (for example, retrieving all documents
whose shard key value is between 200 and 400),
then range partitioning is preferred
 Because each range query will typically be submitted to
a single node that contains all the required documents
in one shard.
 If most searches retrieve one document at a time,
hash partitioning may be preferable because it
randomizes the distribution of shard key values into
chunks.
SHARDING IN MONGODB
 MongoDB queries are submitted to a module called
the query router, which keeps track of which nodes
contain which shards based on the particular
partitioning method used on the shard keys.
 The query will be routed to the nodes that contain the
shards that hold the documents that the query is
requesting.
 If the system cannot determine which shards hold the
required documents, the query will be submitted to all
the nodes that hold shards of the collection.
SHARDING IN MONGODB
 Sharding and replication are used together:
 Sharding focuses on improving performance via load
balancing and horizontal scalability.
 Replication focuses on ensuring system availability
when certain nodes fail in the distributed system.
WHY NOSQL?
 Document or table ?
WHY NOSQL?
 Alter the table and add Description, Rate and Reviews
 NOSQL is Flexible
No Schema restrictions
WHY NOSQL?
 SQL is Restricted !
Fill the data
WHY NOSQL? - USE CASES WHERE NOSQL
WILL OUTPERFORM SQL
 Agile - Flexibility for Faster Development
WHY NOSQL? - USE CASES WHERE NOSQL
WILL OUTPERFORM SQL
 Agile - Flexibility for Faster Development
WHY NOSQL? - USE CASES WHERE NOSQL
WILL OUTPERFORM SQL
 Agile - Simplicity for Easier Development
WHY NOSQL? - USE CASES WHERE NOSQL
WILL OUTPERFORM SQL
 Agile - Simplicity for Easier Development
 Reading this profile would require the application to
read six rows from three table
WHY NOSQL? - USE CASES WHERE NOSQL
WILL OUTPERFORM SQL
 Agile - Simplicity for Easier Development
WHY NOSQL? - USE CASES WHERE NOSQL
WILL OUTPERFORM SQL
 Availability for Always-on
WHY NOSQL? - USE CASES WHERE NOSQL
WILL OUTPERFORM SQL
 Availability for Always-on
NOSQL CATEGORIES EXAMPLES -
DOCUMENT-BASED NOSQL SYSTEMS
XML is stored into a native XML Type
NOSQL CATEGORIES EXAMPLES -
DOCUMENT-BASED NOSQL SYSTEMS
 The query retrieves the <Features> child element of
the <ProductDescription> element
 Result:
NOSQL CATEGORIES EXAMPLES - NOSQL
KEY-VALUE STORES
 RIAK as example
NOSQL CATEGORIES EXAMPLES - NOSQL
KEY-VALUE STORES
 The response to a query will be an object contains
a list of documents which match the given query.
 The documents returned are Search documents (a
set of Solr field/values)
NOSQL CATEGORIES EXAMPLES - COLUMN
NOSQL SYSTEMS
 Cassandra as an example
 returns a result-set of rows, where each row
consists of a key and a collection of columns
corresponding to the query
NOSQL CATEGORIES EXAMPLES - COLUMN
NOSQL SYSTEMS
 LOCAL_QUORUM: it’s a consistency level type
 Used in multiple data center clusters.
 Use to maintain consistency locally (within the single data center).
NOSQL CATEGORIES EXAMPLES - GRAPH-
BASED NOSQL SYSTEMS
 Neo4j as an example
NOSQL CATEGORIES EXAMPLES - GRAPH-
BASED NOSQL SYSTEMS
NOSQL CATEGORIES EXAMPLES - OBJECT
DATABASES
 LINQ as an example
NOSQL KEY-VALUE STORES
1. Introduction
2. DynamoDB Overview
3. Voldemort Key-Value Distributed Data Store
4. Examples of Other Key-Value Stores
4.1 INTRODUCTION
 No query language
 A set of operations that can be used by the
application programmers.
 Characteristics:
 Every value is associated with a unique key.
 Retrieving the value by supplying the key is very fast.
4.1 INTRODUCTION
4.2 DYNAMODB OVERVIEW
 Amazon product – part AWS
 Data model is using the concepts of tables, items,
and attributes.
 The table does not have a schema.
 Holds a collection of self-describing items.
 The item consist of a number of (attribute, value) pairs
 Attribute values can be single-valued or multivalued.
4.2 DYNAMODB OVERVIEW
 Uploads an item to the ProductCatalog table
4.3 VOLDEMORT KEY-VALUE DISTRIBUTED
DATA STORE
 Based on Amazon’s DynamoDB.
 Used by LinkedIn.
 Simple and basic set of operations, like (put, delete
and get).
 Pluggable with other storage engines like MySQL
 Nodes are independent
 Automatic replications and partitioning
4.3 VOLDEMORT KEY-VALUE DISTRIBUTED
DATA STORE
4.4 EXAMPLES OF OTHER KEY-VALUE
STORES
1. Oracle key-value store.
2. Redis key-value cache and store.
3. Apache Cassandra
COLUMN-BASED OR WIDE COLUMN
NOSQL SYSTEMS
 Stores data tables as columns rather than as rows.
HBASE DATA MODEL AND VERSIONING
 Apache HBase is an open-source, distributed, versioned, non-
relational database.
 Column is identified by a combination of (column family:column
qualifier).
 Stores multiple versions of a data item, with a timestamp associated
with each version.
HBASE DATA MODEL AND VERSIONING
HBASE DATA MODEL AND VERSIONING
 Table is divided into a number of regions.
 Range partitioning.
 Apache Zookeeper and Apache HDFS (Hadoop Distributed
File System) are used for management.
NOSQL GRAPH DATABASES AND NEO4J
 The data is represented as a graph, which is a collection of vertices
(nodes) and edges.
 Nodes and edges can be labeled to indicate the types of entities and
relationships they represent
 It is generally possible to store data associated with both individual
nodes and individual edges.
 Neo4j is a NOSQL Graph DB and it’s an open source system, also it
is implemented in Java.
NEO4J
 The data model in Neo4j organizes data using the concepts of nodes
and relationships.
 Nodes and relationships have properties which store the data items.
 Nodes can have labels.
 Nodes that have the same label are grouped into a collection that
identifies a subset of the nodes in the database graph for querying
purposes.
 A node can have zero, one, or several labels.
NEO4J
NEO4J
Nosql databases

More Related Content

What's hot

An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBLee Theobald
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Databasenehabsairam
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databasesAshwani Kumar
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL DatabasesDerek Stainer
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLRamakant Soni
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB Habilelabs
 
MongoDB presentation
MongoDB presentationMongoDB presentation
MongoDB presentationHyphen Call
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architectureBishal Khanal
 
Introduction to Oracle Database
Introduction to Oracle DatabaseIntroduction to Oracle Database
Introduction to Oracle Databasepuja_dhar
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBRavi Teja
 
SQL vs. NoSQL Databases
SQL vs. NoSQL DatabasesSQL vs. NoSQL Databases
SQL vs. NoSQL DatabasesOsama Jomaa
 

What's hot (20)

Mongodb vs mysql
Mongodb vs mysqlMongodb vs mysql
Mongodb vs mysql
 
Nosql data models
Nosql data modelsNosql data models
Nosql data models
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
An Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDBAn Introduction To NoSQL & MongoDB
An Introduction To NoSQL & MongoDB
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
 
introduction to NOSQL Database
introduction to NOSQL Databaseintroduction to NOSQL Database
introduction to NOSQL Database
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
 
Introduction to NOSQL databases
Introduction to NOSQL databasesIntroduction to NOSQL databases
Introduction to NOSQL databases
 
Introduction to NoSQL Databases
Introduction to NoSQL DatabasesIntroduction to NoSQL Databases
Introduction to NoSQL Databases
 
MongoDB
MongoDBMongoDB
MongoDB
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Basics of MongoDB
Basics of MongoDB Basics of MongoDB
Basics of MongoDB
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
Get to know PostgreSQL!
Get to know PostgreSQL!Get to know PostgreSQL!
Get to know PostgreSQL!
 
MongoDB presentation
MongoDB presentationMongoDB presentation
MongoDB presentation
 
Mongo db intro.pptx
Mongo db intro.pptxMongo db intro.pptx
Mongo db intro.pptx
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
 
Introduction to Oracle Database
Introduction to Oracle DatabaseIntroduction to Oracle Database
Introduction to Oracle Database
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
SQL vs. NoSQL Databases
SQL vs. NoSQL DatabasesSQL vs. NoSQL Databases
SQL vs. NoSQL Databases
 

Viewers also liked

NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenLorenzo Alberton
 
Deterministic simulation testing
Deterministic simulation testingDeterministic simulation testing
Deterministic simulation testingFoundationDB
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developerJesus Rodriguez
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraFolio3 Software
 
A practical introduction to Oracle NoSQL Database - OOW2014
A practical introduction to Oracle NoSQL Database - OOW2014A practical introduction to Oracle NoSQL Database - OOW2014
A practical introduction to Oracle NoSQL Database - OOW2014Anuj Sahni
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsAndrew Brust
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL DatabasesRajith Pemabandu
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Chris Richardson
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-ConceptsBhaskar Gunda
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big dataSteven Francia
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation Ericsson Labs
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQLTony Tam
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword Haitham El-Ghareeb
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Eric Evans
 

Viewers also liked (20)

NoSQL Databases: Why, what and when
NoSQL Databases: Why, what and whenNoSQL Databases: Why, what and when
NoSQL Databases: Why, what and when
 
FoundationDB - NoSQL and ACID
FoundationDB - NoSQL and ACIDFoundationDB - NoSQL and ACID
FoundationDB - NoSQL and ACID
 
Deterministic simulation testing
Deterministic simulation testingDeterministic simulation testing
Deterministic simulation testing
 
NoSql Databases
NoSql DatabasesNoSql Databases
NoSql Databases
 
Nosql databases for the .net developer
Nosql databases for the .net developerNosql databases for the .net developer
Nosql databases for the .net developer
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
A practical introduction to Oracle NoSQL Database - OOW2014
A practical introduction to Oracle NoSQL Database - OOW2014A practical introduction to Oracle NoSQL Database - OOW2014
A practical introduction to Oracle NoSQL Database - OOW2014
 
Big Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI ProsBig Data and NoSQL for Database and BI Pros
Big Data and NoSQL for Database and BI Pros
 
NoSQL and ACID
NoSQL and ACIDNoSQL and ACID
NoSQL and ACID
 
An Intro to NoSQL Databases
An Intro to NoSQL DatabasesAn Intro to NoSQL Databases
An Intro to NoSQL Databases
 
Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)Using Spring with NoSQL databases (SpringOne China 2012)
Using Spring with NoSQL databases (SpringOne China 2012)
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
 
NOSQL Overview
NOSQL OverviewNOSQL Overview
NOSQL Overview
 
NoSQL databases and managing big data
NoSQL databases and managing big dataNoSQL databases and managing big data
NoSQL databases and managing big data
 
Distributed computing
Distributed computingDistributed computing
Distributed computing
 
NoSQL databases
NoSQL databasesNoSQL databases
NoSQL databases
 
NoSQL Slideshare Presentation
NoSQL Slideshare Presentation NoSQL Slideshare Presentation
NoSQL Slideshare Presentation
 
Data Modeling for NoSQL
Data Modeling for NoSQLData Modeling for NoSQL
Data Modeling for NoSQL
 
NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword NoSQL Databases, Not just a Buzzword
NoSQL Databases, Not just a Buzzword
 
Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3Cassandra By Example: Data Modelling with CQL3
Cassandra By Example: Data Modelling with CQL3
 

Similar to Nosql databases

Softwae and database in data communication network
Softwae and database in data communication networkSoftwae and database in data communication network
Softwae and database in data communication networkAyoubSohiabMohammad
 
Presentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSPresentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSabdurrobsoyon
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...ijdms
 
no sql presentation
no sql presentationno sql presentation
no sql presentationchandanm2
 
Oracle DBA Tutorial for Beginners -Oracle training institute in bangalore
Oracle DBA Tutorial for Beginners -Oracle training institute in bangaloreOracle DBA Tutorial for Beginners -Oracle training institute in bangalore
Oracle DBA Tutorial for Beginners -Oracle training institute in bangaloreTIB Academy
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB DatabaseTariqul islam
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesEditor Jacotech
 
Overview of oracle database
Overview of oracle databaseOverview of oracle database
Overview of oracle databaseSamar Prasad
 
Overview of oracle database
Overview of oracle databaseOverview of oracle database
Overview of oracle databaseSamar Prasad
 
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Mohamed Galal
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...IJCERT JOURNAL
 

Similar to Nosql databases (20)

No sq lv2
No sq lv2No sq lv2
No sq lv2
 
Softwae and database in data communication network
Softwae and database in data communication networkSoftwae and database in data communication network
Softwae and database in data communication network
 
Datastores
DatastoresDatastores
Datastores
 
Presentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSPresentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMS
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
no sql presentation
no sql presentationno sql presentation
no sql presentation
 
Oracle DBA Tutorial for Beginners -Oracle training institute in bangalore
Oracle DBA Tutorial for Beginners -Oracle training institute in bangaloreOracle DBA Tutorial for Beginners -Oracle training institute in bangalore
Oracle DBA Tutorial for Beginners -Oracle training institute in bangalore
 
Oracle archi ppt
Oracle archi pptOracle archi ppt
Oracle archi ppt
 
NOSQL and MongoDB Database
NOSQL and MongoDB DatabaseNOSQL and MongoDB Database
NOSQL and MongoDB Database
 
Data management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunitiesData management in cloud study of existing systems and future opportunities
Data management in cloud study of existing systems and future opportunities
 
ORDBMS.pptx
ORDBMS.pptxORDBMS.pptx
ORDBMS.pptx
 
Datastores
DatastoresDatastores
Datastores
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
 
Overview of oracle database
Overview of oracle databaseOverview of oracle database
Overview of oracle database
 
Overview of oracle database
Overview of oracle databaseOverview of oracle database
Overview of oracle database
 
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)Modern databases and its challenges (SQL ,NoSQL, NewSQL)
Modern databases and its challenges (SQL ,NoSQL, NewSQL)
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
 
No sql databases
No sql databasesNo sql databases
No sql databases
 

Recently uploaded

Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commercemanigoyal112
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineeringssuserb3a23b
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 

Recently uploaded (20)

Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Cyber security and its impact on E commerce
Cyber security and its impact on E commerceCyber security and its impact on E commerce
Cyber security and its impact on E commerce
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Software Coding for software engineering
Software Coding for software engineeringSoftware Coding for software engineering
Software Coding for software engineering
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 

Nosql databases

  • 1. NOSQL DATABASES AND BIG DATA STORAGE SYSTEMS Ateeq Ateeq
  • 2. CONTENT  1- Introduction to NOSQL Systems  2- The CAP Theorem  3- Document-Based NOSQL Systems and MongoDB  4- NOSQL Key-Value Stores  5- Column-Based or Wide Column NOSQL Systems  6- NOSQL Graph Databases and Neo4j
  • 3. INTRODUCTION TO NOSQL SYSTEMS  1.1 Emergence of NOSQL Systems  1.2 Characteristics of NOSQL Systems  1.3 Categories of NOSQL Systems
  • 4. 1.1 EMERGENCE OF NOSQL SYSTEMS  SQL system may not be appropriate for some applications such as Emails  SQL systems offer too many services (powerful query language, concurrency control, etc.), which this application may not need;  structured data model such the traditional relational model may be too restrictive.  SQL require schemas, which are not required by many of the NOSQL systems.
  • 5. 1.1 EMERGENCE OF NOSQL SYSTEMS  Examples of NOSQL systems:  Google – BigTable  Amazon – DynamoDB  Facebook – Cassandra  MongoDB  CouchDB  Graph databases like Neo4J and GraphBase
  • 6. 1.2 CHARACTERISTICS OF NOSQL SYSTEMS  NOSQL characteristics related to distributed databases and distributed systems.  NOSQL characteristics related to data models and query languages.
  • 7. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS 1- Scalability:  horizontal scalability: adding more nodes for data storage and processing as the volume of data grows.  Vertical scalability: expanding the storage and computing power of existing nodes.  In NOSQL systems, horizontal scalability is employed while the system is operational, so techniques for distributing the existing data among new nodes without interrupting system operation are necessary.
  • 8. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS 2- Availability, Replication and Eventual Consistency:  Data is replicated over two or more nodes in a transparent manner.  Update must be applied to every copy of the replicated data items.  Eventual consistency: is a consistency model used in distributed computing to achieve high availability that informally guarantees that, if no new updates are made to a given data item, eventually all accesses to that item will return the last updated value.
  • 9. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS 3- Replication Models:  Master-slave replication: requires one copy to be the master copy;  Write operations must be applied to the master copy, usually using eventual consistency  For read, all reads are from the master copy, or reads at the slave copies but would not guarantee that the values are the latest writes.  Master-master replication: allows reads and writes at any of the replicas.  The values of an item will be temporarily inconsistent.  Reconciliation method to resolve conflicting write operations of the same data item at different nodes must be implemented as part of the master-master replication scheme.
  • 10. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS
  • 11. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS
  • 12. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS  4- Sharding of Files:  Files can have many millions of records accessed concurrently by thousands of users.  Sharding (also known as horizontal) serves to distribute the load of accessing the file records to multiple nodes.  Shards works in tandem to improve load balancing on the replication as well as data availability.
  • 13. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS
  • 14. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS  5- High-Performance Data Access:  Hashing: The location of the value is given by the result of h(k).  Range partitioning: the location is determined via a range of key values. Example: location i would hold the objects whose key values K are in the range Kimin ≤ K ≤ Kimax. In applications that require range queries, where multiple objects within a range of key values are retrieved, range partitioned is preferred.
  • 15. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS
  • 16. CHARACTERISTICS RELATED TO DISTRIBUTED DATABASES AND DISTRIBUTED SYSTEMS
  • 17. CHARACTERISTICS RELATED TO DATA MODELS AND QUERY LANGUAGES.  1- Not Requiring a Schema:  Allowing semi-structured and self describing data.  The users can specify a partial schema in some systems to improve storage efficiency, but it is not required to have a schema in most of the NOSQL systems.  Constraints on the data would have to be programmed in the application programs that access the data items.  Languages for describing semi-structured data: JSON (JavaScript Object Notation) and XML (Extensible Markup Language)
  • 18. CHARACTERISTICS RELATED TO DATA MODELS AND QUERY LANGUAGES.  2- Less Powerful Query Languages:  In many applications that use NOSQL systems may not require a powerful query language such as SQL, because search (read) queries in these systems often locate single objects in a single file based on their object keys.  Reading and writing the data objects is accomplished by calling the appropriate operations by the programmer (API).  SCRUD: Search, Create, Read, Update and Delete  Provide a high-level query language, but it may not have the full power of SQL, for example the joins need to be implemented in the application programs.
  • 19. CHARACTERISTICS RELATED TO DATA MODELS AND QUERY LANGUAGES.  3- Versioning:  Provide storage of multiple versions of the data items, with the timestamps of when the data version was created.
  • 20. 1.3 CATEGORIES OF NOSQL SYSTEMS The most common categories: 1. Document-based NOSQL systems:  Store data in the form of documents using well-known formats such as JSON.  Documents are accessible via their document id, but can also be accessed rapidly using other indexes. 2. NOSQL key-value stores:  Fast access by the key to the value associated with the key  Value can be a record or an object or a document or even have a more complex data structure. 3. Column-based or wide column NOSQL systems:  Partition a table by column into column families  Form of vertical partitioning. 4. Graph-based NOSQL systems:  Data is represented as graphs  Related nodes can be found by traversing the edges using path expressions.
  • 21. 1.3 CATEGORIES OF NOSQL SYSTEMS Additional categories : 5. Hybrid NOSQL systems:  These systems have characteristics from two or more of the common categories.. 6. Object databases. 7. XML databases.
  • 22. THE CAP THEOREM  The CAP: it’s impossible to guarantee consistency, availability and partition tolerance at the same time in a distributed system with data replication.  Two properties out of the three to guarantee.  Weaker consistency levels are often used in NOSQL system instead of guaranteeing serializability.  Eventual consistency is used.
  • 23. THE CAP THEOREM  The CAP theorem is used to explain some of the competing requirements in a distributed system with replication.  The three letters in CAP refers to  Consistency (among replicated copies):  The nodes will have the same copies of a replicated data item visible for various transactions.  Availability (of the system for read and write operations) :  Each read or write will either be processed successfully or will receive a message that the operation cannot be completed.  Partition tolerance (in the face of the nodes in the system being partitioned by a network fault).:  The system can continue operating if the network connecting the nodes has a fault that results in two or more partitions,  Nodes in each partition can only communicate among each other.
  • 25. DOCUMENT-BASED NOSQL SYSTEMS AND MONGODB 1. Introduction 2. MongoDB Data Model 3. MongoDB CRUD Operations 4. MongoDB Distributed Systems Characteristics
  • 26. 3.1INTRODUCTION  Document-based NOSQL systems store data as collections of similar documents.  Documents resemble complex objects or XML documents  Documents in a collection should be similar, but they can have different attributes.  Document-based NOSQL systems: MongoDB and CouchDB.
  • 27. 3.2 MONGODB DATA MODEL  MongoDB is a free and open-source cross-platform document-oriented database.  Classified as a NoSQL database,
  • 28. 3.2 MONGODB DATA MODEL  MongoDB documents are stored in BSON (Binary JSON) format.  BSON is a variation of JSON with some additional data types and is more efficient for storage than JSON.  Individual documents are stored in a collection.  The operation createCollection is used to create each collection.
  • 29. 3.2 MONGODB DATA MODEL  Example: create a collection called project to hold PROJECT objects from the COMPANY database : db.createCollection(“project”, { capped : true, size : 1310720, max : 500 } )  “project” is the name of the collection (Mandatory)  Capped: capped means it has upper limits on its storage space (size) and number of documents (max).  Capping helps the system to choose the storage options for each collection.
  • 30. 3.2 MONGODB DATA MODEL  Example: create a document collection called worker : db.createCollection(“worker”, { capped : true, size : 5242880, max : 2000 } )  Each document has a unique ObjectId field “_id”  The _id is by default:  Automatically indexed in the collection.  The value is system-generated.  System-generated have a specific format – “combines the timestamp when the object is created, the node id, the process id and a counter “.  User-generated can have any value specified by the user as long as its.
  • 31. 3.2 MONGODB DATA MODEL  A collection does not have a schema.  The structure of the data fields in documents is chosen based on how documents will be accessed and used, and the user can choose a normalized design (similar to normalized relational tuples) or a denormalized design (similar to XML documents or complex objects).  Interdocument references can be specified by storing in one document the ObjectId or ObjectIds of other related documents.
  • 32. 3.2 MONGODB DATA MODEL Company database example
  • 33. 3.2 MONGODB DATA MODEL Project info Embedded workers info
  • 34. 3.2 MONGODB DATA MODEL Project info Embedded workers array Workers
  • 35. 3.2 MONGODB DATA MODEL Project ID as an attribute
  • 37. 3.3 MONGODB CRUD OPERATIONS  Insert:  db.<collection_name>.insert(<document(s)>)  Example:  Db.project.insert({_id:”P1”, Pname:”ProjectX”,Plocation:”Jenin”})  Delete: remove  db.<collection_name>.remove(<condition>)  Example:  db.project.remove( {"_id": ObjectId(“P1")});
  • 38. 3.3 MONGODB CRUD OPERATIONS  Read: fined  db.<collection_name>.find(<condition>)  Example:  Db.project.find({"_id": ObjectId(“P1")})  Update:  db.<collection_name>. update(SELECTIOIN_CRITERIA, UPDATED_DATA)  Example:  Db.project.update({"_id" : ObjectId(P1)},{$set:{‘PLocation':‘AAUJ'}})
  • 39. 3.4 MONGODB DISTRIBUTED SYSTEMS CHARACTERISTICS  Replication in MongoDB  Sharding in MongoDB
  • 40. REPLICATION IN MONGODB  Master-slave approach for replication.  All read and write are done on the primary copy.  Secondary copies are to recover from primary fails.
  • 41. SHARDING IN MONGODB  Sharding of the documents in the collection—also known as horizontal partitioning— divides the documents into disjoint partitions known as shards.  Two ways:  Range partitioning  Hash partitioning
  • 42. SHARDING IN MONGODB  Range and Hash portioning require that the user specify a particular document field to be used as the basis for partitioning the documents into shards.  The partitioning field—known as the “shard key”, must exist in every document in the collection, and it must have an index.  The values of the shard key are divided into chunks, and the documents are partitioned based on the chunks of shard key values
  • 43. SHARDING IN MONGODB  Chunks created by specifying a range of key values and each chunk contains the key values in one range.  If range queries are commonly applied to a collection (for example, retrieving all documents whose shard key value is between 200 and 400), then range partitioning is preferred  Because each range query will typically be submitted to a single node that contains all the required documents in one shard.  If most searches retrieve one document at a time, hash partitioning may be preferable because it randomizes the distribution of shard key values into chunks.
  • 44. SHARDING IN MONGODB  MongoDB queries are submitted to a module called the query router, which keeps track of which nodes contain which shards based on the particular partitioning method used on the shard keys.  The query will be routed to the nodes that contain the shards that hold the documents that the query is requesting.  If the system cannot determine which shards hold the required documents, the query will be submitted to all the nodes that hold shards of the collection.
  • 45. SHARDING IN MONGODB  Sharding and replication are used together:  Sharding focuses on improving performance via load balancing and horizontal scalability.  Replication focuses on ensuring system availability when certain nodes fail in the distributed system.
  • 47. WHY NOSQL?  Alter the table and add Description, Rate and Reviews  NOSQL is Flexible No Schema restrictions
  • 48. WHY NOSQL?  SQL is Restricted ! Fill the data
  • 49. WHY NOSQL? - USE CASES WHERE NOSQL WILL OUTPERFORM SQL  Agile - Flexibility for Faster Development
  • 50. WHY NOSQL? - USE CASES WHERE NOSQL WILL OUTPERFORM SQL  Agile - Flexibility for Faster Development
  • 51. WHY NOSQL? - USE CASES WHERE NOSQL WILL OUTPERFORM SQL  Agile - Simplicity for Easier Development
  • 52. WHY NOSQL? - USE CASES WHERE NOSQL WILL OUTPERFORM SQL  Agile - Simplicity for Easier Development  Reading this profile would require the application to read six rows from three table
  • 53. WHY NOSQL? - USE CASES WHERE NOSQL WILL OUTPERFORM SQL  Agile - Simplicity for Easier Development
  • 54. WHY NOSQL? - USE CASES WHERE NOSQL WILL OUTPERFORM SQL  Availability for Always-on
  • 55. WHY NOSQL? - USE CASES WHERE NOSQL WILL OUTPERFORM SQL  Availability for Always-on
  • 56. NOSQL CATEGORIES EXAMPLES - DOCUMENT-BASED NOSQL SYSTEMS XML is stored into a native XML Type
  • 57. NOSQL CATEGORIES EXAMPLES - DOCUMENT-BASED NOSQL SYSTEMS  The query retrieves the <Features> child element of the <ProductDescription> element  Result:
  • 58. NOSQL CATEGORIES EXAMPLES - NOSQL KEY-VALUE STORES  RIAK as example
  • 59. NOSQL CATEGORIES EXAMPLES - NOSQL KEY-VALUE STORES  The response to a query will be an object contains a list of documents which match the given query.  The documents returned are Search documents (a set of Solr field/values)
  • 60. NOSQL CATEGORIES EXAMPLES - COLUMN NOSQL SYSTEMS  Cassandra as an example  returns a result-set of rows, where each row consists of a key and a collection of columns corresponding to the query
  • 61. NOSQL CATEGORIES EXAMPLES - COLUMN NOSQL SYSTEMS  LOCAL_QUORUM: it’s a consistency level type  Used in multiple data center clusters.  Use to maintain consistency locally (within the single data center).
  • 62. NOSQL CATEGORIES EXAMPLES - GRAPH- BASED NOSQL SYSTEMS  Neo4j as an example
  • 63. NOSQL CATEGORIES EXAMPLES - GRAPH- BASED NOSQL SYSTEMS
  • 64. NOSQL CATEGORIES EXAMPLES - OBJECT DATABASES  LINQ as an example
  • 65. NOSQL KEY-VALUE STORES 1. Introduction 2. DynamoDB Overview 3. Voldemort Key-Value Distributed Data Store 4. Examples of Other Key-Value Stores
  • 66. 4.1 INTRODUCTION  No query language  A set of operations that can be used by the application programmers.  Characteristics:  Every value is associated with a unique key.  Retrieving the value by supplying the key is very fast.
  • 68. 4.2 DYNAMODB OVERVIEW  Amazon product – part AWS  Data model is using the concepts of tables, items, and attributes.  The table does not have a schema.  Holds a collection of self-describing items.  The item consist of a number of (attribute, value) pairs  Attribute values can be single-valued or multivalued.
  • 69. 4.2 DYNAMODB OVERVIEW  Uploads an item to the ProductCatalog table
  • 70. 4.3 VOLDEMORT KEY-VALUE DISTRIBUTED DATA STORE  Based on Amazon’s DynamoDB.  Used by LinkedIn.  Simple and basic set of operations, like (put, delete and get).  Pluggable with other storage engines like MySQL  Nodes are independent  Automatic replications and partitioning
  • 71. 4.3 VOLDEMORT KEY-VALUE DISTRIBUTED DATA STORE
  • 72. 4.4 EXAMPLES OF OTHER KEY-VALUE STORES 1. Oracle key-value store. 2. Redis key-value cache and store. 3. Apache Cassandra
  • 73. COLUMN-BASED OR WIDE COLUMN NOSQL SYSTEMS  Stores data tables as columns rather than as rows.
  • 74. HBASE DATA MODEL AND VERSIONING  Apache HBase is an open-source, distributed, versioned, non- relational database.  Column is identified by a combination of (column family:column qualifier).  Stores multiple versions of a data item, with a timestamp associated with each version.
  • 75. HBASE DATA MODEL AND VERSIONING
  • 76. HBASE DATA MODEL AND VERSIONING  Table is divided into a number of regions.  Range partitioning.  Apache Zookeeper and Apache HDFS (Hadoop Distributed File System) are used for management.
  • 77. NOSQL GRAPH DATABASES AND NEO4J  The data is represented as a graph, which is a collection of vertices (nodes) and edges.  Nodes and edges can be labeled to indicate the types of entities and relationships they represent  It is generally possible to store data associated with both individual nodes and individual edges.  Neo4j is a NOSQL Graph DB and it’s an open source system, also it is implemented in Java.
  • 78. NEO4J  The data model in Neo4j organizes data using the concepts of nodes and relationships.  Nodes and relationships have properties which store the data items.  Nodes can have labels.  Nodes that have the same label are grouped into a collection that identifies a subset of the nodes in the database graph for querying purposes.  A node can have zero, one, or several labels.
  • 79. NEO4J
  • 80. NEO4J