1. NoSQL
Thenraja Vettivelraj
Swansea University
Contents List
ABSTRACT
1. INTRODUCTION
2. MAIN FEATURES
2.1 COMPARISON WITH SQL
3. EXAMPLE - CASSANDRA
3.1 MAIN FEATURES OF APACHE CASSANDRA
3.2 WHY APACHE CASSANDRA?
3.3 APPLICATIONS
4. DRAWBACKS OF NOSQL
5. SUMMARY
6. REFERENCES
2. ABSTRACT
NoSQL is one of the emerging fields without any arguments. It is a very powerful and efficient tool in
data storage and manipulating the data. It has no fixed Schema, no Joins and it also avoided the
“ACID” properties. [Han, J. et al., 2011] And basically one of the advantages of the NoSQL is very
much faster than the SQL and also the operational cost will be low than the relational database. Due to
the current trend there is necessity in increase of Storage, Connectedness, Architecture and Semi-
Structure [Accessed: 25 Feb 2012].
1. INTRODUCTION
The term “NoSQL” means, it has so many interpretations at first many told that it is Non-Relational
database and others say that “NOSQL” stands for Not Only SQL. And now-a-days they are calling the
term “NOSQL” as an Umbrella term for all the databases and the data stores which don’t follow the
relational database and also it is not a single technology or a product but it is a class of products,
collection of diverse and matter of about how to manipulate and store the data [Accessed: 24 Feb
2012]
It's a term basically hit the market on 1998 [Accessed: 24 Feb 2012] and now for the past 3-4 years it
has its own place in the market because of its tremendous growth. Massive scalability, Lower cost,
Schema flexibility, Massive Data Stores and high availability [Accessed: 24 Feb2012]. Some of the
main applications of the NoSQL are Search Engines, Data Processing and Social Website. NoSQL
does not support Joins and but it supports ACID properties.
There are four main data models in NoSQL namely
Key-Value Stores
Big Table Clones
Document Databases
Graph Databases
In these we have to choose the right one for our job [Accessed: 25 Feb 2012]. Some of the very
examples of NOSQL databases are Cassandra which is used by Facebook (Social Networking Site)
and it comes under the Key-Value store. It has the capability to handle data very huge Terabyte (TB)
of data in a single day because of its users. Big Table is an example for BigTable Clones and they
reasoned for developing their own database in order to increase the control the performance and
scalability. Google uses for its Search Engine, Gmail, Orkut and other Google applications. Neo4j is a
very good example for Graph database and it is written in Java. Apache CouchDB which is an
example for Document database written in Erlang. In the Figure 1 they have compared the four
different data models of NoSQL in a graph size versus complexity.
2. MAIN FEATURES
CAP theorem-Consistency, Availability and Partition tolerance. According to [Accessed: 11 Mar
2012] “Available, Partition-Tolerant (AP) Systems achieve "eventual consistency" through
replication and verification. Examples of AP systems is Cassandra, CouchDB
Consistency means that each client always has the same view of the data.
Availability means that all clients can always read and write.
Partition tolerance means that the system works well across physical network
partitions.”
3. Size
Complexity
Figure 1: Comparison on NoSQL data models
2.1 COMPARISON WITH SQL
When we compare with SQL, NoSQL slightly have the upper hand because of scalability and
performance. Uses map reduce, CQL instead of SQL language.
3. EXAMPLE - CASSANDRA
Cassandra is one of the well known NoSQL database and it is used widely because it has the
capability to handle large amounts of structured data without any failure and it will be ease of use.
It is written in Java and it requires JVM (Java Virtual Machine) to be installed in the system before
you start your Server and also is of key-value store type. Basically Cassandra supports CQL
(Cassandra Query Language). DataStax is one of the third party distributions of the Cassandra and it
has the Cassandra CQL Shell where we have to create the Keyspace and Column family.
Figure 2: Cassandra CQL Shell where keyspace and column family created
Key-value
stores
Big table clones
Document databases
Graph databases
4. Keyspace is the outer most grouping of our data and it also a collection of column family and typically
each application will have one keyspace name. They are the management and configuration part for
the column family. And one most important thing about the keyspace is the replicating factor. In the
above we created the strategy class as Simple strategy, other than this there is Network strategy
topology. And we can create multiple number of nodes. Then created the Column family named
example. Normally there are two types of column namely
Standard column family and
Super column family
Cassandra consists of three simple methods. They are insert, get and delete.
Standard column family
Super Colum family
Figure 3: Cassandra Data Modelling
3.1 MAIN FEATURES OF APACHE CASSANDRA
Partitioning
This is one of the main features in Cassandra because the data we are storing will be partitioned
dynamically and stored in the cluster over the set of available nodes by using the Hash mechanism.
By consistent hashing we will get a fixed circular space or “ring”. Each node has been assigned with a
random which denotes the position in the ring. Each data stored has been assigned a specific key in
the ring.
5. Figure 4: Ring View of Cassandra Test cluster
The above shown is the ring view of the Cassandra test cluster which has a token value and also it has
some other information like IP, Size and Load which is available in Web Interface of Datastax
(http://localhost:8888/opscenter/index.html) by default.
Scaling the cluster
Cassandra can also support multi node. When a new node is added into the existing system which
already has one node will split up the workload of other node and hence will be responsible for the
same job what the other node does. This can be done by the Bootstrap algorithm by some node in
command line utility or by the Cassandra web dashboard.
Figure 5: Cassandra dashboard
3.2 WHY APACHE CASSANDRA?
There are many factors that why I should have Cassandra mainly because it has the capability to
handle TB or PB’s of data in a peer to peer architecture, it follows CQL (Cassandra Query Language)
which is alike SQL, peer to peer architecture, Data will be replicated to multiple nodes and hence
6. there won’t be single point of failure, cloud enabled, data will be replicated to more than one location
in case of disaster recovery scenarios so there will be durability and high availability, transparent fault
detection and recovery which follows gossip protocol, ease of use and no special hardware is required
to run.
3.3 APPLICATIONS
Companies like Accenture, Twitter, Facebook and many more companies were using the NoSQL
database in one or other way because of its main features. Not only in industries but also in
Educational and other government sectors also slowly started using the NoSQL database. For example
“Burt uses Cassandra in their software to help advertisers and agencies improve the efficiency and
effect of online campaigns” [Accessed: 11 Mar 2012].
4. DRAWBACKS OF NOSQL
Unlike the SQL it doesn't have ACID properties. So we cannot expect the degree of reliability what
we get in the SQL database. Many were unfamiliar with this technology. Unlike the other commercial
SQL databases here we won't get enough support for the product, since many of the NoSQL were
only limited support.
5. SUMMARY
Like Graph database, Key-value database, Big table Clones, Document database it has made a very
big impact in the database field and most of them are Open source. So in my point of view I am sure
that many will soon migrate towards NoSQL from SQL. So in the next two to four years we can
expect a major change in the database field because of its scalability and its other features, but
chances are less that it will replace the SQL databases. Each database has its Pros and Cons and it’s
our duty to choose the right one.
7. 6. REFERENCES
[Accessed: 24 Feb2012] Slideshare.net (2010) NoSQL databases. [Online] Available at:
http://www.slideshare.net/marin_dimitrov/nosql-databases-3584443
[Accessed: 24 Feb 2012] Perdue, T. (1998) NoSQL - An Overview of NoSQL. [online] Available at:
http://newtech.about.com/od/databasemanagement/a/Nosql.htm
[Accessed: 24 Feb 2012] Tiwari, S. (2011) Professional NoSQL. [e-book] Wrox Programmer to
Programmer. Available through: Google Books
http://books.google.co.uk/books?id=tv5iO9MnObUC&printsec=frontcover&dq=nosql&hl=en&sa=X
&ei=5vw_T9CABMG_0QWtzqyPDw&ved=0CEQQ6AEwAg#v=onepage&q=nosql&f=false
[Han, J. et al. , 2011] Han, J. et al. (2011)"Survey on NoSQL database," Pervasive Computing and
Applications (ICPCA), 2011 6th International Conference on , vol., no., pp.363-366, 26-28 Oct. 2011
doi: 10.1109/ICPCA.2011.6106531
[Accessed: 4 Mar 2012] Slideshare.net (2010) NoSQL or not NoSQL? [Online] Available at:
http://www.slideshare.net/ruflin/nosql-or-not-nosql
[Accessed: 25 Feb 2012] Blogs.neotechnology.com (2009) NOSQL: scaling to size and scaling to
complexity - Emil's Neo Thoughts. [Online] Available at:
http://blogs.neotechnology.com/emil/2009/11/nosql-scaling-to-size-and-scaling-to-complexity.html
[Accessed: 25 Feb 2012] Slideshare.net (2011) A NOSQL Overview And The Benefits Of Graph
Databases (nosql east 2009). [Online] Available at: http://www.slideshare.net/emileifrem/nosql-east-
a-nosql-overview-and-the-benefits-of-graph-databases
[Accessed: 25 Feb 2012] Slideshare.net (2011) NOSQL for Dummies. [Online] Available at:
http://www.slideshare.net/thobe/nosql-for-dummies
Leavitt, N.; , "Will NoSQL Databases Live Up to Their Promise?," Computer , vol.43, no.2, pp.12-14,
Feb. 2010 doi: 10.1109/MC.2010.58
URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5410700&isnumber=5410692
[Accessed: 11 Mar 2012] Blog.nahurst.com (2010) Visual Guide to NoSQL Systems - Nathan Hurst's
Blog. [Online] Available at: http://blog.nahurst.com/visual-guide-to-nosql-systems
[Accessed: 11 Mar 2012] Datastax.com (2011) Cassandra Users | DataStax. [online] Available at:
http://www.datastax.com/cassandrausers