1. An Overview of NoSQL Databases
RichPerry
CIS 264
October25, 2014
2. An Overview of NoSQL Databases | Rich Perry Page 1 of 13
Executive Summary
NoSQL is a relatively new type of next generation database system, but not relational
database management systems (DBMS). It's also commonly known as "Not Only SQL".
However, according to Carlo Struzzi, it would be more accurate to call it NoREL for No
Relations, or Not Relational. The concept was first introduced in 1998 by Carlo Struzzi,
and then re-introduced in 2009 by Eric Evans. It offers radically different choices and
options for data storage compared to conventional relational databases.
This generation of DBMS offers much more flexibility, higher performance, higher levels
of scalability, less complexity, and different choices of functionality.
It’s about more than just rows in tables. NoSQL database systems allow data storage
and retrieval using many different formats such as key-value, column family, document,
and graph databases.
NoSQL databases have no joins like relational databases. Instead of joins, systems
allow users to extract data using simple interfaces. They also have little or no database
schema and do not strictly enforce ACID transaction standards like relational databases.
It supports linear scalability. If you add more processors, you get a proportional increase
in performance. Horizontal scalability (dividing the system among multiple servers) is
also a benefit of most NoSQL database systems.
There are more choices for storing, retrieving, and manipulating data. It's called "Not
Only SQL" because one can use SQL as well as many other query languages.
NoSQL is not only available as open source. There are many open source NoSQL
products and many commercial products that use NoSQL concepts.
3. An Overview of NoSQL Databases | Rich Perry Page 2 of 13
It's not just for use with Big Data problems. Many people have a common misconception
that NoSQL is just for use in solving Big Data problems. NoSQL database systems are
commonly used for Big Data situations. However, NoSQL also provides alternative
solutions when flexibility, performance, and scalability are important aside from Big Data
problems.
Categories ofNoSQLDatabases
There four main types of NoSQL databases. Each has their own unique advantages
and disadvantages. The four main types are:
1. Column family (Bigtable) stores
2. Key-Value stores
3. Graph stores
4. Document stores
Columnfamily (Bigtable) stores
Column family databases can scale to manage large amounts data. They use both row
and column identifiers as keys to find data, and are often referred to as "data stores"
rather than databases because they lack features normally found in most DBMSs. For
example, column family databases lack typed columns, secondary indexes, triggers, and
query languages.
Spreadsheets serve well as a comparison model for this type of database. Data values
are addressed by the combination of the row and column much like in an Excel
4. An Overview of NoSQL Databases | Rich Perry Page 3 of 13
spreadsheet. Cells can contain any type of data. A cell can be populated with data at
any time or left empty.
Most, but not all, column family databases use a timestamp and a "column family" in
addition to the column name and row identifier as a multi-part key. Column family stores
are often called Bigtable stores because the tables can be enormous with billions of
rows or more.
Most rows utilize few columns out of the many possible columns. This results in most
cells in the table being empty, and is known as a sparse matrix. This type of data
structure works very inefficiently in relational databases, but column family stores are
made for this type of data storage.
AdvantagesofColumnfamilystores
1. Can manage very large amounts of data efficiently.
2. High scalability -- Column family databases do not use join so they scale very
easily in distributed systems.
3. High availability. Column family systems are usually configured to store data on
multiple nodes in different geographic areas with automatic failover.
DisadvantagesforColumnfamilystores
1. Not as flexible as other NoSQL database types.
2. Minimal functionality available with most column family databases.
5. An Overview of NoSQL Databases | Rich Perry Page 4 of 13
Key-Value Stores
Key-Value stores are some of the simplest of NoSQL databases. This type of NoSQL
database system is sometimes called the Swiss Army knife of databases, because it can
be used in many different situations.
Key-Value stores have no query language and work like dictionaries. Keys are paired
with values. Application programmer's interfaces (APIs) are used to add new key-value
pairs (a.k.a., put), delete key-value pairs, and retrieve a value when given a key (a.k.a.,
get). In a dictionary, words are paired with definitions. The words are keys and the
definitions are the values. If the user gives the API a word (key), the API returns a
definition (value).
Keys are flexible and can be many different types of data. Some examples of types of
keys are:
Name of an image.
File path.
Hash code.
URL.
SQL query.
REST web service call.
Values are also flexible. They can be almost anything. Common values could be
documents, images, web pages, text, etc.
CommonAttributes ofKey-ValueStoredatabases
1. All keys must be unique.
2. Keys are indexed but values are not.
3. Values can be any data type and/or different data types. Whereas in a
relational database, values in a single column must be homogenous.
4. Queries return one and only one value.
6. An Overview of NoSQL Databases | Rich Perry Page 5 of 13
5. Queries must search for a key, but cannot search for a value.
AdvantagesofKey-ValueStoredatabases
1. Precision Service Levels
2. Precision service monitoring and notification
3. Scalability and Reliability
4. Portability and lower operational cost
5. Speed -- queries tend to run very quickly.
6. Simplicity -- it does not get much simpler than key-value pairs
7. Can be used for many different applications and data storage problems.
DisadvantagesofKey-ValueStoredatabases
1. Cannot query values. Only the key can be queried. This means that the user
must know the key.
2. Does not establish relationships between data. If relationships are important,
other NoSQL types may be more appropriate.
3. Cannot return lists of values. Queries return one and only one value.
4. Values may contain any data type. This is only a problem if the user expects a
certain data type but receives another, or if the user expects the data type to
always be the same when no such guarantee is provided.
Document Stores
This is one of the most flexible, powerful and popular types of databases in the NoSQL
movement. Key-Value and column family stores work by searching for a key and they
return a value associated with that key. These two types of databases do not index
7. An Overview of NoSQL Databases | Rich Perry Page 6 of 13
values, or allow searching on values. Document stores work in a different manner.
They allow searching on any content within documents.
Document stores automatically index all content inside a document when it is added to
the database. This makes indexes large, but everything is searchable. A document
store API can provide a list of documents, find a single document, or find any subsection
of any document. A key-value store can store an entire document in the value area and
return that document if you search for its key, but a document store can return a just a
sentence or paragraph from a large document (e.g., a book) without loading the entire
document into memory.
Tree structures are used in document store databases. The tree structures begin with a
root node that has branches. The branches can have sub-branches and those can be
divided into sub-branches indefinitely until they terminate at a leaf. The values are
stored at the leaf level.
Most document store databases also use collections to manage large number of
documents. Collections can be used for different purposes such as navigation, grouping
similar types of documents, and applying business rules to set different permissions,
indexes, and triggers. Collections can contain other collections and trees can have sub-
trees.
Advantages ofDocumentStoredatabases
1. Very flexible.
2. High performance.
3. Variable but usually high scalability.
4. Relatively simple, but powerful APIs
8. An Overview of NoSQL Databases | Rich Perry Page 7 of 13
Disadvantagesof DocumentStoredatabases
1. Overkill if searching inside documents is not necessary. If a user only needs
the whole document, a key-value store might be better.
2. Only well suited for storing documents. If the data is not part of a document, it
should probably be put in a different type of database.
Graph Stores
Graph stores are optimized to efficiently store node and links, and allow users to query
those graphs. Graph Store databases are useful for any business that has complex
relationships between objects such as social networking, rules-based engines, mashups,
and systems that must analyze complex network structures.
A graph store is a system that contains a sequence for nodes and relationships that
create a graph when these things are combined. Key-Value stores have two data fields,
the key and its value. Graph stores have 3 data fields -- node, relationship, and
property.
Graph nodes are nouns and often represent real world objects such people,
organizations, websites, computers on a network, or cities on routes (i.e., highways,
railways, or air routes). The relationships are the connections between the nodes.
Graph store database queries essentially traverse the nodes on the graph. They can
return information such as:
1. Shortest path between two nodes on a graph.
2. Neighboring nodes that have specific properties.
3. Similarities of neighboring nodes between two nodes.
9. An Overview of NoSQL Databases | Rich Perry Page 8 of 13
Relationships are handled differently compared to relational databases. Graph store
databases store related nodes together and assign internal identifiers to nodes, so that it
can join networks.
AdvantagesofGraphStoredatabases
1. Better performance compared to relational databases.
2. Designed to handle complex relationships between data.
DisadvantagesofGraphStoredatabases
1. Difficult to scale horizontally to multiple servers. Data can be replicated on
multiple servers to enhance read performance, but writing to multiple servers that
span multiple nodes is complicated to implement.
Overall Advantages of NoSQLDatabases
Scalability
More specifically, horizontal scaling is a major advantage of NoSQL database systems.
This allows organizations to distribute the database across multiple servers and nodes
rather than just buying bigger and better servers.
Low maintenance in the future
Relational databases require highly trained and experienced DBAs and developers.
Although DBAs are probably not losing their jobs any time in the near future, NoSQL
database systems will require less maintenance, less management, less support, and
either fewer DBAs or IT professionals to do a similar job with less extensive training.
10. An Overview of NoSQL Databases | Rich Perry Page 9 of 13
Cost
Admins can distribute a NoSQL database system across multiple low cost hardware
rather than just buying more expensive servers. Many NoSQL systems are also open
source which usually translates into no cost software. Relational database systems tend
to rely heavily on costly, proprietary software and hardware.
Flexibility
There are different NoSQL data models available to developers. This means that
organizations have some good choices.
Performance
Most NoSQL databases provide much higher performance compared to relational
databases. The main reason for this is that more choices are available to the IT
professionals and management. The IT department can pick which type of database is
best suited for the purpose. Key-value store databases are simple and return one value.
This means that the performance of a NoSQL database system is primarily related to its
focus. With the exception of graph type data stores, the performance is also helped by
the low level complexity.
Overall Disadvantages of NoSQLDatabases
Maturity
NoSQL database systems are not mature and therefore not appropriate for all purposes
and situations. According to Herman Mehling of Database Journal, NoSQL, in general,
lacks credibility in the IT world; whereas some relational databases are known for their
11. An Overview of NoSQL Databases | Rich Perry Page 10 of 13
rich functionality, stability, reliability, vendor support, and wealth of expertise available in
the employment pool. Relational databases have a lot of credibility with many IT
professionals, such as DBAs, developers, data architects, and IT managers and
executives.
Support
Relational databases may be expensive but their vendors provide a very high level of
support. Most NoSQL systems are open source which means the basic software is free
which in turn means the there is little or no support from outside your organization.
Analytics and Business Intelligence (BI)
Most business intelligence tools simply do not have interfaces for or connectivity to
NoSQL databases. Quest Software has developed Toad for Cloud databases which
provide limited ad hoc query support for some NoSQL systems. However there is a
significant overall shortage of BI tools available for NoSQL.
Expertise
There is a wide and deep pool of IT professionals skilled, trained, and/or experienced
with relational databases, but not a lot who know how to use, develop, or maintain
NoSQL systems. Over time, the laws of supply and demand with education and
employment will address this problem, but it won’t happen immediately.
Compatibility
Relational databases have many standards. However, NoSQL databases have very
few, if any, standards. The lack of standards could make it very difficult to switch from
12. An Overview of NoSQL Databases | Rich Perry Page 11 of 13
one vendor to another (i.e., assumes buying vendor software, not just downloading open
source), if an organization becomes displeased with the service.
NoSQLvs. Relational database
Structure
RDBMSs are designed to use only highly structured data. NoSQL databases are
designed to use unstructured or semi-structured data.
ACID
ACID stands for Atomic, Consistent, Isolated, and Durable. It is a way to structure a
database to ensure data integrity and keep transactions reliable. Relational databases
strictly enforce ACID, but NoSQL databases usually do not. This is generally considered
an advantage of relational databases. If your data requires a high amount of
transactions, a relational database is probably a better choice.
Flexibility
Relational databases are not known for being very flexible. As previously mentioned,
they structure data to a high degree and also have rigid schemas. Most NoSQL
databases are very flexible because they have little or no schema, or have a flexible
schema and use data that is unstructured or structured to a much lower degree than
relational databases.
13. An Overview of NoSQL Databases | Rich Perry Page 12 of 13
Normalization
Relational databases need data to be normalized to at least the 3rd degree (a.k.a., 3rd
normal form) in order to fully function in a relational manner and make efficient use of
joins and indexes. NoSQL databases are often designed to support queries and do so
by denormalizing data based on anticipated queries for the given database.
Denormalizing improves query performance, so this is one of several reasons that
NoSQL databases usually provide higher performance, at least for reading.
Conclusions
Any IT professional or manager must carefully consider the purpose before one can
intelligently choose the correct NoSQL database or even decide whether transferring
data from a relational database is appropriate.
NoSQL databases can offer significant benefits, but they are not necessarily an
appropriate solution for every data storage problem. However, any organization
considering converting from a relational DBMS to a NoSQL database system should
carefully note the limitations and other issues associated with these types of databases,
and the risks of having no vendor support if using open source software, plus the lack of
experienced IT professionals.
Both relational databases and NoSQL databases have their place in the IT world.
NoSQL may be a better alternative for some situations. Relational databases may also
be the correct solution for some types of data storage problems. Some situations may
call for a combined solution, where both relational databases and NoSQL databases are
used.
14. An Overview of NoSQL Databases | Rich Perry Page 13 of 13
References
[McCreary, Dan and Kelly, Ann].[2014].[Making Sense of NoSQL].[Manning]
[Brooks, Charlie].[2014].[Enterprise NoSQL for Dummies].[MarkLogic]
[Scofield, Ben].[2010].[NoSQL Death to Relational Databases(?)].[Slide Share].[
http://www.slideshare.net/bscofield/nosql-codemash-2010] (accessed [10/17/2014]).
Lith, Adam; Jakob Mattson (2010). "Investigating storage solutions for large data: A
comparison of well performing and scalable data storage solutions for real time
extraction and batch insertion of data" (PDF). Göteborg: Department of Computer
Science and Engineering, Chalmers University of Technology. p. 70. Retrieved 05 Oct
2014. "Carlo Strozzi first used the term NoSQL in 1998 as a name for his open source
relational database that did not offer a SQL interface[...]"
"NoSQL 2009". Blog.sym-link.com. 12 May 2009. Retrieved 05 October 2014.
[Harrison, Guy].[2010].[10 things you should know about NoSQL
databases].[TechReplublic].[http://www.techrepublic.com/blog/10-things/10-things-you-
should-know-about-nosql-databases/].(accessed [10/17/2014]).
[Mehling , Herman].[2010].[10 things you Need to Know About NoSQL
Databases].[Database Journal].[
http://www.databasejournal.com/features/article.php/3905531/10-things-you-Need-to-
Know-About-NoSQL-Databases.htm].(accessed [10/17/2014]).