SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
Is multi-model the future of
NoSQL?
Max Neunhöffer
SouthBay.NET Meetup, 5 March 2015
www.arangodb.com
Max Neunhöffer
I am a mathematician
“Earlier life”: Research in Computer Algebra
(Computational Group Theory)
Always juggled with big data
Now: working in database development, NoSQL, ArangoDB
I like:
research,
hacking,
teaching,
tickling the highest performance out of computer systems.
1
ArangoDB GmbH
triAGENS GmbH offers consulting services since 2004:
software architecture
project management
software development
business analysis
a lot of experience with specialised database systems
have done NoSQL, before the term was coined at all
2011/2012, an idea emerged:
to build the database one had wished to have all those years!
development of ArangoDB as open source software since 2012
ArangoDB GmbH: spin-off to take care of ArangoDB (2014)
2
Document and Key/Value Stores
Document store
A document store stores a set of documents, which usually
means JSON data, these sets are called collections. The
database has access to the contents of the documents.
each document in the collection has a unique key
secondary indexes possible, leading to more powerful queries
different documents in the same collection: structure can vary
no schema is required for a collection
database normalisation can be relaxed
Key/value store
Opaque values, only key lookup without secondary indexes:
=⇒ high performance and perfect scalability
3
Graph databases
Graph database
A graph database stores a labelled graph. Vertices and
edges can be documents. Graphs are good to model
relations.
graphs often describe data very naturally (e.g. the facebook
friendship graph)
graphs can be stored using tables, however, graph queries
notoriously lead to expensive joins
there are interesting and useful graph algorithms like “shortest
path” or “neighbourhood”
need a good query language to reap the benefits
horizontal scalability is troublesome
graph databases vary widely in scope and usage, no standard
4
Column-oriented data stores
Column-oriented data astores
A column-oriented database stores tables but “keeps
columns together” rather than rows.
access to a whole column is fast
sparse rows are handled efficiently
particularly good for certain types of data analysis
often implemented in a key/value-like fashion
row access can be slow
columns have homogeneous data, so compression works well
prominent examples: C-Store and Cassandra
5
Massively parallel: map-reduce and friends
The area of massively parallel
A massively parallel database can use thousands of servers
distributed all over the world and still appears as a single
service.
Humongous data capacity and very high read/write
performance
examples are Apache Cassandra, Apache Hadoop, Google’s
Spanner, Riak and others
these systems have important use cases, in particular in the
analytic domain
query capabilities are somewhat limited like for example only
“map/reduce”
⇒ good horizontal scalability at the cost of reduced query flexibility
6
Polyglot Persistence
Idea
Use the right data model for each part of a system.
For an application, persist
an object or structured data as a JSON document,
a hash table in a key/value store,
relations between objects in a graph database,
a homogeneous array in a relational DBMS.
If the table has many empty cells or inhomogeneous rows, use
a column-oriented database.
Take scalability needs into account!
7
A typical Use Case — an Online Shop
We need to hold
customer data: usually homogeneous, but still variations
=⇒ use a relational DB: MySQL
product data: even for a specialised business quite
inhomogeneous
=⇒ use a document store:
shopping carts: need very fast lookup by session key
=⇒ use a key/value store:
order and sales data: relate customers and products
=⇒ use a document store:
recommendation engine data: links between different entities
=⇒ use a graph database:
8
Polyglot Persistence is nice, but . . .
Consequence: One needs multiple database systems in the persis-
tence layer of a single project!
Polyglot persistence introduces some friction through
data synchronisation,
data conversion,
increased installation and administration effort,
more training needs.
Wouldn’t it be nice, . . .
. . . to enjoy the benefits without the disadvantages?
9
The Multi-Model Approach
Multi-model database
A multi-model database combines a document store with a
graph database and is at the same time a key/value store.
Vertices are documents in a vertex collection,
edges are documents in an edge collection.
a single, common query language for all three data models
is able to compete with specialised products on their turf
allows for polyglot persistence using a single database
queries can mix the different data models
can replace a RDMBS in many cases
10
Why is this possible at all?
Document stores and key/value stores
Document stores: have primary key, are key/value stores.
Without using secondary indexes, performance is nearly as
good as with opaque data instead of JSON.
Good horizontal scalability can be achieved for key lookups.
11
Why is this possible at all?
Document stores and graph databases
graph database: would like to associate arbitrary data with
vertices and edges, so JSON documents are a good choice.
A good edge index, giving fast access to neighbours.
This can be a secondary index.
Graph support in the query language.
Implementations of graph algorithms in the DB engine.
12
A Map of the NoSQL Landscape
Transaction Processing DBs
Analytic processing DBs
Map/reduce
Column Stores
Extensibility
Documents
Massively distributed
Graphs
Structured
Data
Key/Value
Complex queries
13
Use case: Aircraft fleet management
One of our customers uses ArangoDB to
store each part, component, unit or aircraft as a document
model containment as a graph
thus can easily find all parts of some component
keep track of maintenance intervals
perform queries orthogonal to the graph structure
thereby getting good efficiency for all needed queries
14
Use case: Family tree management
For genealogy, the natural object is a family tree.
data naturally comes as a (directed) graph
many queries are traversals or shortest path
but not all, for example:
“all people with name James” in a family tree, sorted by birthday
“all family members who studied at Berkeley”, sorted by
number of children
quite often, queries mixing the different models are useful
15
Use case: knowledge bases
encode nearly arbitrary knowledge
often produced by machine learning
queried in very complex ways by expert systems
often in connection to an inference engine
need linked data with lots of associations
typical queries have unpredictable path length, thus graph
queries shine
nevertheless, often queries orthogonal to the links are needed
16
Recently: Key/Value stores adding other models
(by Basho), originally a key/value store, adds support for
documents with their 2.0 version (late 2014)
(sponsored by Pivotal), originally an in-memory
key/value store, has over time added more data types and
more complex operations
FoundationDB (by FoundationDB) is a key/value store, but is
now marketed as a multi-model database by adding additional
layers on top
OrientDB (by Orient Technologies) started as an object
database and nowadays calls itself a multi-model database
17
Recently: DataStax acquired Aurelius
In February 2015, DataStax (commercialised version of Cassan-
dra (column-oriented)), announced the acquisition of Aurelius, the
company behind TitanDB (a distributed graph database on top of
Cassandra).
In their own words:
“Bringing Graph Database Technology To Cassandra.”
“Will deliver massively scalable, always-on graph database
technology.”
“Will simplify the adoption of leading NoSQL technologies to
support multi-model use case environments.”
18
Recently: MongoDB 3.0 adds pluggable DB engine
is one of the most popular document stores.
In February 2015, they announced their 3.0 version, to be released
in March, featuring
a pluggable storage engine layer
transparent on-disk compression
etc.
This indicates their interest to support more data models than “just
documents”.
It will be very interesting indeed to see if and how they extend their
query-language . . .
19
is a multi-model database (document store & graph database),
is open source and free (Apache 2 license),
offers convenient queries (via HTTP/REST and AQL),
memory efficient by shape detection,
uses JavaScript throughout (Google’s V8 built into server),
API extensible by JavaScript code in the Foxx framework,
offers many drivers for a wide range of languages,
is easy to use with web front end and good documentation,
enjoys good professional as well as community support
and has sharding since Version 2.0.
20
Configurable consistency
ArangoDB offers
atomic and isolated CRUD operations for single documents,
transactions spanning multiple documents and multiple
collections,
snapshot semantics for complex queries,
very secure durable storage using append only and storing
multiple revisions,
all this for documents as well as for graphs.
In the near future, ArangoDB will
implement complete MVCC semantics to allow for lock-free
concurrent transactions
and offer the same ACID semantics even with sharding.
21
Replication and Sharding — horizontal scalability
Right now, ArangoDB provides
easy setup of (asynchronous) replication,
which allows read access parallelisation (master/slaves setup),
sharding with automatic data distribution to multiple servers.
Very soon, ArangoDB will feature
fault tolerance by automatic failover and synchronous
replication in cluster mode,
zero administration by a self-reparing and self-balancing
cluster architecture,
full integration with Apache Mesos and Mesosphere.
22
Powerful query language: AQL
The built in Arango Query Language AQL allows
complex, powerful and convenient queries,
with transaction semantics,
allowing to do joins,
with user definable functions (in JavaScript).
AQL is independent of the driver used and
offers protection against injections by design.
For Version 2.3, we have reengineered the AQL query engine:
use a C++ implementation for high performance,
optimise distributed queries in the cluster.
23
Extensible through JavaScript and Foxx
The HTTP API of ArangoDB
can be extended by user-defined JavaScript code,
that is executed in the DB server for high performance.
This is formalised by the Foxx microservice framework,
which allows to implement complex, user-defined APIs with
direct access to the DB engine.
Very flexible and secure authentication schemes can be
implemented conveniently by the user in JavaScript.
Because JavaScript runs everywhere (in the DB server as well
as in the browser), one can use the same libraries in the
back-end and in the front-end.
=⇒ implement your own micro services
24
The Future of NoSQL: My Observations
I observe
2 decades ago the most versatile solutions eventually
dominated the relational DB market
(Oracle, MySQL, PostgreSQL),
the rise of the polyglot persistence idea
a trend towards multi-model databases
specialised products broadening their scope
even relational systems add support for JSON documents
devOps gaining influence (Docker phenomenon)
25
The Future of NoSQL: My Predictions
In 5 years time . . .
the default approach is to use a multi-model database,
the big vendors will all add other data models,
the NoSQL solutions will conquer a sizable portion
of what is now dominated by the relational model,
specialized products will only survive, if they find a niche.
26
Links
https://www.arangodb.com
http://guesser.9hoeffer.de:8000
https://github.com/ArangoDB/guesser
https://github.com/triAGENS/ArangoDB-NET
27

Mais conteúdo relacionado

Mais procurados

Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1
ArangoDB Database
 
Small Overview of Skype Database Tools
Small Overview of Skype Database ToolsSmall Overview of Skype Database Tools
Small Overview of Skype Database Tools
elliando dias
 

Mais procurados (20)

Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
 
Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1Rupy2012 ArangoDB Workshop Part1
Rupy2012 ArangoDB Workshop Part1
 
Arango DB
Arango DBArango DB
Arango DB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Introduction and overview ArangoDB query language AQL
Introduction and overview ArangoDB query language AQLIntroduction and overview ArangoDB query language AQL
Introduction and overview ArangoDB query language AQL
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
PostgreSQL - Object Relational Database
PostgreSQL - Object Relational DatabasePostgreSQL - Object Relational Database
PostgreSQL - Object Relational Database
 
Domain Driven Design and NoSQL TLV
Domain Driven Design and NoSQL TLVDomain Driven Design and NoSQL TLV
Domain Driven Design and NoSQL TLV
 
Graph Databases & OrientDB
Graph Databases & OrientDBGraph Databases & OrientDB
Graph Databases & OrientDB
 
Hotcode 2013: Javascript in a database (Part 2)
Hotcode 2013: Javascript in a database (Part 2)Hotcode 2013: Javascript in a database (Part 2)
Hotcode 2013: Javascript in a database (Part 2)
 
Small Overview of Skype Database Tools
Small Overview of Skype Database ToolsSmall Overview of Skype Database Tools
Small Overview of Skype Database Tools
 
Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file serverBGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
BGOUG 2012 - Drag & drop and other stuff - Using your database as a file server
 
The other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needsThe other Apache Technologies your Big Data solution needs
The other Apache Technologies your Big Data solution needs
 
Linked data-tooling-xml
Linked data-tooling-xmlLinked data-tooling-xml
Linked data-tooling-xml
 
Multi-model database
Multi-model databaseMulti-model database
Multi-model database
 
Solid pods and the future of the spatial web
Solid pods and the future of the spatial webSolid pods and the future of the spatial web
Solid pods and the future of the spatial web
 
Mongo db
Mongo dbMongo db
Mongo db
 
NHibernate
NHibernateNHibernate
NHibernate
 
DBPedia-past-present-future
DBPedia-past-present-futureDBPedia-past-present-future
DBPedia-past-present-future
 

Semelhante a Is multi-model the future of NoSQL?

Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling Technique
Carmen Sanborn
 
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGOLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
Lucidworks
 
NoSQL_Databases
NoSQL_DatabasesNoSQL_Databases
NoSQL_Databases
Rick Perry
 

Semelhante a Is multi-model the future of NoSQL? (20)

Multi-model databases and node.js
Multi-model databases and node.jsMulti-model databases and node.js
Multi-model databases and node.js
 
guacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDBguacamole: an Object Document Mapper for ArangoDB
guacamole: an Object Document Mapper for ArangoDB
 
Artigo no sql x relational
Artigo no sql x relationalArtigo no sql x relational
Artigo no sql x relational
 
Schemaless Databases
Schemaless DatabasesSchemaless Databases
Schemaless Databases
 
Document Based Data Modeling Technique
Document Based Data Modeling TechniqueDocument Based Data Modeling Technique
Document Based Data Modeling Technique
 
The ABC of Big Data
The ABC of Big DataThe ABC of Big Data
The ABC of Big Data
 
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
3.Implementation with NOSQL databases Document Databases (Mongodb).pptx
 
2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx
 
Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...Comparative study of no sql document, column store databases and evaluation o...
Comparative study of no sql document, column store databases and evaluation o...
 
No sq lv2
No sq lv2No sq lv2
No sq lv2
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
 
Big Data - Hadoop Ecosystem
Big Data -  Hadoop Ecosystem Big Data -  Hadoop Ecosystem
Big Data - Hadoop Ecosystem
 
The future of Big Data tooling
The future of Big Data toolingThe future of Big Data tooling
The future of Big Data tooling
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
 
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGOLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
 
Selecting best NoSQL
Selecting best NoSQL Selecting best NoSQL
Selecting best NoSQL
 
Webcast Q&A- Big Data Architectures Beyond Hadoop
Webcast Q&A- Big Data Architectures Beyond HadoopWebcast Q&A- Big Data Architectures Beyond Hadoop
Webcast Q&A- Big Data Architectures Beyond Hadoop
 
NoSQL_Databases
NoSQL_DatabasesNoSQL_Databases
NoSQL_Databases
 

Mais de Max Neunhöffer

Mais de Max Neunhöffer (9)

Deep Dive on ArangoDB
Deep Dive on ArangoDBDeep Dive on ArangoDB
Deep Dive on ArangoDB
 
Scaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSScaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOS
 
Scaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOSScaling ArangoDB on Mesosphere DCOS
Scaling ArangoDB on Mesosphere DCOS
 
Processing large-scale graphs with Google Pregel
Processing large-scale graphs with Google PregelProcessing large-scale graphs with Google Pregel
Processing large-scale graphs with Google Pregel
 
Backbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTPBackbone using Extensible Database APIs over HTTP
Backbone using Extensible Database APIs over HTTP
 
Complex queries in a distributed multi-model database
Complex queries in a distributed multi-model databaseComplex queries in a distributed multi-model database
Complex queries in a distributed multi-model database
 
Overhauling a database engine in 2 months
Overhauling a database engine in 2 monthsOverhauling a database engine in 2 months
Overhauling a database engine in 2 months
 
Experience with C++11 in ArangoDB
Experience with C++11 in ArangoDBExperience with C++11 in ArangoDB
Experience with C++11 in ArangoDB
 
Extensible Database APIs and their role in Software Architecture
Extensible Database APIs and their role in Software ArchitectureExtensible Database APIs and their role in Software Architecture
Extensible Database APIs and their role in Software Architecture
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Is multi-model the future of NoSQL?

  • 1. Is multi-model the future of NoSQL? Max Neunhöffer SouthBay.NET Meetup, 5 March 2015 www.arangodb.com
  • 2. Max Neunhöffer I am a mathematician “Earlier life”: Research in Computer Algebra (Computational Group Theory) Always juggled with big data Now: working in database development, NoSQL, ArangoDB I like: research, hacking, teaching, tickling the highest performance out of computer systems. 1
  • 3. ArangoDB GmbH triAGENS GmbH offers consulting services since 2004: software architecture project management software development business analysis a lot of experience with specialised database systems have done NoSQL, before the term was coined at all 2011/2012, an idea emerged: to build the database one had wished to have all those years! development of ArangoDB as open source software since 2012 ArangoDB GmbH: spin-off to take care of ArangoDB (2014) 2
  • 4. Document and Key/Value Stores Document store A document store stores a set of documents, which usually means JSON data, these sets are called collections. The database has access to the contents of the documents. each document in the collection has a unique key secondary indexes possible, leading to more powerful queries different documents in the same collection: structure can vary no schema is required for a collection database normalisation can be relaxed Key/value store Opaque values, only key lookup without secondary indexes: =⇒ high performance and perfect scalability 3
  • 5. Graph databases Graph database A graph database stores a labelled graph. Vertices and edges can be documents. Graphs are good to model relations. graphs often describe data very naturally (e.g. the facebook friendship graph) graphs can be stored using tables, however, graph queries notoriously lead to expensive joins there are interesting and useful graph algorithms like “shortest path” or “neighbourhood” need a good query language to reap the benefits horizontal scalability is troublesome graph databases vary widely in scope and usage, no standard 4
  • 6. Column-oriented data stores Column-oriented data astores A column-oriented database stores tables but “keeps columns together” rather than rows. access to a whole column is fast sparse rows are handled efficiently particularly good for certain types of data analysis often implemented in a key/value-like fashion row access can be slow columns have homogeneous data, so compression works well prominent examples: C-Store and Cassandra 5
  • 7. Massively parallel: map-reduce and friends The area of massively parallel A massively parallel database can use thousands of servers distributed all over the world and still appears as a single service. Humongous data capacity and very high read/write performance examples are Apache Cassandra, Apache Hadoop, Google’s Spanner, Riak and others these systems have important use cases, in particular in the analytic domain query capabilities are somewhat limited like for example only “map/reduce” ⇒ good horizontal scalability at the cost of reduced query flexibility 6
  • 8. Polyglot Persistence Idea Use the right data model for each part of a system. For an application, persist an object or structured data as a JSON document, a hash table in a key/value store, relations between objects in a graph database, a homogeneous array in a relational DBMS. If the table has many empty cells or inhomogeneous rows, use a column-oriented database. Take scalability needs into account! 7
  • 9. A typical Use Case — an Online Shop We need to hold customer data: usually homogeneous, but still variations =⇒ use a relational DB: MySQL product data: even for a specialised business quite inhomogeneous =⇒ use a document store: shopping carts: need very fast lookup by session key =⇒ use a key/value store: order and sales data: relate customers and products =⇒ use a document store: recommendation engine data: links between different entities =⇒ use a graph database: 8
  • 10. Polyglot Persistence is nice, but . . . Consequence: One needs multiple database systems in the persis- tence layer of a single project! Polyglot persistence introduces some friction through data synchronisation, data conversion, increased installation and administration effort, more training needs. Wouldn’t it be nice, . . . . . . to enjoy the benefits without the disadvantages? 9
  • 11. The Multi-Model Approach Multi-model database A multi-model database combines a document store with a graph database and is at the same time a key/value store. Vertices are documents in a vertex collection, edges are documents in an edge collection. a single, common query language for all three data models is able to compete with specialised products on their turf allows for polyglot persistence using a single database queries can mix the different data models can replace a RDMBS in many cases 10
  • 12. Why is this possible at all? Document stores and key/value stores Document stores: have primary key, are key/value stores. Without using secondary indexes, performance is nearly as good as with opaque data instead of JSON. Good horizontal scalability can be achieved for key lookups. 11
  • 13. Why is this possible at all? Document stores and graph databases graph database: would like to associate arbitrary data with vertices and edges, so JSON documents are a good choice. A good edge index, giving fast access to neighbours. This can be a secondary index. Graph support in the query language. Implementations of graph algorithms in the DB engine. 12
  • 14. A Map of the NoSQL Landscape Transaction Processing DBs Analytic processing DBs Map/reduce Column Stores Extensibility Documents Massively distributed Graphs Structured Data Key/Value Complex queries 13
  • 15. Use case: Aircraft fleet management One of our customers uses ArangoDB to store each part, component, unit or aircraft as a document model containment as a graph thus can easily find all parts of some component keep track of maintenance intervals perform queries orthogonal to the graph structure thereby getting good efficiency for all needed queries 14
  • 16. Use case: Family tree management For genealogy, the natural object is a family tree. data naturally comes as a (directed) graph many queries are traversals or shortest path but not all, for example: “all people with name James” in a family tree, sorted by birthday “all family members who studied at Berkeley”, sorted by number of children quite often, queries mixing the different models are useful 15
  • 17. Use case: knowledge bases encode nearly arbitrary knowledge often produced by machine learning queried in very complex ways by expert systems often in connection to an inference engine need linked data with lots of associations typical queries have unpredictable path length, thus graph queries shine nevertheless, often queries orthogonal to the links are needed 16
  • 18. Recently: Key/Value stores adding other models (by Basho), originally a key/value store, adds support for documents with their 2.0 version (late 2014) (sponsored by Pivotal), originally an in-memory key/value store, has over time added more data types and more complex operations FoundationDB (by FoundationDB) is a key/value store, but is now marketed as a multi-model database by adding additional layers on top OrientDB (by Orient Technologies) started as an object database and nowadays calls itself a multi-model database 17
  • 19. Recently: DataStax acquired Aurelius In February 2015, DataStax (commercialised version of Cassan- dra (column-oriented)), announced the acquisition of Aurelius, the company behind TitanDB (a distributed graph database on top of Cassandra). In their own words: “Bringing Graph Database Technology To Cassandra.” “Will deliver massively scalable, always-on graph database technology.” “Will simplify the adoption of leading NoSQL technologies to support multi-model use case environments.” 18
  • 20. Recently: MongoDB 3.0 adds pluggable DB engine is one of the most popular document stores. In February 2015, they announced their 3.0 version, to be released in March, featuring a pluggable storage engine layer transparent on-disk compression etc. This indicates their interest to support more data models than “just documents”. It will be very interesting indeed to see if and how they extend their query-language . . . 19
  • 21. is a multi-model database (document store & graph database), is open source and free (Apache 2 license), offers convenient queries (via HTTP/REST and AQL), memory efficient by shape detection, uses JavaScript throughout (Google’s V8 built into server), API extensible by JavaScript code in the Foxx framework, offers many drivers for a wide range of languages, is easy to use with web front end and good documentation, enjoys good professional as well as community support and has sharding since Version 2.0. 20
  • 22. Configurable consistency ArangoDB offers atomic and isolated CRUD operations for single documents, transactions spanning multiple documents and multiple collections, snapshot semantics for complex queries, very secure durable storage using append only and storing multiple revisions, all this for documents as well as for graphs. In the near future, ArangoDB will implement complete MVCC semantics to allow for lock-free concurrent transactions and offer the same ACID semantics even with sharding. 21
  • 23. Replication and Sharding — horizontal scalability Right now, ArangoDB provides easy setup of (asynchronous) replication, which allows read access parallelisation (master/slaves setup), sharding with automatic data distribution to multiple servers. Very soon, ArangoDB will feature fault tolerance by automatic failover and synchronous replication in cluster mode, zero administration by a self-reparing and self-balancing cluster architecture, full integration with Apache Mesos and Mesosphere. 22
  • 24. Powerful query language: AQL The built in Arango Query Language AQL allows complex, powerful and convenient queries, with transaction semantics, allowing to do joins, with user definable functions (in JavaScript). AQL is independent of the driver used and offers protection against injections by design. For Version 2.3, we have reengineered the AQL query engine: use a C++ implementation for high performance, optimise distributed queries in the cluster. 23
  • 25. Extensible through JavaScript and Foxx The HTTP API of ArangoDB can be extended by user-defined JavaScript code, that is executed in the DB server for high performance. This is formalised by the Foxx microservice framework, which allows to implement complex, user-defined APIs with direct access to the DB engine. Very flexible and secure authentication schemes can be implemented conveniently by the user in JavaScript. Because JavaScript runs everywhere (in the DB server as well as in the browser), one can use the same libraries in the back-end and in the front-end. =⇒ implement your own micro services 24
  • 26. The Future of NoSQL: My Observations I observe 2 decades ago the most versatile solutions eventually dominated the relational DB market (Oracle, MySQL, PostgreSQL), the rise of the polyglot persistence idea a trend towards multi-model databases specialised products broadening their scope even relational systems add support for JSON documents devOps gaining influence (Docker phenomenon) 25
  • 27. The Future of NoSQL: My Predictions In 5 years time . . . the default approach is to use a multi-model database, the big vendors will all add other data models, the NoSQL solutions will conquer a sizable portion of what is now dominated by the relational model, specialized products will only survive, if they find a niche. 26