3. The current world of NoSQL The current world of NoSQL RDBMS 105+ databases NoSQL 122+ databases Forecast NoSQL market expected to reach $3.4 Billion by 2018 NoSQL market revenue $14 Billion over 2013 – 2018 RDBMS are great and ... will be fine
4.
5.
6.
7.
8. NoSQL models 31% Key Value 10% Document 13% Graph 9% Column Family 6% XML 10% Object 21% Other
9.
10.
11. Comparison General Information Redis CouchBase Neo4j Cassandra Language C C, C++, Erlang Java Java Commercial Support Third party companies Consulting & Support with Enterprise Neo4j Advanced, Neo4j Enterprise DataStax, Impetus, Acunu, Riptapo, Cubet Technologies Customers GitHub, Guardian Media Group Zynga, AOL, BBC Adobe, Cisco, StudiVZ, Deutsche Telekom, Fanbox Twitter, Digg, Reddit, Rackspace, Facebook Licenses New BSD Community & Enterprise Licenses GPL or AGPLv3 Apache License 2
13. Best Use Redis CouchBase Neo4j Cassandra Real-time systems where low latency is critical (games) Syncing online and offline data (allows synchronization and sharing of data and applications across multiple platforms and mobile devices) Cloud/network management Managing large streams of non-transactional data: apache logs, application logs, etc High performance caching tier for web sites and other applications Social and online gaming Social, geospatial data Consistent, fast response times under writes (high volume writes) Server for backed sessions or transient data Data management layer for recommendation engine Bioinformatics Real-time analytics & statistics Service offering some real-time statistics Highly available solution
14. Which model should you use? Column Oriented Store Document Store Key Value Store Graph Database More specific: which NoSQL database?
15.
16.
17.
18. Lessons learned from actual use Hybrid Approach NoSQL RDBMS Business Facade Two Databases: NoSQL + RDBMS Key Value Storage for Session Data + RDBMS for User Data Column Storage for Reporting Data + RDBMS for User Data
19.
Notas do Editor
It's a well known truth that we should choose the right tool for the job. Everyone says that. Who can disagree? The problem is this is not helpful assertion without being able to answer more specific questions like: what jobs are the tools good at? What NoSQL database should I choose out of many available options? Here is Table of Contents of our today's workshop which aim's is to to help you to get an answer for this question. 1) First off, I will provide some statistics and interesting numbers from the current world of NoSQL. 2) Next, you will know some info on NoSQL initiative and specifically about classes of NoSQL databases. 3) Then I will tell you the differences between existing models & classes of NoSQL. We will stop more on one specific database out of each class. 4) Next five minutes I will briefly tell about differences of these databases and outline best use cases for each. 5) When speaking about how ultimately choose the right tool I will recap some recommendations and good approaches. 6) NoSQL clients - few real world examples and stories (from Renat). 7) Finally lessons that were learnt from the actual usage of NoSQL.
The worldwide NoSQL market is expected to reach $3.4 billion by 2018 and NoSQL market will generate $14 Billion in revenues over the period 2013 – 2018. RDBMS are great and the forecast that they will be fine. Why?
Oracle officially released memcached daemon plugin that talks with InnoDB and NoSQL+MySQL has become an official solution. More changes bridging NoSQL\\SQL divide: Neo4j recently announced that JDBC interface was created which forwards database queries to Neo4j and allows common applications to access the NoSQL database without modification. Cassandra + CQL (structured query language) Couchbase Server 2.0 comes along with a NoSQL query language called UnQL Interest in using key-value pair (KVP) technology has reemerged to the point where the traditional RDMS vendors evaluate strategy of developing in-house NoSQL solutions and integrating them in current product offers. It will not take long before we’ll see acquisitions driven by emerging NoSQL technology. Oracle officially released memcached daemon plugin that talks with InnoDB and NoSQL+MySQL has become an official solution. More changes bridging NoSQL\\SQL divide: Neo4j recently announced that JDBC interface was created which forwards database queries to Neo4j and allows common applications to access the NoSQL database without modification. Cassandra + CQL (structured query language) Couchbase Server 2.0 comes along with a NoSQL query language called UnQL By the way, the same state was with the database market in the 1970s before SQL was invented (a lot of APIs and no single standard)
NoSQL initiative promotes a loosely defined class of non-relational data stores that break with a ACID paradigm and relational databases. NoSQL data management systems are inherently: - Schema-free (no unneeded complexity; flexible data models; variety of features and strict data consistency of RDBMS might be unnecessary; - Huge data amount & high throughput over slow, expensive in terms of performance relational databases in favor of more efficient and cheaper ways of managing data; dealing with big data and web scale; - Eventually consistent / BASE (not ACID) -basically available, soft state, eventual consistency; - Simple API
Core NoSQL systems can be divided in these main classes: Key-Value Stores (Riak, Redis, MemcacheDB) Wide Column Store / Column Families (Cassandra, Hbase, Amazon SimpleDB) Document Stores (MongoDB, CouchDB, Jackrabbit) Graph Databases (Neo4J, InfiniteGraph) XML Databases (Berkeley DB XML, eXist) - typically communication is performed by means of HTTP/REST, WebDAV, SOAP, XML-RPC and xml-oriented query method: XQuery, Xpointer, Xpath Object Databases (Objectivity, db4o) – one of the main goals is to provide an easy and native interface to persistence for object oriented programming languages. Other (unresolved and uncategorized)
Redis is an open-source, networked, in-memory, key-value data store with optional durability. It is written in ANSI C. The development of Redis is sponsored by VMware. CouchBase is open source, schema-free document database, which provides JavaScript-based map/reduce-indexing to query and analyze data; peer-based replication, geoCouch for creating location-aware applications, binary packages for Red Hat and Ubuntu Linux, Windows, and Mac OS X. It combines CouchDB, Membase, and Memcached. Neo4j is open source, either embedded or standalone server with REST API, disk-based, fully transactional Java persistence engine. It stores data with multiple relationships, multiple connections in graphs rather than in tables. Cassandra is an open source distributed database management system designed to handle large amounts of data spread. It provides a highly available service with no single point of failure. It is a NoSQL solution that was initially developed by Facebook and powered their Inbox Search feature until late 2010.
Commercial Support: Redis - CouchBase – depending on whether it community edition license or enterprise license Neo4j - Cassandra – third companies provide commercial support and commercial distributions of Cassandra. Customers & some notable users: Redis – Online hosting service GitHub, British Guardian Media Group CouchBase - organizations including Zynga, AOL, the BBC and thousands of others power their interactive web applications with Couchbase Neo4j – Adobe, Cisco, StudiVZ (the largest social network in Europe), Fanbox (social networking website)
Client Libraries (Accessing your data should be easy): CouchBase non-vBucket ("Classic" Memcached clients) or vBucket-aware Type 2 Membase clients (vBucket is defined as the "owner" of a subset of the key space of a membase cluster. Every key "belongs" to a vBucket. A mapping function is used to calculate the vbucket in which a given key belongs). Cassandra's client API is built entirely on top of Thrift for different programming languages including Python, Java, .NET, Ruby, PHP, Perl, C++ Map\\Reduce (Generally available parallel computing might be impotant, ): Cassandra – enables certain Hadoop functionality against Cassandra's data. ACID transactions: Redis is not a "durable" datastore, in the sense of the "D" in ACID. CouchBase – support ACID transaction semantics Neo4j – supports ACID transactions with the default isolation level is read committed, locks are acquired at the Node and Relationship level, deadlock detection is built into the core transaction management.
Redis: Service offering some-realtime statistics. A good example of this - an application built on Redis, a tool for debugging HTTP requests built in 48 hours by Leah Culver and Chris Wanstrath. It's called Hurl. Transient data. Any transient data used by your application is also a good fit for Redis CouchBase: Sync mobile data to cloud data - Not all iPhones, iPads or iPod Touch devices are online all the time, or even within range of Internet connectivity. But the devices, and software, must be useful whether online or offline. Social and online gaming – CouchBase can be a good option for data management layer in the social and online gaming, where predictable latency, responsiveness and automated data caching are required. Data management layer for recommendation engine – recommendation engine targeting ads and offers. Targeting algorithms and approaches can change and often require changes in input data. With schema-free data it's no need to define a database schema before inserting data. Neo4j: Social, geospatial data – neo4j allows queries to find target nodes or shortest paths. It allows indexing on node/relationship properties.
- Maturity Some databases are not as proven Incomplete NoSQL solutions You write a larger data management tier You maintain your business code and infrastructure code You have to customize management and deployment technology and procedures - Connectivity/querying APIs for .NET, Java, Perl, Python, etc. Some solutions have no querying When available query languages differ Lack of general ad-hoc querying – “no” SQL
A distributed system can support only two of the following characteristics: Consistency (all nodes see the same data at the same time), Availability (every operation must terminate in an intended response), Partition tolerance (Operations will complete, even if individual components are unavailable)
Start small, but significant – meaning that you should focus on the problem you try to solve with NoSQL