SlideShare uma empresa Scribd logo
1 de 30
CouchBase The Complete NoSql
Solution for Big Data
- Debajani Mohanty
CAP Theorum
Before we get into big data and the role of NOSQL, we must first
understand the CAP theorem. In theoretical computer science, the
CAP theorem, also known as Brewer's theorem, states that it is
impossible for a distributed computer system to simultaneously
provide all three of the following guarantees
1. Consistency (all nodes see the same data at the same time)
2. Availability (a guarantee that every request receives a response about
whether it succeeded or failed)
3. Partition tolerance (the system continues to operate despite arbitrary
message loss or failure of part of the system)
Although all three are impossible to achieve, any two can be achieved
by the systems. That means in order to get high availability and
partition tolerance, you need to sacrifice consistency
The 5 Vs of Big Data
• Big data is a broad term for data sets so large or complex that
traditional data processing applications are inadequate.
• Challenges include analysis, capture, data curation,
search, sharing, storage, transfer, visualization,
and information privacy.
• We currently only see the beginnings of a transformation into a
big data economy. Any business that doesn’t seriously
consider the implications of Big Data runs the risk of being left
behind.
• To get a better understanding of what Big Data is, it is often
described using 5 Vs: Volume Velocity Variety Veracity Value
Volume
Volume Refers to the vast amounts of data generated
every second. We are not talking Terabytes but
Zettabytes or Brontobytes. If we take all the data
generated in the world between the beginning of time
and 2008, the same amount of data will soon be
generated every minute. This makes most data sets too
large to store and analyse using traditional database
technology. New big data tools use distributed systems
so that we can store and analyse data across databases
that are dotted around anywhere in the world
Variety
Variety Refers to the different types of data we
can now use. In the past we only focused on
structured data that neatly fitted into tables or
relational databases, such as financial data. In
fact, 80% of the world’s data is unstructured
(text, images, video, voice, etc.) With big data
technology we can now analyse and bring
together data of different types such as
messages, social media conversations, photos,
sensor data, video or voice recordings.
Velocity
Velocity Refers to the speed at which new data
is generated and the speed at which data moves
around. Just think of social media messages
going viral in seconds. Technology allows us
now to analyze the data while it is being
generated (sometimes referred to as in-memory
analytics), without ever putting it into databases.
Veracity & Value
Veracity refers to truthfulness, correctness
of the data.
Value! Having access to big data is no
good unless we can turn it into value.
Companies are starting to generate
amazing value from their big data.
Big Data and Human Brain
To understand how big data could be solution architected, let’s try to
understand how human brain is architected.
So the key is parallel processing. Hureeyyyyyy!!!
Hadoop & MapReduce
• In 2004, Google published a paper on a process called MapReduce that
used such an architecture.
• The MapReduce framework provides a parallel processing model and
associated implementation to process huge amounts of data. With
MapReduce, queries are split and distributed across parallel nodes and
processed in parallel (the Map step). The results are then gathered and
delivered (the Reduce step). The framework was very successful, so others
wanted to replicate the algorithm. Therefore, an implementation of the
MapReduce framework was adopted by an Apache open source project
named Hadoop.
• But Hadoop is only for processing the data. How can we store this huge
data?
NoSql Database
• A NoSQL (often interpreted as Not only SQL) database often used in big data-centric
real-time web applications, provides a mechanism for storage and retrieval of data
that is modeled in means other than the tabular relations used in relational
databases.
• Motivations for this approach include simplicity of design, horizontal scaling, and finer
control over availability. The data structures used by NoSQL databases (e.g. key-
value, graph, or document) differ from those used in relational databases, making
some operations faster in NoSQL and others faster in relational databases.
• The particular suitability of a given NoSQL database depends on the problem it must
solve.
Types of NoSQL databases
• There have been various approaches to classify NoSQL databases, each
with different categories and subcategories. Because of the variety of
approaches and overlaps it is difficult to get and maintain an overview of
non-relational databases. Nevertheless, a basic classification is based on
data model. A few examples in each category are:
• Column: Accumulo, Cassandra, Druid, HBase, Vertica
• Document: Lotus Notes, Clusterpoint, Apache CouchDB, Couchbase,
MarkLogic, MongoDB, OrientDB, Qizx
• Key-value: CouchDB, Dynamo, FoundationDB, MemcacheDB, Redis, Riak,
FairCom c-treeACE, Aerospike, OrientDB, MUMPS
• Graph: Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso, Stardog
• Multi-model: OrientDB, FoundationDB, ArangoDB, Alchemy Database,
CortexDB
Graph Database
• This kind of database is designed for data
whose relations are well represented as a
graph (elements interconnected with an
undetermined number of relations
between them). The kind of data could be
social relations, public transport links, road
maps or network topologies.
Key-value stores
• In this model, data is represented as a
collection of key-value pairs, such that
each possible key appears at most once
in the collection. The key-value model is
one of the simplest non-trivial data
models, and richer data models are often
implemented on top of it.
Document-oriented databases
• The central concept of a document store is the notion of a
"document". While each document-oriented database
implementation differs on the details of this definition, in general,
they all assume that documents encapsulate and encode data (or
information) in some standard formats or encodings. Encodings in
use include XML, JSON as well as binary forms like BSON.
• The most widely used solutions in no-sql are MongoDB and
CouchBase and both of them are document-oriented databases.
• Here is a sample document:
{
'_id' : '5897g42s0245afo4o473ai1e7',
'firstname': 'John',
'lastname': 'Doe',
'age': 26,
'sex': 'M',
'interests': [ 'Reading', 'Running', 'Hacking' ]
}
MongoDB vs CouchBase
Results Analysis
Another Analysis
Scalability
• In Couchbase, you can easily add servers to do clustering and
obtain a distributed system, Couchbase is flexible enough to avoid
downtime. Indeed, it relies on the power of the Erlang language, a
functional and fault-tolerant language that manages hot changes.
• For MongoDB, the configuration is a bit more complicated. For
example, once you have defined the shard key (the key to distribute
documents within a sharded cluster), it becomes difficult to change it
afterwards. The system is not as flexible, so you have to think
carefully about your data modeling before you move your
application into production.
• Scalability is why Couchbase is widely used in social gaming, where
millions of players can play and their numbers can increase
exponentially overnight.
Monitoring tool
Couchbase comes with a turnkey package while MongoDB requires an
additional subscription to a monitoring service. You can monitor MongoDB
using the command line, but a monitoring tool without graphical interface is
relatively restrictive.
Introducing CouchBase
• Couchbase provides the world’s most complete, most scalable and
best performing NoSQL database.
• Based on a share nothing architecture, a single node-type, a built in
caching layer, true auto-sharding and the world’s first NoSQL mobile
offering: Couchbase Mobile, a complete NoSQL mobile solution
comprised of Couchbase Server, Couchbase Sync Gateway and
Couchbase Lite.
• Clients: AT&T, Amadeus, Bally’s, Beats Music, Cisco, Comcast,
Concur, Disney, eBay / PayPal, Neiman Marcus, Orbitz, Rakuten /
Viber, Sky, Tencent, Tesco, Verizon and Willis Group, as well as
hundreds of other household names worldwide
Real life Use Cases
Couchbase Server’s unique combinations could be 1) linear, horizontal
scalability, 2) sustained low latency and high throughput performance, and
3) the extensibility of the system.
Few usecases:
• Session store: User sessions are easily stored and managed in
Couchbase, for instance, by using the document ID naming scheme,
“user:USERID”. With Couchbase Server, you can flag items for deletion
after a certain amount of time, and therefore you have the option of having
Couchbase automatically delete old sessions.
• Social gaming: You can model and store game state, property state, time
lines, conversations and chats with Couchbase Server. The asynchronous
persistence algorithms of Couchbase were designed, built and deployed to
support some of the highest scale social games.
• Ad, offer, and content targeting: The same attributes which serve
Couchbase in the gaming context also apply well for real-time ad and
content targeting. For example, Couchbase provides a fast storage
capability for counters. Counters are useful for tracking visits, associating
users with various targeting profiles, tracking ad-offers, and for tracking ad-
inventory.
Buckets
• Couchbase Server stores all of your application data in either RAM or on disk. The
data containers used in Couchbase Server are called buckets; there are two bucket
types in Couchbase, which reflect the two types of data storage that we use in
Couchbase Server. Buckets also serve as namespaces for documents and are used
to look up a document by key:
• Couchbase Buckets
• Memcached Buckets
• You can customize the properties of each bucket, within limits using Couchbase
Admin Console, Couchbase Command Line Interface (CLI), or the Couchbase REST
Admin API. Quotas for RAM and disk space can be configured per bucket so you can
manage usage across a cluster
• Couchbase Server is best suited for fast-changing data items of relatively small size.
For in-memory storage, using Couchbase Memcached buckets, the memcached
standard 1 megabyte limit applies to each value. Items suitable for storage include
shopping carts, user profile, user sessions, time lines, game states, pages,
conversations and product catalog. Items that are less suitable include large audio or
video media files.
• On that note, some Couchbase SDKs offer the additional feature of optionally
compressing/decompressing objects stored into Couchbase. The CPU-time versus
space trade-off here should be considered
Couchbase Buckets
• Couchbase Buckets: provide data persistence and data replication. Data
stored in Couchbase Buckets is highly-available and reconfigurable without
server downtime. They can survive node failures and restore data plus allow
cluster reconfiguration while still fulfilling service requests. The main
features are:
– Supports items up to 20MB in size.
– Persistence, including data sets that are larger than the allocated memory size
for a bucket. You can configure persistence per bucket and Couchbase Server
will persist data asynchronously from RAM to disk
– Fully supports replication and server rebalancing. You can configure one or more
replica servers for a Couchbase bucket. If a node fails, a replica node can be
promoted to be the host node.
– Full range of statistics supported.
Memcached Buckets
• Memcached Buckets: provides in-memory document storage. Memcache
buckets cache frequently-used data in memory, thereby reducing the
number of queries a database server must perform in response to web
application requests. Memcached buckets can work alongside relational
database technology, not only NoSQL databases.
– Item size limited to 1 MByte.
– No persistence.
– No replication; no rebalancing.
– Statistics about Memcached Buckets are on RAM usage and client-side
operations.
Keys & Metadata
• All information that you store in Couchbase Server are documents with keys, unique identifiers
for a document, and values are either JSON documents or if you choose the data you want to
store can be byte stream, data types, or other forms of serialized objects.
• Keys are also known as document IDs and serve the same function as a SQL primary key. A key
in Couchbase Server can be any string and is unique.
• By default, all documents contain metadata that is provided by the Couchbase Server. The
metadata is stored with the document and is used to change how the document is handled.
• CAS Value—Also called CAS token or CAS ID, this value is a unique identifier associated with a
document that is verified by the Couchbase Server before a document is deleted or changed and
provides a form of basic optimistic concurrency. When Couchbase Server checks a CAS value
before changing data, it effectively prevents data loss without having to lock records. Couchbase
Server prevents a document from being altered by an operation if another process alters the
document and its CAS value, in the meantime.
• Time to Live (TTL)—This is an expiration for a document typically specified in seconds. By
default, any document created in Couchbase Server that does not have a given TTL will have an
indefinite life span and will remain in Couchbase Server unless an explicit delete call from a client
removes it. The Couchbase Server will delete values during regular maintenance if the TTL for an
item has expired.
Note: The expiration value deletes information from the entire database. It has no effect on when
the information is removed from the RAM caching layer.
• Flags—These are SDK- specific flags which are used to provides a variety of options during
storage, retrieval, update, and removal of documents. Typically flags are optional metadata used
by a Couchbase client library to perform additional processing of a document. An example of
flags include the ability to specify that a document be formatted a specific way before it is stored.
Creating First Application
Components for your development environment:
• Couchbase Server: installed on a virtual or physical machine separate from the machine
containing your web application server. Download the appropriate version for your environment
here http://www.couchbase.com/download
• Couchbase SDK: installed for runtime on the machine containing your web application server.
You will also need to make the SDKs available in your development environment in order to
compile/interpret your client-side code. The SDKs are programming-language and platform-
specific. You will use your SDK to communicate with the Couchbase Server from your web
application. Downloads for your chosen SDK are here: http://www.couchbase.com/develop
• Couchbase Admin Console: administering your Couchbase Server is done via the Couchbase
Admin Console, a web application viewable in most modern browsers. Your development
environment should therefore have the latest version of Mozilla Firefox 3.6+, Apple Safari 5+,
Google Chrome 11, or Internet Explorer 8, or higher. You should set your browser preference to
be JavaScript enabled.
The development languages supported by the Couchbase Client SDK Libraries are Java, .NET,
PHP, Ruby, C
Connecting A Bucket
• After you have your Couchbase Server up and running, and your
chosen Couchbase Client libraries installed on a web server, you
create the code that connects to the server from the client.
1. Make a new bucket request to the REST endpoint for buckets and
provide the new bucket settings as request parameters:
shell> curl -u Administrator:password 
2. -d name=newBucket -d ramQuotaMB=100 -d authType=none 
3. -d replicaNumber=1 -d proxyPort=11215
http://localhost:8091/pools/default/buckets
Connecting to Couchbase Server
The following shows a basic steps for creating a connection:
• Include, import, link, or require Couchbase SDK libraries into your program
files. In the example that follows, we require 'couchbase'.
• Provide connection information for the Couchbase cluster. Typically this
includes URI, bucket ID, a password and optional parameters and can be
provided as a list or string. To avoid failure to initially connect, you should
provide and try at least two URL’s for two different nodes. In the following
example, we provide connection information as"http://<host>:<port>/pools".
In this case there is no password required.
• Create an instance of a Couchbase client object. In the example that
follows, we create a new client instance in the client =
Couchbase.connect statement.
• Perform any database operations for your applications, such as read, write,
delete, or query.
• If needed, destroy the client, and therefore disconnect.
Connecting to Couchbase Server..
• The below example in Java we demonstrate how it is safest to
create at least two possible node URIs while creating an initial
connection with the server. This way, if your application attempts to
connect, but one node is down, the client automatically re-attempts
to connect with the second node URL:
// Set up at least two URIs in case one server fails
List<URI> servers = new ArrayList<URI>();
servers.add("http://<host>:8091/pools");
servers.add("http://<host>:8091/pools");
// Create a client talking to the default bucket
CouchbaseClient cbc = new CouchbaseClient(servers, "default", "");
// Create a client talking to the default bucket
CouchbaseClient cbc = new CouchbaseClient(servers, "default", "");
System.err.println(cbc.get(“thisname") + " is off developing with Couchbase!");

Mais conteúdo relacionado

Mais procurados

ZK MVVM, Spring & JPA On Two PaaS Clouds
ZK MVVM, Spring & JPA On Two PaaS CloudsZK MVVM, Spring & JPA On Two PaaS Clouds
ZK MVVM, Spring & JPA On Two PaaS Clouds
Simon Massey
 
Lessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at DatabricksLessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at Databricks
Matei Zaharia
 
Developing Distributed Web Applications, Where does REST fit in?
Developing Distributed Web Applications, Where does REST fit in?Developing Distributed Web Applications, Where does REST fit in?
Developing Distributed Web Applications, Where does REST fit in?
Srinath Perera
 
Databases & Microsoft SQL Server
Databases & Microsoft SQL ServerDatabases & Microsoft SQL Server
Databases & Microsoft SQL Server
Mahmoud Abdallah
 

Mais procurados (20)

Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
 
Relational databases vs Non-relational databases
Relational databases vs Non-relational databasesRelational databases vs Non-relational databases
Relational databases vs Non-relational databases
 
ZK MVVM, Spring & JPA On Two PaaS Clouds
ZK MVVM, Spring & JPA On Two PaaS CloudsZK MVVM, Spring & JPA On Two PaaS Clouds
ZK MVVM, Spring & JPA On Two PaaS Clouds
 
No sql3 rmoug
No sql3 rmougNo sql3 rmoug
No sql3 rmoug
 
Cloudfoundry architecture
Cloudfoundry architectureCloudfoundry architecture
Cloudfoundry architecture
 
Lessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at DatabricksLessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at Databricks
 
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
Building Highly Scalable Java Applications on Windows Azure - JavaOne S313978
 
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQLFrom Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
From Obvious to Ingenius: Incrementally Scaling Web Apps on PostgreSQL
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
 
Developing Distributed Web Applications, Where does REST fit in?
Developing Distributed Web Applications, Where does REST fit in?Developing Distributed Web Applications, Where does REST fit in?
Developing Distributed Web Applications, Where does REST fit in?
 
Azure and cloud design patterns
Azure and cloud design patternsAzure and cloud design patterns
Azure and cloud design patterns
 
Writing simple web services in java using eclipse editor
Writing simple web services in java using eclipse editorWriting simple web services in java using eclipse editor
Writing simple web services in java using eclipse editor
 
7 Stages of Scaling Web Applications
7 Stages of Scaling Web Applications7 Stages of Scaling Web Applications
7 Stages of Scaling Web Applications
 
CloudConnect 2011 - Building Highly Scalable Java Applications on Windows Azure
CloudConnect 2011 - Building Highly Scalable Java Applications on Windows AzureCloudConnect 2011 - Building Highly Scalable Java Applications on Windows Azure
CloudConnect 2011 - Building Highly Scalable Java Applications on Windows Azure
 
NoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture PatternsNoSQL Now! NoSQL Architecture Patterns
NoSQL Now! NoSQL Architecture Patterns
 
SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?SQL/NoSQL How to choose ?
SQL/NoSQL How to choose ?
 
Databases & Microsoft SQL Server
Databases & Microsoft SQL ServerDatabases & Microsoft SQL Server
Databases & Microsoft SQL Server
 
Sql vs NO-SQL database differences explained
Sql vs NO-SQL database differences explainedSql vs NO-SQL database differences explained
Sql vs NO-SQL database differences explained
 
Why NBC Universal Migrated to MongoDB Atlas
Why NBC Universal Migrated to MongoDB AtlasWhy NBC Universal Migrated to MongoDB Atlas
Why NBC Universal Migrated to MongoDB Atlas
 
Azure in Developer Perspective
Azure in Developer PerspectiveAzure in Developer Perspective
Azure in Developer Perspective
 

Semelhante a CouchBase The Complete NoSql Solution for Big Data

Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
Mohit Tare
 
TCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYATCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYA
Aditya Srinivasan
 
Nosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptxNosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptx
Radhika R
 

Semelhante a CouchBase The Complete NoSql Solution for Big Data (20)

Fundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and HadoopFundamentals of big data analytics and Hadoop
Fundamentals of big data analytics and Hadoop
 
NOSQL
NOSQLNOSQL
NOSQL
 
Module-2_HADOOP.pptx
Module-2_HADOOP.pptxModule-2_HADOOP.pptx
Module-2_HADOOP.pptx
 
BIg Data Analytics-Module-2 vtu engineering.pptx
BIg Data Analytics-Module-2 vtu engineering.pptxBIg Data Analytics-Module-2 vtu engineering.pptx
BIg Data Analytics-Module-2 vtu engineering.pptx
 
Big data and hadoop
Big data and hadoopBig data and hadoop
Big data and hadoop
 
TCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYATCS_DATA_ANALYSIS_REPORT_ADITYA
TCS_DATA_ANALYSIS_REPORT_ADITYA
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
Big Data
Big DataBig Data
Big Data
 
No Sql On Social And Sematic Web
No Sql On Social And Sematic WebNo Sql On Social And Sematic Web
No Sql On Social And Sematic Web
 
NoSQL On Social And Sematic Web
NoSQL On Social And Sematic WebNoSQL On Social And Sematic Web
NoSQL On Social And Sematic Web
 
Nosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptxNosql-Module 1 PPT.pptx
Nosql-Module 1 PPT.pptx
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
Big data and hadoop overvew
Big data and hadoop overvewBig data and hadoop overvew
Big data and hadoop overvew
 
Big Data_Architecture.pptx
Big Data_Architecture.pptxBig Data_Architecture.pptx
Big Data_Architecture.pptx
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edge
 
Hadoop - Architectural road map for Hadoop Ecosystem
Hadoop -  Architectural road map for Hadoop EcosystemHadoop -  Architectural road map for Hadoop Ecosystem
Hadoop - Architectural road map for Hadoop Ecosystem
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
DSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and CassandraDSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and Cassandra
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

CouchBase The Complete NoSql Solution for Big Data

  • 1. CouchBase The Complete NoSql Solution for Big Data - Debajani Mohanty
  • 2. CAP Theorum Before we get into big data and the role of NOSQL, we must first understand the CAP theorem. In theoretical computer science, the CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees 1. Consistency (all nodes see the same data at the same time) 2. Availability (a guarantee that every request receives a response about whether it succeeded or failed) 3. Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system) Although all three are impossible to achieve, any two can be achieved by the systems. That means in order to get high availability and partition tolerance, you need to sacrifice consistency
  • 3.
  • 4. The 5 Vs of Big Data • Big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. • Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, and information privacy. • We currently only see the beginnings of a transformation into a big data economy. Any business that doesn’t seriously consider the implications of Big Data runs the risk of being left behind. • To get a better understanding of what Big Data is, it is often described using 5 Vs: Volume Velocity Variety Veracity Value
  • 5. Volume Volume Refers to the vast amounts of data generated every second. We are not talking Terabytes but Zettabytes or Brontobytes. If we take all the data generated in the world between the beginning of time and 2008, the same amount of data will soon be generated every minute. This makes most data sets too large to store and analyse using traditional database technology. New big data tools use distributed systems so that we can store and analyse data across databases that are dotted around anywhere in the world
  • 6. Variety Variety Refers to the different types of data we can now use. In the past we only focused on structured data that neatly fitted into tables or relational databases, such as financial data. In fact, 80% of the world’s data is unstructured (text, images, video, voice, etc.) With big data technology we can now analyse and bring together data of different types such as messages, social media conversations, photos, sensor data, video or voice recordings.
  • 7. Velocity Velocity Refers to the speed at which new data is generated and the speed at which data moves around. Just think of social media messages going viral in seconds. Technology allows us now to analyze the data while it is being generated (sometimes referred to as in-memory analytics), without ever putting it into databases.
  • 8. Veracity & Value Veracity refers to truthfulness, correctness of the data. Value! Having access to big data is no good unless we can turn it into value. Companies are starting to generate amazing value from their big data.
  • 9. Big Data and Human Brain To understand how big data could be solution architected, let’s try to understand how human brain is architected. So the key is parallel processing. Hureeyyyyyy!!!
  • 10. Hadoop & MapReduce • In 2004, Google published a paper on a process called MapReduce that used such an architecture. • The MapReduce framework provides a parallel processing model and associated implementation to process huge amounts of data. With MapReduce, queries are split and distributed across parallel nodes and processed in parallel (the Map step). The results are then gathered and delivered (the Reduce step). The framework was very successful, so others wanted to replicate the algorithm. Therefore, an implementation of the MapReduce framework was adopted by an Apache open source project named Hadoop. • But Hadoop is only for processing the data. How can we store this huge data?
  • 11. NoSql Database • A NoSQL (often interpreted as Not only SQL) database often used in big data-centric real-time web applications, provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. • Motivations for this approach include simplicity of design, horizontal scaling, and finer control over availability. The data structures used by NoSQL databases (e.g. key- value, graph, or document) differ from those used in relational databases, making some operations faster in NoSQL and others faster in relational databases. • The particular suitability of a given NoSQL database depends on the problem it must solve.
  • 12. Types of NoSQL databases • There have been various approaches to classify NoSQL databases, each with different categories and subcategories. Because of the variety of approaches and overlaps it is difficult to get and maintain an overview of non-relational databases. Nevertheless, a basic classification is based on data model. A few examples in each category are: • Column: Accumulo, Cassandra, Druid, HBase, Vertica • Document: Lotus Notes, Clusterpoint, Apache CouchDB, Couchbase, MarkLogic, MongoDB, OrientDB, Qizx • Key-value: CouchDB, Dynamo, FoundationDB, MemcacheDB, Redis, Riak, FairCom c-treeACE, Aerospike, OrientDB, MUMPS • Graph: Allegro, Neo4J, InfiniteGraph, OrientDB, Virtuoso, Stardog • Multi-model: OrientDB, FoundationDB, ArangoDB, Alchemy Database, CortexDB
  • 13. Graph Database • This kind of database is designed for data whose relations are well represented as a graph (elements interconnected with an undetermined number of relations between them). The kind of data could be social relations, public transport links, road maps or network topologies.
  • 14. Key-value stores • In this model, data is represented as a collection of key-value pairs, such that each possible key appears at most once in the collection. The key-value model is one of the simplest non-trivial data models, and richer data models are often implemented on top of it.
  • 15. Document-oriented databases • The central concept of a document store is the notion of a "document". While each document-oriented database implementation differs on the details of this definition, in general, they all assume that documents encapsulate and encode data (or information) in some standard formats or encodings. Encodings in use include XML, JSON as well as binary forms like BSON. • The most widely used solutions in no-sql are MongoDB and CouchBase and both of them are document-oriented databases. • Here is a sample document: { '_id' : '5897g42s0245afo4o473ai1e7', 'firstname': 'John', 'lastname': 'Doe', 'age': 26, 'sex': 'M', 'interests': [ 'Reading', 'Running', 'Hacking' ] }
  • 19. Scalability • In Couchbase, you can easily add servers to do clustering and obtain a distributed system, Couchbase is flexible enough to avoid downtime. Indeed, it relies on the power of the Erlang language, a functional and fault-tolerant language that manages hot changes. • For MongoDB, the configuration is a bit more complicated. For example, once you have defined the shard key (the key to distribute documents within a sharded cluster), it becomes difficult to change it afterwards. The system is not as flexible, so you have to think carefully about your data modeling before you move your application into production. • Scalability is why Couchbase is widely used in social gaming, where millions of players can play and their numbers can increase exponentially overnight.
  • 20. Monitoring tool Couchbase comes with a turnkey package while MongoDB requires an additional subscription to a monitoring service. You can monitor MongoDB using the command line, but a monitoring tool without graphical interface is relatively restrictive.
  • 21. Introducing CouchBase • Couchbase provides the world’s most complete, most scalable and best performing NoSQL database. • Based on a share nothing architecture, a single node-type, a built in caching layer, true auto-sharding and the world’s first NoSQL mobile offering: Couchbase Mobile, a complete NoSQL mobile solution comprised of Couchbase Server, Couchbase Sync Gateway and Couchbase Lite. • Clients: AT&T, Amadeus, Bally’s, Beats Music, Cisco, Comcast, Concur, Disney, eBay / PayPal, Neiman Marcus, Orbitz, Rakuten / Viber, Sky, Tencent, Tesco, Verizon and Willis Group, as well as hundreds of other household names worldwide
  • 22. Real life Use Cases Couchbase Server’s unique combinations could be 1) linear, horizontal scalability, 2) sustained low latency and high throughput performance, and 3) the extensibility of the system. Few usecases: • Session store: User sessions are easily stored and managed in Couchbase, for instance, by using the document ID naming scheme, “user:USERID”. With Couchbase Server, you can flag items for deletion after a certain amount of time, and therefore you have the option of having Couchbase automatically delete old sessions. • Social gaming: You can model and store game state, property state, time lines, conversations and chats with Couchbase Server. The asynchronous persistence algorithms of Couchbase were designed, built and deployed to support some of the highest scale social games. • Ad, offer, and content targeting: The same attributes which serve Couchbase in the gaming context also apply well for real-time ad and content targeting. For example, Couchbase provides a fast storage capability for counters. Counters are useful for tracking visits, associating users with various targeting profiles, tracking ad-offers, and for tracking ad- inventory.
  • 23. Buckets • Couchbase Server stores all of your application data in either RAM or on disk. The data containers used in Couchbase Server are called buckets; there are two bucket types in Couchbase, which reflect the two types of data storage that we use in Couchbase Server. Buckets also serve as namespaces for documents and are used to look up a document by key: • Couchbase Buckets • Memcached Buckets • You can customize the properties of each bucket, within limits using Couchbase Admin Console, Couchbase Command Line Interface (CLI), or the Couchbase REST Admin API. Quotas for RAM and disk space can be configured per bucket so you can manage usage across a cluster • Couchbase Server is best suited for fast-changing data items of relatively small size. For in-memory storage, using Couchbase Memcached buckets, the memcached standard 1 megabyte limit applies to each value. Items suitable for storage include shopping carts, user profile, user sessions, time lines, game states, pages, conversations and product catalog. Items that are less suitable include large audio or video media files. • On that note, some Couchbase SDKs offer the additional feature of optionally compressing/decompressing objects stored into Couchbase. The CPU-time versus space trade-off here should be considered
  • 24. Couchbase Buckets • Couchbase Buckets: provide data persistence and data replication. Data stored in Couchbase Buckets is highly-available and reconfigurable without server downtime. They can survive node failures and restore data plus allow cluster reconfiguration while still fulfilling service requests. The main features are: – Supports items up to 20MB in size. – Persistence, including data sets that are larger than the allocated memory size for a bucket. You can configure persistence per bucket and Couchbase Server will persist data asynchronously from RAM to disk – Fully supports replication and server rebalancing. You can configure one or more replica servers for a Couchbase bucket. If a node fails, a replica node can be promoted to be the host node. – Full range of statistics supported.
  • 25. Memcached Buckets • Memcached Buckets: provides in-memory document storage. Memcache buckets cache frequently-used data in memory, thereby reducing the number of queries a database server must perform in response to web application requests. Memcached buckets can work alongside relational database technology, not only NoSQL databases. – Item size limited to 1 MByte. – No persistence. – No replication; no rebalancing. – Statistics about Memcached Buckets are on RAM usage and client-side operations.
  • 26. Keys & Metadata • All information that you store in Couchbase Server are documents with keys, unique identifiers for a document, and values are either JSON documents or if you choose the data you want to store can be byte stream, data types, or other forms of serialized objects. • Keys are also known as document IDs and serve the same function as a SQL primary key. A key in Couchbase Server can be any string and is unique. • By default, all documents contain metadata that is provided by the Couchbase Server. The metadata is stored with the document and is used to change how the document is handled. • CAS Value—Also called CAS token or CAS ID, this value is a unique identifier associated with a document that is verified by the Couchbase Server before a document is deleted or changed and provides a form of basic optimistic concurrency. When Couchbase Server checks a CAS value before changing data, it effectively prevents data loss without having to lock records. Couchbase Server prevents a document from being altered by an operation if another process alters the document and its CAS value, in the meantime. • Time to Live (TTL)—This is an expiration for a document typically specified in seconds. By default, any document created in Couchbase Server that does not have a given TTL will have an indefinite life span and will remain in Couchbase Server unless an explicit delete call from a client removes it. The Couchbase Server will delete values during regular maintenance if the TTL for an item has expired. Note: The expiration value deletes information from the entire database. It has no effect on when the information is removed from the RAM caching layer. • Flags—These are SDK- specific flags which are used to provides a variety of options during storage, retrieval, update, and removal of documents. Typically flags are optional metadata used by a Couchbase client library to perform additional processing of a document. An example of flags include the ability to specify that a document be formatted a specific way before it is stored.
  • 27. Creating First Application Components for your development environment: • Couchbase Server: installed on a virtual or physical machine separate from the machine containing your web application server. Download the appropriate version for your environment here http://www.couchbase.com/download • Couchbase SDK: installed for runtime on the machine containing your web application server. You will also need to make the SDKs available in your development environment in order to compile/interpret your client-side code. The SDKs are programming-language and platform- specific. You will use your SDK to communicate with the Couchbase Server from your web application. Downloads for your chosen SDK are here: http://www.couchbase.com/develop • Couchbase Admin Console: administering your Couchbase Server is done via the Couchbase Admin Console, a web application viewable in most modern browsers. Your development environment should therefore have the latest version of Mozilla Firefox 3.6+, Apple Safari 5+, Google Chrome 11, or Internet Explorer 8, or higher. You should set your browser preference to be JavaScript enabled. The development languages supported by the Couchbase Client SDK Libraries are Java, .NET, PHP, Ruby, C
  • 28. Connecting A Bucket • After you have your Couchbase Server up and running, and your chosen Couchbase Client libraries installed on a web server, you create the code that connects to the server from the client. 1. Make a new bucket request to the REST endpoint for buckets and provide the new bucket settings as request parameters: shell> curl -u Administrator:password 2. -d name=newBucket -d ramQuotaMB=100 -d authType=none 3. -d replicaNumber=1 -d proxyPort=11215 http://localhost:8091/pools/default/buckets
  • 29. Connecting to Couchbase Server The following shows a basic steps for creating a connection: • Include, import, link, or require Couchbase SDK libraries into your program files. In the example that follows, we require 'couchbase'. • Provide connection information for the Couchbase cluster. Typically this includes URI, bucket ID, a password and optional parameters and can be provided as a list or string. To avoid failure to initially connect, you should provide and try at least two URL’s for two different nodes. In the following example, we provide connection information as"http://<host>:<port>/pools". In this case there is no password required. • Create an instance of a Couchbase client object. In the example that follows, we create a new client instance in the client = Couchbase.connect statement. • Perform any database operations for your applications, such as read, write, delete, or query. • If needed, destroy the client, and therefore disconnect.
  • 30. Connecting to Couchbase Server.. • The below example in Java we demonstrate how it is safest to create at least two possible node URIs while creating an initial connection with the server. This way, if your application attempts to connect, but one node is down, the client automatically re-attempts to connect with the second node URL: // Set up at least two URIs in case one server fails List<URI> servers = new ArrayList<URI>(); servers.add("http://<host>:8091/pools"); servers.add("http://<host>:8091/pools"); // Create a client talking to the default bucket CouchbaseClient cbc = new CouchbaseClient(servers, "default", ""); // Create a client talking to the default bucket CouchbaseClient cbc = new CouchbaseClient(servers, "default", ""); System.err.println(cbc.get(“thisname") + " is off developing with Couchbase!");