SlideShare uma empresa Scribd logo
1 de 47
Couchbase and Apache Spark
efficient data crunching in a fast moving world
©2015 Couchbase Inc. 2
Matt Ingenthron
Worked on large site scalability
problems at previous
company…
memcached contributor
Joined Couchbase very early
and helped define key parts of
system
A Quick Architectural
Introduction to Couchbase
©2015 Couchbase Inc. 4
Couchbase is a Document Oriented Database
High availability
cache
Key-value
store
Document
database
Embedded
database
Sync
management
Couchbase can be used a number of ways.
Developers often need a simple distributed hashtable, then grow to need secondary indexing
and are either mobile-first or need to address mobile deployment.
©2015 Couchbase Inc. 5
What makes Couchbase unique?
5
Performance &
scalability leader
Sub millisecond latency
with high throughput;
memory-centric
architecture
Multi-
purpose
Simplified
administration
Easy to deploy &
manage; integrated
Admin Console, single-
click cluster expansion
& rebalance
Cache, key value store,
document database,
and local/mobile
database in single
platform
Always-on
availability
Data replication across
nodes, clusters, and
data centers
Enterprises choose Couchbase for several key advantages
24x365
©2015 Couchbase Inc. 6
 Consolidated cache and
database
 Tune memory required based
on application requirements
Multi-purpose database supports many uses
6
6
Tunable built-in
cache
Flexible schemas
with JSON
Couchbase Lite
 Represent data with varying
schemas using JSON on the
server or on the device
 Index and query data with
Javascript views
 Light weight embedded DB for
always available apps
 Sync Gateway syncs data
seamlessly with Couchbase
Server
©2015 Couchbase Inc. 7
Couchbase leads in performance and scalability
Auto
Sharding
Memory-memory
XDCR
Single
NodeType
 No manual sharding
 Database manages data
movement to scale out – not
the user
 Market’s only memory-to-
memory database replication
across clusters and geos
 Provides disaster recover /
data locality
 Hugely simplifies management
of clusters
 Easy to scale clusters by adding
any number of nodes
©2015 Couchbase Inc. 8
24x365
Couchbase delivers always-on availability
8
High
Availability
Disaster
Recovery
Backup &
Restore
 In-memory replication with
manual or automatic fail over
 Rack-zone awareness to
minimize data unavailability
 Memory-to-memory cross
cluster replication across data
centers or geos
 Active-active topology with bi-
directional setup
 Full backup or Incremental
backup with online restore
 Delta node catch-ups for faster
recovery after failures
©2015 Couchbase Inc. 9
Simplified administration for exceptional ease of use
Online upgrades and
operations
Built-in enterprise
class admin console
RestfulAPIs
 Online software, hardware and
DB upgrades
 Indexing, compaction,
rebalance, backup & restore
 Perform all administrative
tasks with the click of a button
 Monitor status of the system
visual at cluster level, database
level, server level
 All admin operations available
via UI, REST APIs or CLI
commands
 Integrate third party
monitoring tools easily using
REST
©2015 Couchbase Inc. 10
Couchbase Server Architecture
Single-node type means easier
administration and scaling
 Single installation
 Two major components/processes:
Data manager cluster manager
 Data manager:
 C/C++
 Layer consolidation of caching and
persistence
 Cluster manager:
 Erlang/OTP
 Administration UI’s
 Out-of-band for data requests
©2015 Couchbase Inc. 11
APPLICATION SERVER
MANAGED CACHE
DISK
DISK
QUEUE
REPLICATION
QUEUE
Write Operation
11
DOC 1
DOC 1DOC 1
Single-node type means easier
administration and scaling
 Writes are async by default
 Application gets
acknowledgement when
successfully in RAM and can trade-
off waiting for replication or
persistence per-write
 Replication to 1, 2 or 3 other nodes
 Replication is RAM-based so
extremely fast
 Off-node replication is primary
level of HA
 Disk written to as fast as possible –
no waiting
©2015 Couchbase Inc. 12
ACTIVE ACTIVE ACTIVE
REPLICA REPLICA REPLICA
Couchbase Server 1 Couchbase Server 2 Couchbase Server 3
Basic Operation
12
SHARD
5
SHARD
2
SHARD
9
SHARD SHARD SHARD
SHARD
4
SHARD
7
SHARD
8
SHARD SHARD SHARD
SHARD
1
SHARD
3
SHARD
6
SHARD SHARD SHARD
SHARD
4
SHARD
1
SHARD
8
SHARD SHARD SHARD
SHARD
6
SHARD
3
SHARD
2
SHARD SHARD SHARD
SHARD
7
SHARD
9
SHARD
5
SHARD SHARD SHARD
Application has single logical connection
to cluster (client object)
• Data is automatically sharded resulting in even
document data distribution across cluster
• Each vbucket replicated 1, 2 or 3 times (“peer-to-peer”
replication)
• Docs are automatically hashed by the client to a shard
• Cluster map provides location of which server a shard
is on
• Every read/write/update/delete goes to same node for
a given key
• Strongly consistent data access (“read your own
writes”)
• A single Couchbase node can achieve 100k’s ops/sec so
no need to scale reads
©2015 Couchbase Inc. 13
Cache Ejection
13
APPLICATION SERVER
MANAGED CACHE
DISK
DISK
QUEUE
REPLICATION
QUEUE
DOC 1
DOC 2DOC 3DOC 4DOC 5
DOC 1
DOC 2 DOC 3 DOC 4 DOC 5
Single-node type means
easier administration and
scaling
 Layer consolidation means read
through and write through cache
 Couchbase automatically removes
data that has already been
persisted from RAM
©2015 Couchbase Inc. 14
APPLICATION SERVER
MANAGED CACHE
DISK
DISK
QUEUE
REPLICATION
QUEUE
DOC 1
Cache Miss
14
DOC 2 DOC 3 DOC 4 DOC 5
DOC 2 DOC 3 DOC 4 DOC 5
GET
DOC 1
DOC 1
DOC 1
Single-node type means
easier administration and
scaling
 Layer consolidation means 1
single interface for App to talk to
and get its data back as fast as
possible
 Separation of cache and disk
allows for fastest access out of
RAM while pulling data from disk
in parallel
©2015 Couchbase Inc. 15
Add Nodes to Cluster
15
ACTIVE ACTIVE ACTIVE
REPLICA REPLICA REPLICA
Couchbase Server 1 Couchbase Server 2 Couchbase Server 3
ACTIVE ACTIVE
REPLICA REPLICA
Couchbase Server 4 Couchbase Server 5
SHARD
5
SHARD
2
SHARD SHARD
SHARD
4
SHARD SHARD
SHARD
1
SHARD
3
SHARD SHARD
SHARD
4
SHARD
1
SHARD
8
SHARD SHARD SHARD
SHARD
6
SHARD
3
SHARD
2
SHARD SHARD SHARD
SHARD
7
SHARD
9
SHARD
5
SHARD SHARD SHARD
SHARD
7
SHARD
SHARD
6
SHARD
SHARD
8
SHARD
9
SHARD
READ/WRITE/UPDATE
Application has single
logical connection to
cluster (client object)
 Multiple nodes added or
removed at once
 One-click operation
 Incremental movement of
active and replica vbuckets
and data
 Client library updated via
cluster map
 Fully online operation, no
downtime or loss of
performance
©2015 Couchbase Inc. 16
Node Unresponsive / Lost
©2015 Couchbase Inc. 17
Fail Over Node
17
ACTIVE ACTIVE ACTIVE
REPLICA REPLICA REPLICA
Couchbase Server 1 Couchbase Server 2 Couchbase Server 3
ACTIVE ACTIVE
REPLICA REPLICA
Couchbase Server 4 Couchbase Server 5
SHARD
5
SHARD
2
SHARD SHARD
SHARD
4
SHARD SHARD
SHARD
1
SHARD
3
SHARD SHARD
SHARD
4
SHARD
1
SHARD
8
SHARD SHARD
SHARDSHARD
6
SHARD
2
SHARD SHARD SHARD
SHARD
7
SHARD
9
SHARD
5
SHARD SHARD
SHARD
SHARD
7
SHARD
SHARD
6
SHARDSHARD
8
SHARD
9
SHARD
SHARD
3
SHARD
1
SHARD
3
SHARD
Application has single
logical connection to
cluster (client object)
 When node goes down,
some requests will fail
 Failover is either automatic
or manual`
 Client library is
automatically updated via
cluster map
 Replicas not recreated to
preserve stability
 Best practice to replace
node and rebalance
Demo
What about Hadoop?
©2015 Couchbase Inc. 20
Big Data = Operational + Analytic (NoSQL + Hadoop)
20
 Online
 Web/Mobile/IoT apps
 Millions of
customers/consumers
 Offline
 Analytics apps
 Hundreds of business analysts
COMPLEX
EVENT PROCESSING
Real Time
REPOSITORY
PERPETUAL
STORE
ANALYTICAL
DB
BUSINESS
INTELLIGENCE
MONITORING
CHAT/VOICE
SYSTEM
BATCH
TRACK
REAL-TIME
TRACK
DASHBOARD
TRACKING and
COLLECTION
ANALYSIS AND
VISUALIZATION
REST FILTER METRICS
©2015 Couchbase Inc. 23
Apache Spark:The Big Picture
©2015 Couchbase Inc. 24
Apache Spark
… is a fast and general purpose engine for small and large scale data
processing …
©2015 Couchbase Inc. 25
Components: Spark Core
Resilient Distributed Datasets
Clustering
Execution
©2015 Couchbase Inc. 26
Components: Spark SQL
Structured through DataFrames
Distributed querying with SQL
©2015 Couchbase Inc. 27
Components: Spark Streaming
Fault-tolerant streaming applications
©2015 Couchbase Inc. 28
Components: Spark MLib
Built-In Machine Learning Algorithms
©2015 Couchbase Inc. 29
Components: Spark GraphX
Graph processing and graph-parallel
computations
©2015 Couchbase Inc. 30
How does it work?
Source: http://spark.apache.org/docs/latest/cluster-overview.html
©2015 Couchbase Inc. 31
Spark Benefits
 Linearly scalable to 1000+ worker nodes
 Simpler to use than Hadoop MR
 Only partial recompute on failure
 For developers and data scientists
– machine learning
– R integration
 Tight but not mandatory Hadoop integration
– Sources, Sinks
– Scheduler
©2015 Couchbase Inc. 32
Spark vs Hadoop
 Spark is RAM while Hadoop is mainly HDFS (disk) bound
 Fully compatible with Hadoop Input/Output
 Easier to develop against thanks to functional composition
 Hadoop certainly more mature, but Spark ecosystem growing fast
©2015 Couchbase Inc. 33
Couchbase in the Spark Landscape
 Transparent generation and persistence of
– RDDs
– DataFrames
– Dstreams
 Spark SQL and N1QL are a natural fit
 Linearly scale your data and application layer
 Share data between SparkApplications
The perfect storage companion for your spark applications.
Source: http://spark.apache.org/docs/latest/cluster-overview.html
©2015 Couchbase Inc. 34
Cluster Communication
STORAGE
Couchbase Server 1
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed
Cache
Cluster
Manager
Cluster
Manager
Managed
Cache
Storage
Data Service
Index Service
Query Service STORAGE
Couchbase Server 2
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed
Cache
Cluster
Manager
Cluster
Manager
Managed
Cache
Storage
Data Service
Index Service
Query Service STORAGE
Couchbase Server 3
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed
Cache
Cluster
Manager
Cluster
Manager
Managed
Cache
Storage
Data Service
Index Service
Query Service STORAGE
Couchbase Server 4
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed
Cache
Cluster
Manager
Cluster
Manager
Managed
Cache
Storage
Data Service
Index Service
Query Service STORAGE
Couchbase Server 5
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed
Cache
Cluster
Manager
Cluster
Manager
Managed
Cache
Storage
Data Service
Index Service
Query Service STORAGE
Couchbase Server 6
SHARD
7
SHARD
9
SHARD
5
SHARDSHARDSHARD
Managed
Cache
Cluster
Manager
Cluster
Manager
Managed
Cache
Storage
Data Service
Index Service
Query Service
Spark Worker Spark Worker
©2015 Couchbase Inc. 35
Ecosystem Flexibility
RDBMS
Streams
Web APIs
DCP
KV
N1QL
Views
Batching
Data Archive
OLTP Data
©2015 Couchbase Inc. 36
Infrastructure Consolidation
©2015 Couchbase Inc. 37
The Connector
©2015 Couchbase Inc. 38
Couchbase Connector
 Spark Core
– Automatic Cluster and Resource Management
– Creating and Persisting RDDs
– Java APIs in addition to Scala
 Spark SQL
– Easy JSON handling and querying
– Tight N1QL Integration
 Spark Streaming
– Persisting DStreams
– DCP source (experimental)
©2015 Couchbase Inc. 39
Facts
 CurrentVersion: 1.0.0-beta
 Code: https://github.com/couchbaselabs/couchbase-spark-connector
 Docs until GA:
http://developer.couchbase.com/documentation/server/4.0/connectors/spark
-1.0/spark-intro.html
©2015 Couchbase Inc. 40
Connection Management
©2015 Couchbase Inc. 41
Connection Management
©2015 Couchbase Inc. 42
Creating RDDs
©2015 Couchbase Inc. 43
Persisting RDDs
©2015 Couchbase Inc. 44
Spark SQL Integration
©2015 Couchbase Inc. 45
Spark Streaming with DCP
©2015 Couchbase Inc. 46
What‘s next?
©2015 Couchbase Inc. 47
Couchbase Connector
 Learn More:
– Couchbase and Spark at Couchbase Connect 2015:
http://connect15.couchbase.com/agenda/spark-couchbase-electrify-data-processing/
 1.1.0 plans
– Upgrade to Spark 1.5
– Stabilize DCP Support
– Extend, Optimze, Fix bugs…
 We need your feedback!

Mais conteúdo relacionado

Mais procurados

Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Data Con LA
 
Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data
Pactera_US
 
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
Lucas Jellema
 

Mais procurados (20)

Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
 
Data Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data EngineeringData Engineer's Lunch #55: Get Started in Data Engineering
Data Engineer's Lunch #55: Get Started in Data Engineering
 
Future of data visualization
Future of data visualizationFuture of data visualization
Future of data visualization
 
IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015IEEE International Conference on Data Engineering 2015
IEEE International Conference on Data Engineering 2015
 
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice MachineSpark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
 
Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data Using Visualization to Succeed with Big Data
Using Visualization to Succeed with Big Data
 
Solr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for HadoopSolr + Hadoop: Interactive Search for Hadoop
Solr + Hadoop: Interactive Search for Hadoop
 
Splice Machine Overview
Splice Machine OverviewSplice Machine Overview
Splice Machine Overview
 
Powering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta LakePowering Interactive BI Analytics with Presto and Delta Lake
Powering Interactive BI Analytics with Presto and Delta Lake
 
Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale Time-oriented event search. A new level of scale
Time-oriented event search. A new level of scale
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Azure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep DiveAzure Data Lake Analytics Deep Dive
Azure Data Lake Analytics Deep Dive
 
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca SartoriCCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
CCI2017 - Considerations for Migrating Databases to Azure - Gianluca Sartori
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
Designing and Implementing a Real-time Data Lake with Dynamically Changing Sc...
 
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
What's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and BeyondWhat's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and Beyond
 

Destaque

yildiz acemoglu
yildiz acemogluyildiz acemoglu
yildiz acemoglu
yzturan
 
Week11.Pre
Week11.PreWeek11.Pre
Week11.Pre
s1150188
 
eTwiningový maraton - ZS Nemsova
eTwiningový maraton - ZS NemsovaeTwiningový maraton - ZS Nemsova
eTwiningový maraton - ZS Nemsova
ivanabrabcova
 
Was sagen die Sagen ... von unserer region
Was sagen die Sagen ... von unserer regionWas sagen die Sagen ... von unserer region
Was sagen die Sagen ... von unserer region
ivanabrabcova
 
Birthday presantation
Birthday presantationBirthday presantation
Birthday presantation
ivanabrabcova
 
eTwinningový maraton ZŠ Gen. Píky, Ostrava
eTwinningový maraton ZŠ Gen. Píky, OstravaeTwinningový maraton ZŠ Gen. Píky, Ostrava
eTwinningový maraton ZŠ Gen. Píky, Ostrava
ivanabrabcova
 
E twinning plus contact seminar (1)
E twinning plus contact seminar (1)E twinning plus contact seminar (1)
E twinning plus contact seminar (1)
ivanabrabcova
 

Destaque (20)

Elasticsearch 2014/04/21 勉強会資料 「Couchbase と Elasticsearch が手を結んだら」
Elasticsearch 2014/04/21 勉強会資料 「Couchbase と Elasticsearch が手を結んだら」Elasticsearch 2014/04/21 勉強会資料 「Couchbase と Elasticsearch が手を結んだら」
Elasticsearch 2014/04/21 勉強会資料 「Couchbase と Elasticsearch が手を結んだら」
 
Andalusia 1
Andalusia 1Andalusia 1
Andalusia 1
 
assignment 2
assignment 2assignment 2
assignment 2
 
Bt pusher bluetooth marketing software system user guide
Bt pusher bluetooth marketing software system user guideBt pusher bluetooth marketing software system user guide
Bt pusher bluetooth marketing software system user guide
 
yildiz acemoglu
yildiz acemogluyildiz acemoglu
yildiz acemoglu
 
ZŠ a MŠ Brezovica
ZŠ a MŠ BrezovicaZŠ a MŠ Brezovica
ZŠ a MŠ Brezovica
 
Мягкое управление командой проекта
Мягкое управление командой проектаМягкое управление командой проекта
Мягкое управление командой проекта
 
ZŠ a MŠ Nečtiny
ZŠ a MŠ NečtinyZŠ a MŠ Nečtiny
ZŠ a MŠ Nečtiny
 
Week11.Pre
Week11.PreWeek11.Pre
Week11.Pre
 
Startup agile (Ciklum Agile Saturday - Dnipropetrovsk) - in russian
Startup agile (Ciklum Agile Saturday - Dnipropetrovsk) - in russianStartup agile (Ciklum Agile Saturday - Dnipropetrovsk) - in russian
Startup agile (Ciklum Agile Saturday - Dnipropetrovsk) - in russian
 
eTwiningový maraton - ZS Nemsova
eTwiningový maraton - ZS NemsovaeTwiningový maraton - ZS Nemsova
eTwiningový maraton - ZS Nemsova
 
Save antarctica
Save antarcticaSave antarctica
Save antarctica
 
Look at the clock flashcards
Look at the clock flashcardsLook at the clock flashcards
Look at the clock flashcards
 
Was sagen die Sagen ... von unserer region
Was sagen die Sagen ... von unserer regionWas sagen die Sagen ... von unserer region
Was sagen die Sagen ... von unserer region
 
Chapter 4 english government
Chapter 4 english governmentChapter 4 english government
Chapter 4 english government
 
Vuorovaikutteinen viestintä ja merkityksien luominen (Sitran Maamerkit-ohjelma)
Vuorovaikutteinen viestintä ja merkityksien luominen (Sitran Maamerkit-ohjelma)Vuorovaikutteinen viestintä ja merkityksien luominen (Sitran Maamerkit-ohjelma)
Vuorovaikutteinen viestintä ja merkityksien luominen (Sitran Maamerkit-ohjelma)
 
Birthday presantation
Birthday presantationBirthday presantation
Birthday presantation
 
Pascale Perry - #smib10 Presentation
Pascale Perry - #smib10 Presentation Pascale Perry - #smib10 Presentation
Pascale Perry - #smib10 Presentation
 
eTwinningový maraton ZŠ Gen. Píky, Ostrava
eTwinningový maraton ZŠ Gen. Píky, OstravaeTwinningový maraton ZŠ Gen. Píky, Ostrava
eTwinningový maraton ZŠ Gen. Píky, Ostrava
 
E twinning plus contact seminar (1)
E twinning plus contact seminar (1)E twinning plus contact seminar (1)
E twinning plus contact seminar (1)
 

Semelhante a Couchbase and Apache Spark

CON6492 - Oracle Database Public Cloud Services v1 1
CON6492 - Oracle Database Public Cloud Services v1 1CON6492 - Oracle Database Public Cloud Services v1 1
CON6492 - Oracle Database Public Cloud Services v1 1
David van Schalkwyk
 
Backroll: Production Grade KVM Backup Solution Integrated in CloudStack
Backroll: Production Grade KVM Backup Solution Integrated in CloudStackBackroll: Production Grade KVM Backup Solution Integrated in CloudStack
Backroll: Production Grade KVM Backup Solution Integrated in CloudStack
ShapeBlue
 

Semelhante a Couchbase and Apache Spark (20)

Real Time Streaming with Flink & Couchbase
Real Time Streaming with Flink & CouchbaseReal Time Streaming with Flink & Couchbase
Real Time Streaming with Flink & Couchbase
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It! Boost Performance with Scala – Learn From Those Who’ve Done It!
Boost Performance with Scala – Learn From Those Who’ve Done It!
 
Couchbase Day
Couchbase DayCouchbase Day
Couchbase Day
 
IBM - Introduction to Cloudant
IBM - Introduction to CloudantIBM - Introduction to Cloudant
IBM - Introduction to Cloudant
 
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part20812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
0812 2014 01_toronto-smac meetup_i_os_cloudant_worklight_part2
 
CON6492 - Oracle Database Public Cloud Services v1 1
CON6492 - Oracle Database Public Cloud Services v1 1CON6492 - Oracle Database Public Cloud Services v1 1
CON6492 - Oracle Database Public Cloud Services v1 1
 
Best Practices for Building Hybrid-Cloud Architectures | Hans Jespersen
Best Practices for Building Hybrid-Cloud Architectures | Hans JespersenBest Practices for Building Hybrid-Cloud Architectures | Hans Jespersen
Best Practices for Building Hybrid-Cloud Architectures | Hans Jespersen
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
Discover the all new Mesosphere DC/OS 1.10
Discover the all new Mesosphere DC/OS 1.10Discover the all new Mesosphere DC/OS 1.10
Discover the all new Mesosphere DC/OS 1.10
 
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed ServiceCloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
 
.NET Cloud-Native Bootcamp- Los Angeles
.NET Cloud-Native Bootcamp- Los Angeles.NET Cloud-Native Bootcamp- Los Angeles
.NET Cloud-Native Bootcamp- Los Angeles
 
Confluent & Attunity: Mainframe Data Modern Analytics
Confluent & Attunity: Mainframe Data Modern AnalyticsConfluent & Attunity: Mainframe Data Modern Analytics
Confluent & Attunity: Mainframe Data Modern Analytics
 
Backroll: Production Grade KVM Backup Solution Integrated in CloudStack
Backroll: Production Grade KVM Backup Solution Integrated in CloudStackBackroll: Production Grade KVM Backup Solution Integrated in CloudStack
Backroll: Production Grade KVM Backup Solution Integrated in CloudStack
 
DBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructureDBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructure
 
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ? Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
 
Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...
Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...
Modernizing Applications with Microservices and DC/OS (Lightbend/Mesosphere c...
 
Under the Hood - Couchbase Server Architecture - June 2015
Under the Hood - Couchbase Server Architecture - June 2015Under the Hood - Couchbase Server Architecture - June 2015
Under the Hood - Couchbase Server Architecture - June 2015
 
Comprehensive and Simplified Management for VMware vSphere Environments - now...
Comprehensive and Simplified Management for VMware vSphere Environments - now...Comprehensive and Simplified Management for VMware vSphere Environments - now...
Comprehensive and Simplified Management for VMware vSphere Environments - now...
 

Último

Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Último (20)

Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 

Couchbase and Apache Spark

  • 1. Couchbase and Apache Spark efficient data crunching in a fast moving world
  • 2. ©2015 Couchbase Inc. 2 Matt Ingenthron Worked on large site scalability problems at previous company… memcached contributor Joined Couchbase very early and helped define key parts of system
  • 4. ©2015 Couchbase Inc. 4 Couchbase is a Document Oriented Database High availability cache Key-value store Document database Embedded database Sync management Couchbase can be used a number of ways. Developers often need a simple distributed hashtable, then grow to need secondary indexing and are either mobile-first or need to address mobile deployment.
  • 5. ©2015 Couchbase Inc. 5 What makes Couchbase unique? 5 Performance & scalability leader Sub millisecond latency with high throughput; memory-centric architecture Multi- purpose Simplified administration Easy to deploy & manage; integrated Admin Console, single- click cluster expansion & rebalance Cache, key value store, document database, and local/mobile database in single platform Always-on availability Data replication across nodes, clusters, and data centers Enterprises choose Couchbase for several key advantages 24x365
  • 6. ©2015 Couchbase Inc. 6  Consolidated cache and database  Tune memory required based on application requirements Multi-purpose database supports many uses 6 6 Tunable built-in cache Flexible schemas with JSON Couchbase Lite  Represent data with varying schemas using JSON on the server or on the device  Index and query data with Javascript views  Light weight embedded DB for always available apps  Sync Gateway syncs data seamlessly with Couchbase Server
  • 7. ©2015 Couchbase Inc. 7 Couchbase leads in performance and scalability Auto Sharding Memory-memory XDCR Single NodeType  No manual sharding  Database manages data movement to scale out – not the user  Market’s only memory-to- memory database replication across clusters and geos  Provides disaster recover / data locality  Hugely simplifies management of clusters  Easy to scale clusters by adding any number of nodes
  • 8. ©2015 Couchbase Inc. 8 24x365 Couchbase delivers always-on availability 8 High Availability Disaster Recovery Backup & Restore  In-memory replication with manual or automatic fail over  Rack-zone awareness to minimize data unavailability  Memory-to-memory cross cluster replication across data centers or geos  Active-active topology with bi- directional setup  Full backup or Incremental backup with online restore  Delta node catch-ups for faster recovery after failures
  • 9. ©2015 Couchbase Inc. 9 Simplified administration for exceptional ease of use Online upgrades and operations Built-in enterprise class admin console RestfulAPIs  Online software, hardware and DB upgrades  Indexing, compaction, rebalance, backup & restore  Perform all administrative tasks with the click of a button  Monitor status of the system visual at cluster level, database level, server level  All admin operations available via UI, REST APIs or CLI commands  Integrate third party monitoring tools easily using REST
  • 10. ©2015 Couchbase Inc. 10 Couchbase Server Architecture Single-node type means easier administration and scaling  Single installation  Two major components/processes: Data manager cluster manager  Data manager:  C/C++  Layer consolidation of caching and persistence  Cluster manager:  Erlang/OTP  Administration UI’s  Out-of-band for data requests
  • 11. ©2015 Couchbase Inc. 11 APPLICATION SERVER MANAGED CACHE DISK DISK QUEUE REPLICATION QUEUE Write Operation 11 DOC 1 DOC 1DOC 1 Single-node type means easier administration and scaling  Writes are async by default  Application gets acknowledgement when successfully in RAM and can trade- off waiting for replication or persistence per-write  Replication to 1, 2 or 3 other nodes  Replication is RAM-based so extremely fast  Off-node replication is primary level of HA  Disk written to as fast as possible – no waiting
  • 12. ©2015 Couchbase Inc. 12 ACTIVE ACTIVE ACTIVE REPLICA REPLICA REPLICA Couchbase Server 1 Couchbase Server 2 Couchbase Server 3 Basic Operation 12 SHARD 5 SHARD 2 SHARD 9 SHARD SHARD SHARD SHARD 4 SHARD 7 SHARD 8 SHARD SHARD SHARD SHARD 1 SHARD 3 SHARD 6 SHARD SHARD SHARD SHARD 4 SHARD 1 SHARD 8 SHARD SHARD SHARD SHARD 6 SHARD 3 SHARD 2 SHARD SHARD SHARD SHARD 7 SHARD 9 SHARD 5 SHARD SHARD SHARD Application has single logical connection to cluster (client object) • Data is automatically sharded resulting in even document data distribution across cluster • Each vbucket replicated 1, 2 or 3 times (“peer-to-peer” replication) • Docs are automatically hashed by the client to a shard • Cluster map provides location of which server a shard is on • Every read/write/update/delete goes to same node for a given key • Strongly consistent data access (“read your own writes”) • A single Couchbase node can achieve 100k’s ops/sec so no need to scale reads
  • 13. ©2015 Couchbase Inc. 13 Cache Ejection 13 APPLICATION SERVER MANAGED CACHE DISK DISK QUEUE REPLICATION QUEUE DOC 1 DOC 2DOC 3DOC 4DOC 5 DOC 1 DOC 2 DOC 3 DOC 4 DOC 5 Single-node type means easier administration and scaling  Layer consolidation means read through and write through cache  Couchbase automatically removes data that has already been persisted from RAM
  • 14. ©2015 Couchbase Inc. 14 APPLICATION SERVER MANAGED CACHE DISK DISK QUEUE REPLICATION QUEUE DOC 1 Cache Miss 14 DOC 2 DOC 3 DOC 4 DOC 5 DOC 2 DOC 3 DOC 4 DOC 5 GET DOC 1 DOC 1 DOC 1 Single-node type means easier administration and scaling  Layer consolidation means 1 single interface for App to talk to and get its data back as fast as possible  Separation of cache and disk allows for fastest access out of RAM while pulling data from disk in parallel
  • 15. ©2015 Couchbase Inc. 15 Add Nodes to Cluster 15 ACTIVE ACTIVE ACTIVE REPLICA REPLICA REPLICA Couchbase Server 1 Couchbase Server 2 Couchbase Server 3 ACTIVE ACTIVE REPLICA REPLICA Couchbase Server 4 Couchbase Server 5 SHARD 5 SHARD 2 SHARD SHARD SHARD 4 SHARD SHARD SHARD 1 SHARD 3 SHARD SHARD SHARD 4 SHARD 1 SHARD 8 SHARD SHARD SHARD SHARD 6 SHARD 3 SHARD 2 SHARD SHARD SHARD SHARD 7 SHARD 9 SHARD 5 SHARD SHARD SHARD SHARD 7 SHARD SHARD 6 SHARD SHARD 8 SHARD 9 SHARD READ/WRITE/UPDATE Application has single logical connection to cluster (client object)  Multiple nodes added or removed at once  One-click operation  Incremental movement of active and replica vbuckets and data  Client library updated via cluster map  Fully online operation, no downtime or loss of performance
  • 16. ©2015 Couchbase Inc. 16 Node Unresponsive / Lost
  • 17. ©2015 Couchbase Inc. 17 Fail Over Node 17 ACTIVE ACTIVE ACTIVE REPLICA REPLICA REPLICA Couchbase Server 1 Couchbase Server 2 Couchbase Server 3 ACTIVE ACTIVE REPLICA REPLICA Couchbase Server 4 Couchbase Server 5 SHARD 5 SHARD 2 SHARD SHARD SHARD 4 SHARD SHARD SHARD 1 SHARD 3 SHARD SHARD SHARD 4 SHARD 1 SHARD 8 SHARD SHARD SHARDSHARD 6 SHARD 2 SHARD SHARD SHARD SHARD 7 SHARD 9 SHARD 5 SHARD SHARD SHARD SHARD 7 SHARD SHARD 6 SHARDSHARD 8 SHARD 9 SHARD SHARD 3 SHARD 1 SHARD 3 SHARD Application has single logical connection to cluster (client object)  When node goes down, some requests will fail  Failover is either automatic or manual`  Client library is automatically updated via cluster map  Replicas not recreated to preserve stability  Best practice to replace node and rebalance
  • 18. Demo
  • 20. ©2015 Couchbase Inc. 20 Big Data = Operational + Analytic (NoSQL + Hadoop) 20  Online  Web/Mobile/IoT apps  Millions of customers/consumers  Offline  Analytics apps  Hundreds of business analysts
  • 23. ©2015 Couchbase Inc. 23 Apache Spark:The Big Picture
  • 24. ©2015 Couchbase Inc. 24 Apache Spark … is a fast and general purpose engine for small and large scale data processing …
  • 25. ©2015 Couchbase Inc. 25 Components: Spark Core Resilient Distributed Datasets Clustering Execution
  • 26. ©2015 Couchbase Inc. 26 Components: Spark SQL Structured through DataFrames Distributed querying with SQL
  • 27. ©2015 Couchbase Inc. 27 Components: Spark Streaming Fault-tolerant streaming applications
  • 28. ©2015 Couchbase Inc. 28 Components: Spark MLib Built-In Machine Learning Algorithms
  • 29. ©2015 Couchbase Inc. 29 Components: Spark GraphX Graph processing and graph-parallel computations
  • 30. ©2015 Couchbase Inc. 30 How does it work? Source: http://spark.apache.org/docs/latest/cluster-overview.html
  • 31. ©2015 Couchbase Inc. 31 Spark Benefits  Linearly scalable to 1000+ worker nodes  Simpler to use than Hadoop MR  Only partial recompute on failure  For developers and data scientists – machine learning – R integration  Tight but not mandatory Hadoop integration – Sources, Sinks – Scheduler
  • 32. ©2015 Couchbase Inc. 32 Spark vs Hadoop  Spark is RAM while Hadoop is mainly HDFS (disk) bound  Fully compatible with Hadoop Input/Output  Easier to develop against thanks to functional composition  Hadoop certainly more mature, but Spark ecosystem growing fast
  • 33. ©2015 Couchbase Inc. 33 Couchbase in the Spark Landscape  Transparent generation and persistence of – RDDs – DataFrames – Dstreams  Spark SQL and N1QL are a natural fit  Linearly scale your data and application layer  Share data between SparkApplications The perfect storage companion for your spark applications. Source: http://spark.apache.org/docs/latest/cluster-overview.html
  • 34. ©2015 Couchbase Inc. 34 Cluster Communication STORAGE Couchbase Server 1 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster Manager Cluster Manager Managed Cache Storage Data Service Index Service Query Service STORAGE Couchbase Server 2 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster Manager Cluster Manager Managed Cache Storage Data Service Index Service Query Service STORAGE Couchbase Server 3 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster Manager Cluster Manager Managed Cache Storage Data Service Index Service Query Service STORAGE Couchbase Server 4 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster Manager Cluster Manager Managed Cache Storage Data Service Index Service Query Service STORAGE Couchbase Server 5 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster Manager Cluster Manager Managed Cache Storage Data Service Index Service Query Service STORAGE Couchbase Server 6 SHARD 7 SHARD 9 SHARD 5 SHARDSHARDSHARD Managed Cache Cluster Manager Cluster Manager Managed Cache Storage Data Service Index Service Query Service Spark Worker Spark Worker
  • 35. ©2015 Couchbase Inc. 35 Ecosystem Flexibility RDBMS Streams Web APIs DCP KV N1QL Views Batching Data Archive OLTP Data
  • 36. ©2015 Couchbase Inc. 36 Infrastructure Consolidation
  • 37. ©2015 Couchbase Inc. 37 The Connector
  • 38. ©2015 Couchbase Inc. 38 Couchbase Connector  Spark Core – Automatic Cluster and Resource Management – Creating and Persisting RDDs – Java APIs in addition to Scala  Spark SQL – Easy JSON handling and querying – Tight N1QL Integration  Spark Streaming – Persisting DStreams – DCP source (experimental)
  • 39. ©2015 Couchbase Inc. 39 Facts  CurrentVersion: 1.0.0-beta  Code: https://github.com/couchbaselabs/couchbase-spark-connector  Docs until GA: http://developer.couchbase.com/documentation/server/4.0/connectors/spark -1.0/spark-intro.html
  • 40. ©2015 Couchbase Inc. 40 Connection Management
  • 41. ©2015 Couchbase Inc. 41 Connection Management
  • 42. ©2015 Couchbase Inc. 42 Creating RDDs
  • 43. ©2015 Couchbase Inc. 43 Persisting RDDs
  • 44. ©2015 Couchbase Inc. 44 Spark SQL Integration
  • 45. ©2015 Couchbase Inc. 45 Spark Streaming with DCP
  • 46. ©2015 Couchbase Inc. 46 What‘s next?
  • 47. ©2015 Couchbase Inc. 47 Couchbase Connector  Learn More: – Couchbase and Spark at Couchbase Connect 2015: http://connect15.couchbase.com/agenda/spark-couchbase-electrify-data-processing/  1.1.0 plans – Upgrade to Spark 1.5 – Stabilize DCP Support – Extend, Optimze, Fix bugs…  We need your feedback!

Notas do Editor

  1. Slide 2 – About Me
  2. KEY POINT: COUCHBASE PROVIDES A SET OF MULTI-PURPOSE, CORE CAPABILITIES THAT SUPPORT A BROAD RANGE OF APPLICATIONS AND USE CASES, ALL IN A SINGLE DATA MANAGEMENT PLATFORM. Couchbase provides a set of technology capabilities to support a broad range of applications and use cases: High Availability Cache: Couchbase provides an integrated managed object cache, so you can start out using Couchbase as a high availability cache on top of your existing relational database. For example, you can use Couchbase as a session store in front of your relational database, if your relational DB is struggling to keep up with the load required for online interactive applications. Key-Value Store: Many customers start with Couchbase as a cache and then broaden their usage to other capabilities, like using Couchbase as a Key-Value Store for things like Profile Management. Document Database: From there, you can grow into using Couchbase as a Document Database, where you can do more with capabilities like indexing and Cross Data Center Replication. Embedded Database: Couchbase also provides an embedded database called Couchbase Lite. It’s a purpose-built database for the device, so you can build applications that are always available and always work, whether offline or online. Sync Management: Finally, as part of our solution for mobile applications, we provide Couchbase Sync Gateway, which automatically synchronizes data on the device with Couchbase Server in the cloud so your developer doesn’t have to write code to manage the complex sync process. Starting with cache and then expanding to other capabilities is often a good way to learn the technology and get comfortable with Couchbase for a wider set of use cases.
  3. Couchbase has emerged as a leading NoSQL provider for number of reasons: Best in performance and scalability We’ve engineered Couchbase from the ground up for high performance and scalability Couchbase is designed to deliver sub-millisecond responsiveness with very high throughput for both reads and writes We consistently outperform competitors like MongoDB and DataStax in multiple independent benchmarks Our performance advantage is driven in large part by our memory-centric architecture, which includes an integrated managed object cache and stream-based replication Broad use case support We’re the only NoSQL provider that has consolidated distributed cache, key-value store, and a JSON-based document database in a single platform This means customers can use Couchbase for a much broader range of applications Integrated mobile solution We’re the only vendor that provides an end-to-end NoSQL mobile solution -- allows customers to easily build mobile apps that run great on or offline Includes a JSON database embedded on the device, along with a prebuilt syncing tier So apps run great on the device, even without a network connection or no connectivity at all Data on the device auto-syncs with the backend server when a connection is available Simplified administration We’ve designed Couchbase to be exceptionally easy to deploy and manage Features such as an integrated Admin Console and single-click cluster expansion & rebalance dramatically increase admin efficiency
  4. Each Couchbase node is exactly the same. All nodes are broken down into two components: A data manager (on the left) and a cluster manager (on the right). It’s important to realize that these are separate processes within the system specifically designed so that a node can continue serving its data even in the face of cluster problems like network disruption. The data manager is written in C and C++ and is responsible both for the object caching layer, persistence layer and querying engine. It is based off of memcached and so provides a number of benefits; -The very low lock contention of memcached allows for extremely high throughput and low latencies both to a small set of documents (or just one) as well as across millions of documents -Being compatible with the memcached protocol means we are not only a drop-in replacement, but inherit support for automatic item expiration (TTL), atomic incrementer. -We’ve increased the maximum object size to 20mb, but still recommend keeping them much smaller -Support for both binary objects as well as natively supporting JSON documents -All of the metadata for the documents and their keys is kept in RAM at all times. While this does add a bit of overhead per item, it also allows for extremely fast “miss” speeds which are critical to the operation of some applications….we don’t have to scan a disk to know when we don’t have some data. The cluster manager is based on Erlang/OTP which was developed by Ericsson to deal with managing hundreds or even thousands of distributed telco switches. This component is responsible for configuration, administration, process monitoring, statistics gathering and the UI and REST interface. Note that there is no data manipulation done through this interface.
  5. Now, as you fill up memory (click), some data that has already been written to disk will be ejected from RAM to make room for new data. (click) Couchbase supports holding much more data than you have RAM available. It’s important to size the RAM capacity appropriately for your working set: the portion of data your application is working with at any given point in time and needs very low latency, high throughput access to. In some applications this is the entire data set, in others it is much smaller. As RAM fills up, we use a “not recently used” algorithm to determine the best data to be ejected from cache.
  6. Should a read now come in for one of those documents that has been ejected (click), it is copied back from disk into RAM and sent back to the application. The document then remains in RAM as long as there is space and it is being accessed.
  7. KEY POINTS: BIG DATA IS NOT ONE THING – IT’S A COMBINATION OF OPERATIONAL (NOSQL) AND ANALYTICAL DATABASES. YOU NEED BOTH. COUCHBASE PROVIDES THE OPERATIONAL SOLUTION. Big data has two major pieces: Operational and Analytical Operational is about: Real time Online, interactive Customer/consumer facing Processing data at high velocity Analytical is about: Offline analytics Often batch oriented Takes time processing Directly touches relatively few users (business analysts) These two pieces together form “Big Data” There’s some overlap NoSQL can deliver some analytics Hadoop can deliver some operational But in general each technology designed for separate purposes Couchbase fits on the operational side, Hadoop on the analytics side
  8. The data generated by users is published to Apache Kafka. Next, it’s pulled into Apache Storm for real time analysis and processing as well as into Hadoop. Finally, Storm writes the data to Couchbase Server for real-time access by LivePerson agents while the data in Hadoop is eventually accessed via HP Vertica and MicroStrategy for offline business intelligence and analysis.
  9. The data is first collected by tracking and collection service. Next, Storm pulls the data in for filtering, enrichment, and statistical analysis. The raw data is written to one Couchbase Server cluster while the processed data is written to a separate Couchbase Server cluster. The processed data is access by a front end for visualization and analysis. In addition, the raw data is copied from Couchbase Server to Hadoop. It’s combine with additional data and the whole is moved into HBase for ad hoc analysis. PayPal was able to handle both the volume and the velocity of data as well as meet both operation and analytical requirements. They relied on data capture, stream processing, NoSQL and Hadoop to do so.