SlideShare uma empresa Scribd logo
1 de 34
SPARK AND COUCHBASE:
AUGMENTING THE OPERATIONAL
DATABASE WITH SPARK
Matt Ingenthron
Couchbase
Agenda
Why integrate Spark and NoSQL?
Architectural alignment
Integration “Points of Interest”
 Automatic sharding and data locality
 Streams: Data Replication and Spark Streaming
 Predicate pushdown and global indexing
 Flexible schemas and schema inference
See it in action
Why Spark and NoSQL?
NoSQL + Spark use cases
Operations Analysis
NoSQL
 Recommendations
 Next gen data warehousing
 Predictive analytics
 Fraud detection
 Catalog
 Customer 360 + IOT
 Personalization
 Mobile applications
Big Data at a Glance
Couchbase Spark Hadoop
Use cases
• Operational
• Web / Mobile
• Analytics
• Machine
Learning
• Analytics
• Machine
Learning
Processing
mode
• Online
• Ad Hoc
• Ad Hoc
• Batch
• Streaming (+/-)
• Batch
• Ad Hoc (+/-)
Low latency = < 1 ms ops Seconds Minutes
Performance Highly predictable Variable Variable
Users are
typically…
Millions of
customers
100’s of analysts or
data scientists
100’s of analysts or
data scientists
Memory-centric Memory-centric Disk-centric
Big data = 10s of Terabytes Petabytes Petabytes
ANALYTICALOPERATIONAL
Use Case: Operationalize Analytics / ML
Hadoop
Examples: recommend content and products, spot fraud or spam
Data scientists train machine learning models
Load results into Couchbase so end users can interact with them online
Machine
Learning
Models
Data
Warehouse
Historical Data
NoSQL
Use Case: Operationalize ML
Model
NoSQL
node(e.g.) Training Data
(Observations)
Serving
Predictions
Spark connects to everything…
DCP
KV
N1QL
Views
Adapted from: Databricks – Not Your Father’s Database https://www.brighttalk.com/webcast/12891/196891
Use Case #2: Data Integration
RDBMSs3hdfs
Elasticsearc
h
Data engineers query data in many systems w/ one language &
runtime
Store results where needed for further use
Late binding of schemas
NoSQL
Architectural Alignment
Full Text Search
Search for and fetch
the most relevant
records given a
freeform text string
Key-Value
Directly fetch /
store a particular
record
Query
Specify a set of criteria
to retrieve relevant data
records.
Essential in reporting.
Map-Reduce
Views
Maintain materialized
indexes of data
records, with reduce
functions for
aggregation
Data Streaming
Efficiently, quickly
stream data records to
external systems for
further processing or
integration
Hash Partitioned Data
Auto Sharding – Bucket And vBuckets
A bucket is a logical, unique key space
Each bucket has active & replica data sets
 Each data set has 1024 virtual buckets
(vBuckets)
 Each vBucket contains 1/1024th of the data
set
 vBuckets have no fixed physical server
location
Mapping of vBuckets to physical servers is
called the cluster map
Document IDs (keys) always get hashed to
the same vBucket
Couchbase SDK’s lookup the vBucket 
server mapping
N1QL Query
N1QL, pronounced “nickel”, is a SQL service with extensions specifically for
JSON
 Is stateless execution, however…
 Uses Couchbase’s Global Secondary Indexes.
 These are sorted structures, range partitioned.
 Both can run on any nodes within the cluster. Nodes with differing
services can be added and removed as needed.
MapReduce Couchbase Views
A JavaScript based, incremental Map-Reduce
service for incrementally building sorted B+Trees.
 Runs on every node, local to the data on that node, stored locally.
 Automatically merge-sorted at query time.
Data Streaming with DCP
A general data streaming service, Database Change Protocol.
 Allows for streaming all data out and continuing, or…
 Stream just what is coming in at the time of connection, or…
 Stream everything out for transfer/takeover…
Couchbase from Spark
Key-Value
Direct fetching/storing
of a particular record.
Query
Specifying a set of
criteria to retrieve
relevant data records.
Essential in reporting.
Map-Reduce
Views
Maintain materialized
indexes of data
records, with reduce
functions for
aggregation.
Data Streaming
Efficiently, quickly
stream data records to
external systems for
further processing or
integration.
Full Text Search
Search for, and allow
tuning of the system to
fetch the most relevant
records given a
freeform search string.
Automatic sharding and data locality
Integration Points of Interest
What happens in Spark Couchbase KV
When 1 Spark node per CB node, the connector will use the cluster map and
push down location hints
 Helpful for situations where processing is intense, like transformation
 Uses pipeline IO optimization
However, not available for N1QL or Views
 Round robin - can’t give location hints
 Back end is scatter gather with 1 node responding
Predicate pushdown and global indexing
Integration Points of Interest
SparkSQL on
N1QL with Global Secondary Indexes
TableScan
Scan all of the data and return it
PrunedScan
Scan an index that matches only relevant
data to the query at hand.
PrunedFilteredScan
Scan an index that matches only relevant
data to the query at hand.
Couchbase Cluster
B-Tree
(Global Secondary Index)
Data
Data
Data
Data
Query
Application
Client
Couchbase Cluster
B-Tree
(Global Secondary Index)
Data
Data
Data
Data
Query
Application
Client
Predicate pushdown
Predicate pushdown
Notes from implementing:
Spark assumes it’s getting
all the data, applies the predicates
Future potential optimizations
Push down all the things!
 Aggregations
 JOINs
Looking at Catalyst engine extensions from SAP
 But, it’s not backward compatible and…
 …many data sources can only push down filters
image courtesy http://allthefreethings.com/about/
Streams: Data Replication and Spark
Streaming
Integration Points of Interest
DCP and Spark Streaming
Many system architectures rely upon streaming from the ‘operational’ data
store to other systems
 Lambda architecture => store everything and
process/reprocess everything based on access
 Command Query Responsibility Segregation - (CQRS)
 Other reactive pattern derived systems and frameworks
Documents flow into the
system from outside
Documents are then
streamed down to consumers
In most common cases, flows
memory to memory
DCP and Spark Streaming
Couchbase Node
DCP
Consumer
Other Cluster
Nodes
Spark Cluster
See it in Action
Couchbase Spark Connector 1.2
• Spark 1.6 support, including Datasets
• Full DCP flow control support
• Enhanced Java APIs
• Bug fixes
34
Questions?
THANK YOU.
@ingenthr
Try Couchbase Spark Connector 1.2
http://www.couchbase.com/bigdata
Additional Information
Deployment Topology
Couchbase
Spark
Worker
Couchbase
Couchbase
 Many small gets
 Streaming with low
mutation rate
 Ad hoc
Couchbase
Data Node
Data Node
Spark
Worker
Couchbase
Couchbase
 Medium processing
 Predictable
workloads
 Plenty of overhead
on machines
Couchbase
Couchbase
Couchbase
Couchbase
Couchbase
Couchbase
Data Node
Data Node
Spark
Worker
XDCR
 Heaviest processing
 Workload isolation
writes

Mais conteúdo relacionado

Mais procurados

Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Sergio Zenatti Filho
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsWaqas Idrees
 
Disrupting Big Data with Apache Spark in the Cloud
Disrupting Big Data with Apache Spark in the CloudDisrupting Big Data with Apache Spark in the Cloud
Disrupting Big Data with Apache Spark in the CloudJen Aman
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsPat Patterson
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...Spark Summit
 
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive MetastoreOracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive MetastoreDataWorks Summit
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseDataStax
 
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...DataStax Academy
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannDatabricks
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesQubole
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAlex Tumanoff
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the CloudDataWorks Summit
 
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleEbooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleVasu S
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for DummiesRodney Joyce
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Michael Rys
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure DatabricksJames Serra
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightDataWorks Summit
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeBizTalk360
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Con LA
 

Mais procurados (20)

Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory Unleash the power of Azure Data Factory
Unleash the power of Azure Data Factory
 
Azure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake AnalyticsAzure Data Lake and Azure Data Lake Analytics
Azure Data Lake and Azure Data Lake Analytics
 
Disrupting Big Data with Apache Spark in the Cloud
Disrupting Big Data with Apache Spark in the CloudDisrupting Big Data with Apache Spark in the Cloud
Disrupting Big Data with Apache Spark in the Cloud
 
Building Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSetsBuilding Data Pipelines with Spark and StreamSets
Building Data Pipelines with Spark and StreamSets
 
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
 
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive MetastoreOracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
 
What's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and BeyondWhat's new in SQL on Hadoop and Beyond
What's new in SQL on Hadoop and Beyond
 
Data Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax EnterpriseData Pipelines with Spark & DataStax Enterprise
Data Pipelines with Spark & DataStax Enterprise
 
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
Microsoft: Building a Massively Scalable System with DataStax and Microsoft's...
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
 
Atlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slidesAtlanta Data Science Meetup | Qubole slides
Atlanta Data Science Meetup | Qubole slides
 
Azure data bricks by Eugene Polonichko
Azure data bricks by Eugene PolonichkoAzure data bricks by Eugene Polonichko
Azure data bricks by Eugene Polonichko
 
Built-In Security for the Cloud
Built-In Security for the CloudBuilt-In Security for the Cloud
Built-In Security for the Cloud
 
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleEbooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
 
Databricks for Dummies
Databricks for DummiesDatabricks for Dummies
Databricks for Dummies
 
Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...Running cost effective big data workloads with Azure Synapse and Azure Data L...
Running cost effective big data workloads with Azure Synapse and Azure Data L...
 
Introduction to Azure Databricks
Introduction to Azure DatabricksIntroduction to Azure Databricks
Introduction to Azure Databricks
 
Build Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsightBuild Big Data Enterprise solutions faster on Azure HDInsight
Build Big Data Enterprise solutions faster on Azure HDInsight
 
Analyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data LakeAnalyzing StackExchange data with Azure Data Lake
Analyzing StackExchange data with Azure Data Lake
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 

Destaque

Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...
Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...
Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...Data Con LA
 
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...Data Con LA
 
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...Data Con LA
 
Domopalooza Major Domo Session
Domopalooza Major Domo SessionDomopalooza Major Domo Session
Domopalooza Major Domo SessionDavid Ramos
 
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...Data Con LA
 
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...Data Con LA
 
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...Data Con LA
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Data Con LA
 
Big Data Day LA 2016/ NoSQL track - Big Data and Real Estate, Jon Zifcak, CEO...
Big Data Day LA 2016/ NoSQL track - Big Data and Real Estate, Jon Zifcak, CEO...Big Data Day LA 2016/ NoSQL track - Big Data and Real Estate, Jon Zifcak, CEO...
Big Data Day LA 2016/ NoSQL track - Big Data and Real Estate, Jon Zifcak, CEO...Data Con LA
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Data Con LA
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Data Con LA
 
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...Data Con LA
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Data Con LA
 
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...Data Con LA
 
Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...
Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...
Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...Data Con LA
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Data Con LA
 
Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...
Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...
Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...Data Con LA
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...Data Con LA
 
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Data Con LA
 
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...Data Con LA
 

Destaque (20)

Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...
Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...
Big Data Day LA 2016/ Use Case Driven track - BI is broken, Dave Fryer, Produ...
 
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
 
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
 
Domopalooza Major Domo Session
Domopalooza Major Domo SessionDomopalooza Major Domo Session
Domopalooza Major Domo Session
 
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
 
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
Big Data Day LA 2016/ NoSQL track - Introduction to Graph Databases, Oren Gol...
 
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
Big Data Day LA 2016/ Use Case Driven track - The Encyclopedia of World Probl...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
 
Big Data Day LA 2016/ NoSQL track - Big Data and Real Estate, Jon Zifcak, CEO...
Big Data Day LA 2016/ NoSQL track - Big Data and Real Estate, Jon Zifcak, CEO...Big Data Day LA 2016/ NoSQL track - Big Data and Real Estate, Jon Zifcak, CEO...
Big Data Day LA 2016/ NoSQL track - Big Data and Real Estate, Jon Zifcak, CEO...
 
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
Big Data Day LA 2016/ NoSQL track - Architecting Real Life IoT Architecture, ...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Why is my Hadoop cluster s...
 
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
Big Data Day LA 2016/ NoSQL track - MongoDB 3.2 Goodness!!!, Mark Helmstetter...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Deep Learning at Scale - A...
 
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
Big Data Day LA 2016/ Use Case Driven track - Reliable Media Reporting in an ...
 
Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...
Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...
Big Data Day LA 2016/ NoSQL track - Privacy vs. Security in a Big Data World,...
 
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
Big Data Day LA 2016/ Big Data Track - Portable Stream and Batch Processing w...
 
Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...
Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...
Big Data Day LA 2016/ Use Case Driven track - Data and Hollywood: "Je t'Aime ...
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Panel - Interactive Applic...
 
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
Big Data Day LA 2016/ NoSQL track - Analytics at the Speed of Light with Redi...
 
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
 

Semelhante a Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Operational Database With Spark, Matt Ingenthron, Senior Director, Engineering, Couchbase

Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark Summit
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)James Serra
 
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdfDustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdfDustin Vannoy
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Helena Edelson
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureLuan Moreno Medeiros Maciel
 
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikKeeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikHostedbyConfluent
 
Spark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael NitschingerSpark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael NitschingerSpark Summit
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDBI Goo Lee
 
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...DB Tsai
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentationargonauts007
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkCloudera, Inc.
 
Data Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCData Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCMark Smith
 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetHostedbyConfluent
 
High performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataHigh performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataCarlos Andrés García
 
High performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataHigh performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataVMware Tanzu
 
NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013Facundo Farias
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Databricks
 

Semelhante a Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Operational Database With Spark, Matt Ingenthron, Senior Director, Engineering, Couchbase (20)

Spark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with SparkSpark and Couchbase: Augmenting the Operational Database with Spark
Spark and Couchbase: Augmenting the Operational Database with Spark
 
Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)Azure Synapse Analytics Overview (r2)
Azure Synapse Analytics Overview (r2)
 
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdfDustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
DustinVannoy_DataPipelines_AzureDataConf_Dec22.pdf
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
 
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft AzureOtimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
Otimizações de Projetos de Big Data, Dw e AI no Microsoft Azure
 
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikKeeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
 
Spark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael NitschingerSpark Summit EU talk by Michael Nitschinger
Spark Summit EU talk by Michael Nitschinger
 
Introduction to ClustrixDB
Introduction to ClustrixDBIntroduction to ClustrixDB
Introduction to ClustrixDB
 
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
 
Kylin and Druid Presentation
Kylin and Druid PresentationKylin and Druid Presentation
Kylin and Druid Presentation
 
The Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache SparkThe Future of Hadoop: A deeper look at Apache Spark
The Future of Hadoop: A deeper look at Apache Spark
 
Glint with Apache Spark
Glint with Apache SparkGlint with Apache Spark
Glint with Apache Spark
 
Data Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKCData Pipeline for The Big Data/Data Science OKC
Data Pipeline for The Big Data/Data Science OKC
 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
 
High performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataHigh performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyData
 
High performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyDataHigh performance Spark distribution on PKS by SnappyData
High performance Spark distribution on PKS by SnappyData
 
Couch db
Couch dbCouch db
Couch db
 
NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
 
Dev Ops Training
Dev Ops TrainingDev Ops Training
Dev Ops Training
 

Mais de Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

Mais de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Último

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Último (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Big Data Day LA 2016/ NoSQL track - Spark And Couchbase: Augmenting The Operational Database With Spark, Matt Ingenthron, Senior Director, Engineering, Couchbase

  • 1. SPARK AND COUCHBASE: AUGMENTING THE OPERATIONAL DATABASE WITH SPARK Matt Ingenthron Couchbase
  • 2. Agenda Why integrate Spark and NoSQL? Architectural alignment Integration “Points of Interest”  Automatic sharding and data locality  Streams: Data Replication and Spark Streaming  Predicate pushdown and global indexing  Flexible schemas and schema inference See it in action
  • 3. Why Spark and NoSQL?
  • 4. NoSQL + Spark use cases Operations Analysis NoSQL  Recommendations  Next gen data warehousing  Predictive analytics  Fraud detection  Catalog  Customer 360 + IOT  Personalization  Mobile applications
  • 5. Big Data at a Glance Couchbase Spark Hadoop Use cases • Operational • Web / Mobile • Analytics • Machine Learning • Analytics • Machine Learning Processing mode • Online • Ad Hoc • Ad Hoc • Batch • Streaming (+/-) • Batch • Ad Hoc (+/-) Low latency = < 1 ms ops Seconds Minutes Performance Highly predictable Variable Variable Users are typically… Millions of customers 100’s of analysts or data scientists 100’s of analysts or data scientists Memory-centric Memory-centric Disk-centric Big data = 10s of Terabytes Petabytes Petabytes ANALYTICALOPERATIONAL
  • 6. Use Case: Operationalize Analytics / ML Hadoop Examples: recommend content and products, spot fraud or spam Data scientists train machine learning models Load results into Couchbase so end users can interact with them online Machine Learning Models Data Warehouse Historical Data NoSQL
  • 7. Use Case: Operationalize ML Model NoSQL node(e.g.) Training Data (Observations) Serving Predictions
  • 8. Spark connects to everything… DCP KV N1QL Views Adapted from: Databricks – Not Your Father’s Database https://www.brighttalk.com/webcast/12891/196891
  • 9. Use Case #2: Data Integration RDBMSs3hdfs Elasticsearc h Data engineers query data in many systems w/ one language & runtime Store results where needed for further use Late binding of schemas NoSQL
  • 11. Full Text Search Search for and fetch the most relevant records given a freeform text string Key-Value Directly fetch / store a particular record Query Specify a set of criteria to retrieve relevant data records. Essential in reporting. Map-Reduce Views Maintain materialized indexes of data records, with reduce functions for aggregation Data Streaming Efficiently, quickly stream data records to external systems for further processing or integration
  • 12. Hash Partitioned Data Auto Sharding – Bucket And vBuckets A bucket is a logical, unique key space Each bucket has active & replica data sets  Each data set has 1024 virtual buckets (vBuckets)  Each vBucket contains 1/1024th of the data set  vBuckets have no fixed physical server location Mapping of vBuckets to physical servers is called the cluster map Document IDs (keys) always get hashed to the same vBucket Couchbase SDK’s lookup the vBucket  server mapping
  • 13. N1QL Query N1QL, pronounced “nickel”, is a SQL service with extensions specifically for JSON  Is stateless execution, however…  Uses Couchbase’s Global Secondary Indexes.  These are sorted structures, range partitioned.  Both can run on any nodes within the cluster. Nodes with differing services can be added and removed as needed.
  • 14. MapReduce Couchbase Views A JavaScript based, incremental Map-Reduce service for incrementally building sorted B+Trees.  Runs on every node, local to the data on that node, stored locally.  Automatically merge-sorted at query time.
  • 15. Data Streaming with DCP A general data streaming service, Database Change Protocol.  Allows for streaming all data out and continuing, or…  Stream just what is coming in at the time of connection, or…  Stream everything out for transfer/takeover…
  • 17. Key-Value Direct fetching/storing of a particular record. Query Specifying a set of criteria to retrieve relevant data records. Essential in reporting. Map-Reduce Views Maintain materialized indexes of data records, with reduce functions for aggregation. Data Streaming Efficiently, quickly stream data records to external systems for further processing or integration. Full Text Search Search for, and allow tuning of the system to fetch the most relevant records given a freeform search string.
  • 18. Automatic sharding and data locality Integration Points of Interest
  • 19. What happens in Spark Couchbase KV When 1 Spark node per CB node, the connector will use the cluster map and push down location hints  Helpful for situations where processing is intense, like transformation  Uses pipeline IO optimization However, not available for N1QL or Views  Round robin - can’t give location hints  Back end is scatter gather with 1 node responding
  • 20. Predicate pushdown and global indexing Integration Points of Interest
  • 21. SparkSQL on N1QL with Global Secondary Indexes TableScan Scan all of the data and return it PrunedScan Scan an index that matches only relevant data to the query at hand. PrunedFilteredScan Scan an index that matches only relevant data to the query at hand.
  • 22. Couchbase Cluster B-Tree (Global Secondary Index) Data Data Data Data Query Application Client
  • 23. Couchbase Cluster B-Tree (Global Secondary Index) Data Data Data Data Query Application Client
  • 25. Predicate pushdown Notes from implementing: Spark assumes it’s getting all the data, applies the predicates Future potential optimizations Push down all the things!  Aggregations  JOINs Looking at Catalyst engine extensions from SAP  But, it’s not backward compatible and…  …many data sources can only push down filters image courtesy http://allthefreethings.com/about/
  • 26. Streams: Data Replication and Spark Streaming Integration Points of Interest
  • 27. DCP and Spark Streaming Many system architectures rely upon streaming from the ‘operational’ data store to other systems  Lambda architecture => store everything and process/reprocess everything based on access  Command Query Responsibility Segregation - (CQRS)  Other reactive pattern derived systems and frameworks
  • 28. Documents flow into the system from outside Documents are then streamed down to consumers In most common cases, flows memory to memory DCP and Spark Streaming Couchbase Node DCP Consumer Other Cluster Nodes Spark Cluster
  • 29. See it in Action
  • 30. Couchbase Spark Connector 1.2 • Spark 1.6 support, including Datasets • Full DCP flow control support • Enhanced Java APIs • Bug fixes 34
  • 32. THANK YOU. @ingenthr Try Couchbase Spark Connector 1.2 http://www.couchbase.com/bigdata
  • 34. Deployment Topology Couchbase Spark Worker Couchbase Couchbase  Many small gets  Streaming with low mutation rate  Ad hoc Couchbase Data Node Data Node Spark Worker Couchbase Couchbase  Medium processing  Predictable workloads  Plenty of overhead on machines Couchbase Couchbase Couchbase Couchbase Couchbase Couchbase Data Node Data Node Spark Worker XDCR  Heaviest processing  Workload isolation writes

Notas do Editor

  1. CHECK THE ORDER
  2. Analysis side – includes various types of machine learning and analytics Often the data warehousing includes many different sources of data
  3. This is a popular use case for Spark because it provides the ability to go big and serve your predictions to a lot of users
  4. Generally speaking, there’s a thin layer like node.js that gets JSON from the NoSQL system and feeds it to the user. Imagine that user is doing something like shopping or listening to music. As they use the App, you need ridiculously low latency because they system has to do something based on that data. But you also want to send that user’s actions back to the Spark system or wherever .
  5. Couchbase advantages include: Fast: memory-centric, integrated cache, implicit batching from the SDK with async & flat map Dev Convenience: Native SDKs, automatic cluster management, code your app without reference to infrastructure Sophisticated: Query using SQL for JSON (N1QL), supports JOINs
  6. This is like federation
  7. TODO: Come up with a better title here. Goal is to lay out the services that are part of Couchbase and then talk about how they fit in with a spark deployment.
  8. Add lots of notes
  9. TODO: Merge ideas from the next slide (hidden)
  10. TODO: fix the colors
  11. On a given Spark worker node, we can optimize Internal to the Couchbase JVM Core, we pipeline operations Amortize the responses over many operations, meaning no cost for successful operations Efficient scheduling
  12. Future: use the union or differences of indexes for filtering down candidates. More powerful than traditional relational databases owing to indexing architecture.
  13. Surprise! We can push down predicates with an O(log n) lookup 2i – super awesome, that’s how awesome we are Relational DBs do this, and they’re not expecting this to be good. Instead, go to every node and go full retard
  14. Surprise! We can push down predicates with an O(log n) lookup 2i – super awesome, that’s how awesome we are Relational DBs do this, and they’re not expecting this to be good. Instead, go to every node and go full retard
  15. Spark always applies the predicates Spark expects it will always get the data and apply the filters If you’re building a similar system, turn off the flag so Spark doesn’t do needlessly re-apply the filters
  16. Optional but cool PERFORMANCE IMPLICATIONS!!! OMG KAFKA is most common Matt could show his mad science demo Limitation - Only backfill works, can’t do from point in time. This is a bug.
  17. Can’t currently shard across spark workers since there’s no way to see this topology
  18. Fully transparent cluster and bucket management, including direct access if needed