SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
Managing Data Analytics in a Hybrid Cloud
Karan Singh
Sr. Solution Architect
Storage & Hyper-Converged Business Unit
Daniel Gilfix
Technical Marketing Manager
Storage & Hyper-Converged Business Unit
AGENDA
2
● CUSTOMER PAIN
● COMMON APPROACHES
● SHARED DATA LAKES
● HOW IT WORKS AND WHERE
● SUMMARY AND NEXT STEPS
CUSTOMER PAIN
INSERT DESIGNATOR, IF NEEDED4
CUSTOMER PAIN POINTS
EXPLOSIVE GROWTH
in data analytics teams
and analytic tools
MULTIPLE TEAMS COMPETING
for use of the same
big data resources.
CONGESTION
in busy analytic clusters
causing frustration and missed SLAs.
HADOOP
SPARKSQL
SPARK
HIVE
MAPREDUCE
PRESTO
IMPALA
KAFKA
NIFI
ETC.
INSERT DESIGNATOR, IF NEEDED5
RESULTING IN CUSTOMER CHOICES
Get a bigger cluster
for many teams to share.
Give each team
own dedicated cluster,
each with copy of
PBs of data.
Give teams ability to
spin-up/spin-down
clusters which can
share common data store.
#1 #2 #3
INSERT DESIGNATOR, IF NEEDED6
#3 ON-DEMAND ANALYTIC CLUSTERS
WITH A SHARED DATA LAKE
HIT SERVICE-LEVEL AGREEMENTS
Give teams their own
compute clusters.
ELIMINATE IDLE RESOURCES
By right-sizing de-coupled
compute and storage.
BUY 10s OF PBS INSTEAD OF 100S
Share data sets across clusters
instead of duplicating them.
INCREASE AGILITY
With spin-up/spin-
down clusters.
INSERT DESIGNATOR, IF NEEDED
Red Hat data analytics infrastructure solution
Multi-tenant workload isolation with shared data context
BATCH JOBS
(SLOW)
STREAMING
ANALYTICS
INTERACTIVE
ANALYTICS
OTHER
ANALYTICS
BATCH JOBS
(FAST)
DYNAMIC compute resources and
clusters able to meet different SLAs
UNIFIED single object storage
solution feeding analytics jobs
ELASTIC provisioning and release
of compute resources required by
various analytics jobs
BENEFITS - AGILITY AND $$$
● Faster answers through elastic provisioning via OSP on shared data sets
● Fewer roadblocks for empowered users in self-service data labs / clusters
● Private/public cloud versatility with S3A interface
● Reduced cost and risk from not duplicating and maintaining data sets
● CapEx relief by scaling storage independent from compute
HOW IT WORKS
INSERT DESIGNATOR, IF NEEDED10
GENERATION - I : ANALYTICS
MONOLITHIC HADOOP STACKS
Analytics vendors provide
single-purpose infrastructure
Analytics vendors provide
analytics software
ANALYTICS +
INFRASTRUCTURE
ANALYTICS +
INFRASTRUCTURE
ANALYTICS +
INFRASTRUCTURE
INSERT DESIGNATOR, IF NEEDED11
GENERATION - II : ANALYTICS
ELASTIC COMPUTE AND SHARED STORAGE CLOUDS
Analytics vendors provide
analytics software
Red Hat provides
cloud infrastructure software
Provisioned Compute Pool
via OpenStack and OpenShift platforms
Shared Datasets
on Red Hat Ceph Storage
INSERT DESIGNATOR, IF NEEDED12
MULTIPLE ANALYTIC CLUSTERS
SHARING DATA
INGEST ETL
INTERACTIVE
QUERY
BATCH QUERY
& JOINS
ELASTIC COMPUTE RESOURCE POOL
Kafka
compute instances
Hive/Map Reduce
compute instances
Presto
compute instances
Spark
compute instances
SHARED DATA LAKE
Platinum SLA
Gold SLA
Silver SLA
Bronze SLA
INSERT DESIGNATOR, IF NEEDED13
ANALYTIC WORKLOADS JOINING THE INFRA
storage silo
bare metal silo virtualization infra
shared storage SAN
Red Hat private cloud infra
Red Hat private cloud object store
The rest of an
enterprise’s apps
The rest of an
enterprise’s apps
VMs VMs today -> containers tomorrow
MULTI TENANT WORKLOAD ISOLATION
With Shared Data Context
HDFS TMP
HADOOP
RED HAT CEPH STORAGE
COMPUTE
STORAGE
COMPUTE
STORAGE
COMPUTE
STORAGE
WORKER
HADOOP CLUSTER 1
OPENSTACK
VM
OPENSHIFT
CONTAINER
2
3
HDFS TMP
SPARK
HDFS TMP
SPARK/
PRESTO
HDFS TMP
S3A S3A
BAREMETAL
RHEL
S3A/S3
INSERT DESIGNATOR, IF NEEDED15
COMMON ARCHITECTURAL MODEL -
PUBLIC OR PRIVATE CLOUD
PUBLIC CLOUD (AWS) PRIVATE CLOUD (RHT)
AWS EC2
PROVISIONING
RED HAT®
OPENSTACK PLATFORM
PROVISIONING
AWS S3
SHARED DATASETS
RED HAT®
CEPH S3
SHARED DATASETS
Hadoop
Presto
Spark Hadoop
Presto
Spark
INSERT DESIGNATOR, IF NEEDED16
FEATURES AND BENEFITS
MULTIPLE ANALYTIC CLUSTERS
• Enable teams to meet their individual SLAs without
competing for resources.
SHARED DATA SETS
• Eliminate duplicate storage costs for multiple HDFS cluster silos.
• Eliminate OpEx costs and complexity for maintaining multiple copies
of datasets for multiple HDFS cluster silos.
FAST PROVISIONING OF ANALYTIC CLUSTERS
• Unlocks Agility
• Enables Speed to Capability
ADVANCE ANALYTICS on CEPH
INSERT DESIGNATOR, IF NEEDED18
MODERN BIG DATA ANALYTICS PIPELINE
Simplified Example
DATA
GENERATION
INGEST DATA
SCIENCE
MACHINE
LEARNING
STREAM
PROCESSING
TRANSFORM,
MERGE,
JOIN
DATA
ANALYTICS
INSERT DESIGNATOR, IF NEEDED19
MODERN BIG DATA ANALYTICS PIPELINE
KEY TERMINOLOGY
DATA
GENERATION
INGEST DATA
SCIENCE
MACHINE
LEARNING
STREAM
PROCESSING
TRANSFORM,
MERGE, JOIN
DATA
ANALYTICS
• Sensors
• Click-stream
• Transactions
• Call-detail records
• NiFi
• Kafka • Presto
• Impala
• SparkSQL
• TensorFlow
• Kafka • Hadoop
• Spark
• Spark
• Hadoop
INSERT DESIGNATOR, IF NEEDED20
TESTED WITH CEPH OBJECT STORE
DATA
GENERATION
INGEST DATA
SCIENCE
MACHINE
LEARNING
STREAM
PROCESSING
TRANSFORM,
MERGE, JOIN
DATA
ANALYTICS
• TPC-DS data sets
(structured)
• logsynth
(semi-structured)
• bulk load
• MapReduce • Impala
• Presto
• (not tested)
• SparkSQL
• Hive/MapReduce
• SparkSQL
• Hive/MapReduce
• (not tested)
INSERT DESIGNATOR, IF NEEDED21
TYPICAL SHARED DATA LAKE
PROJECT STAGES
IDENTIFY
• Potential fit?
QUALIFY
• 1-2 day workshop
• ID questions needing evidence
• Prioritize questions by value
• Design POC architecture
POC OR PILOT
• Answer questions
• Empirical results
• RHT Solution Engineering
• RHT Consulting
DEPLOYMENT
• Phased roll-out
• Red Hat Consulting
SUMMARY AND NEXT STEPS
INSERT DESIGNATOR, IF NEEDED23
KEY TAKEAWAYS
MISSED SLAs
Large Spark/Hadoop
shops suffering from missed
SLAs due to cluster congestion.
EXCESSIVE CAPEX AND OPEX
due to multi-cluster
solutions without shared data.
Do you do big data analytics on-premises?
Do you have multi-PB data sets?
Do you have multiple Spark/Hadoop clusters?
Do these Spark/Hadoop clusters need to
share data sets?
Do you also have non Spark/Hadoop tools that
need access to these data sets?
PROBLEMS HOW YOU KNOW IT’S YOU
INSERT DESIGNATOR, IF NEEDED RED HAT CONFIDENTIAL
ONE CUSTOMER’S UNSOLICITED TESTIMONY
“We managed to deliver tremendous value to our organization”:
● Releasing lock on data: moving the HDFS to an open access object store and opening
the data process to more processes and analysis.
● Releasing lock on compute: now we’re able to spin up and decommission compute
power according to customer needs and utilize cloud benefits (including GPU incorporation
in zero time and effort), without worrying about the data.
● Releasing lock on innovation: we can now allow anyone to try and build something new
without the fear of messing things up (data or cluster wise). We’ve built an environment that
can tolerant mistakes at all levels (process and data), and by doing so, our developers can be
much more daring.“
INSERT DESIGNATOR, IF NEEDED RED HAT CONFIDENTIAL
CUSTOMER SATISFACTION
“I’m delighted to announce that its been a few weeks since we’ve launched our Cloudoop*
offering to our customers, and it’s a huge success. The responses from our customers are
very, very positive, and I’m quoting “Big big like!!!”
This shift from the traditional approach is revolutionizing the way we consume and process
our data.”
---- Head of Cloud Infrastructure, government agency
(*Cloudoop is their Spark-as-a-service offering with an S3 backend, Spark by Cloudera and an S3 by Ceph)
INSERT DESIGNATOR, IF NEEDED
RESOURCES
Summary-level blogs:
● Breaking down data silos with Red
Hat infrastructure
● Why would companies do this?
● Will mainstream analytics jobs run
directly against a Ceph object store?
● How much slower will they run than
natively on HDFS?
Architect-level blogs:
● What about locality?
● Anatomy of the S3A filesystem client
● To the cloud!
● Storing tables in Ceph object storage
● Comparing with HDFS—TestDFSIO
● Comparing with remote HDFS—Hive
Testbench (SparkSQL)
● Comparing with local HDFS—Hive
Testbench (SparkSQL)
● Comparing with remote HDFS—Hive
Testbench (Impala)
● AI and machine learning workloads
27
SOCIAL MEDIA OPTIONS
THANK YOU

Mais conteúdo relacionado

Mais procurados

Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed_Hat_Storage
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightColleen Corrice
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Sage Weil
 
QCT Fact Sheet-English
QCT Fact Sheet-EnglishQCT Fact Sheet-English
QCT Fact Sheet-EnglishPeggy Ho
 
Introduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStackIntroduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStackOpenStack_Online
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesKamesh Pemmaraju
 
Red Hat Storage Day Boston - OpenStack + Ceph Storage
Red Hat Storage Day Boston - OpenStack + Ceph StorageRed Hat Storage Day Boston - OpenStack + Ceph Storage
Red Hat Storage Day Boston - OpenStack + Ceph StorageRed_Hat_Storage
 
Architecting Ceph Solutions
Architecting Ceph SolutionsArchitecting Ceph Solutions
Architecting Ceph SolutionsRed_Hat_Storage
 
New use cases for Ceph, beyond OpenStack, Luis Rico
New use cases for Ceph, beyond OpenStack, Luis RicoNew use cases for Ceph, beyond OpenStack, Luis Rico
New use cases for Ceph, beyond OpenStack, Luis RicoCeph Community
 
Ceph and Openstack in a Nutshell
Ceph and Openstack in a NutshellCeph and Openstack in a Nutshell
Ceph and Openstack in a NutshellKaran Singh
 
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...Ian Colle
 
HKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM serversHKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM serversLinaro
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed_Hat_Storage
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsColleen Corrice
 
Drbd9 and drbdmanage_june_2016
Drbd9 and drbdmanage_june_2016Drbd9 and drbdmanage_june_2016
Drbd9 and drbdmanage_june_2016Philipp Reisner
 
SF Ceph Users Jan. 2014
SF Ceph Users Jan. 2014SF Ceph Users Jan. 2014
SF Ceph Users Jan. 2014Kyle Bader
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookDanny Al-Gaaf
 
What you need to know about ceph
What you need to know about cephWhat you need to know about ceph
What you need to know about cephEmma Haruka Iwao
 

Mais procurados (19)

Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based HardwareRed hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
Red hat Storage Day LA - Designing Ceph Clusters Using Intel-Based Hardware
 
Ceph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer SpotlightCeph Deployment at Target: Customer Spotlight
Ceph Deployment at Target: Customer Spotlight
 
Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)Storage tiering and erasure coding in Ceph (SCaLE13x)
Storage tiering and erasure coding in Ceph (SCaLE13x)
 
QCT Fact Sheet-English
QCT Fact Sheet-EnglishQCT Fact Sheet-English
QCT Fact Sheet-English
 
Introduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStackIntroduction into Ceph storage for OpenStack
Introduction into Ceph storage for OpenStack
 
New Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference ArchitecturesNew Ceph capabilities and Reference Architectures
New Ceph capabilities and Reference Architectures
 
Red Hat Storage Day Boston - OpenStack + Ceph Storage
Red Hat Storage Day Boston - OpenStack + Ceph StorageRed Hat Storage Day Boston - OpenStack + Ceph Storage
Red Hat Storage Day Boston - OpenStack + Ceph Storage
 
Architecting Ceph Solutions
Architecting Ceph SolutionsArchitecting Ceph Solutions
Architecting Ceph Solutions
 
New use cases for Ceph, beyond OpenStack, Luis Rico
New use cases for Ceph, beyond OpenStack, Luis RicoNew use cases for Ceph, beyond OpenStack, Luis Rico
New use cases for Ceph, beyond OpenStack, Luis Rico
 
Ceph and Openstack in a Nutshell
Ceph and Openstack in a NutshellCeph and Openstack in a Nutshell
Ceph and Openstack in a Nutshell
 
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
What is a Ceph (and why do I care). OpenStack storage - Colorado OpenStack Me...
 
HKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM serversHKG15-401: Ceph and Software Defined Storage on ARM servers
HKG15-401: Ceph and Software Defined Storage on ARM servers
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference Architectures
 
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and ContributionsCeph on Intel: Intel Storage Components, Benchmarks, and Contributions
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
 
Drbd9 and drbdmanage_june_2016
Drbd9 and drbdmanage_june_2016Drbd9 and drbdmanage_june_2016
Drbd9 and drbdmanage_june_2016
 
SF Ceph Users Jan. 2014
SF Ceph Users Jan. 2014SF Ceph Users Jan. 2014
SF Ceph Users Jan. 2014
 
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
 
What you need to know about ceph
What you need to know about cephWhat you need to know about ceph
What you need to know about ceph
 
librados
libradoslibrados
librados
 

Semelhante a Manage Data Analytics in Hybrid Cloud with Shared Data Lakes

Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsScyllaDB
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceeRic Choo
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Anant Corporation
 
Accelerate Big Data Application Development with Cascading
Accelerate Big Data Application Development with CascadingAccelerate Big Data Application Development with Cascading
Accelerate Big Data Application Development with CascadingCascading
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with SparkVincent GALOPIN
 
Elasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingElasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingCascading
 
Democratization of Data @Indix
Democratization of Data @IndixDemocratization of Data @Indix
Democratization of Data @IndixManoj Mahalingam
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsAli Hodroj
 
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...Databricks
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 editionDavid Talby
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data AnalyticsAmazon Web Services
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data ArchitecturesLynn Langit
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Unlock the value of big data with the DX2000 from NEC
Unlock the value of big data with the DX2000 from NECUnlock the value of big data with the DX2000 from NEC
Unlock the value of big data with the DX2000 from NECPrincipled Technologies
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?OVHcloud
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupVictor Coustenoble
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiFelicia Haggarty
 

Semelhante a Manage Data Analytics in Hybrid Cloud with Shared Data Lakes (20)

Developing Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data PlatformsDeveloping Enterprise Consciousness: Building Modern Open Data Platforms
Developing Enterprise Consciousness: Building Modern Open Data Platforms
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Scaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data ScienceScaling up with Cisco Big Data: Data + Science = Data Science
Scaling up with Cisco Big Data: Data + Science = Data Science
 
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
Data Engineer's Lunch #82: Automating Apache Cassandra Operations with Apache...
 
Accelerate Big Data Application Development with Cascading
Accelerate Big Data Application Development with CascadingAccelerate Big Data Application Development with Cascading
Accelerate Big Data Application Development with Cascading
 
Lambda architecture with Spark
Lambda architecture with SparkLambda architecture with Spark
Lambda architecture with Spark
 
Elasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log ProcessingElasticsearch + Cascading for Scalable Log Processing
Elasticsearch + Cascading for Scalable Log Processing
 
Democratization of Data @Indix
Democratization of Data @IndixDemocratization of Data @Indix
Democratization of Data @Indix
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
Productionizing Machine Learning with Apache Spark, MLflow and ONNX from the ...
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
Data & Analytics - Session 1 - Big Data Analytics
Data & Analytics - Session 1 -  Big Data AnalyticsData & Analytics - Session 1 -  Big Data Analytics
Data & Analytics - Session 1 - Big Data Analytics
 
Cloud Big Data Architectures
Cloud Big Data ArchitecturesCloud Big Data Architectures
Cloud Big Data Architectures
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Unlock the value of big data with the DX2000 from NEC
Unlock the value of big data with the DX2000 from NECUnlock the value of big data with the DX2000 from NEC
Unlock the value of big data with the DX2000 from NEC
 
How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?How to scale your PaaS with OVH infrastructure?
How to scale your PaaS with OVH infrastructure?
 
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetupDataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
DataStax - Analytics on Apache Cassandra - Paris Tech Talks meetup
 
SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017SnappyData Toronto Meetup Nov 2017
SnappyData Toronto Meetup Nov 2017
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 

Último

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlkumarajju5765
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 

Último (20)

Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girlCall Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
Call Girls 🫤 Dwarka ➡️ 9711199171 ➡️ Delhi 🫦 Two shot with one girl
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

Manage Data Analytics in Hybrid Cloud with Shared Data Lakes

  • 1. Managing Data Analytics in a Hybrid Cloud Karan Singh Sr. Solution Architect Storage & Hyper-Converged Business Unit Daniel Gilfix Technical Marketing Manager Storage & Hyper-Converged Business Unit
  • 2. AGENDA 2 ● CUSTOMER PAIN ● COMMON APPROACHES ● SHARED DATA LAKES ● HOW IT WORKS AND WHERE ● SUMMARY AND NEXT STEPS
  • 4. INSERT DESIGNATOR, IF NEEDED4 CUSTOMER PAIN POINTS EXPLOSIVE GROWTH in data analytics teams and analytic tools MULTIPLE TEAMS COMPETING for use of the same big data resources. CONGESTION in busy analytic clusters causing frustration and missed SLAs. HADOOP SPARKSQL SPARK HIVE MAPREDUCE PRESTO IMPALA KAFKA NIFI ETC.
  • 5. INSERT DESIGNATOR, IF NEEDED5 RESULTING IN CUSTOMER CHOICES Get a bigger cluster for many teams to share. Give each team own dedicated cluster, each with copy of PBs of data. Give teams ability to spin-up/spin-down clusters which can share common data store. #1 #2 #3
  • 6. INSERT DESIGNATOR, IF NEEDED6 #3 ON-DEMAND ANALYTIC CLUSTERS WITH A SHARED DATA LAKE HIT SERVICE-LEVEL AGREEMENTS Give teams their own compute clusters. ELIMINATE IDLE RESOURCES By right-sizing de-coupled compute and storage. BUY 10s OF PBS INSTEAD OF 100S Share data sets across clusters instead of duplicating them. INCREASE AGILITY With spin-up/spin- down clusters.
  • 7. INSERT DESIGNATOR, IF NEEDED Red Hat data analytics infrastructure solution Multi-tenant workload isolation with shared data context BATCH JOBS (SLOW) STREAMING ANALYTICS INTERACTIVE ANALYTICS OTHER ANALYTICS BATCH JOBS (FAST) DYNAMIC compute resources and clusters able to meet different SLAs UNIFIED single object storage solution feeding analytics jobs ELASTIC provisioning and release of compute resources required by various analytics jobs
  • 8. BENEFITS - AGILITY AND $$$ ● Faster answers through elastic provisioning via OSP on shared data sets ● Fewer roadblocks for empowered users in self-service data labs / clusters ● Private/public cloud versatility with S3A interface ● Reduced cost and risk from not duplicating and maintaining data sets ● CapEx relief by scaling storage independent from compute
  • 10. INSERT DESIGNATOR, IF NEEDED10 GENERATION - I : ANALYTICS MONOLITHIC HADOOP STACKS Analytics vendors provide single-purpose infrastructure Analytics vendors provide analytics software ANALYTICS + INFRASTRUCTURE ANALYTICS + INFRASTRUCTURE ANALYTICS + INFRASTRUCTURE
  • 11. INSERT DESIGNATOR, IF NEEDED11 GENERATION - II : ANALYTICS ELASTIC COMPUTE AND SHARED STORAGE CLOUDS Analytics vendors provide analytics software Red Hat provides cloud infrastructure software Provisioned Compute Pool via OpenStack and OpenShift platforms Shared Datasets on Red Hat Ceph Storage
  • 12. INSERT DESIGNATOR, IF NEEDED12 MULTIPLE ANALYTIC CLUSTERS SHARING DATA INGEST ETL INTERACTIVE QUERY BATCH QUERY & JOINS ELASTIC COMPUTE RESOURCE POOL Kafka compute instances Hive/Map Reduce compute instances Presto compute instances Spark compute instances SHARED DATA LAKE Platinum SLA Gold SLA Silver SLA Bronze SLA
  • 13. INSERT DESIGNATOR, IF NEEDED13 ANALYTIC WORKLOADS JOINING THE INFRA storage silo bare metal silo virtualization infra shared storage SAN Red Hat private cloud infra Red Hat private cloud object store The rest of an enterprise’s apps The rest of an enterprise’s apps VMs VMs today -> containers tomorrow
  • 14. MULTI TENANT WORKLOAD ISOLATION With Shared Data Context HDFS TMP HADOOP RED HAT CEPH STORAGE COMPUTE STORAGE COMPUTE STORAGE COMPUTE STORAGE WORKER HADOOP CLUSTER 1 OPENSTACK VM OPENSHIFT CONTAINER 2 3 HDFS TMP SPARK HDFS TMP SPARK/ PRESTO HDFS TMP S3A S3A BAREMETAL RHEL S3A/S3
  • 15. INSERT DESIGNATOR, IF NEEDED15 COMMON ARCHITECTURAL MODEL - PUBLIC OR PRIVATE CLOUD PUBLIC CLOUD (AWS) PRIVATE CLOUD (RHT) AWS EC2 PROVISIONING RED HAT® OPENSTACK PLATFORM PROVISIONING AWS S3 SHARED DATASETS RED HAT® CEPH S3 SHARED DATASETS Hadoop Presto Spark Hadoop Presto Spark
  • 16. INSERT DESIGNATOR, IF NEEDED16 FEATURES AND BENEFITS MULTIPLE ANALYTIC CLUSTERS • Enable teams to meet their individual SLAs without competing for resources. SHARED DATA SETS • Eliminate duplicate storage costs for multiple HDFS cluster silos. • Eliminate OpEx costs and complexity for maintaining multiple copies of datasets for multiple HDFS cluster silos. FAST PROVISIONING OF ANALYTIC CLUSTERS • Unlocks Agility • Enables Speed to Capability
  • 18. INSERT DESIGNATOR, IF NEEDED18 MODERN BIG DATA ANALYTICS PIPELINE Simplified Example DATA GENERATION INGEST DATA SCIENCE MACHINE LEARNING STREAM PROCESSING TRANSFORM, MERGE, JOIN DATA ANALYTICS
  • 19. INSERT DESIGNATOR, IF NEEDED19 MODERN BIG DATA ANALYTICS PIPELINE KEY TERMINOLOGY DATA GENERATION INGEST DATA SCIENCE MACHINE LEARNING STREAM PROCESSING TRANSFORM, MERGE, JOIN DATA ANALYTICS • Sensors • Click-stream • Transactions • Call-detail records • NiFi • Kafka • Presto • Impala • SparkSQL • TensorFlow • Kafka • Hadoop • Spark • Spark • Hadoop
  • 20. INSERT DESIGNATOR, IF NEEDED20 TESTED WITH CEPH OBJECT STORE DATA GENERATION INGEST DATA SCIENCE MACHINE LEARNING STREAM PROCESSING TRANSFORM, MERGE, JOIN DATA ANALYTICS • TPC-DS data sets (structured) • logsynth (semi-structured) • bulk load • MapReduce • Impala • Presto • (not tested) • SparkSQL • Hive/MapReduce • SparkSQL • Hive/MapReduce • (not tested)
  • 21. INSERT DESIGNATOR, IF NEEDED21 TYPICAL SHARED DATA LAKE PROJECT STAGES IDENTIFY • Potential fit? QUALIFY • 1-2 day workshop • ID questions needing evidence • Prioritize questions by value • Design POC architecture POC OR PILOT • Answer questions • Empirical results • RHT Solution Engineering • RHT Consulting DEPLOYMENT • Phased roll-out • Red Hat Consulting
  • 23. INSERT DESIGNATOR, IF NEEDED23 KEY TAKEAWAYS MISSED SLAs Large Spark/Hadoop shops suffering from missed SLAs due to cluster congestion. EXCESSIVE CAPEX AND OPEX due to multi-cluster solutions without shared data. Do you do big data analytics on-premises? Do you have multi-PB data sets? Do you have multiple Spark/Hadoop clusters? Do these Spark/Hadoop clusters need to share data sets? Do you also have non Spark/Hadoop tools that need access to these data sets? PROBLEMS HOW YOU KNOW IT’S YOU
  • 24. INSERT DESIGNATOR, IF NEEDED RED HAT CONFIDENTIAL ONE CUSTOMER’S UNSOLICITED TESTIMONY “We managed to deliver tremendous value to our organization”: ● Releasing lock on data: moving the HDFS to an open access object store and opening the data process to more processes and analysis. ● Releasing lock on compute: now we’re able to spin up and decommission compute power according to customer needs and utilize cloud benefits (including GPU incorporation in zero time and effort), without worrying about the data. ● Releasing lock on innovation: we can now allow anyone to try and build something new without the fear of messing things up (data or cluster wise). We’ve built an environment that can tolerant mistakes at all levels (process and data), and by doing so, our developers can be much more daring.“
  • 25. INSERT DESIGNATOR, IF NEEDED RED HAT CONFIDENTIAL CUSTOMER SATISFACTION “I’m delighted to announce that its been a few weeks since we’ve launched our Cloudoop* offering to our customers, and it’s a huge success. The responses from our customers are very, very positive, and I’m quoting “Big big like!!!” This shift from the traditional approach is revolutionizing the way we consume and process our data.” ---- Head of Cloud Infrastructure, government agency (*Cloudoop is their Spark-as-a-service offering with an S3 backend, Spark by Cloudera and an S3 by Ceph)
  • 26. INSERT DESIGNATOR, IF NEEDED RESOURCES Summary-level blogs: ● Breaking down data silos with Red Hat infrastructure ● Why would companies do this? ● Will mainstream analytics jobs run directly against a Ceph object store? ● How much slower will they run than natively on HDFS? Architect-level blogs: ● What about locality? ● Anatomy of the S3A filesystem client ● To the cloud! ● Storing tables in Ceph object storage ● Comparing with HDFS—TestDFSIO ● Comparing with remote HDFS—Hive Testbench (SparkSQL) ● Comparing with local HDFS—Hive Testbench (SparkSQL) ● Comparing with remote HDFS—Hive Testbench (Impala) ● AI and machine learning workloads