SlideShare a Scribd company logo
1 of 25
1© Cloudera, Inc. All rights reserved.
Data Engineering: Elastic, Low-Cost
Data Processing in the Cloud
David Tishgart | Product Marketing | Cloudera
Kaushik Deka | CTO | Novantas
2© Cloudera, Inc. All rights reserved.
Three Core Enterprise Workload Patterns
Process data, develop &
serve predictive models
Data
Engineering &
Data Science
ELT, reporting, exploratory
business intelligence
Analytic
Database
Build data-driven
applications to deliver
real-time insights
Operational
Database
Multi-Storage, Multi-Environment
3© Cloudera, Inc. All rights reserved.
Data Engineering in the Cloud
Across industries, data engineering and
data science are a natural fit for the cloud:
● Data growth: More data being created in the cloud
● Transient workloads: Development/test, exploration;
batch ETL, model training and scoring
● Flexibility: Optimize infrastructure for the job;
self-service for data engineers, data scientists
● Lower TCO: Do more with less
4© Cloudera, Inc. All rights reserved.
Cloudera’s Data Engineering Solution
Familiar tools for data
science & data integration
Partner
integrations
Interactive search and
immediate exploration
Search
Audit, lineage, encryption,
key management, & policy
lifecycles
Navigator
Easy deployment and
flexible scaling
Cloud
Deployment
Modern Real-time Analytics
Engine
Spark
Large-scale ETL & batch
processing engine
Hive-on-Spark
5© Cloudera, Inc. All rights reserved.
Traditional Data Engineering for ETL
• Unstructured Data
• Structured Data
• Social Data
• Machine Data
• IOT
• Stream or batch
• Choice of engine: Spark, Hive
on Spark, Mapreduce
• Meet SLAs: resource
management, fast processing
• Analytic engines
• Real-time applications
• External storage
systems
Any source and format Large scale data processing Batch or stream pipelines
Ingest data sources
Transform and
combine data with
SLAs
Processed data
consumed by...
6© Cloudera, Inc. All rights reserved.
Data Engineering for Machine Learning Workloads
Raw Data
- many
sources
- many
formats
- varying
validity
Validated
ML Models
End User
Data
Engineering
Data
Science
Well-formated
data
Training, validation,
and test data
cleaning
merging
filtering
model building
model training
hyper-param
tuning
pipeline
execution
production
operation
Data
Engineering
Consump-
tion for
analysis
Ongoing Data
Ingestion
7© Cloudera, Inc. All rights reserved.
Transience for flexibility,
lower TCO and risk
Unified platform, from
ingest to insight and action
Object Store
Hybrid support for
multiple environments
STORE
COMPUTE
Requirements for Data Engineering
Portability, flexibility, and an end-to-end enterprise platform
8© Cloudera, Inc. All rights reserved.
Benefits of Data Engineering with Cloudera
Lower TCO and increased flexibility on a trusted enterprise data platform
Increased Convenience On a Common PlatformLower cost
Multi-cloud
• Shop across providers:
Amazon, Google, Microsoft
Deliver On-Demand
• Immediate access to large
compute with fast cluster
provisioning
• Self-service for developers
Optimize and Isolate
• Tailor infrastructure for the job
• Run different software versions
• Enable more experimentation
with less opportunity cost
Build Complete Data Apps
• Ingest, stream, process, explore
analyze, model, and serve on the
same platform
• Shared data with object store
integration
• Cluster metadata persistence
• Common compliance-ready
security and governance
frameworks
Manage Costs
• Transience for dev/test,
ETL, and data science
• Usage-based pricing
• Spot instance support
9© Cloudera, Inc. All rights reserved.
Data Engineering in the Cloud
Three Architectural Patterns to Optimize Price, Performance, Convenience
Object Storage
Batch
Cluster
Transient Batch (most flexible)
Spin up clusters as needed.
● On-demand/spot instances
● Usage-based pricing
● Sized for workload
● Cluster per tenant/user
Batch
Cluster
Batch
Cluster
Persistent Batch (most control)
Persistent cluster(s) for frequent ETL.
● Reserved instances
● Node-based pricing
● Grow/shrink
● Cluster per tenant group
Persistent Cluster
Batch
Persistent Batch on HDFS (fastest)
Top performance for frequent ETL.
● Reserved instances
● Node-based pricing
● Grow/shrink
● Shared across tenant groups
Batch
Persistent Cluster
Batch Batch
Persistent Cluster
HDFS
Batch Batch
10© Cloudera, Inc. All rights reserved.
Data Engineering for Customer Journey
Analytics and Scoring in Financial Services
Kaushik Deka, CTO, Novantas
11© Cloudera, Inc. All rights reserved.
Novantas is the leader in customer science and revenue strategies for the financial
industry through analytics that leverage data, advice and technology
 2016 FinTech 100 company based out of Manhattan (NYC) providing Pricing, Distribution, Treasury/Risk and Marketing Solutions in retail
banking
 Expert Practice Leaders work with CEO’s and Functional Heads daily around the globe.
 Decision support and analytic platforms help top 20 US banks manage over US$1.5 Trillion of deposits
 Center of excellence in Big Data Analytics in Consumer Banking
12© Cloudera, Inc. All rights reserved.
Novantas has developed leading-edge analytics and modeling capabilities to help Banks
improve their performance at all stages of the customer journey
Novantas Supports Banks in Understanding All Stages of the Customer Journey
Acquisition
Activation
and
Engagement
Maturity Senescence
•Customer
Segmentation
•Customer Targeting
•Channel
Optimization
•Offer Optimization
•Activation Propensity
•Customer Potential
Value
•Deposit Modelling
•Promotional
Optimization
•Usage Optimization
•Attrition Propensity
•Retention Campaign
Targeting
•Customer Lifetime
Value
•Cross-sell and
Upsell Modelling
•Revenue
Optimization
•Primacy/Exclusivity
Optimization
13© Cloudera, Inc. All rights reserved.
One of our unique contributions to customer journey analytics is in the area of customer
scoring, particularly metrics to determine the potential value of their customers
Sample
Complexities
Current Profitability
Calculation of current value contribution of a
customer
• Differentiation and appropriate valuation of
deposits (core, promotional)
• Scope of calculation (eg current account only,
bank only, full relationship) and, if less than full
relationship, accounting for additional profits/cost
generated elsewhere
Over Time Account Potential
Assessment of the value contribution of the
customer longitudinally (over time)
• Estimation of future account usage patterns
(balances, transactional behavior,
savings/borrowing requirements, etc)
• Duration of calculation (lifetime, 10-year, 5-year,
etc)
Across Wallet Account Potential
Assessment of the value contribution of the
customer latitudinally (across wallet)
• Estimation of current off-us wallet (balances and
value to Bank)
• Scope (e.g. checking only, bank only, full
relationship)
CLV Focus CPV Focus
Core Elements of Customer Potential Value Calculation
14© Cloudera, Inc. All rights reserved.
The data engineering challenges that underpin the development of these scores are
immense
ILLUSTRATIVE SCORING CALCULATION COMPLEXITIES
There are literally thousands of stratified
variables that could potentially go into the
calculations…
Choosing these variables can also be affected by
curated data available…
Even when data is available, interpreting it is
complicated and metric/model definitions can
change…
• Basic Variables (Average Daily Balance, Number of
Deposits/Cycle, Average Non-Bill Transfer Out Value,
etc)
• First Order Derivatives (Rate of Change in Average Daily
Balances, Rate of Change in Monthly Branch
Transactions, etc)
• Second Order Derivatives (Rate of Change in First Order
Derivative Variables, eg Rate of Change in Rate of
Change in Average Daily Balances)
• etc
• Understanding sources of useful data
• Knowing how to parse difficult data into usable
information
• Knowing which data can be easily substituted for more
readily available sources
• Experience in mapping multiple data sources to a
semantically integrated financial data model
• Transactional clues: frequency of payment, consistency
of amount, relation of payment amount to account
balances, etc
• Account clues: existence of mortgage at the bank,
existence of credit card at the bank, presence of utility
payments from account, presence of income payments,
etc
…Being able to rapidly perform feature
engineering at scale on large data sets is
essential given the large number of variables
that must be evaluated
…Being able to curate and map a wide range of
data sources and types to a standard data model
ensures data integrity and allows data scientists to
spend more time on modeling rather than data
wrangling
…Being able to govern business metadata and
track model performance is essential to the
ongoing application of the metrics/scores
$1,467.32 to Bank X
Mortgage?
Savings?
Transfer to Secondary
(Primary?) Account?
Credit Card Payment?
Your Statement
15© Cloudera, Inc. All rights reserved.
Our Customer Analytics Platform built on CDH 5.8 leverages Spark on YARN and is
engineered for high performance analytics on both AWS and private cloud
Ecosystem of Applications (Domain Specific)
MetricScape Scoring Workbench
Internal and External Data Sources
Internal bank, 3rd Party (e.g., competitor pricing), public domain, and Novantas proprietary data
Spark/Hadoop (CDH)
Metadata Governance Metrics Library Management
Novantas Banking Data Model
Analytics Database
BI/Reporting Scoring / Campaigns
Scenario /
Optimization
Publish metadata to
Navigator
APIs for operationalizing
predictive models
Customer Data Hub
(Hybrid Cloud on HDFS)
Analytics Operating System
(Spark on YARN)
Ecosystem of fit-for-purpose
End-User Apps Forecasting Rate /Offer Delivery
Analytic Dataset Generation
16© Cloudera, Inc. All rights reserved.
A banking ontology stored in a containerized format on HDFS enables efficient data
processing
17© Cloudera, Inc. All rights reserved.
Our MetricScape scoring workbench has built-in metadata governance and code-gen
capability and leverages Spark, Navigator, Search, Hue and Impala
Manage Metrics/Scores Library
• Create/modify metric(s) definitions using Novantas
Spark API for Banking
• Capture metadata and publish to Navigator
• Faceted search and tagging
• Version Control and business traceability
Manage Data Sources
• Register use case specific data model
• Data model is stored in container-based storage
format in Hadoop for optimal processing at
customer level
Generate Analytic Datasets
• Create dataset to support specific scoring use cases
(segmentation, multi-point, event aligned, etc)
Connect to BI Tool (Tableau) via
Impala Connector Data Visualization
Connect to Jupyter/RStudio
Train/Test Models
Connect to Hue/Impala
Interactive Queries
MetricScape
18© Cloudera, Inc. All rights reserved.
Data Life Cycle (Hybrid Cloud)
Data Sources (HDFS or S3)
Data source 1
Raw
files
Raw Maps
Process/ derivations
Settings/prefs
Data source 2
Raw
files
Raw Maps
Process/ derivations
Settings/prefs
Data source 3
Raw
files
Raw Maps
Process/ derivations
Settings/prefs
Metrics and Model
Factory (Spark)
Logical Data Warehouse
(HDFS/Parquet)
Data-driven
extraction pipeline
MetricScape API
Analytic
Datasets
Domain 1
Domain 3Domain 2
Standard Banking Data Model
•Stable Entity Keys
•Common Entities
•Common Dimensions
•Derived datasets from
downstream processes
Ontology Validation
Model
API end-
points
Metadata
Catalog
Use Case
Driven Data
Model
Data
Harmonization
BI or Impala
Search
Navigator
MetricScape
19© Cloudera, Inc. All rights reserved.
Technology catalysts of a data engineering solution for customer journey analytics and
scoring use cases
An efficient and cost-effective storage model on Hadoop that works on hybrid cloud and that co-partitions and co-
locates related data conforming to a banking ontology
A high performance domain specific Spark API onto the semantic data model leveraging the Spark ecosystem to
parallelize metrics and models
A data science workbench with built-in metadata governance and code-gen capability
Curated library of parameterized metrics with data lineage that can be leveraged to score millions of customers
A metadata governance and version control framework built into feature engineering and all analyses on the
workbench, cataloged in Cloudera Navigator
20© Cloudera, Inc. All rights reserved.20
Business case for a large US bank: optimize the role and value of promotional pricing
to drive rate insensitive deposit growth using customer propensity modeling and
scoring
Challenge
• What are the material segments of depositors that react to promotional pricing - when and why ?
• At what point in the customer journey can the bank most economically influence deposit consolidation?
Massive Dataset with
Transactional Information
Deep Analytics – Repeatable,
reliable Scoring Models
IT Benefits (Cloud Solution) Business Impact
• 9 Years of Customer and Account
Holdings Months Customer
Holdings
• 4 years of money in / money out
detail
• 2 years of offer disposition history
• 3rd Party Data and Novantas
Wallet models
• Descriptive Metrics for Customer
Journey Exploration
• Scoring models identifying:
• Price Sensitivity
• Shopping Behavior
• Deposit cost given churn
• Persistence
• CPV
• Over 1000 Metrics/Scores per
customer generated in 14ms
• Low TCO (hybrid cloud)
• Scalable infrastructure (eg. scale
storage and compute separately)
• Manage costs (eg. transient nodes
for variable workloads)
• Speed of provisioning cluster (eg.
Cloudera Director)
• Ease of administration and
maintenance (eg. Cloudera
Manager)
• Reduce promotional spend by 50%
through precision targeting in
marketing treatments
• Increase initiatives around
achieving primacy
• Limit retention offers – reduce
promotion expense by 10% and
balance retention by only 3% with
nominal change in customer
retention
21© Cloudera, Inc. All rights reserved.
Director Provisioning: Cluster Lifecycle Management
Spin up, grow & shrink, terminate CDH clusters that read/write to object store
Easy Administration
• Dynamic cluster lifecycle management
• Single pane of glass: multi-cluster view
Flexible Deployments
• Multi-cloud: AWS, Azure, GCP
• Fast cluster deployments
• Scaling of CDH clusters
• Spot instance support
Enterprise-grade
• Integration across Cloudera Enterprise
• Management of CDH deployments at scale
Cloudera Director
22© Cloudera, Inc. All rights reserved.
Instance Recommendations
Default Guidelines (based on Apache Spark best practices)
Workload AWS Azure Google
Default (mixed workloads) m4.2xlarge (or greater) D3-5 v2 n1-standard-4/8/16
Compute-Intensive (e.g. machine learning
simulations)
c4.2xlarge (or greater) F4, F8, F16 n1-highcpu-8/16/32
Memory-Intensive (e.g. large, cached Spark
objects)
r4.2xlarge (or greater) D11-15 v2 n1-highmem-4/8/16/32
I/O-Intensive (e.g. multiple or shared R/W steps) EBS-backed (see
“exceptions”)
Use Premium
Storage
Data Nodes:
Master Nodes:
Type AWS Azure Google
CM m4.4xlarge DS13 v2 n1-standard-16
CDH c4.4xlarge DS14 v2 n1-highmem-16
Master Node Notes:
● Size memory inline with the cluster size.
● Do not use Spot. Spot Block is acceptable if
reservation duration exceeds workload time.
● Start with 50GB block storage (gp2 for AWS) for
CM.
23© Cloudera, Inc. All rights reserved.
Q&A
24© Cloudera, Inc. All rights reserved.
Next steps
• Check out our other “Best practices” in the cloud webinars
www.cloudera.com/resources
• Learn more about Novantas and companies like them
www.cloudera.com/more/customers.html
• Visit our downloads page and take Cloudera Director for a spin
25© Cloudera, Inc. All rights reserved.
Thank you

More Related Content

What's hot

Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndCloudera, Inc.
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchCloudera, Inc.
 
The Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in ChurnThe Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in ChurnCloudera, Inc.
 
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence
Cloudera, Inc.
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Cloudera, Inc.
 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformCloudera, Inc.
 
A Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber ThreatsA Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber ThreatsCloudera, Inc.
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Cloudera, Inc.
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduCloudera, Inc.
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18Cloudera, Inc.
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudCloudera, Inc.
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Cloudera, Inc.
 
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Cloudera, Inc.
 
Live Cloudera Cybersecurity Solution Demo
Live Cloudera Cybersecurity Solution DemoLive Cloudera Cybersecurity Solution Demo
Live Cloudera Cybersecurity Solution DemoCloudera, Inc.
 
Advanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine LearningAdvanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine LearningCloudera, Inc.
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduCloudera, Inc.
 
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)Cloudera, Inc.
 
Intuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchIntuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchCloudera, Inc.
 
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...Cloudera, Inc.
 

What's hot (20)

Part 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to EndPart 3: Models in Production: A Look From Beginning to End
Part 3: Models in Production: A Look From Beginning to End
 
Part 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science WorkbenchPart 1: Introducing the Cloudera Data Science Workbench
Part 1: Introducing the Cloudera Data Science Workbench
 
The Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in ChurnThe Big Picture: Learned Behaviors in Churn
The Big Picture: Learned Behaviors in Churn
 
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence
Driving Better Products with Customer Intelligence

Driving Better Products with Customer Intelligence

 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera
Supercharge Splunk with Cloudera

Supercharge Splunk with Cloudera

 
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data PlatformHow to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
How to Build Multi-disciplinary Analytics Applications on a Shared Data Platform
 
A Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber ThreatsA Community Approach to Fighting Cyber Threats
A Community Approach to Fighting Cyber Threats
 
Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18Consolidate your data marts for fast, flexible analytics 5.24.18
Consolidate your data marts for fast, flexible analytics 5.24.18
 
Part 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache KuduPart 1: Lambda Architectures: Simplified by Apache Kudu
Part 1: Lambda Architectures: Simplified by Apache Kudu
 
Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the CloudPart 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
Part 2: Cloudera’s Operational Database: Unlocking New Benefits in the Cloud
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)Building a Data Hub that Empowers Customer Insight (Technical Workshop)
Building a Data Hub that Empowers Customer Insight (Technical Workshop)
 
Live Cloudera Cybersecurity Solution Demo
Live Cloudera Cybersecurity Solution DemoLive Cloudera Cybersecurity Solution Demo
Live Cloudera Cybersecurity Solution Demo
 
Advanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine LearningAdvanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine Learning
 
Moving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache KuduMoving Beyond Lambda Architectures with Apache Kudu
Moving Beyond Lambda Architectures with Apache Kudu
 
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
Securing the Data Hub--Protecting your Customer IP (Technical Workshop)
 
Intuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with SearchIntuitive Real-Time Analytics with Search
Intuitive Real-Time Analytics with Search
 
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
Introducing Cloudera Navigator Optimizer: Offload Assessments and Active Data...
 

Viewers also liked

Enabling the Connected Car Revolution

Enabling the Connected Car Revolution
Enabling the Connected Car Revolution

Enabling the Connected Car Revolution
Cloudera, Inc.
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Cloudera, Inc.
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)Cloudera, Inc.
 
Tracxn Research - Chatbots Landscape, February 2017
Tracxn Research - Chatbots Landscape, February 2017Tracxn Research - Chatbots Landscape, February 2017
Tracxn Research - Chatbots Landscape, February 2017Tracxn
 
Tracxn Research - Mobile Advertising Landscape, February 2017
Tracxn Research - Mobile Advertising Landscape, February 2017Tracxn Research - Mobile Advertising Landscape, February 2017
Tracxn Research - Mobile Advertising Landscape, February 2017Tracxn
 
Tracxn Research - Finance & Accounting Landscape, February 2017
Tracxn Research - Finance & Accounting Landscape, February 2017Tracxn Research - Finance & Accounting Landscape, February 2017
Tracxn Research - Finance & Accounting Landscape, February 2017Tracxn
 
Tugas4 0317-nasrulakbar-141250552
Tugas4 0317-nasrulakbar-141250552Tugas4 0317-nasrulakbar-141250552
Tugas4 0317-nasrulakbar-141250552Nasrul Akbar
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Lucas Jellema
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)Cloudera, Inc.
 
Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloude...
Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloude...Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloude...
Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloude...Cloudera, Inc.
 

Viewers also liked (13)

Enabling the Connected Car Revolution

Enabling the Connected Car Revolution
Enabling the Connected Car Revolution

Enabling the Connected Car Revolution

 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1

 
Top 5 IoT Use Cases
Top 5 IoT Use CasesTop 5 IoT Use Cases
Top 5 IoT Use Cases
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)
 
Tracxn Research - Chatbots Landscape, February 2017
Tracxn Research - Chatbots Landscape, February 2017Tracxn Research - Chatbots Landscape, February 2017
Tracxn Research - Chatbots Landscape, February 2017
 
Tracxn Research - Mobile Advertising Landscape, February 2017
Tracxn Research - Mobile Advertising Landscape, February 2017Tracxn Research - Mobile Advertising Landscape, February 2017
Tracxn Research - Mobile Advertising Landscape, February 2017
 
The Impala Cookbook
The Impala CookbookThe Impala Cookbook
The Impala Cookbook
 
Tracxn Research - Finance & Accounting Landscape, February 2017
Tracxn Research - Finance & Accounting Landscape, February 2017Tracxn Research - Finance & Accounting Landscape, February 2017
Tracxn Research - Finance & Accounting Landscape, February 2017
 
Tugas4 0317-nasrulakbar-141250552
Tugas4 0317-nasrulakbar-141250552Tugas4 0317-nasrulakbar-141250552
Tugas4 0317-nasrulakbar-141250552
 
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
Introducing NoSQL and MongoDB to complement Relational Databases (AMIS SIG 14...
 
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
How Big Data Can Enable Analytics from the Cloud (Technical Workshop)
 
Google Cloud Spanner Preview
Google Cloud Spanner PreviewGoogle Cloud Spanner Preview
Google Cloud Spanner Preview
 
Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloude...
Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloude...Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloude...
Chicago Data Summit: Cloudera's Distribution including Apache Hadoop & Cloude...
 

Similar to Data Engineering: Elastic, Low-Cost Data Processing in the Cloud

Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Cloudera, Inc.
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaCloudera, Inc.
 
New Analytic Uses of Master Data Management in the Enterprise
New Analytic Uses of Master Data Management in the EnterpriseNew Analytic Uses of Master Data Management in the Enterprise
New Analytic Uses of Master Data Management in the EnterpriseDATAVERSITY
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformCloudera, Inc.
 
151116 Sedania Cloudera BDA Profile
151116 Sedania Cloudera BDA Profile151116 Sedania Cloudera BDA Profile
151116 Sedania Cloudera BDA ProfileZarul Zaabah
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)DataStax
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformArvind Sathi
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US InformationJulian Tong
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014Hortonworks
 
Capgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini
 
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAmazon Web Services
 
DataArt Financial Services and Capital Markets
DataArt Financial Services and Capital MarketsDataArt Financial Services and Capital Markets
DataArt Financial Services and Capital MarketsDataArt
 
Sami patel full_resume
Sami patel full_resumeSami patel full_resume
Sami patel full_resumeJignesh Shah
 
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture MaturityADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture MaturityDATAVERSITY
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationDATAVERSITY
 

Similar to Data Engineering: Elastic, Low-Cost Data Processing in the Cloud (20)

Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8Building a Modern Analytic Database with Cloudera 5.8
Building a Modern Analytic Database with Cloudera 5.8
 
Oracle canvas 140604 2
Oracle canvas 140604 2Oracle canvas 140604 2
Oracle canvas 140604 2
 
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and ClouderaIs your big data journey stalling? Take the Leap with Capgemini and Cloudera
Is your big data journey stalling? Take the Leap with Capgemini and Cloudera
 
New Analytic Uses of Master Data Management in the Enterprise
New Analytic Uses of Master Data Management in the EnterpriseNew Analytic Uses of Master Data Management in the Enterprise
New Analytic Uses of Master Data Management in the Enterprise
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data Platform
 
151116 Sedania Cloudera BDA Profile
151116 Sedania Cloudera BDA Profile151116 Sedania Cloudera BDA Profile
151116 Sedania Cloudera BDA Profile
 
Big Data and Analytics
Big Data and AnalyticsBig Data and Analytics
Big Data and Analytics
 
Big Data and Analytics
Big Data and AnalyticsBig Data and Analytics
Big Data and Analytics
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
Implementing Advanced Analytics Platform
Implementing Advanced Analytics PlatformImplementing Advanced Analytics Platform
Implementing Advanced Analytics Platform
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 
CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014CSC - Presentation at Hortonworks Booth - Strata 2014
CSC - Presentation at Hortonworks Booth - Strata 2014
 
Capgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with Cloudera
 
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and RedshiftAWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
AWS Webcast - Sales Productivity Solutions with MicroStrategy and Redshift
 
DataArt Financial Services and Capital Markets
DataArt Financial Services and Capital MarketsDataArt Financial Services and Capital Markets
DataArt Financial Services and Capital Markets
 
Sami patel full_resume
Sami patel full_resumeSami patel full_resume
Sami patel full_resume
 
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture MaturityADV Slides: How to Improve Your Analytic Data Architecture Maturity
ADV Slides: How to Improve Your Analytic Data Architecture Maturity
 
The Shifting Landscape of Data Integration
The Shifting Landscape of Data IntegrationThe Shifting Landscape of Data Integration
The Shifting Landscape of Data Integration
 

More from Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 

Recently uploaded (20)

Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 

Data Engineering: Elastic, Low-Cost Data Processing in the Cloud

  • 1. 1© Cloudera, Inc. All rights reserved. Data Engineering: Elastic, Low-Cost Data Processing in the Cloud David Tishgart | Product Marketing | Cloudera Kaushik Deka | CTO | Novantas
  • 2. 2© Cloudera, Inc. All rights reserved. Three Core Enterprise Workload Patterns Process data, develop & serve predictive models Data Engineering & Data Science ELT, reporting, exploratory business intelligence Analytic Database Build data-driven applications to deliver real-time insights Operational Database Multi-Storage, Multi-Environment
  • 3. 3© Cloudera, Inc. All rights reserved. Data Engineering in the Cloud Across industries, data engineering and data science are a natural fit for the cloud: ● Data growth: More data being created in the cloud ● Transient workloads: Development/test, exploration; batch ETL, model training and scoring ● Flexibility: Optimize infrastructure for the job; self-service for data engineers, data scientists ● Lower TCO: Do more with less
  • 4. 4© Cloudera, Inc. All rights reserved. Cloudera’s Data Engineering Solution Familiar tools for data science & data integration Partner integrations Interactive search and immediate exploration Search Audit, lineage, encryption, key management, & policy lifecycles Navigator Easy deployment and flexible scaling Cloud Deployment Modern Real-time Analytics Engine Spark Large-scale ETL & batch processing engine Hive-on-Spark
  • 5. 5© Cloudera, Inc. All rights reserved. Traditional Data Engineering for ETL • Unstructured Data • Structured Data • Social Data • Machine Data • IOT • Stream or batch • Choice of engine: Spark, Hive on Spark, Mapreduce • Meet SLAs: resource management, fast processing • Analytic engines • Real-time applications • External storage systems Any source and format Large scale data processing Batch or stream pipelines Ingest data sources Transform and combine data with SLAs Processed data consumed by...
  • 6. 6© Cloudera, Inc. All rights reserved. Data Engineering for Machine Learning Workloads Raw Data - many sources - many formats - varying validity Validated ML Models End User Data Engineering Data Science Well-formated data Training, validation, and test data cleaning merging filtering model building model training hyper-param tuning pipeline execution production operation Data Engineering Consump- tion for analysis Ongoing Data Ingestion
  • 7. 7© Cloudera, Inc. All rights reserved. Transience for flexibility, lower TCO and risk Unified platform, from ingest to insight and action Object Store Hybrid support for multiple environments STORE COMPUTE Requirements for Data Engineering Portability, flexibility, and an end-to-end enterprise platform
  • 8. 8© Cloudera, Inc. All rights reserved. Benefits of Data Engineering with Cloudera Lower TCO and increased flexibility on a trusted enterprise data platform Increased Convenience On a Common PlatformLower cost Multi-cloud • Shop across providers: Amazon, Google, Microsoft Deliver On-Demand • Immediate access to large compute with fast cluster provisioning • Self-service for developers Optimize and Isolate • Tailor infrastructure for the job • Run different software versions • Enable more experimentation with less opportunity cost Build Complete Data Apps • Ingest, stream, process, explore analyze, model, and serve on the same platform • Shared data with object store integration • Cluster metadata persistence • Common compliance-ready security and governance frameworks Manage Costs • Transience for dev/test, ETL, and data science • Usage-based pricing • Spot instance support
  • 9. 9© Cloudera, Inc. All rights reserved. Data Engineering in the Cloud Three Architectural Patterns to Optimize Price, Performance, Convenience Object Storage Batch Cluster Transient Batch (most flexible) Spin up clusters as needed. ● On-demand/spot instances ● Usage-based pricing ● Sized for workload ● Cluster per tenant/user Batch Cluster Batch Cluster Persistent Batch (most control) Persistent cluster(s) for frequent ETL. ● Reserved instances ● Node-based pricing ● Grow/shrink ● Cluster per tenant group Persistent Cluster Batch Persistent Batch on HDFS (fastest) Top performance for frequent ETL. ● Reserved instances ● Node-based pricing ● Grow/shrink ● Shared across tenant groups Batch Persistent Cluster Batch Batch Persistent Cluster HDFS Batch Batch
  • 10. 10© Cloudera, Inc. All rights reserved. Data Engineering for Customer Journey Analytics and Scoring in Financial Services Kaushik Deka, CTO, Novantas
  • 11. 11© Cloudera, Inc. All rights reserved. Novantas is the leader in customer science and revenue strategies for the financial industry through analytics that leverage data, advice and technology  2016 FinTech 100 company based out of Manhattan (NYC) providing Pricing, Distribution, Treasury/Risk and Marketing Solutions in retail banking  Expert Practice Leaders work with CEO’s and Functional Heads daily around the globe.  Decision support and analytic platforms help top 20 US banks manage over US$1.5 Trillion of deposits  Center of excellence in Big Data Analytics in Consumer Banking
  • 12. 12© Cloudera, Inc. All rights reserved. Novantas has developed leading-edge analytics and modeling capabilities to help Banks improve their performance at all stages of the customer journey Novantas Supports Banks in Understanding All Stages of the Customer Journey Acquisition Activation and Engagement Maturity Senescence •Customer Segmentation •Customer Targeting •Channel Optimization •Offer Optimization •Activation Propensity •Customer Potential Value •Deposit Modelling •Promotional Optimization •Usage Optimization •Attrition Propensity •Retention Campaign Targeting •Customer Lifetime Value •Cross-sell and Upsell Modelling •Revenue Optimization •Primacy/Exclusivity Optimization
  • 13. 13© Cloudera, Inc. All rights reserved. One of our unique contributions to customer journey analytics is in the area of customer scoring, particularly metrics to determine the potential value of their customers Sample Complexities Current Profitability Calculation of current value contribution of a customer • Differentiation and appropriate valuation of deposits (core, promotional) • Scope of calculation (eg current account only, bank only, full relationship) and, if less than full relationship, accounting for additional profits/cost generated elsewhere Over Time Account Potential Assessment of the value contribution of the customer longitudinally (over time) • Estimation of future account usage patterns (balances, transactional behavior, savings/borrowing requirements, etc) • Duration of calculation (lifetime, 10-year, 5-year, etc) Across Wallet Account Potential Assessment of the value contribution of the customer latitudinally (across wallet) • Estimation of current off-us wallet (balances and value to Bank) • Scope (e.g. checking only, bank only, full relationship) CLV Focus CPV Focus Core Elements of Customer Potential Value Calculation
  • 14. 14© Cloudera, Inc. All rights reserved. The data engineering challenges that underpin the development of these scores are immense ILLUSTRATIVE SCORING CALCULATION COMPLEXITIES There are literally thousands of stratified variables that could potentially go into the calculations… Choosing these variables can also be affected by curated data available… Even when data is available, interpreting it is complicated and metric/model definitions can change… • Basic Variables (Average Daily Balance, Number of Deposits/Cycle, Average Non-Bill Transfer Out Value, etc) • First Order Derivatives (Rate of Change in Average Daily Balances, Rate of Change in Monthly Branch Transactions, etc) • Second Order Derivatives (Rate of Change in First Order Derivative Variables, eg Rate of Change in Rate of Change in Average Daily Balances) • etc • Understanding sources of useful data • Knowing how to parse difficult data into usable information • Knowing which data can be easily substituted for more readily available sources • Experience in mapping multiple data sources to a semantically integrated financial data model • Transactional clues: frequency of payment, consistency of amount, relation of payment amount to account balances, etc • Account clues: existence of mortgage at the bank, existence of credit card at the bank, presence of utility payments from account, presence of income payments, etc …Being able to rapidly perform feature engineering at scale on large data sets is essential given the large number of variables that must be evaluated …Being able to curate and map a wide range of data sources and types to a standard data model ensures data integrity and allows data scientists to spend more time on modeling rather than data wrangling …Being able to govern business metadata and track model performance is essential to the ongoing application of the metrics/scores $1,467.32 to Bank X Mortgage? Savings? Transfer to Secondary (Primary?) Account? Credit Card Payment? Your Statement
  • 15. 15© Cloudera, Inc. All rights reserved. Our Customer Analytics Platform built on CDH 5.8 leverages Spark on YARN and is engineered for high performance analytics on both AWS and private cloud Ecosystem of Applications (Domain Specific) MetricScape Scoring Workbench Internal and External Data Sources Internal bank, 3rd Party (e.g., competitor pricing), public domain, and Novantas proprietary data Spark/Hadoop (CDH) Metadata Governance Metrics Library Management Novantas Banking Data Model Analytics Database BI/Reporting Scoring / Campaigns Scenario / Optimization Publish metadata to Navigator APIs for operationalizing predictive models Customer Data Hub (Hybrid Cloud on HDFS) Analytics Operating System (Spark on YARN) Ecosystem of fit-for-purpose End-User Apps Forecasting Rate /Offer Delivery Analytic Dataset Generation
  • 16. 16© Cloudera, Inc. All rights reserved. A banking ontology stored in a containerized format on HDFS enables efficient data processing
  • 17. 17© Cloudera, Inc. All rights reserved. Our MetricScape scoring workbench has built-in metadata governance and code-gen capability and leverages Spark, Navigator, Search, Hue and Impala Manage Metrics/Scores Library • Create/modify metric(s) definitions using Novantas Spark API for Banking • Capture metadata and publish to Navigator • Faceted search and tagging • Version Control and business traceability Manage Data Sources • Register use case specific data model • Data model is stored in container-based storage format in Hadoop for optimal processing at customer level Generate Analytic Datasets • Create dataset to support specific scoring use cases (segmentation, multi-point, event aligned, etc) Connect to BI Tool (Tableau) via Impala Connector Data Visualization Connect to Jupyter/RStudio Train/Test Models Connect to Hue/Impala Interactive Queries MetricScape
  • 18. 18© Cloudera, Inc. All rights reserved. Data Life Cycle (Hybrid Cloud) Data Sources (HDFS or S3) Data source 1 Raw files Raw Maps Process/ derivations Settings/prefs Data source 2 Raw files Raw Maps Process/ derivations Settings/prefs Data source 3 Raw files Raw Maps Process/ derivations Settings/prefs Metrics and Model Factory (Spark) Logical Data Warehouse (HDFS/Parquet) Data-driven extraction pipeline MetricScape API Analytic Datasets Domain 1 Domain 3Domain 2 Standard Banking Data Model •Stable Entity Keys •Common Entities •Common Dimensions •Derived datasets from downstream processes Ontology Validation Model API end- points Metadata Catalog Use Case Driven Data Model Data Harmonization BI or Impala Search Navigator MetricScape
  • 19. 19© Cloudera, Inc. All rights reserved. Technology catalysts of a data engineering solution for customer journey analytics and scoring use cases An efficient and cost-effective storage model on Hadoop that works on hybrid cloud and that co-partitions and co- locates related data conforming to a banking ontology A high performance domain specific Spark API onto the semantic data model leveraging the Spark ecosystem to parallelize metrics and models A data science workbench with built-in metadata governance and code-gen capability Curated library of parameterized metrics with data lineage that can be leveraged to score millions of customers A metadata governance and version control framework built into feature engineering and all analyses on the workbench, cataloged in Cloudera Navigator
  • 20. 20© Cloudera, Inc. All rights reserved.20 Business case for a large US bank: optimize the role and value of promotional pricing to drive rate insensitive deposit growth using customer propensity modeling and scoring Challenge • What are the material segments of depositors that react to promotional pricing - when and why ? • At what point in the customer journey can the bank most economically influence deposit consolidation? Massive Dataset with Transactional Information Deep Analytics – Repeatable, reliable Scoring Models IT Benefits (Cloud Solution) Business Impact • 9 Years of Customer and Account Holdings Months Customer Holdings • 4 years of money in / money out detail • 2 years of offer disposition history • 3rd Party Data and Novantas Wallet models • Descriptive Metrics for Customer Journey Exploration • Scoring models identifying: • Price Sensitivity • Shopping Behavior • Deposit cost given churn • Persistence • CPV • Over 1000 Metrics/Scores per customer generated in 14ms • Low TCO (hybrid cloud) • Scalable infrastructure (eg. scale storage and compute separately) • Manage costs (eg. transient nodes for variable workloads) • Speed of provisioning cluster (eg. Cloudera Director) • Ease of administration and maintenance (eg. Cloudera Manager) • Reduce promotional spend by 50% through precision targeting in marketing treatments • Increase initiatives around achieving primacy • Limit retention offers – reduce promotion expense by 10% and balance retention by only 3% with nominal change in customer retention
  • 21. 21© Cloudera, Inc. All rights reserved. Director Provisioning: Cluster Lifecycle Management Spin up, grow & shrink, terminate CDH clusters that read/write to object store Easy Administration • Dynamic cluster lifecycle management • Single pane of glass: multi-cluster view Flexible Deployments • Multi-cloud: AWS, Azure, GCP • Fast cluster deployments • Scaling of CDH clusters • Spot instance support Enterprise-grade • Integration across Cloudera Enterprise • Management of CDH deployments at scale Cloudera Director
  • 22. 22© Cloudera, Inc. All rights reserved. Instance Recommendations Default Guidelines (based on Apache Spark best practices) Workload AWS Azure Google Default (mixed workloads) m4.2xlarge (or greater) D3-5 v2 n1-standard-4/8/16 Compute-Intensive (e.g. machine learning simulations) c4.2xlarge (or greater) F4, F8, F16 n1-highcpu-8/16/32 Memory-Intensive (e.g. large, cached Spark objects) r4.2xlarge (or greater) D11-15 v2 n1-highmem-4/8/16/32 I/O-Intensive (e.g. multiple or shared R/W steps) EBS-backed (see “exceptions”) Use Premium Storage Data Nodes: Master Nodes: Type AWS Azure Google CM m4.4xlarge DS13 v2 n1-standard-16 CDH c4.4xlarge DS14 v2 n1-highmem-16 Master Node Notes: ● Size memory inline with the cluster size. ● Do not use Spot. Spot Block is acceptable if reservation duration exceeds workload time. ● Start with 50GB block storage (gp2 for AWS) for CM.
  • 23. 23© Cloudera, Inc. All rights reserved. Q&A
  • 24. 24© Cloudera, Inc. All rights reserved. Next steps • Check out our other “Best practices” in the cloud webinars www.cloudera.com/resources • Learn more about Novantas and companies like them www.cloudera.com/more/customers.html • Visit our downloads page and take Cloudera Director for a spin
  • 25. 25© Cloudera, Inc. All rights reserved. Thank you