This document discusses building a modern analytic database with Cloudera. It outlines Marketing Associates' evaluation of solutions to address challenges around managing massive and diverse data volumes. They selected Cloudera Enterprise to enable self-service BI and real-time analytics at lower costs than traditional databases. The solution has provided scalability, cost savings of over 90%, and improved security and compliance. Future roadmaps for Cloudera's analytic database include faster SQL, improved multitenancy, and deeper BI tool integration.
Building a Modern Analytic Database with Cloudera 5.8
1. 1Š Cloudera, Inc. All rights reserved.
Building a Modern Analytic
Database with Cloudera 5.8
Justin Erickson | Sr Director of Product | Cloudera
Andy Frey | CIO | Marketing Associates
2. 2Š Cloudera, Inc. All rights reserved.
Agenda
⢠Building a Modern Analytic Database with Hadoop
⢠Key Use Cases Enabled
⢠Whatâs New with Cloudera 5.8
⢠Marketing Associates Customer Case Study
⢠Whatâs Next?
3. 3Š Cloudera, Inc. All rights reserved.
Common Application Patterns
Operational Efficiency New Business Value
OPERATIONS
DATAMANAGEMENT
UNIFIED SERVICES
PROCESS,ANALYZE, SERVE
STORE
INTEGRATE
Process data, develop &
serve predictive models
Data
Engineering &
Science
ELT, reporting, exploratory
business intelligence
Analytic
Database
Build data-driven
applications
to deliver real-time insights.
Operational
Database
4. 4Š Cloudera, Inc. All rights reserved.
Analytic
Database
More data of all types is being
tapped for analytics, across
environments
Self-Service BI & Data
Open up new possibilities
for real-time insights as
data changes
Real-Time Analysis
BI & analytics are critical but
only tell part of the story. Get
more value by sharing data
across workloads
Converged Workloads
5. 5Š Cloudera, Inc. All rights reserved.
Key Use Cases
EDW
Optimization
Data
Preparation
Self-Service BI
& Exploration
Use your EDW more
efficiently by offloading
workloads to Hadoop
Fast, flexible ETL over large
data volumes, so data is always
ready for your business
Fastest time-to-insights with a modern
analytic database designed with
Hadoopâs flexibility and agility
6. 6Š Cloudera, Inc. All rights reserved.
Clouderaâs Analytic Database Solution
OPERATIONS
DATAMANAGEMENT
UNIFIED SERVICES
PROCESS,ANALYZE, SERVE
STORE
INTEGRATE
Identify, offload, &
optimize workloads to
Hadoop
Navigator
Optimizer
Intelligent SQL editor
Hue
Audit, lineage,
encryption, key
management, & policy
lifecycles
Navigator
Integration with the
leading BI tools
BI Partners
Interactive query engine
for BI & SQL analytics
Impala
Large-scale ETL & batch
processing engine
Hive-on-
Spark
7. 7Š Cloudera, Inc. All rights reserved.
ETL & Data Preparation
⢠Flexible & Scalable
⢠Process larger data volumes, of
any type
⢠Fastest Data Processing
⢠Distributed processing and
best-of-breed technologies for
the fastest performance
⢠Minimize Data Movement
⢠Prepared data immediately
available for analytics with
shared storage and metadata
8. 8Š Cloudera, Inc. All rights reserved.
Self-Service BI & Exploratory Analytics
⢠Self-Service Data Agility
⢠No rigid data modeling encumbrances for agile acquisition
⢠Iteratively analyze and flexibly model
⢠Self-Service Exploratory Analytics
⢠Interactive responses for iterative exploration
⢠Confidently handle all BI and SQL users
⢠Cost-Effective Scalability with Users/Data
⢠Easily add nodes to handle more data and users
⢠Leverage the full potential of available data
⢠Productively Use Existing Tools and Skills
⢠Integration with all leading BI tools & compatible analytic
SQL language
⢠Metadata and lineage for easy data discovery
⢠Intelligent SQL editor for greater developer productivity
9. 9Š Cloudera, Inc. All rights reserved.
Optimize the Enterprise Data Warehouse
⢠Decrease Storage Costs
⢠Focus on high-value reporting data in
the EDW
⢠Keep More/All Data Online
⢠Unlimited scale keeps data accessible
and out of archive
⢠Improve Performance
⢠Eliminate contention and meet SLAs for
routine reporting
⢠Get New Insights
⢠Enable ad hoc and exploratory analytics
Siemensâ TCO Assessment (cost/TB)
11. 11Š Cloudera, Inc. All rights reserved.
Advancements with Cloudera 5.8
Impala Hue
Navigator
Optimizer
⢠Cloud-Native:
⢠Read/write directly
from Amazon S3
⢠Performance:
⢠>10x faster
performance on
secure clusters
⢠Data Discovery:
⢠Preview, tag, search, pin
tables in browser
⢠Query Design Assistance:
⢠Autocomplete of tables,
columns, syntax
⢠Efficient troubleshooting
⢠Collaboration & Sharing:
⢠Save & share queries
with peers
⢠Set permissions directly
on results
⢠Now GA!
⢠Ease offloading path to
Hadoop
⢠Active Data
Optimization to enable
peak performance for
Hive and Impala
12. 12Š Cloudera, Inc. All rights reserved.
Self-Service Data Discovery & BI
at Marketing Associates
Andy Frey
13. 13Š Cloudera, Inc. All rights reserved.
About Me â Andy Frey
From Assembler to Ajax, Modem to Mobile, and Mainframe to Cloud, Andy Frey,
developed his deep knowledge as a technologist, and CIO at leading national
corporations such as GAB Robins, Compuware, J. Walter Thompson, Coolfire and
now Marketing Associates, providing Fortune 100 corporations with
technologically advanced enterprise solutions.
14. 14Š Cloudera, Inc. All rights reserved.
Introducing Magnify and Marketing Associates
⢠Magnify Analytic Solutions â a wholly-owned division of Detroit, Michigan-based
Marketing Associates serving primarily Fortune 100 clients â uses technology-driven
data analysis to offer clients a range of informed business services that increase
profitability through its four lines of service: business intelligence, digital intelligence,
credit risk management, and marketing analytics.
⢠Established in 1967 Marketing Associates is a full-service, technology enabled marketing
services company headquartered in Detroit, Michigan with offices in Wilmington,
Delaware and Charlotte, North Carolina. MA offers private and public cloud hosting,
custom web development, and data transformation among itsâ IT based services.
Offering Cloudera Hadoop IaaS and experienced Data Scientists
15. 15Š Cloudera, Inc. All rights reserved.
Different Challenges for Different Clients
The B2C Challenge
⢠Previously using expensive RDBMS systems to deliver B2C marketing contests and
product giveaways. Up to 150 in a year.
⢠Huge spikes in web event data posed challenges. 200,000 hits in first minute for
popular brandsâ campaigns.
⢠Cost to license for biggest spike made projects unprofitable.
⢠Also needed to monitor and manipulate massive amounts of data in real time.
RDBMS could not respond adequately during massive data intake during
campaign run. âWhen has a campaign reached its limit? Has total supply of
product been allocated?â
16. 16Š Cloudera, Inc. All rights reserved.
Different Challenges for Different Clients
The CRM Challenge
⢠Another project for a large client involved managing a repository of customer
data from multiple sources. The magnitude was vast, data was multi-structured
and new sources were being added on a regular basis.
⢠Initially executed using 4 relational databases, query times slowed and costs
soared.
⢠Difficulty merging unstructured data from multiple sources using traditional
RDBMS.
⢠Deployment of prominent SQL RDBMS estimated @ $5 million cost (approx. 150
terabyte).
17. 17Š Cloudera, Inc. All rights reserved.
Evaluation & Decision
Key criteria for modern analytic database:
⢠Handle huge spikes in web event data.
⢠Manage and manipulate massive data volumes in
real-time.
⢠Scalability and performance.
⢠Ability to skill transfer from current SQL based
programming team.
⢠Reduce costs.
18. 18Š Cloudera, Inc. All rights reserved.
Evaluation & Decision
Key criteria for modern analytic database:
⢠Handle huge spikes in web event data.
⢠Manage and manipulate massive data volumes in
real-time.
⢠Scalability and performance.
⢠Ability to skill transfer from current SQL based
programming team.
⢠Reduce costs.
Considered various offerings:
⢠Considered SQL Server (discarded due to cost).
⢠Knew Hadoop could be the solution and started
looking at commercial implementations.
⢠Considered non-commercial & shorted listed two
Hadoop vendors: Cloudera & Hortonworks.
⢠Determined non-commercial too risky, too
burdensome â left it to the experts.
19. 19Š Cloudera, Inc. All rights reserved.
Evaluation & Decision
Key criteria for modern analytic database:
⢠Handle huge spikes in web event data.
⢠Manage and manipulate massive data volumes in
real-time.
⢠Scalability and performance.
⢠Ability to skill transfer from current SQL based
programming team.
⢠Reduce costs.
Considered various offerings:
⢠Considered SQL Server (discarded due to cost).
⢠Knew Hadoop could be the solution and started
looking at commercial implementations.
⢠Considered non-commercial & shorted listed two
Hadoop vendors: Cloudera & Hortonworks.
⢠Determined non-commercial too risky, too
burdensome â left it to the experts.
⢠Launched June 2014
⢠Why Hadoop: Cost, Tech Requirements, Data Size
⢠Why Cloudera: Most mature solution with better overall enterprise toolset; Cloudera Team
Decision
21. 21Š Cloudera, Inc. All rights reserved.
Solution: Self-Service Data Discovery & BI
⢠Self-service data discovery capabilities allow us to eliminate the need for distribution of
multiple Excel reports instead allowing our clients to interact directly with Hadoop.
⢠Security enhanced as the need for distribution of Excel reports via email went away.
⢠Use of Tableau to run Impala queries produces real-time reporting resulting in significant
value add and convenience for our clients.
⢠Offers scalability and flexibility to accommodate diverse and growing client demands.
⢠Allows us to scale our web event product giveaways.
⢠Accommodates the addition of new data sources.
⢠Easily add nodes to avoid potential performance bottlenecks.
22. 22Š Cloudera, Inc. All rights reserved.
Why We Chose Cloudera
⢠Cloudera Manager became a major differentiator.
⢠Made cluster management easy
⢠User friendly = reduced learning curve
⢠Chose Impala for its real-time query performance.
⢠Proven Cloudera innovation and zeal to maintain an enterprise class solution by
offering new tools and functions while maintaining/supporting the Apache
project.
⢠Cloudera appeared to be the prominent choice of large Hadoop installs in the
Fortune 500. Best IT Analyst rating.
⢠Impressed with Cloudera team before purchase.
23. 23Š Cloudera, Inc. All rights reserved.
Benefits & Impact
⢠All-inclusive Cloudera Enterprise costs less than the required relational database licenses alone -
Over 90% cost reduction.
⢠Other benefits - cheaper hardware and easier to manage.
⢠Cloudera Navigator provides a single interface to locate and classify data, audit who is accessing
what data, and protect the data with centralized key management.
⢠Critical tool when handling PII and other sensitive data. Comprehensive audit trail allows for
easy monitoring of PII data access.
⢠Allows us to satisfy strict security compliance regulations with ease.
⢠Cloudera Professional Services are knowledgeable, responsive, and help establish best practices
for our internal development team. They helped us get it right the second time.
Any time we had a crisis they were there to help
Why we are glad we chose Cloudera?
24. 24Š Cloudera, Inc. All rights reserved.
Lessons Learned
⢠First used non-Cloudera consulting: Big mistake â design incorrect for data collected. Work with
Cloudera Professional Services to design it right the first time.
⢠Start small, if you can, and grow solution.
⢠Donât need big capital investment upfront
⢠Get value out of small cluster (eg. 3 nodes) and expand as needed.
⢠Install services to meet your current needs. Install additional services as your data needs change.
⢠Look at all Cloudera solutions, learn them, and use them.
⢠Training: Be generous, conduct in phases to keep new skills relevant as you build and deploy.
⢠Whatâs next?
⢠Prebuilt analytical models as a platform.
⢠Evaluate Navigator Optimizer to improve query performance and identify best candidates for legacy
application migration
25. 25Š Cloudera, Inc. All rights reserved.
Whatâs Next for Clouderaâs Analytic
Database?
26. 26Š Cloudera, Inc. All rights reserved.
Analytic Database Roadmap
Faster, richer, more expressive
SQL
⢠Hive-on-Spark GA
⢠Insert, update delete via Kudu
⢠Performance improvements
⢠Nested JSON
Improved multitenancy
⢠Fewer OOM errors
⢠Graceful node decomission
⢠Admission control enhancements
⢠Improved YARN integration
Better SQL workbench
⢠Higher Hue concurrency
⢠SQL editor usability improvements
⢠Intelligent recommendations of tables,
joins & more for Hue users
⢠Exposing tags & lineage through the
Hue query experience
Deeper integration with BI
tools
⢠Joint workload optimizations
⢠Support for nested types and s
⢠Data discovery functionality injected
into the BI experience
Workload optimization
⢠Multi-platform workload profiling
⢠Recommendation of in-line
materialized views
Confidential â Do not Redistribute
27. 27Š Cloudera, Inc. All rights reserved.
Next Steps
⢠Download Cloudera 5.8
⢠cloudera.com/downloads
⢠Release Notes
⢠cloudera.com/documentation/enterprise/release-
notes/topics/rg_release_notes.html
⢠Learn more about Navigator Optimizer and BI in the Cloud
⢠Register for Parts 2 & 3 of the Webinar Series!
⢠cloudera.com/about-cloudera/events/webinars/5-8-webinar-series.html