SlideShare a Scribd company logo
1 of 98
Download to read offline
linkedin.com/company/cloudera/
twitter.com/cloudera
facebook.com/cloudera
instagram.com/cloudera
#MeetUpClouderaBrasil
MeetUp
ROADMAP CLOUDERA 2020
What is coming in the new Cloudera data platform
© 2020 Cloudera, Inc. All rights reserved. 2
Thiago Santiago
Solution Engineering
linkedin.com/in/thiagosantiago/
thiago@cloudera.com
© 2020 Cloudera, Inc. All rights reserved. 3
● Why are we here today?
● CDP Cloud
● CDP Data Center
● CDF for CDP
● CML for CDP
● Data Driven Journey and Use Cases
Agenda
© 2020 Cloudera, Inc. All rights reserved. 4
Why are you here now?
You could be...
...Netflix?...Soap opera? ...Soccer?
© 2020 Cloudera, Inc. All rights reserved. 5
Why are you here now?
...Make 2020 your BigData Year!
© 2020 Cloudera, Inc. All rights reserved. 6
Why do you want BigData?
© 2020 Cloudera, Inc. All rights reserved. 7
$138,918$122,306
BigData Salaries?
https://www.indeed.com/salaries/Big-Data-Salaries
BigData ArchitectData Scientist
*per year
$137,054
Data Warehouse
Architect
$113,222
Senior Software
Engineer
© 2020 Cloudera, Inc. All rights reserved. 8
Technology Trends?
Artificial
Intelligence
Internet of Things Cloud Computing Streaming Data
Industrial Internet
Connected Business
Consumer Devices
Smart Devices
Autonomy
Prescriptive Analytics
SaaS/PaaS Applications
Ephemeral Use Cases
Operational Efficiency
Collaboration
Real-time Applications
Targeted Retail
Recommendations
Industrial Applications
Shifting the Data Paradigm
© 2020 Cloudera, Inc. All rights reserved. 9
Security
The Way BigData is Changing the world
Big data is being heavily used by law enforcement, particularly by national
organisations such NSA. These organisations have access to vast amounts of
data which they use to catch criminals, foil terrorist plots .
Surveillance video from currently deployed Border Patrol assets such as fixed
and mobile towers, imaging unattended ground sensors (UGS), and unmanned
air systems (drones)
Analytical techniques applied to detect anomalies and/or trends leading to
actionable intelligence
Machine learning used to help with predictive and proactive deployment of
resources
© 2020 Cloudera, Inc. All rights reserved. 10
Decline in
Corruption
The Way BigData is Changing the world
Better monitoring of assets through foolproof big data analytics will help
governments to track economies and facilitate a better allocation among
everyone in the society. This also tackles the problems with unwieldy
bureaucracy, misinformation and other types of obstacles to transparent
economics.
© 2020 Cloudera, Inc. All rights reserved. 11
Environmental
Health
The Way BigData is Changing the world
Increased carbon emission, greenhouse gasses, global warming and other
climate changes can be better monitored and fought with the help of Big Data.
The easiest example lies in wearable devices connected through internet
providing awareness and means to stand against local environmental
challenges.
© 2020 Cloudera, Inc. All rights reserved. 12
Fighting Poverty
The Way BigData is Changing the world
By taking data from developing nations, non-profit organizations are able to find
areas where people can benefit the most from having access to better
education, financial services, developed infrastructure, and health services.
Having this information on hand can aid in efforts to get help for areas that are
struck by natural disasters or health catastrophes. Big data may also help
developing nations fight government corruption, which can cause extreme
levels of poverty and impede relief efforts.
Similarly, access to large amounts of information from various sources can also
help organizations identify and react better to health epidemics, natural
disasters (earthquakes, cyclones, etc.) and agricultural related trends (drought,
famine, etc)."
© 2020 Cloudera, Inc. All rights reserved. 13
HealthCare
The Way BigData is Changing the world
Big data analytics is accelerating the speed at which researches can work, for
example DNA strings can now be decoded in minutes which can lead to the
faster creation of cures and the ability to predict disease patterns.
Big data is being used to monitor premature and sick babies in some specialist
units, with the techniques allowing the doctors to analyse every heart beat and
breathing patterns. This leads to the development of algorithms which now
allow for the prediction of infections 24 hours before any physical symptoms
occur.
© 2020 Cloudera, Inc. All rights reserved. 14
Science
The Way BigData is Changing the world
CERN with the Large Hadron Collider producing astronomical amounts of data
designed to unlock the secrets of the universe. The processing power is
necessary to be able to analyse the 30 petabytes of data that the Hadron
Collider produces annually.
Big data is also aiding with space exploration – the Square Kilometre Array
generates 700 terabytes of data a second.
NASA Exchange Platform will help to manage the data. The technology which
can detect radar on a planet 50 light years away could eventually help to
discover life on another planet.
© 2020 Cloudera, Inc. All rights reserved. 15
We Believe 

data can make what was impossible 

yesterday, possible today.
We Believe 

data can make what was impossible 

yesterday, possible today.
We Believe 

data can make what was impossible 

yesterday, possible today.
© Cloudera, Inc. All rights reserved.19
THE ENTERPRISE DATA CLOUD COMPANY
We believe that data
can make what was
impossible yesterday,
possible today
We empower people 

to transform complex
data into clear and
actionable insights
We deliver an 

enterprise data cloud
for any data, anywhere,
from the Edge to AI
© Cloudera, Inc. All rights reserved.20
SNAPSHOT OF THE “NEW” CLOUDERA
85Countries Customers
3,000+Employees
2,000+
© Cloudera, Inc. All rights reserved.21
LEADING IN TOP INDUSTRIES
8/10
TOP

GLOBAL
10/10
TOP 

GLOBAL
9/10
TOP 

GLOBAL
40+
GOVERNMENT
CUSTOMERS
BANKING TELCO PHARMAPUBLIC
8/10
TOP 

GLOBAL
TECHNOLOGY
10/10
TOP 

GLOBAL
AUTOMOTIVE
© 2019 Cloudera, Inc. All rights reserved. 22
WORLD-CLASS TRAINING, SERVICES & SUPPORT
Fastest route from 

zero to production
PROFESSIONAL SERVICES
SCP-certified support 

anywhere in the world
CLOUDERA SUPPORT
3 top big data 

certifications
CLOUDERA UNIVERSITY
© Cloudera, Inc. All rights reserved.23
ENTERPRISE
DATA CLOUD
ARCHITECTURE
•Multi-function analytics
•Hybrid and multi-cloud
•Secure and governed
•Open platform
IOT, INGEST &
STREAMING
DATA 

WAREHOUSING
SECURITY & GOVERNANCE
ML / AI
DATA SCIENCE
PUBLIC CLOUDS 

compute & storage
DATACENTER

compute & storage
© Cloudera, Inc. All rights reserved.24
Any Cloud Multi-Function OpenSecure & Governed
THE ENTERPRISE DATA CLOUD COMPANY
© 2019 Cloudera, Inc. All rights reserved. 25
CLOUDERA 

DATA PLATFORM
• Public, private & hybrid clouds
• Shared data experience
• Powered by open source
• Analytics from the Edge to AI
• Unified data control plane
Analytic
experiences
Data Flow &
Streaming
Data 

Engineering
Data 

Warehouse
Operational
Database
Machine
Learning
Identity | Orchestration | Management | OperationsControl

plane
Management

Console
Data Hub & Cloudera Runtime
Any
Infrastructure
Edge Public
Multi-Cloud
Hybrid
Cloud
Private
Cloud
Catalog | Schema | Migration | Security | GovernanceData 

anywhere
© 2019 Cloudera, Inc. All rights reserved. 26
CLOUDERA DATA PLATFORM - FORM FACTORS
Data Center, Public cloud, Private cloud, hybrid
Control plane
DW
Data Hub & Cloudera Runtime
MLODDEDF
SDX – security, governance & metadata
Edge to AI
CDP – Public Cloud
Storage ComputePublic
Multi-Cloud
Control plane
DW
Data Hub & Cloudera Runtime
MLODDEDF
SDX – security, governance & metadata
Edge to AI
CDP – Private Cloud
Datacenter
Storage
Container 

Cloud
Private
DW
DS/

ML
DF OpDBDE
Control plane
SDX – security, governance & metadata
CDP – Data Center
Storage &
Compute
SDX – security, governance & metadata
Control plane
© 2020 Cloudera, Inc. All rights reserved. 27
CDP Data Center
EDH
Cloudera
Enterprise
Data Hub
The Most Comprehensive Data Analytics Platform
+ + New Features = CDP Data Center
CDP Public Cloud
© 2020 Cloudera, Inc. All rights reserved. 29
Environment
• 1 Template
• 1 Region
• 1 VPC
• Multiple Roles/Buckets
KEY CONCEPTS & COMPONENTS
1:1
ENVIRONMENTS
Data Lake
• SDX: Atlas, Ranger, Knox, IdBroker,
CM
• Associated with groups/users
Data Hub Clusters / Experiences
• DH templates
• ML Env
• DW Database Catalogs/Virtual
Compute
1:N
© 2020 Cloudera, Inc. All rights reserved. 30
KEY CONCEPTS & COMPONENTS
Typical user flow
Enterprise IT CDP Control Plane Enterprise Cloud Resources (IAM, Network, VMs, Buckets, etc.)
Management Console
1
Step 1
User connects to
CDP with their
enterprise identity
Step 2
They create an
environment and data
lake for their enterprise
2
Environment
Step 3
They create data hub
clusters for traditional
workloads
Data Lake
Atlas
Ranger
Knox
IdBroker
FreeIPA
CM
HMS
3
BI Team Cluster ETL Team Cluster
4
Node 1 Node 2 Node 3
Step 4
They create access points
for containerized analytic
experiences
Node 1 Node 2 Node 3
Data Warehouse Experience Machine Learning Experience
© 2020 Cloudera, Inc. All rights reserved. 31
CONSISTENT SECURITY AND GOVERNANCE
Built for multi-functional analytics anywhere
• Data Catalog: a comprehensive catalog of all data sets, spanning on-
premises, cloud object stores, structured, unstructured, and semi-
structured
• Schema: automatic capture and storage of any and all schema and
metadata definitions as they are used and created by platform workloads
• Replication: deliver data as well as data policies there where the
enterprise needs to work, with complete consistency and security
• Security: role-based access control applied consistently across the
platform. Includes full stack encryption and key management
• Governance: enterprise-grade auditing, lineage, and governance
capabilities applied across the platform with rich extensibility for partner
integrations
© 2019 Cloudera, Inc. All rights reserved. 32
CDP HOME
A single login to access the full
platform, documentation, and
support - all controlled through
corporate SSO
© 2019 Cloudera, Inc. All rights reserved. 33
A single pane of glass to manage
100s of clusters all with different
lifecycles - across multiple
environments
MANAGEMENT
CONSOLE
© 2020 Cloudera, Inc. All rights reserved. 34
DATA LAKE
What is a Data Lake?
A common set of Services (SDX)
within an Environment that are
shared across multiple Clusters/
Experiences.
These include Services for:
• Security
• Auditing
• Governance
• Data Discovery
© 2020 Cloudera, Inc. All rights reserved. 35
DATA HUB CLUSTERS AND EXPERIENCES
What are the consumption options?
A Data Hub Cluster is a
customizable environment that runs
like a traditional Hadoop cluster, but
is designed to leverage Cloud
Storage.
An Experience is a container-based
compute environment for specific
purposes:
ML, DW, DE, OD, DF
© 2019 Cloudera, Inc. All rights reserved. 36
DATA
HUB
A familiar and highly customizable
cluster service optimized for the
separation of storage and compute
© 2019 Cloudera, Inc. All rights reserved. 37
DATA
WAREHOUSE
A data warehousing service
optimized for concurrency,
caching, and isolation
© 2019 Cloudera, Inc. All rights reserved. 38
DATA
CATALOG
A centralized data stewardship tool
for searching, organizing, securing,
and governing data across
environments
© 2019 Cloudera, Inc. All rights reserved. 39
WORKLOAD
MANAGER
A centralized management tool for
analyzing and optimizing
workloads within and across
environments
© 2019 Cloudera, Inc. All rights reserved. 40
REPLICATION
MANAGER
A centralized management tool for
replicating and migrating data,
metadata, and policies between
environments
© 2019 Cloudera, Inc. All rights reserved. 41
A machine learning workspace
service to connect teams of data
scientists to enterprise data
MACHINE
LEARNING
© 2020 Cloudera, Inc. All rights reserved. 42
Tour
CDP Public Cloud
https://console.cdp.cloudera.com/#/
CDP Data Center
© 2020 Cloudera, Inc. All rights reserved. 44
New Features for everyone...
New features for CDH 6 customers
Ranger 2.0
• Dynamic row filtering & column masking
• Attribute-based access control
• SparkSQL fine-grained access control
Atlas 2.0
• Advanced data discovery
• Improved performance and scalability
Hive 3
• Hive-on-Tez for better ETL performance
• ACID transactions
Ozone
(Preview)
• 10x scalability of HDFS
Knox • Gateway-based SSO
Druid
• Low-latency DataMart for real-time and
aggregate data
Spark on
Docker
• Simplified dependency management
New features for HDP 3 customers
Cloudera
Manager
• Virtual private clusters
• Automated wire encryption setup
• Fine-grained RBAC for administrators
• Streamlined maintenance workflows
Atlas 2.0
• Advanced data lineage
• Faceted search
Solr 7
• Relevance-based text search over
unstructured data (text, pdf, .jpg, ...)
Impala
• Better fit for Data Mart migration use cases
(interactive, BI style queries)
Hue • Built-in SQL editor
Kudu
• Better performance for fast changing /
updateable data
Better at-rest
Encryption
• Key Trustee Server, NavEncrypt
© 2020 Cloudera, Inc. All rights reserved. 45
What’s in the box?
CDP Data Center 7.0 (2H 2019) Coming soon...
• Cloudera Manager 7.0
• Hadoop 3.1
• Spark 2.4
• Hive 3.1
• Impala 3.2
• Oozie 5.1
• Hue 4.3
• Ranger 2.0
• Atlas 2.0
• Solr 7.4
• Tez 0.9
• HBase 2.2
• Phoenix 5.0
• Kudu 1.11
• Sqoop 1.4.7
• Parquet 1.10
• Avro 1.8
• ORC 1.5
• Zookeeper 3.5
• Kafka 2.3
• Key Trustee Server
• Ozone (Tech Preview)
• LLAP
• Livy
• Druid
• Ranger KMS
• Key HSM
• Navigator Encrypt
• Zeppelin
• Knox
• Accumulo
© 2020 Cloudera, Inc. All rights reserved. 46
Foundation for Containerized Applications
Latest
upstream
features
Best of CDH
and HDP
features
CDH 5 / HDP 2
Cluster
Existing Apps
Existing Data
Existing Hardware
Upgrade
CDH 6 / HDP 3
Cluster
Existing Apps
Existing Data
Existing Hardware
CDP Data Center
Cluster
Existing Apps
SDX
Storage
CDP Private Cloud
Management Console
Container Cloud
Data Hub
DW, ML,
more
Upgrade
Direct Upgrade
CDP Data Center provides the stateful elements for new wave of containerized applications
• Storage
• Table Schema
• Authentication & Authorization
• Governance
Plan your path to CDP-DC now, expand to new experiences in this year
New for CDH Customers
© 2020 Cloudera, Inc. All rights reserved. 48
Ranger Authorization
• Standard CDP authorization model across services
○ Replaces Sentry
• Better fine-grained access controls
○ Dynamic Row Filtering
○ Dynamic Column Masking
○ Attribute-based Access Control
○ SparkSQL fine-grained access control
• Rich policy features
○ Allow/Deny constructs, Custom policy conditions/context enrichers, time bound policies,
Atlas integration (for tag based policies)
• Extensive Access Auditing with rich event metadata
© 2020 Cloudera, Inc. All rights reserved. 49
New in Ranger
• Ranger AuthZ for Impala, HMS, Solr (doc level), Ozone (TP)
• Security Zones
• RBAC in Ranger
New for both Cloudera and Hortonworks customers
© 2020 Cloudera, Inc. All rights reserved. 50
Apache Ranger - Impala Support
● Single policy store for Hive
and Impala to enable
consistent policy authoring
● Independent AuthZ plugin to
enforce policies locally
● Resource and tag based
policies supported
● Masking/Row filtering on
roadmap for Impala
© 2020 Cloudera, Inc. All rights reserved. 51
Apache Ranger - Security Zones
● Resource Isolation
(especially for multi-
tenancy)
● Policy administration
isolation
● Cross-service logical
grouping
© 2020 Cloudera, Inc. All rights reserved. 52
Apache Ranger - Roles
© 2020 Cloudera, Inc. All rights reserved. 53
Apache Ranger Roadmap
• Authz Integration with more services
○ Kudu, Nifi Registry, Schema Registry etc
• Incremental policy/tag downloads
• Ranger audit extensions
• REST based Authz server
• RangerKMS-KeyTrustee integration
• Row filtering capability extension (to Hbase etc)
• Ranger authz for Ranger
• Supporting multiple versions of plugins
© 2020 Cloudera, Inc. All rights reserved. 54
Apache Atlas
• Metadata catalog & search
• Lineage & chain of custody
• Business glossary
• Metadata audits & security
© 2020 Cloudera, Inc. All rights reserved. 55
Apache Atlas: Overview
• A catalog for metadata of enterprise assets
• Large number of integrations to gather metadata and lineage
© 2020 Cloudera, Inc. All rights reserved. 56
Apache Atlas: Overview (cont..)
• Rich, dynamic type-system makes it easy to onboard new components
• APIs to define types: entity, classification, struct, relationship, enum
© 2020 Cloudera, Inc. All rights reserved. 57
Apache Atlas: Metadata - Hive Column
© 2020 Cloudera, Inc. All rights reserved. 58
Apache Atlas: lineage - Hive Table
● Propagation of Tags
● Filter and search
● Export Lineage
© 2020 Cloudera, Inc. All rights reserved. 59
Apache Atlas: Search
© 2020 Cloudera, Inc. All rights reserved. 60
Apache Atlas
What’s New in CDP-DC?
• Impala and HMS new hooks
• Spark-Atlas connector
• Lineage Improvements
• Runtime Stats
• Optimized Search
• Improvements to address Navigator metadata import
© 2020 Cloudera, Inc. All rights reserved. 61
HIVE 3 FOR DATA WAREHOUSING IN CDP-DC - OVERVIEW
• Comprehensive ANSI SQL 2016 coverage
• Use Cases: Pre-built reports, more efficient SQL constructs, BI tool compatibility
• Capabilities: Implements 120/163 SQL 2016 mandatory features and > 70 optional features
Runs all 99 TPC-DS queries without modifications
Additional SQL friendly capabilities e.g. surrogate keys, information_schema, …
• ACID Support: Transactions and INSERT/UPDATE/DELETE/MERGE
• Use Cases: Delete individual rows (GDPR), data cleansing/correction, merge for CDC data, ...
• Capabilities: SQL 2011 compliant, transactional (snapshot isolation), set based insert/update/delete
Managed tables (ACID default) on ORC; External tables (non-ACID) on ORC/Parquet
New for HDP customers
© 2020 Cloudera, Inc. All rights reserved. 63
IMPALA AND KUDU FOR DATA WAREHOUSING IN CDP-DC
• Apache Impala: Leading MPP SQL Engine for DW - optimized for Parquet/Kudu
• Ideal for: Data Mart Implementations that require Interactive/Ad-hoc BI
• 1000+ enterprise customers - many running on 10s of PBs and 100s of nodes
• Certified with leading BI tools with broad SQL coverage
• Latest release adds improvements for resiliency, concurrency, and metadata
• Apache Kudu: Leading columnar storage engine for fast analytics on fast data
• Ideal for: Low latency time series data ingest and analytics (with Impala SQL engine)
• Strength of fast ingest with single rows like HBASE and allows large scans like HDFS
• ACID (insert/update/delete) semantics with single rows
© 2020 Cloudera, Inc. All rights reserved. 64
HUE FOR DATA WAREHOUSING IN CDP-DC
• Apache Hue: Leading SQL Workbench for Ad-hoc BI
• Ideal for: Ad-hoc queries/exploration on Data Marts/HDFS files using Impala and/or HIVE
• Very high adoption rate across hadoop landscapes with thousands of active users
• Key features:
• SQL editor - autocomplete, query history, query plans
• File browser - Object Stores (S3, ADLS), HDFS
• Document Handling - Sharing, Downloading, Importing, Exporting
• Load balancing for large scale deployments with hundreds of concurrent users
© 2020 Cloudera, Inc. All rights reserved. 65
Cloudera Manager 7 - What’s new for HDP Users
• Single pane of glass
○ Multiple clusters! (up to 3,000 nodes total)
○ ‘Compute’ clusters & ‘Base’ clusters (‘VPCs’)
• Security
○ Automated wire encryption (TLS 1.2)
○ HDFS encryption-at-rest wizard (KTS/KMS)
○ Fine-grained access control for admins
• Ease of administration
○ Global configuration search / config ‘diff’ before restart
○ Edge/’gateway’ node configuration
○ Proper rolling restart (HA-sensitive)
○ View of YARN/Impala workloads
• Performance
○ BitTorrent based distribution of binaries
© 2020 Cloudera, Inc. All rights reserved. 66
Cloudera Manager 7 - What’s new for Everyone!
• Management of new services
○ Ranger,Atlas,Hive-on-Tez,DAS
• CDP Look-and-Feel
• Cluster-level configuration history
• Improved global search
• Resume errors in enabling Kerberos
• Minor scalability improvements (hosts page)
• Improved alerts configuration
• JQuery 3.4 (improved security)
YARN
© 2020 Cloudera, Inc. All rights reserved. 68
Capacity Scheduler & Queue Manager UI
• Capacity Scheduler is now default scheduler in YARN
! GPU support
! Node Labels
! Global scheduling support
! Better placement support
• A new Queue management UI experience for better usability
© 2020 Cloudera, Inc. All rights reserved. 69
Capacity Scheduler & Queue Manager UI
• New Queue Manager UI in CM to configure resources and queues
List of all queues in
cluster
© 2020 Cloudera, Inc. All rights reserved. 70
Spark Dependency Management
• Simplify dependency management with Spark-on-Docker support
• No need to install dependencies on individual cluster hosts
Enable Docker on
YARN with a click
from CM for Spark
workloads
CDF for CDP-DC
© 2020 Cloudera, Inc. All rights reserved. 72
Cloudera DataFlow (CDF) Platform - When will it be Supported on CDP-DC?
Deployment Spectrum
CDP DataHub
CDP DataCenter
CDP DataFlow Service
CDH On-Premise
© 2020 Cloudera, Inc. All rights reserved. 73
CFM 2.0 Highlights & Platform Integration
Based on Apache NiFi 1.10
Allows parameterization of all
processor properties
Support for “public” (accessible to
remote site to site clients) ports for
any processor
Queue Length and time to
Backpressure are now predicted
First release to include K8s
Operator (Tech Preview)
Allows customers to try out NiFi
clusters on Kubernetes
Operator takes care of NiFi cluster
installation, configuration and
scaling
OpenShift certified (pending)
Goal is to to gather feedback from
customers about requirements
First release to include Stateless
NiFi Runtime
New NiFi Runtime
Flow Files stored in memory, not
persisted on disk
Data Durability provided by source/
target systems
Allows for abstraction of “jobs”
Allows for flows to be “triggered”
CFM 2.0 will be Available as Add-On to CDP-DC Available Post GA - Target Q4
© 2020 Cloudera, Inc. All rights reserved. 74
Flume to CEM / CFM migration
Yes, Flume is really gone.
Opportunity Flume Use Case Migration
Questions? Need Help for
specific customer use case?
Flume Offload Sales Play
We now have a powerful data
distribution / ingest tool in our stack
Door opener for new analytics use
cases
- Flexible Data Movement
architecture
- Foundation for real-time
stream processing
Identify your customer’s Use Cases
Most common use cases:
- Hadoop Ingest (HDFS, HBase)
- “Flafka” (Read/Write Kafka)
- HTTP, File sources/sinks
Flume used as agent -> CEM
Flume used for central ingest -> CFM
We will host a deep-dive Flume
Offload enablement
Check out Flume Offload Collateral
(Decks, example Flows, migration
strategy)
Reach out to dim-
field@cloudera.com
mkohs@cloudera.com
fce_streaming / fce_nifi
© 2020 Cloudera, Inc. All rights reserved. 75
Kafka 2.3 Available in CDP-DC 7.0
Secure and Governed Kafka Clusters with New Ranger & Atlas Integration
What's New?
• Kafka 2.3 available in CR 7.0 Parcel
• Kafka / Ranger Integration
• Kafka / Atlas Integration
• Support Hive 3.X / Kafka Storage
Handler
• Support LDAP Base Auth
• Support multiple Kafka compute
clusters using shared Security Data
Lake with Ranger & Atlas
Shared Security Context
from Data Lake consisting
of Ranger and Atlas
Kafka Compute Cluster
using Shared Security
Context
© 2020 Cloudera, Inc. All rights reserved. 76
Kafka Management Services Support on CDP-DC
SR, SMM & SRM Available as Add-On to CDP-DC Available Post GA - Target Q4
Schema Registry
New Kafka Schema Governance
Streams Replication Manager (SRM)
New Kafka Replication Engine powered by
MirrorMaker2
Streams Messaging Manager (SMM)
New Kafka Monitoring Service
© 2020 Cloudera, Inc. All rights reserved. 77
New Flink Support on CDP-DC
Flink Yarn Support Available as Add-On to CDP-DC Available Post GA - Target Q4
Why Flink
• Next Gen streaming engine
offers more superior solution
than Storm
• Flink runs as Yarn app
• Key Features
! Ultra Low Latency ( < 100 MS)
! Advanced features (late arriving data,
checkpointing, event time processing)
! Exactly Once Processing
! Complex Stateful Stream Processing
! Growing / Vibrant Community
CLOUDERA
MACHINE
LEARNING
© 2020 Cloudera, Inc. All rights reserved. 79
© 2020 Cloudera, Inc. All rights reserved. 80
© 2020 Cloudera, Inc. All rights reserved. 81
© 2020 Cloudera, Inc. All rights reserved. 82
GETTING TO PRODUCTION
© 2020 Cloudera, Inc. All rights reserved. 84
© 2020 Cloudera, Inc. All rights reserved. 85
© 2020 Cloudera, Inc. All rights reserved. 86
Tour
CML for CDP
https://console.cdp.cloudera.com/#/
© 2020 Cloudera, Inc. All rights reserved. 87
DATA-DRIVEN JOURNEY
© 2019 Cloudera, Inc. All rights reserved. 88
DATA-DRIVEN JOURNEY
USE CASES
VISIBILITY
PRODUCTIVITY
TRANSFORMATION
Preventive 

& Proactive
Maintenance
IoT Hub for
Industry 4.0
Advanced
Threat
Detection
Risk 

Modelling &
Analysis
Marketing
Systems
Integration
Customer
360 

Insights
Exploratory
Data
Science
Data
Warehouse
Applied
Machine
Learning
GROW
Sales & Marketing
CONNECT
Operations & Product
PROTECT
Security & Compliance
MODERNIZE
IT, Tech, Data Science & Analytics
© Cloudera, Inc. All rights reserved.89
HIERARCHY OF NEEDS FOR THE DATA-DRIVEN ENTERPRISE
The “AI Ladder”
AI
MACHINE
LEARNING
DATA SCIENCE
ANALYTICS
"BIG DATA"
© 2020 Cloudera, Inc. All rights reserved. 90
Actionable Intelligence Powers Today’s Financial Services
OFAC
Lists
Credit
Records
ATM
Streams Transactions
& Wires
Stock
Tickers
Trade
Settlements
DIGITAL CUSTOMER
360
RISK DATA
AGGREGATION
ANTI-MONEY LAUNDERING FRAUD
DETECTION
TRADE
SURVEILLANCE
Mobile
App Data
Trade
Data
Web
Logs
Banker
Notes
Demographi
c Data
Customer
Transactio
n Data
© 2020 Cloudera, Inc. All rights reserved. 91
Connected Data Drives Success in Telecommunications
Call Detail
Records
Product
Catalogs
Cyber
Threat
Metadata
Sensor
Data
Server
Logs
Voice-to-Text
SINGLE VIEW OF
THE CUSTOMER
CHURN
REDUCTION
CDR ANALYSIS NETWORK
OPTIMIZATION
DYNAMIC BANDWIDTH
ALLOCATION
Clickstrea
m
ERP
System
Data
Social
Media Billing
Data
Subscriber
Profiles
CRM
Record
s
© 2020 Cloudera, Inc. All rights reserved. 92
Actionable Intelligence Drives Retail Sales Growth
Product
Catalogs
Sales
Forecasts
Beacons &
RFID Server
Logs
In-Store
WiFi Logs
Store
Communicatio
ns
SINGLE VIEW OF
THE CUSTOMER
PRODUCT
RECOMMENDATIONS
INVENTORY &
SUPPLY CHAIN
PRICING
OPTIMIZATION
TARGETED
PROMOTIONS
Clickstrea
m
ERP
Data
Social
Media
Staffing
Plans
Store
Reporting
CRM
Record
s
© 2020 Cloudera, Inc. All rights reserved. 93
Actionable Intelligence Makes Healthcare Precise and Personal
Patient
Records
Lab Data
Pharmacy
Data
Patient
Locations
Wearable
s
Intra-Network
Data
Sensor
Data
Claims
Data
Social
Media Physician
Notes
Patient
Satisfaction
Data
Clinical
(EMR)
Data
SINGLE VIEW OF
PATIENT
REAL-TIME VITAL
SIGN MONITORING
BILLING &
REIMBURSEMENTS
EMR
OPTIMIZATION
SUPPLY CHAIN
OPTIMIZATION
© 2020 Cloudera, Inc. All rights reserved. 94
Actionable Intelligence Makes Pharmaceuticals Safe & Effective
Research
Cohort Data
Molecular
Data
RFID
Data Social
Media
Biometri
cs
Sensor
Data
DRUG TRIAL COHORT
SELECTION
YIELD
OPTIMIZATION
RAW MATERIAL WASTE
REDUCTION
SEARCHABLE
RESEARCH REPOS
NEXT-GEN
SEQUENCING (NGS)
Supply Chain
Geo-location Data
Scientific
Studies
Manufacturing
Machine Data
Clinical
Records
Sales
Reports
Genomic
Data
© 2020 Cloudera, Inc. All rights reserved. 95
Actionable Intelligence Powers Modern Manufacturing
Defect
Testing Data
Product
Designs
MES
System
s
RFID
Streams
SCADA
Systems
Shop Floor
Sensors
PREVENTATIVE
MAINTENANCE
SUPPLY CHAIN
OPTIMIZATION
YIELD MAXIMIZATION QUALITY
CONTROL
RECALL AVOIDANCE
ERP
Systems
Supplier
Receipts
Machine
Data
Assembly
Line Sensors
Data
Historians
Work
Orders
© 2020 Cloudera, Inc. All rights reserved. 96
Actionable Intelligence Enhances Public Sector Efficiency
Historical
Archives
Cyber Threat
Metadata
Vehicle
Telemetry
Data
Disease
Outbreaks
Natural
Disasters
PUBLIC
TRANSPORTATION
INFRASTUCTURE
MAINTENANCE
PUBLIC
HEALTH
NATIONAL
DEFENSE
HOMELAND
SECURITY
Socia
l 

Medi
a
Work
Orders
Meeting
Notes
Voter
Rolls Public
Benefits
Claims
Financial
Audits
Extreme
Weather Alerts
© 2020 Cloudera, Inc. All rights reserved. 97
Why are you here now?
© 2020 Cloudera, Inc. All rights reserved. 98
THANK YOU!
Because
This is Your
BigData
Year!
2020

More Related Content

What's hot

Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentialsqureshihamid
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudDatabricks
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache RangerDataWorks Summit
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architectureAdam Doyle
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - DatalakeLam Le
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasDataWorks Summit/Hadoop Summit
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxCalvinSim10
 
Get Savvy with Snowflake
Get Savvy with SnowflakeGet Savvy with Snowflake
Get Savvy with SnowflakeMatillion
 
Data Center Migration to the AWS Cloud
Data Center Migration to the AWS CloudData Center Migration to the AWS Cloud
Data Center Migration to the AWS CloudTom Laszewski
 
Azure governance v4.0
Azure governance v4.0Azure governance v4.0
Azure governance v4.0Marcos Oikawa
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsTimothy Spann
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Amazon Web Services
 
SMC304 Serverless Orchestration with AWS Step Functions
SMC304 Serverless Orchestration with AWS Step FunctionsSMC304 Serverless Orchestration with AWS Step Functions
SMC304 Serverless Orchestration with AWS Step FunctionsAmazon Web Services
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarAmazon Web Services
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 

What's hot (20)

Snowflake essentials
Snowflake essentialsSnowflake essentials
Snowflake essentials
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the CloudApache Spark on K8S Best Practice and Performance in the Cloud
Apache Spark on K8S Best Practice and Performance in the Cloud
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Securing Hadoop with Apache Ranger
Securing Hadoop with Apache RangerSecuring Hadoop with Apache Ranger
Securing Hadoop with Apache Ranger
 
Delta lake and the delta architecture
Delta lake and the delta architectureDelta lake and the delta architecture
Delta lake and the delta architecture
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Security and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache AtlasSecurity and Data Governance using Apache Ranger and Apache Atlas
Security and Data Governance using Apache Ranger and Apache Atlas
 
Introduction to Amazon Athena
Introduction to Amazon AthenaIntroduction to Amazon Athena
Introduction to Amazon Athena
 
Data platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptxData platform modernization with Databricks.pptx
Data platform modernization with Databricks.pptx
 
Get Savvy with Snowflake
Get Savvy with SnowflakeGet Savvy with Snowflake
Get Savvy with Snowflake
 
Data Center Migration to the AWS Cloud
Data Center Migration to the AWS CloudData Center Migration to the AWS Cloud
Data Center Migration to the AWS Cloud
 
Azure governance v4.0
Azure governance v4.0Azure governance v4.0
Azure governance v4.0
 
Running Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration OptionsRunning Apache NiFi with Apache Spark : Integration Options
Running Apache NiFi with Apache Spark : Integration Options
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
Analyzing Data Streams in Real Time with Amazon Kinesis: PNNL's Serverless Da...
 
SMC304 Serverless Orchestration with AWS Step Functions
SMC304 Serverless Orchestration with AWS Step FunctionsSMC304 Serverless Orchestration with AWS Step Functions
SMC304 Serverless Orchestration with AWS Step Functions
 
Building a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - WebinarBuilding a Modern Data Architecture on AWS - Webinar
Building a Modern Data Architecture on AWS - Webinar
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 

Similar to Meet up roadmap cloudera 2020 - janeiro

Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Cloud Computing and CDO (April 29).pdf
 Cloud Computing and CDO (April 29).pdf Cloud Computing and CDO (April 29).pdf
Cloud Computing and CDO (April 29).pdfPablo Junco
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Cloud beyond the obvious, an approach for innovation
Cloud beyond the obvious, an approach for innovationCloud beyond the obvious, an approach for innovation
Cloud beyond the obvious, an approach for innovationChristian Verstraete
 
State of cloud computing v2
State of cloud computing v2State of cloud computing v2
State of cloud computing v2Md Aminul Hassan
 
Cloud & Big Data - Digital Transformation in Banking
Cloud & Big Data - Digital Transformation in Banking Cloud & Big Data - Digital Transformation in Banking
Cloud & Big Data - Digital Transformation in Banking Sutedjo Tjahjadi
 
Keynote: Art of the Possible - Moore
Keynote: Art of the Possible - MooreKeynote: Art of the Possible - Moore
Keynote: Art of the Possible - MooreNeo4j
 
The Art of Data Science - event slides
The Art of Data Science - event slidesThe Art of Data Science - event slides
The Art of Data Science - event slidesRedPixie
 
Big Data
Big DataBig Data
Big DataBBDO
 
141900791 big-data
141900791 big-data141900791 big-data
141900791 big-dataglittaz
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013Brian Crotty
 
IoT meets AI in the Clouds
IoT meets AI in the CloudsIoT meets AI in the Clouds
IoT meets AI in the CloudsDr. Mirko Kämpf
 
top 10 Digital transformation Technologies in 2022.docx
top 10 Digital transformation Technologies in 2022.docxtop 10 Digital transformation Technologies in 2022.docx
top 10 Digital transformation Technologies in 2022.docxAdvance Tech
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyRohit Dubey
 
SessionA-Keynote-NSIT-AMS-Aug15b.pptx
SessionA-Keynote-NSIT-AMS-Aug15b.pptxSessionA-Keynote-NSIT-AMS-Aug15b.pptx
SessionA-Keynote-NSIT-AMS-Aug15b.pptxssuser993127
 
巨量資料入門 The evolution of data architecture
巨量資料入門 The evolution of data architecture巨量資料入門 The evolution of data architecture
巨量資料入門 The evolution of data architectureWei-Chiu Chuang
 
KUBRICK Graphs: A journey from in vogue to success-ion
KUBRICK Graphs: A journey from in vogue to success-ionKUBRICK Graphs: A journey from in vogue to success-ion
KUBRICK Graphs: A journey from in vogue to success-ionNeo4j
 
Digital Transformation in the Lab
Digital Transformation in the LabDigital Transformation in the Lab
Digital Transformation in the Labaccenture
 
Tech + Built Environment Trends 22
Tech + Built Environment Trends 22Tech + Built Environment Trends 22
Tech + Built Environment Trends 22Matthew Marson
 

Similar to Meet up roadmap cloudera 2020 - janeiro (20)

Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
VSD Paris 2018 - Présentation Finale
VSD Paris 2018 - Présentation FinaleVSD Paris 2018 - Présentation Finale
VSD Paris 2018 - Présentation Finale
 
Cloud Computing and CDO (April 29).pdf
 Cloud Computing and CDO (April 29).pdf Cloud Computing and CDO (April 29).pdf
Cloud Computing and CDO (April 29).pdf
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Cloud beyond the obvious, an approach for innovation
Cloud beyond the obvious, an approach for innovationCloud beyond the obvious, an approach for innovation
Cloud beyond the obvious, an approach for innovation
 
State of cloud computing v2
State of cloud computing v2State of cloud computing v2
State of cloud computing v2
 
Cloud & Big Data - Digital Transformation in Banking
Cloud & Big Data - Digital Transformation in Banking Cloud & Big Data - Digital Transformation in Banking
Cloud & Big Data - Digital Transformation in Banking
 
Keynote: Art of the Possible - Moore
Keynote: Art of the Possible - MooreKeynote: Art of the Possible - Moore
Keynote: Art of the Possible - Moore
 
The Art of Data Science - event slides
The Art of Data Science - event slidesThe Art of Data Science - event slides
The Art of Data Science - event slides
 
Big Data
Big DataBig Data
Big Data
 
141900791 big-data
141900791 big-data141900791 big-data
141900791 big-data
 
BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013BBDO Proximity: Big-data May 2013
BBDO Proximity: Big-data May 2013
 
IoT meets AI in the Clouds
IoT meets AI in the CloudsIoT meets AI in the Clouds
IoT meets AI in the Clouds
 
top 10 Digital transformation Technologies in 2022.docx
top 10 Digital transformation Technologies in 2022.docxtop 10 Digital transformation Technologies in 2022.docx
top 10 Digital transformation Technologies in 2022.docx
 
Big Data PPT by Rohit Dubey
Big Data PPT by Rohit DubeyBig Data PPT by Rohit Dubey
Big Data PPT by Rohit Dubey
 
SessionA-Keynote-NSIT-AMS-Aug15b.pptx
SessionA-Keynote-NSIT-AMS-Aug15b.pptxSessionA-Keynote-NSIT-AMS-Aug15b.pptx
SessionA-Keynote-NSIT-AMS-Aug15b.pptx
 
巨量資料入門 The evolution of data architecture
巨量資料入門 The evolution of data architecture巨量資料入門 The evolution of data architecture
巨量資料入門 The evolution of data architecture
 
KUBRICK Graphs: A journey from in vogue to success-ion
KUBRICK Graphs: A journey from in vogue to success-ionKUBRICK Graphs: A journey from in vogue to success-ion
KUBRICK Graphs: A journey from in vogue to success-ion
 
Digital Transformation in the Lab
Digital Transformation in the LabDigital Transformation in the Lab
Digital Transformation in the Lab
 
Tech + Built Environment Trends 22
Tech + Built Environment Trends 22Tech + Built Environment Trends 22
Tech + Built Environment Trends 22
 

More from Thiago Santiago

LGPD - Webinar Cloudera e FIAP
LGPD - Webinar Cloudera e FIAPLGPD - Webinar Cloudera e FIAP
LGPD - Webinar Cloudera e FIAPThiago Santiago
 
Harvard Business Review - LGPD
Harvard Business Review - LGPDHarvard Business Review - LGPD
Harvard Business Review - LGPDThiago Santiago
 
Hortonworks - IBM - Cloud Event
Hortonworks - IBM - Cloud EventHortonworks - IBM - Cloud Event
Hortonworks - IBM - Cloud EventThiago Santiago
 
Hortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceHortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceThiago Santiago
 
Social Media Monitoring with NiFi, Druid and Superset
Social Media Monitoring with NiFi, Druid and SupersetSocial Media Monitoring with NiFi, Druid and Superset
Social Media Monitoring with NiFi, Druid and SupersetThiago Santiago
 
Big Data Week São Paulo 2017
Big Data Week São Paulo 2017 Big Data Week São Paulo 2017
Big Data Week São Paulo 2017 Thiago Santiago
 
Hortonworks & IBM solutions
Hortonworks & IBM solutionsHortonworks & IBM solutions
Hortonworks & IBM solutionsThiago Santiago
 
Instituto Infnet - BigData e Hadoop
Instituto Infnet  - BigData e HadoopInstituto Infnet  - BigData e Hadoop
Instituto Infnet - BigData e HadoopThiago Santiago
 
Hadoop Day - MeetUp - O poder da Informação
Hadoop Day - MeetUp - O poder da InformaçãoHadoop Day - MeetUp - O poder da Informação
Hadoop Day - MeetUp - O poder da InformaçãoThiago Santiago
 
BigData & Hadoop - Technology Latinoware 2016
BigData & Hadoop - Technology Latinoware 2016BigData & Hadoop - Technology Latinoware 2016
BigData & Hadoop - Technology Latinoware 2016Thiago Santiago
 
TDC 2014 - Hadoop Hands ON
TDC 2014 - Hadoop Hands ONTDC 2014 - Hadoop Hands ON
TDC 2014 - Hadoop Hands ONThiago Santiago
 
Hadoop - Mãos à massa! Qcon2014
Hadoop - Mãos à massa! Qcon2014Hadoop - Mãos à massa! Qcon2014
Hadoop - Mãos à massa! Qcon2014Thiago Santiago
 

More from Thiago Santiago (13)

LGPD - Webinar Cloudera e FIAP
LGPD - Webinar Cloudera e FIAPLGPD - Webinar Cloudera e FIAP
LGPD - Webinar Cloudera e FIAP
 
Harvard Business Review - LGPD
Harvard Business Review - LGPDHarvard Business Review - LGPD
Harvard Business Review - LGPD
 
Hortonworks - IBM - Cloud Event
Hortonworks - IBM - Cloud EventHortonworks - IBM - Cloud Event
Hortonworks - IBM - Cloud Event
 
Hortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data ScienceHortonworks - IBM Cognitive - The Future of Data Science
Hortonworks - IBM Cognitive - The Future of Data Science
 
Social Media Monitoring with NiFi, Druid and Superset
Social Media Monitoring with NiFi, Druid and SupersetSocial Media Monitoring with NiFi, Druid and Superset
Social Media Monitoring with NiFi, Druid and Superset
 
PGDay Brasilia 2017
PGDay Brasilia 2017PGDay Brasilia 2017
PGDay Brasilia 2017
 
Big Data Week São Paulo 2017
Big Data Week São Paulo 2017 Big Data Week São Paulo 2017
Big Data Week São Paulo 2017
 
Hortonworks & IBM solutions
Hortonworks & IBM solutionsHortonworks & IBM solutions
Hortonworks & IBM solutions
 
Instituto Infnet - BigData e Hadoop
Instituto Infnet  - BigData e HadoopInstituto Infnet  - BigData e Hadoop
Instituto Infnet - BigData e Hadoop
 
Hadoop Day - MeetUp - O poder da Informação
Hadoop Day - MeetUp - O poder da InformaçãoHadoop Day - MeetUp - O poder da Informação
Hadoop Day - MeetUp - O poder da Informação
 
BigData & Hadoop - Technology Latinoware 2016
BigData & Hadoop - Technology Latinoware 2016BigData & Hadoop - Technology Latinoware 2016
BigData & Hadoop - Technology Latinoware 2016
 
TDC 2014 - Hadoop Hands ON
TDC 2014 - Hadoop Hands ONTDC 2014 - Hadoop Hands ON
TDC 2014 - Hadoop Hands ON
 
Hadoop - Mãos à massa! Qcon2014
Hadoop - Mãos à massa! Qcon2014Hadoop - Mãos à massa! Qcon2014
Hadoop - Mãos à massa! Qcon2014
 

Recently uploaded

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 

Recently uploaded (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 

Meet up roadmap cloudera 2020 - janeiro

  • 2. © 2020 Cloudera, Inc. All rights reserved. 2 Thiago Santiago Solution Engineering linkedin.com/in/thiagosantiago/ thiago@cloudera.com
  • 3. © 2020 Cloudera, Inc. All rights reserved. 3 ● Why are we here today? ● CDP Cloud ● CDP Data Center ● CDF for CDP ● CML for CDP ● Data Driven Journey and Use Cases Agenda
  • 4. © 2020 Cloudera, Inc. All rights reserved. 4 Why are you here now? You could be... ...Netflix?...Soap opera? ...Soccer?
  • 5. © 2020 Cloudera, Inc. All rights reserved. 5 Why are you here now? ...Make 2020 your BigData Year!
  • 6. © 2020 Cloudera, Inc. All rights reserved. 6 Why do you want BigData?
  • 7. © 2020 Cloudera, Inc. All rights reserved. 7 $138,918$122,306 BigData Salaries? https://www.indeed.com/salaries/Big-Data-Salaries BigData ArchitectData Scientist *per year $137,054 Data Warehouse Architect $113,222 Senior Software Engineer
  • 8. © 2020 Cloudera, Inc. All rights reserved. 8 Technology Trends? Artificial Intelligence Internet of Things Cloud Computing Streaming Data Industrial Internet Connected Business Consumer Devices Smart Devices Autonomy Prescriptive Analytics SaaS/PaaS Applications Ephemeral Use Cases Operational Efficiency Collaboration Real-time Applications Targeted Retail Recommendations Industrial Applications Shifting the Data Paradigm
  • 9. © 2020 Cloudera, Inc. All rights reserved. 9 Security The Way BigData is Changing the world Big data is being heavily used by law enforcement, particularly by national organisations such NSA. These organisations have access to vast amounts of data which they use to catch criminals, foil terrorist plots . Surveillance video from currently deployed Border Patrol assets such as fixed and mobile towers, imaging unattended ground sensors (UGS), and unmanned air systems (drones) Analytical techniques applied to detect anomalies and/or trends leading to actionable intelligence Machine learning used to help with predictive and proactive deployment of resources
  • 10. © 2020 Cloudera, Inc. All rights reserved. 10 Decline in Corruption The Way BigData is Changing the world Better monitoring of assets through foolproof big data analytics will help governments to track economies and facilitate a better allocation among everyone in the society. This also tackles the problems with unwieldy bureaucracy, misinformation and other types of obstacles to transparent economics.
  • 11. © 2020 Cloudera, Inc. All rights reserved. 11 Environmental Health The Way BigData is Changing the world Increased carbon emission, greenhouse gasses, global warming and other climate changes can be better monitored and fought with the help of Big Data. The easiest example lies in wearable devices connected through internet providing awareness and means to stand against local environmental challenges.
  • 12. © 2020 Cloudera, Inc. All rights reserved. 12 Fighting Poverty The Way BigData is Changing the world By taking data from developing nations, non-profit organizations are able to find areas where people can benefit the most from having access to better education, financial services, developed infrastructure, and health services. Having this information on hand can aid in efforts to get help for areas that are struck by natural disasters or health catastrophes. Big data may also help developing nations fight government corruption, which can cause extreme levels of poverty and impede relief efforts. Similarly, access to large amounts of information from various sources can also help organizations identify and react better to health epidemics, natural disasters (earthquakes, cyclones, etc.) and agricultural related trends (drought, famine, etc)."
  • 13. © 2020 Cloudera, Inc. All rights reserved. 13 HealthCare The Way BigData is Changing the world Big data analytics is accelerating the speed at which researches can work, for example DNA strings can now be decoded in minutes which can lead to the faster creation of cures and the ability to predict disease patterns. Big data is being used to monitor premature and sick babies in some specialist units, with the techniques allowing the doctors to analyse every heart beat and breathing patterns. This leads to the development of algorithms which now allow for the prediction of infections 24 hours before any physical symptoms occur.
  • 14. © 2020 Cloudera, Inc. All rights reserved. 14 Science The Way BigData is Changing the world CERN with the Large Hadron Collider producing astronomical amounts of data designed to unlock the secrets of the universe. The processing power is necessary to be able to analyse the 30 petabytes of data that the Hadron Collider produces annually. Big data is also aiding with space exploration – the Square Kilometre Array generates 700 terabytes of data a second. NASA Exchange Platform will help to manage the data. The technology which can detect radar on a planet 50 light years away could eventually help to discover life on another planet.
  • 15. © 2020 Cloudera, Inc. All rights reserved. 15
  • 16. We Believe 
 data can make what was impossible 
 yesterday, possible today.
  • 17. We Believe 
 data can make what was impossible 
 yesterday, possible today.
  • 18. We Believe 
 data can make what was impossible 
 yesterday, possible today.
  • 19. © Cloudera, Inc. All rights reserved.19 THE ENTERPRISE DATA CLOUD COMPANY We believe that data can make what was impossible yesterday, possible today We empower people 
 to transform complex data into clear and actionable insights We deliver an 
 enterprise data cloud for any data, anywhere, from the Edge to AI
  • 20. © Cloudera, Inc. All rights reserved.20 SNAPSHOT OF THE “NEW” CLOUDERA 85Countries Customers 3,000+Employees 2,000+
  • 21. © Cloudera, Inc. All rights reserved.21 LEADING IN TOP INDUSTRIES 8/10 TOP
 GLOBAL 10/10 TOP 
 GLOBAL 9/10 TOP 
 GLOBAL 40+ GOVERNMENT CUSTOMERS BANKING TELCO PHARMAPUBLIC 8/10 TOP 
 GLOBAL TECHNOLOGY 10/10 TOP 
 GLOBAL AUTOMOTIVE
  • 22. © 2019 Cloudera, Inc. All rights reserved. 22 WORLD-CLASS TRAINING, SERVICES & SUPPORT Fastest route from 
 zero to production PROFESSIONAL SERVICES SCP-certified support 
 anywhere in the world CLOUDERA SUPPORT 3 top big data 
 certifications CLOUDERA UNIVERSITY
  • 23. © Cloudera, Inc. All rights reserved.23 ENTERPRISE DATA CLOUD ARCHITECTURE •Multi-function analytics •Hybrid and multi-cloud •Secure and governed •Open platform IOT, INGEST & STREAMING DATA 
 WAREHOUSING SECURITY & GOVERNANCE ML / AI DATA SCIENCE PUBLIC CLOUDS 
 compute & storage DATACENTER
 compute & storage
  • 24. © Cloudera, Inc. All rights reserved.24 Any Cloud Multi-Function OpenSecure & Governed THE ENTERPRISE DATA CLOUD COMPANY
  • 25. © 2019 Cloudera, Inc. All rights reserved. 25 CLOUDERA 
 DATA PLATFORM • Public, private & hybrid clouds • Shared data experience • Powered by open source • Analytics from the Edge to AI • Unified data control plane Analytic experiences Data Flow & Streaming Data 
 Engineering Data 
 Warehouse Operational Database Machine Learning Identity | Orchestration | Management | OperationsControl
 plane Management
 Console Data Hub & Cloudera Runtime Any Infrastructure Edge Public Multi-Cloud Hybrid Cloud Private Cloud Catalog | Schema | Migration | Security | GovernanceData 
 anywhere
  • 26. © 2019 Cloudera, Inc. All rights reserved. 26 CLOUDERA DATA PLATFORM - FORM FACTORS Data Center, Public cloud, Private cloud, hybrid Control plane DW Data Hub & Cloudera Runtime MLODDEDF SDX – security, governance & metadata Edge to AI CDP – Public Cloud Storage ComputePublic Multi-Cloud Control plane DW Data Hub & Cloudera Runtime MLODDEDF SDX – security, governance & metadata Edge to AI CDP – Private Cloud Datacenter Storage Container 
 Cloud Private DW DS/
 ML DF OpDBDE Control plane SDX – security, governance & metadata CDP – Data Center Storage & Compute SDX – security, governance & metadata Control plane
  • 27. © 2020 Cloudera, Inc. All rights reserved. 27 CDP Data Center EDH Cloudera Enterprise Data Hub The Most Comprehensive Data Analytics Platform + + New Features = CDP Data Center
  • 29. © 2020 Cloudera, Inc. All rights reserved. 29 Environment • 1 Template • 1 Region • 1 VPC • Multiple Roles/Buckets KEY CONCEPTS & COMPONENTS 1:1 ENVIRONMENTS Data Lake • SDX: Atlas, Ranger, Knox, IdBroker, CM • Associated with groups/users Data Hub Clusters / Experiences • DH templates • ML Env • DW Database Catalogs/Virtual Compute 1:N
  • 30. © 2020 Cloudera, Inc. All rights reserved. 30 KEY CONCEPTS & COMPONENTS Typical user flow Enterprise IT CDP Control Plane Enterprise Cloud Resources (IAM, Network, VMs, Buckets, etc.) Management Console 1 Step 1 User connects to CDP with their enterprise identity Step 2 They create an environment and data lake for their enterprise 2 Environment Step 3 They create data hub clusters for traditional workloads Data Lake Atlas Ranger Knox IdBroker FreeIPA CM HMS 3 BI Team Cluster ETL Team Cluster 4 Node 1 Node 2 Node 3 Step 4 They create access points for containerized analytic experiences Node 1 Node 2 Node 3 Data Warehouse Experience Machine Learning Experience
  • 31. © 2020 Cloudera, Inc. All rights reserved. 31 CONSISTENT SECURITY AND GOVERNANCE Built for multi-functional analytics anywhere • Data Catalog: a comprehensive catalog of all data sets, spanning on- premises, cloud object stores, structured, unstructured, and semi- structured • Schema: automatic capture and storage of any and all schema and metadata definitions as they are used and created by platform workloads • Replication: deliver data as well as data policies there where the enterprise needs to work, with complete consistency and security • Security: role-based access control applied consistently across the platform. Includes full stack encryption and key management • Governance: enterprise-grade auditing, lineage, and governance capabilities applied across the platform with rich extensibility for partner integrations
  • 32. © 2019 Cloudera, Inc. All rights reserved. 32 CDP HOME A single login to access the full platform, documentation, and support - all controlled through corporate SSO
  • 33. © 2019 Cloudera, Inc. All rights reserved. 33 A single pane of glass to manage 100s of clusters all with different lifecycles - across multiple environments MANAGEMENT CONSOLE
  • 34. © 2020 Cloudera, Inc. All rights reserved. 34 DATA LAKE What is a Data Lake? A common set of Services (SDX) within an Environment that are shared across multiple Clusters/ Experiences. These include Services for: • Security • Auditing • Governance • Data Discovery
  • 35. © 2020 Cloudera, Inc. All rights reserved. 35 DATA HUB CLUSTERS AND EXPERIENCES What are the consumption options? A Data Hub Cluster is a customizable environment that runs like a traditional Hadoop cluster, but is designed to leverage Cloud Storage. An Experience is a container-based compute environment for specific purposes: ML, DW, DE, OD, DF
  • 36. © 2019 Cloudera, Inc. All rights reserved. 36 DATA HUB A familiar and highly customizable cluster service optimized for the separation of storage and compute
  • 37. © 2019 Cloudera, Inc. All rights reserved. 37 DATA WAREHOUSE A data warehousing service optimized for concurrency, caching, and isolation
  • 38. © 2019 Cloudera, Inc. All rights reserved. 38 DATA CATALOG A centralized data stewardship tool for searching, organizing, securing, and governing data across environments
  • 39. © 2019 Cloudera, Inc. All rights reserved. 39 WORKLOAD MANAGER A centralized management tool for analyzing and optimizing workloads within and across environments
  • 40. © 2019 Cloudera, Inc. All rights reserved. 40 REPLICATION MANAGER A centralized management tool for replicating and migrating data, metadata, and policies between environments
  • 41. © 2019 Cloudera, Inc. All rights reserved. 41 A machine learning workspace service to connect teams of data scientists to enterprise data MACHINE LEARNING
  • 42. © 2020 Cloudera, Inc. All rights reserved. 42 Tour CDP Public Cloud https://console.cdp.cloudera.com/#/
  • 44. © 2020 Cloudera, Inc. All rights reserved. 44 New Features for everyone... New features for CDH 6 customers Ranger 2.0 • Dynamic row filtering & column masking • Attribute-based access control • SparkSQL fine-grained access control Atlas 2.0 • Advanced data discovery • Improved performance and scalability Hive 3 • Hive-on-Tez for better ETL performance • ACID transactions Ozone (Preview) • 10x scalability of HDFS Knox • Gateway-based SSO Druid • Low-latency DataMart for real-time and aggregate data Spark on Docker • Simplified dependency management New features for HDP 3 customers Cloudera Manager • Virtual private clusters • Automated wire encryption setup • Fine-grained RBAC for administrators • Streamlined maintenance workflows Atlas 2.0 • Advanced data lineage • Faceted search Solr 7 • Relevance-based text search over unstructured data (text, pdf, .jpg, ...) Impala • Better fit for Data Mart migration use cases (interactive, BI style queries) Hue • Built-in SQL editor Kudu • Better performance for fast changing / updateable data Better at-rest Encryption • Key Trustee Server, NavEncrypt
  • 45. © 2020 Cloudera, Inc. All rights reserved. 45 What’s in the box? CDP Data Center 7.0 (2H 2019) Coming soon... • Cloudera Manager 7.0 • Hadoop 3.1 • Spark 2.4 • Hive 3.1 • Impala 3.2 • Oozie 5.1 • Hue 4.3 • Ranger 2.0 • Atlas 2.0 • Solr 7.4 • Tez 0.9 • HBase 2.2 • Phoenix 5.0 • Kudu 1.11 • Sqoop 1.4.7 • Parquet 1.10 • Avro 1.8 • ORC 1.5 • Zookeeper 3.5 • Kafka 2.3 • Key Trustee Server • Ozone (Tech Preview) • LLAP • Livy • Druid • Ranger KMS • Key HSM • Navigator Encrypt • Zeppelin • Knox • Accumulo
  • 46. © 2020 Cloudera, Inc. All rights reserved. 46 Foundation for Containerized Applications Latest upstream features Best of CDH and HDP features CDH 5 / HDP 2 Cluster Existing Apps Existing Data Existing Hardware Upgrade CDH 6 / HDP 3 Cluster Existing Apps Existing Data Existing Hardware CDP Data Center Cluster Existing Apps SDX Storage CDP Private Cloud Management Console Container Cloud Data Hub DW, ML, more Upgrade Direct Upgrade CDP Data Center provides the stateful elements for new wave of containerized applications • Storage • Table Schema • Authentication & Authorization • Governance Plan your path to CDP-DC now, expand to new experiences in this year
  • 47. New for CDH Customers
  • 48. © 2020 Cloudera, Inc. All rights reserved. 48 Ranger Authorization • Standard CDP authorization model across services ○ Replaces Sentry • Better fine-grained access controls ○ Dynamic Row Filtering ○ Dynamic Column Masking ○ Attribute-based Access Control ○ SparkSQL fine-grained access control • Rich policy features ○ Allow/Deny constructs, Custom policy conditions/context enrichers, time bound policies, Atlas integration (for tag based policies) • Extensive Access Auditing with rich event metadata
  • 49. © 2020 Cloudera, Inc. All rights reserved. 49 New in Ranger • Ranger AuthZ for Impala, HMS, Solr (doc level), Ozone (TP) • Security Zones • RBAC in Ranger New for both Cloudera and Hortonworks customers
  • 50. © 2020 Cloudera, Inc. All rights reserved. 50 Apache Ranger - Impala Support ● Single policy store for Hive and Impala to enable consistent policy authoring ● Independent AuthZ plugin to enforce policies locally ● Resource and tag based policies supported ● Masking/Row filtering on roadmap for Impala
  • 51. © 2020 Cloudera, Inc. All rights reserved. 51 Apache Ranger - Security Zones ● Resource Isolation (especially for multi- tenancy) ● Policy administration isolation ● Cross-service logical grouping
  • 52. © 2020 Cloudera, Inc. All rights reserved. 52 Apache Ranger - Roles
  • 53. © 2020 Cloudera, Inc. All rights reserved. 53 Apache Ranger Roadmap • Authz Integration with more services ○ Kudu, Nifi Registry, Schema Registry etc • Incremental policy/tag downloads • Ranger audit extensions • REST based Authz server • RangerKMS-KeyTrustee integration • Row filtering capability extension (to Hbase etc) • Ranger authz for Ranger • Supporting multiple versions of plugins
  • 54. © 2020 Cloudera, Inc. All rights reserved. 54 Apache Atlas • Metadata catalog & search • Lineage & chain of custody • Business glossary • Metadata audits & security
  • 55. © 2020 Cloudera, Inc. All rights reserved. 55 Apache Atlas: Overview • A catalog for metadata of enterprise assets • Large number of integrations to gather metadata and lineage
  • 56. © 2020 Cloudera, Inc. All rights reserved. 56 Apache Atlas: Overview (cont..) • Rich, dynamic type-system makes it easy to onboard new components • APIs to define types: entity, classification, struct, relationship, enum
  • 57. © 2020 Cloudera, Inc. All rights reserved. 57 Apache Atlas: Metadata - Hive Column
  • 58. © 2020 Cloudera, Inc. All rights reserved. 58 Apache Atlas: lineage - Hive Table ● Propagation of Tags ● Filter and search ● Export Lineage
  • 59. © 2020 Cloudera, Inc. All rights reserved. 59 Apache Atlas: Search
  • 60. © 2020 Cloudera, Inc. All rights reserved. 60 Apache Atlas What’s New in CDP-DC? • Impala and HMS new hooks • Spark-Atlas connector • Lineage Improvements • Runtime Stats • Optimized Search • Improvements to address Navigator metadata import
  • 61. © 2020 Cloudera, Inc. All rights reserved. 61 HIVE 3 FOR DATA WAREHOUSING IN CDP-DC - OVERVIEW • Comprehensive ANSI SQL 2016 coverage • Use Cases: Pre-built reports, more efficient SQL constructs, BI tool compatibility • Capabilities: Implements 120/163 SQL 2016 mandatory features and > 70 optional features Runs all 99 TPC-DS queries without modifications Additional SQL friendly capabilities e.g. surrogate keys, information_schema, … • ACID Support: Transactions and INSERT/UPDATE/DELETE/MERGE • Use Cases: Delete individual rows (GDPR), data cleansing/correction, merge for CDC data, ... • Capabilities: SQL 2011 compliant, transactional (snapshot isolation), set based insert/update/delete Managed tables (ACID default) on ORC; External tables (non-ACID) on ORC/Parquet
  • 62. New for HDP customers
  • 63. © 2020 Cloudera, Inc. All rights reserved. 63 IMPALA AND KUDU FOR DATA WAREHOUSING IN CDP-DC • Apache Impala: Leading MPP SQL Engine for DW - optimized for Parquet/Kudu • Ideal for: Data Mart Implementations that require Interactive/Ad-hoc BI • 1000+ enterprise customers - many running on 10s of PBs and 100s of nodes • Certified with leading BI tools with broad SQL coverage • Latest release adds improvements for resiliency, concurrency, and metadata • Apache Kudu: Leading columnar storage engine for fast analytics on fast data • Ideal for: Low latency time series data ingest and analytics (with Impala SQL engine) • Strength of fast ingest with single rows like HBASE and allows large scans like HDFS • ACID (insert/update/delete) semantics with single rows
  • 64. © 2020 Cloudera, Inc. All rights reserved. 64 HUE FOR DATA WAREHOUSING IN CDP-DC • Apache Hue: Leading SQL Workbench for Ad-hoc BI • Ideal for: Ad-hoc queries/exploration on Data Marts/HDFS files using Impala and/or HIVE • Very high adoption rate across hadoop landscapes with thousands of active users • Key features: • SQL editor - autocomplete, query history, query plans • File browser - Object Stores (S3, ADLS), HDFS • Document Handling - Sharing, Downloading, Importing, Exporting • Load balancing for large scale deployments with hundreds of concurrent users
  • 65. © 2020 Cloudera, Inc. All rights reserved. 65 Cloudera Manager 7 - What’s new for HDP Users • Single pane of glass ○ Multiple clusters! (up to 3,000 nodes total) ○ ‘Compute’ clusters & ‘Base’ clusters (‘VPCs’) • Security ○ Automated wire encryption (TLS 1.2) ○ HDFS encryption-at-rest wizard (KTS/KMS) ○ Fine-grained access control for admins • Ease of administration ○ Global configuration search / config ‘diff’ before restart ○ Edge/’gateway’ node configuration ○ Proper rolling restart (HA-sensitive) ○ View of YARN/Impala workloads • Performance ○ BitTorrent based distribution of binaries
  • 66. © 2020 Cloudera, Inc. All rights reserved. 66 Cloudera Manager 7 - What’s new for Everyone! • Management of new services ○ Ranger,Atlas,Hive-on-Tez,DAS • CDP Look-and-Feel • Cluster-level configuration history • Improved global search • Resume errors in enabling Kerberos • Minor scalability improvements (hosts page) • Improved alerts configuration • JQuery 3.4 (improved security)
  • 67. YARN
  • 68. © 2020 Cloudera, Inc. All rights reserved. 68 Capacity Scheduler & Queue Manager UI • Capacity Scheduler is now default scheduler in YARN ! GPU support ! Node Labels ! Global scheduling support ! Better placement support • A new Queue management UI experience for better usability
  • 69. © 2020 Cloudera, Inc. All rights reserved. 69 Capacity Scheduler & Queue Manager UI • New Queue Manager UI in CM to configure resources and queues List of all queues in cluster
  • 70. © 2020 Cloudera, Inc. All rights reserved. 70 Spark Dependency Management • Simplify dependency management with Spark-on-Docker support • No need to install dependencies on individual cluster hosts Enable Docker on YARN with a click from CM for Spark workloads
  • 72. © 2020 Cloudera, Inc. All rights reserved. 72 Cloudera DataFlow (CDF) Platform - When will it be Supported on CDP-DC? Deployment Spectrum CDP DataHub CDP DataCenter CDP DataFlow Service CDH On-Premise
  • 73. © 2020 Cloudera, Inc. All rights reserved. 73 CFM 2.0 Highlights & Platform Integration Based on Apache NiFi 1.10 Allows parameterization of all processor properties Support for “public” (accessible to remote site to site clients) ports for any processor Queue Length and time to Backpressure are now predicted First release to include K8s Operator (Tech Preview) Allows customers to try out NiFi clusters on Kubernetes Operator takes care of NiFi cluster installation, configuration and scaling OpenShift certified (pending) Goal is to to gather feedback from customers about requirements First release to include Stateless NiFi Runtime New NiFi Runtime Flow Files stored in memory, not persisted on disk Data Durability provided by source/ target systems Allows for abstraction of “jobs” Allows for flows to be “triggered” CFM 2.0 will be Available as Add-On to CDP-DC Available Post GA - Target Q4
  • 74. © 2020 Cloudera, Inc. All rights reserved. 74 Flume to CEM / CFM migration Yes, Flume is really gone. Opportunity Flume Use Case Migration Questions? Need Help for specific customer use case? Flume Offload Sales Play We now have a powerful data distribution / ingest tool in our stack Door opener for new analytics use cases - Flexible Data Movement architecture - Foundation for real-time stream processing Identify your customer’s Use Cases Most common use cases: - Hadoop Ingest (HDFS, HBase) - “Flafka” (Read/Write Kafka) - HTTP, File sources/sinks Flume used as agent -> CEM Flume used for central ingest -> CFM We will host a deep-dive Flume Offload enablement Check out Flume Offload Collateral (Decks, example Flows, migration strategy) Reach out to dim- field@cloudera.com mkohs@cloudera.com fce_streaming / fce_nifi
  • 75. © 2020 Cloudera, Inc. All rights reserved. 75 Kafka 2.3 Available in CDP-DC 7.0 Secure and Governed Kafka Clusters with New Ranger & Atlas Integration What's New? • Kafka 2.3 available in CR 7.0 Parcel • Kafka / Ranger Integration • Kafka / Atlas Integration • Support Hive 3.X / Kafka Storage Handler • Support LDAP Base Auth • Support multiple Kafka compute clusters using shared Security Data Lake with Ranger & Atlas Shared Security Context from Data Lake consisting of Ranger and Atlas Kafka Compute Cluster using Shared Security Context
  • 76. © 2020 Cloudera, Inc. All rights reserved. 76 Kafka Management Services Support on CDP-DC SR, SMM & SRM Available as Add-On to CDP-DC Available Post GA - Target Q4 Schema Registry New Kafka Schema Governance Streams Replication Manager (SRM) New Kafka Replication Engine powered by MirrorMaker2 Streams Messaging Manager (SMM) New Kafka Monitoring Service
  • 77. © 2020 Cloudera, Inc. All rights reserved. 77 New Flink Support on CDP-DC Flink Yarn Support Available as Add-On to CDP-DC Available Post GA - Target Q4 Why Flink • Next Gen streaming engine offers more superior solution than Storm • Flink runs as Yarn app • Key Features ! Ultra Low Latency ( < 100 MS) ! Advanced features (late arriving data, checkpointing, event time processing) ! Exactly Once Processing ! Complex Stateful Stream Processing ! Growing / Vibrant Community
  • 79. © 2020 Cloudera, Inc. All rights reserved. 79
  • 80. © 2020 Cloudera, Inc. All rights reserved. 80
  • 81. © 2020 Cloudera, Inc. All rights reserved. 81
  • 82. © 2020 Cloudera, Inc. All rights reserved. 82
  • 84. © 2020 Cloudera, Inc. All rights reserved. 84
  • 85. © 2020 Cloudera, Inc. All rights reserved. 85
  • 86. © 2020 Cloudera, Inc. All rights reserved. 86 Tour CML for CDP https://console.cdp.cloudera.com/#/
  • 87. © 2020 Cloudera, Inc. All rights reserved. 87 DATA-DRIVEN JOURNEY
  • 88. © 2019 Cloudera, Inc. All rights reserved. 88 DATA-DRIVEN JOURNEY USE CASES VISIBILITY PRODUCTIVITY TRANSFORMATION Preventive 
 & Proactive Maintenance IoT Hub for Industry 4.0 Advanced Threat Detection Risk 
 Modelling & Analysis Marketing Systems Integration Customer 360 
 Insights Exploratory Data Science Data Warehouse Applied Machine Learning GROW Sales & Marketing CONNECT Operations & Product PROTECT Security & Compliance MODERNIZE IT, Tech, Data Science & Analytics
  • 89. © Cloudera, Inc. All rights reserved.89 HIERARCHY OF NEEDS FOR THE DATA-DRIVEN ENTERPRISE The “AI Ladder” AI MACHINE LEARNING DATA SCIENCE ANALYTICS "BIG DATA"
  • 90. © 2020 Cloudera, Inc. All rights reserved. 90 Actionable Intelligence Powers Today’s Financial Services OFAC Lists Credit Records ATM Streams Transactions & Wires Stock Tickers Trade Settlements DIGITAL CUSTOMER 360 RISK DATA AGGREGATION ANTI-MONEY LAUNDERING FRAUD DETECTION TRADE SURVEILLANCE Mobile App Data Trade Data Web Logs Banker Notes Demographi c Data Customer Transactio n Data
  • 91. © 2020 Cloudera, Inc. All rights reserved. 91 Connected Data Drives Success in Telecommunications Call Detail Records Product Catalogs Cyber Threat Metadata Sensor Data Server Logs Voice-to-Text SINGLE VIEW OF THE CUSTOMER CHURN REDUCTION CDR ANALYSIS NETWORK OPTIMIZATION DYNAMIC BANDWIDTH ALLOCATION Clickstrea m ERP System Data Social Media Billing Data Subscriber Profiles CRM Record s
  • 92. © 2020 Cloudera, Inc. All rights reserved. 92 Actionable Intelligence Drives Retail Sales Growth Product Catalogs Sales Forecasts Beacons & RFID Server Logs In-Store WiFi Logs Store Communicatio ns SINGLE VIEW OF THE CUSTOMER PRODUCT RECOMMENDATIONS INVENTORY & SUPPLY CHAIN PRICING OPTIMIZATION TARGETED PROMOTIONS Clickstrea m ERP Data Social Media Staffing Plans Store Reporting CRM Record s
  • 93. © 2020 Cloudera, Inc. All rights reserved. 93 Actionable Intelligence Makes Healthcare Precise and Personal Patient Records Lab Data Pharmacy Data Patient Locations Wearable s Intra-Network Data Sensor Data Claims Data Social Media Physician Notes Patient Satisfaction Data Clinical (EMR) Data SINGLE VIEW OF PATIENT REAL-TIME VITAL SIGN MONITORING BILLING & REIMBURSEMENTS EMR OPTIMIZATION SUPPLY CHAIN OPTIMIZATION
  • 94. © 2020 Cloudera, Inc. All rights reserved. 94 Actionable Intelligence Makes Pharmaceuticals Safe & Effective Research Cohort Data Molecular Data RFID Data Social Media Biometri cs Sensor Data DRUG TRIAL COHORT SELECTION YIELD OPTIMIZATION RAW MATERIAL WASTE REDUCTION SEARCHABLE RESEARCH REPOS NEXT-GEN SEQUENCING (NGS) Supply Chain Geo-location Data Scientific Studies Manufacturing Machine Data Clinical Records Sales Reports Genomic Data
  • 95. © 2020 Cloudera, Inc. All rights reserved. 95 Actionable Intelligence Powers Modern Manufacturing Defect Testing Data Product Designs MES System s RFID Streams SCADA Systems Shop Floor Sensors PREVENTATIVE MAINTENANCE SUPPLY CHAIN OPTIMIZATION YIELD MAXIMIZATION QUALITY CONTROL RECALL AVOIDANCE ERP Systems Supplier Receipts Machine Data Assembly Line Sensors Data Historians Work Orders
  • 96. © 2020 Cloudera, Inc. All rights reserved. 96 Actionable Intelligence Enhances Public Sector Efficiency Historical Archives Cyber Threat Metadata Vehicle Telemetry Data Disease Outbreaks Natural Disasters PUBLIC TRANSPORTATION INFRASTUCTURE MAINTENANCE PUBLIC HEALTH NATIONAL DEFENSE HOMELAND SECURITY Socia l 
 Medi a Work Orders Meeting Notes Voter Rolls Public Benefits Claims Financial Audits Extreme Weather Alerts
  • 97. © 2020 Cloudera, Inc. All rights reserved. 97 Why are you here now?
  • 98. © 2020 Cloudera, Inc. All rights reserved. 98 THANK YOU! Because This is Your BigData Year! 2020