SlideShare uma empresa Scribd logo
1 de 29
1© Cloudera, Inc. All rights reserved.
|
Is your Big Data journey stalling?
Take the Leap with Capgemini
and Cloudera
Industrializing your transition to the Modern Data Landscape
|
2© Cloudera, Inc. All rights reserved.
|
Speakers
Andrea Capodicasa
Senior Solution Architect
Insights & Data
Goutham Belliappa
Big Data practice leader
Insights & Data
Alex Gutow
Senior Manager,
Product Marketing
3© Cloudera, Inc. All rights reserved.
|
Agenda
• The Case for Change
• Industrializing the Change
• Adoption
• Q&A
4© Cloudera, Inc. All rights reserved.
|
Capgemini Insights & Data Global Practice
Global reach with over 13,000 professionals across 40+ countries
with over 500 Big Data & Data
Science professionals, including
100+ Hadoop certified
consultants
We employ >13,000 information
management specialist
practitioners, deployed across
Capgemini’s global network
We were recognised again by
Gartner as one of the 4 leading
information service providers
globally
Capgemini Insights & Data Global
Practice since 2015, delivering
business & IT Insights and data
services
Capgemini has a global reach and
local presence in 44 Countries and
over 100 Languages
Canada
USA
Mexico
Centers of
Excellence in
Mumbai and
Bangalore
Brazil
Argentina
Saudi
Arabia
South Africa
China
Australia
4500
400
70300
1200
5000
Western Europe
Eastern Europe
Middle East & Africa
Latin America
North America
Asia Pacific
India
Morocco
EUROPE
• Austria
• Finland
• France
• Italy
• Germany
• Norway
• Sweden
• Netherlands
• Poland
• Spain
• Switzerland
• UK
5© Cloudera, Inc. All rights reserved.
|
The case for change
6© Cloudera, Inc. All rights reserved.
|
Information Trends: What are seeing in the market place?
Recent years have brought unprecedented changes to the Information landscape. Each of these “disruptors” have
individual momentum and collectively represent significant opportunity to improve
an organization’s effectiveness.
Successful CIOs and leaders consciously take these trends into consideration when planning
the evolution of their information architecture.
Empower the business by focusing from the “user down”, not the “system up”.
Modeling business requirements months or even years
in advance and IT delivering a multi year plan to rollout
a solution that may not apply in a fast changing
business environment are long gone
Ms. Agility killed Mr. Waterfall
The availability of “finished” business functions within
the cloud provides organizations with tremendous
opportunities while increasing IT information
challenges
Cloud Computing
Open source architecture provides substantial
development and complexity cost savings vs. legacy
software packages.
Open Source
Software as a Service offerings in Big Data,
Data Transformation & finished analytics are removing
the infrastructure bottle necks of servers, software and
maintenance from obstructing
speed to market
As a Service
The proliferation of web-connected IP devices creates
a “hyper-evolving” cyber breach potential for
organizations; privacy laws create compliance
challenges with mobile devices
Security & Privacy
Traditionally data dictionaries have been single
purpose and technically focused. As data becomes
more valuable and the same information is used in
multiple ways, then the need for Business Meta-data
will become critical
Business Meta-Data
Has resulted in data where segments are loosely
connected and correlations are at times
non-intuitive, requiring new ways to mine
and derive insights
Social Computing
Massive in-memory databases with intensely complex
analytics are highly scalable -- change anything,
anytime, and simultaneously compare the results of
multiple scenarios in seconds
In Memory Analytics
Describes the transition from historical or hind-sight
indicators to insight and foresight indicators and
visualizations.
“Real” Analytics
7© Cloudera, Inc. All rights reserved.
|
Customers are Looking for a Guide
8© Cloudera, Inc. All rights reserved.
|
Cloudera Enterprise
Making Hadoop Fast, Easy, and Secure
A new kind of data
platform
• One place for unlimited
data
• Unified, multi-
framework data access
Cloudera makes it
• Fast for business
• Easy to manage
• Secure without
compromisePublic Cloud
Private Cloud
Hybrid Environments
Hybrid Deployment
Flexibility
OPERATIONS
DATA
MANAGEMENT
STRUCTURED UNSTRUCTURED
PROCESS, ANALYZE, SERVE
UNIFIED SERVICES
RESOURCE MANAGEMENT SECURITY
NoSQL
STORE
INTEGRATE
BATCH STREAM SQL SEARCH OTHER
OTHERFILESYSTEM RELATIONAL
9© Cloudera, Inc. All rights reserved.
|
The traditional approach to BI & Analytics is a bottleneck
in the operational value chain
Traditional BI & Analytics approach • Centralised BI teams too monolithic and divorced
from the business operations
• Insights latency
• Reporting on the past, limited ability to predict
and prescribe what is needed now
• Each new business question asked = more time
required to crunch the right data
• Heavy duplication in operational data throughout
the BI layers & systems
• Diluted data quality & governance create risks of
security breach, compliance issues & risk exposure
• Significant costs – infrastructure and people.
• Limited ability to scale - either from organic data
volumes growth or increasing data complexity
10© Cloudera, Inc. All rights reserved.
|
The Insights-driven enterprise puts information at the centre
and insights “at the point of action”
Next Generation approach • Next-generation data management platform enabling a
pervasive, real-time “insights & data fabric” serving
operations
• Standardized & cost effective data management, allowing
high agility on insights and the ability to “ask any
questions”
• Operational applications provide data and integrate
insights back in a continuous improvement loop
• Operations integrate predicted best outcomes to optimise
business processes, automatically where possible
• Ability to detect and catch events on the fly that will
require immediate action (e.g. fraud detection) for
optimal reaction or proactive action
• Coherent management of platforms & data management
processes, with insights & data science skills embedded
directly in the operational units for maximum impact
• Optimized total cost of ownership (TCO) with a
rationalized and simplified data landscape
11© Cloudera, Inc. All rights reserved.
|
OPERATIONS
DATAMANAGEMENT
UNIFIED SERVICES
PROCESS,ANALYZE, SERVE
STORE
INTEGRATE
Key challenges blur the vision on both the target and
the journey to the Insights-driven enterprise
Challenges addressed
“Which data should we
retain and/or which data
could we archive?”
“I don’t know how to
drive value from my
data”
“Can I decrease costs by
moving my data
(landscape) to the cloud
or As-A-Service”
“How mature is my data
landscape in comparison
to the best industrial
trends?”
“I have been told to“
do something” about big
data analytics but don’t
know where to start”
“Can the Business
Intelligence landscape be
optimized to derive the
maximum value out of it?”
“Our data landscape is
scattered, complex and
very expensive, can we
fix it?”
Value created
A modern data strategy will enable:
 Reduced complexity: Rationalizing the
data strategy to meet demand
 Lower cost: Reduce the operating cost of
your data strategy
 Increased agility and better time to
market: More speed in the development
of new information applications
 More/Better insights and return on
intelligence: Ease to derive meaningful
insights and enable business
transformation
 Less risk: Reduce complexity of the data
strategy
 Data security & privacy: Make your data
strategy compliant with rules and
regulations
12© Cloudera, Inc. All rights reserved.
|
Industrializing the change
13© Cloudera, Inc. All rights reserved.
|
Misura Diligent Idem Blend Papillon Virtu
Capgemini’s Leap Data Transformation Framework
Modules overview
Essence
(Semantic Layer consolidation)
 Analyze existing semantic layer of architecture
 Identify potential functional overlap and produce
recommendations for consolidation
Data concierge
 Business Information Catalog
 Self service ingestion, distillation, analytics
 Data Operations Services
Estimation Discovery Design/Build Testing
 Agile environment provisioning
 Continuous Integration lifecycle
One-Click leap
 Optimize/reduce
transformation scope
 Optimize
reporting design
 Optimize SQL  Industrialize end to
end testing
 Estimate the
transformation effort
 Optimize ETL semantic
design
14© Cloudera, Inc. All rights reserved.
|
Diligent / Blend Applications
Business Problem
 Large and complex DW estates have been built over the last
20 years or, so and the infrastructure hosting them might need
update
 A number of reports and underlying tables will be duplicated
or not utilised anymore – they can be decommissioned saving
valuable resources
 Users are reluctant to give up “their” reports/data when
migrations programmes occur
Solution
 Scope reduction through identifying current BO reports that are not used. Up to 40% discovered with a customer of ours
 Scope reduction in identifying reports that are duplicates or share a number of data items.
 Automated method to migrate BO reports to Pentaho, hence reduced workload and reduced errors.
 A scientific and objective approach to measure which data are
actually used
 Diligent BO Audit data explorer to identify interactions
between users and Universes / Reports and tables
 Diligent BO Meta data gathering Module to extract Universe
and report information.
 Blend Report merger to identify reports reduction
 Blend XML Generator to create Pentaho reporting cubes from
Diligent gathered metadata.
Diligent Blend
Accelerator Results
15© Cloudera, Inc. All rights reserved.
|
IDEM-DA
Business Problem
 The customer has very strict security and normalisation
requirements when loading their data, they need different
obfuscation types for different “semantic types pre” e.g.
names, phone numbers, social security numbers. Etc.
 Left it as a manual activity, this would imply a laborious and
time consuming identification of hundred of thousands of
columns – a costly and error prone activity
Solution
 Automated identification of tables columns for encryption,
and standardisation
 Automated creation of ETL meta-data spreadsheets which
drive Data Acquisitions Pentaho jobs for data migration
Accelerator Results
 Manual generation of meta-data
spreadsheet: Several Days - Weeks
 IDEM-DA: 15mins - 2 hours
 Manual eyeballing of data – human errors.
Can take hours to several days
 IDEM-DA: Approximately 70% reduction
and more accurate identification of known
types
Project manager of Data Migration
project: “IDEM-DA is the only way
forward”
Idem
16© Cloudera, Inc. All rights reserved.
|
Example table
IDEM-DA
Column Name Dataset
mob_no 07710232931,07083210302
email example@hotmail.com,
hello@gmail.com
free_text_field My address is 12 lucky street,
London, E12 2TF
serial_id 11234, 22313, 3231313
Semantic Type
MOBILE_NO
EMAIL
Address
UNKNOWN
IDEM-DA
IDEM-DA is a Module used to support the ETL from legacy data warehouses into Modern architecture
Idem
17© Cloudera, Inc. All rights reserved.
|
IDEM-ES
Business Problem
 The customer has a load pattern called “cutover+delta” –
historical tables are updated with daily files
 Although many tables have most of the columns with
similar names, Left it as a manual activity, this would
imply a time consuming identification of hundred of
thousands of columns – a error prone activity
Solution
 Machine learning based solution to automatically identify
similarity between columns (humanly supervised)
 Column name similarity (ngrams)
 Column content similarity (ngrams)
 Column content agnostic distribution (hist)
 Open architecture to automatically evaluate best
model (tested 600+)
 Automated creation of INSERT INTO ETL scripts
Accelerator Results
- Acceleration expected around 30-50% Can automatically generate SQL insert statements to create
the current view
Idem
18© Cloudera, Inc. All rights reserved.
|
IDEM-ES
Idem
19© Cloudera, Inc. All rights reserved.
|
IDEM ES
Idem
20© Cloudera, Inc. All rights reserved.
|
Virtu – Data testing Framework
Business Problem
 Testing data migrations – and in general integrity of data
transformations in large scale BI/DW estates is complicated
 Thousands of objects moved across during the migration –
and when in production loaded every day might lead to
hundred of defects – without an automated system to keep
track of all of them can become a daunting task
 Continuously monitoring of the DQ performance and
regression error history is essential to maintain acceptable
levels of quality
Solution
Benefits
• Customer can easily plan and execute a large amount of checks – completely controlling their lifecycle (creation, modification,
decommissioning)
• Configurable engine to store details of defects to have maximum visibility and transparency on errors and their resolutions
• Native connection to modern defect management systems (Jira) – and easily expandable to any systems with reachable API
• DQ dashboard gives real time and drillable information on current DQ state
• Compatible with 3 system types – Oracle, Impala & MySQL
 A complete e2e testing framework that accelerates the
configuration, execution and evaluation of tests for large scale BI
domains
 Comprised of Web UI for maximum user friendliness in
configuration
 Scheduler engine to launch configurable batches of tests
 Real time Defect manager for timely defects issuing and
progress check
 DQ dashboard for monitoring state and progress
21© Cloudera, Inc. All rights reserved.
|
Virtu – Testing Framework
22© Cloudera, Inc. All rights reserved.
|
Virtu – Testing Framework
23© Cloudera, Inc. All rights reserved.
|
Adoption
24© Cloudera, Inc. All rights reserved.
|
Leap Data Transformation Framework is the result of a client
co-innovation process and delivered efficiencies on large projects
 Capgemini client in Public Sector is building a Business Data Lake (BDL) to
support all digital channels interactions as well as rationalize/optimize its IT
Business Intelligence legacy landscape on top of the new Big Data architecture
 In the scope of the IT Rationalization project, 10+ data warehouses, hundreds of
analytical business services, and thousands of BO reports must be moved on top
of the BDL, for thousands of business users throughout the organization.
 In this context, Leap Data Transformation Framework was used on a 1st business
scope
 Leap is a framework consisting of a transformation methodology and
accelerators across the transformation lifecycle which can operate at scale:
 The methodology is modular and covering all phases of transformations
 Elements of the Discovery phase were automated
 Design and Build process automation (metadata driven) and application
deployment controls delivered development efficiencies and scalability
 A metadata driven test automation framework reduced initial test effort
and subsequent regression test activities
 A Continuous Development process
 Platform application stack deployment efficiencies
Approach Key Outcomes
Accelerator Results
An end to end, fact-based transformation framework to deliver IT Rationalization on top of Big Data architectures
 40% reduction of the transformation
scope
Diligent
 15% efficiency in the design/build
process through use of:
• Semi-Automated ETL code optimizer
• Semi-Automated SQL optimizer
• Semi-Automated report optimizer
Idem Papillon Blend
 10% efficiency in the test development
process (1st pass) & 30% efficiency in
regression testing through:
• Automated test & assurance
framework
Virtu
25© Cloudera, Inc. All rights reserved.
|
Use cases for Capgemini’s Leap Data Transformation
Framework for optimized business data lakes
 For advanced clients embracing the potential of modern
architectures
 Opportunity to transform, simplify and rationalize an
organization’s data landscape for optimized TCO
 Leap Data Transformation full suite enables risk and cost
reduction working well in an agile approach
Replatforming
 For clients in need of better visibility of their current data
assets before moving to Big Data
 Leap Data Transformation Framework can help optimize
current data management processes, reduce substantially
transformation scope, identify the optimal platform for
the workloads and shape a future project for success
Legacy Discovery/DW optimization
 Capgemini takes over current BI estate and modernizes it
through its NextGen BISC approach
 For clients with redundant and expensive DW estates
concerned about risks to move to modern architectures
 Leap Data Transformation Framework full suite is a key
element to optimize the TCO and ensuring quality in the
transformation process
Managing existing BI &
move to modern architectures
 For clients needing to automate their data testing in big
data environments or large relational environments
 Tools can automate the testing lifecycle for both big data
and traditional relational DW estates
Testing
26© Cloudera, Inc. All rights reserved.
|
Replatforming legacy BI applications requires strong strategies
for user adoption and decommissioning
Strong user adoption strategy
 End users understand the new value
they will get out of the new system
 They are empowered to use it
 Their success is spreading to new
initiatives
• They forget all about the old & slow
stuff fairly quickly
Weak user adoption strategy
 End users fear the new system will
impact their capacity to do their jobs
 The known is safer than the new
 First tests on the new systems
disappoint, any failure goes viral
 Evolutions still run on the old system,
“just in case”
Strong kill strategy
 Systems are killed according to
roadmap, costs linked to unused HW
& SW are recovered
 IT & Business impacts are
anticipated, managed and
communicated
 The energy is focused on the new
Weak kill strategy
 First systems are shut down ignoring
business constraints, impacting
operations
 Endless hours spent to compare the
old and the new and explain
differences
 Unprepared board escalations when
unplanned impacts arise
THE USER
ADOPTION
STRATEGY
THE KILL
STRATEGY
27© Cloudera, Inc. All rights reserved.
|
Sample Table of contents for the output of a 4 week Data
Warehouse Optimization roadmap based on LEAP
 Data Extract & Staging
 Data Management & EDW
 Semantic Layer
 Sandbox & Analytics
 Operational Analytics
 Data Virtualization Layer
 Master Data Management
 Metadata Management
 Data Distribution Layer
 Our Understanding
 Big Data Trends in Heavy Equipment /farm Industry
 Technology Principles
 Reference Architecture
– Conceptual Architecture
– Architecture Components
 Technology Choice Points
– ETL tool comparison
– EMR vs. Hadoop
 ETL & Data Offloading Plan
– Project Structure, Sequence, Sprints
– Assumptions
– Collaborative Planning & Prep
 Logical Architecture
 Business Value Proposition
 Current State Architecture
 End State Architecture
 Current State + 6 months Architecture
 Current State + 12 months
Architecture
 Current State + 18 months
Architecture
 Data Distribution Layer
28© Cloudera, Inc. All rights reserved.
|
What’s next?
29© Cloudera, Inc. All rights reserved.
|
Contact our experts
Schedule a discovery session with our
experts
Schedule a first assessment of the value of
Leap for your organization
Goutham Belliappa
Goutham.belliappa@capgemini.com
https://www.linkedin.com/in/gouthambelliappa
Andrea CAPODICASA
Andrea.capodicasa@capgemini.com
Duane Garrett
duane@cloudera.com

Mais conteúdo relacionado

Mais procurados

How Cloudera SDX can aid GDPR compliance 6.21.18
How Cloudera SDX can aid GDPR compliance 6.21.18How Cloudera SDX can aid GDPR compliance 6.21.18
How Cloudera SDX can aid GDPR compliance 6.21.18Cloudera, Inc.
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedCloudera, Inc.
 
A Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision MedicineA Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision MedicineCloudera, Inc.
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Cloudera, Inc.
 
The Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyThe Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyCloudera, Inc.
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformCloudera, Inc.
 
Optimizing Regulatory Compliance with Big Data
Optimizing Regulatory Compliance with Big DataOptimizing Regulatory Compliance with Big Data
Optimizing Regulatory Compliance with Big DataCloudera, Inc.
 
Enterprise Data Hub: The Next Big Thing in Big Data
Enterprise Data Hub: The Next Big Thing in Big DataEnterprise Data Hub: The Next Big Thing in Big Data
Enterprise Data Hub: The Next Big Thing in Big DataCloudera, Inc.
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)Cloudera, Inc.
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Technologies
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionCloudera, Inc.
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataScott Clinton
 
Perspectives on Ethical Big Data Governance
Perspectives on Ethical Big Data GovernancePerspectives on Ethical Big Data Governance
Perspectives on Ethical Big Data GovernanceCloudera, Inc.
 
Advanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine LearningAdvanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine LearningCloudera, Inc.
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Cloudera, Inc.
 
Becoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeBecoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeCloudera, Inc.
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti
 
High-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache ImpalaHigh-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache ImpalaCloudera, Inc.
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubCloudera, Inc.
 

Mais procurados (20)

How Cloudera SDX can aid GDPR compliance 6.21.18
How Cloudera SDX can aid GDPR compliance 6.21.18How Cloudera SDX can aid GDPR compliance 6.21.18
How Cloudera SDX can aid GDPR compliance 6.21.18
 
The 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: ExposedThe 5 Biggest Data Myths in Telco: Exposed
The 5 Biggest Data Myths in Telco: Exposed
 
A Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision MedicineA Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision Medicine
 
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1
Using Big Data to Transform Your Customer’s Experience - Part 1

Using Big Data to Transform Your Customer’s Experience - Part 1

 
The Five Markers on Your Big Data Journey
The Five Markers on Your Big Data JourneyThe Five Markers on Your Big Data Journey
The Five Markers on Your Big Data Journey
 
Turning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data PlatformTurning Data into Business Value with a Modern Data Platform
Turning Data into Business Value with a Modern Data Platform
 
Optimizing Regulatory Compliance with Big Data
Optimizing Regulatory Compliance with Big DataOptimizing Regulatory Compliance with Big Data
Optimizing Regulatory Compliance with Big Data
 
Enterprise Data Hub: The Next Big Thing in Big Data
Enterprise Data Hub: The Next Big Thing in Big DataEnterprise Data Hub: The Next Big Thing in Big Data
Enterprise Data Hub: The Next Big Thing in Big Data
 
The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)The Vortex of Change - Digital Transformation (Presented by Intel)
The Vortex of Change - Digital Transformation (Presented by Intel)
 
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike FergusonMapR Enterprise Data Hub Webinar w/ Mike Ferguson
MapR Enterprise Data Hub Webinar w/ Mike Ferguson
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
 
Big Data Fundamentals
Big Data FundamentalsBig Data Fundamentals
Big Data Fundamentals
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your dataHortonworks Hybrid Cloud - Putting you back in control of your data
Hortonworks Hybrid Cloud - Putting you back in control of your data
 
Perspectives on Ethical Big Data Governance
Perspectives on Ethical Big Data GovernancePerspectives on Ethical Big Data Governance
Perspectives on Ethical Big Data Governance
 
Advanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine LearningAdvanced Analytics for Investment Firms and Machine Learning
Advanced Analytics for Investment Firms and Machine Learning
 
Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...Govern This! Data Discovery and the application of data governance with new s...
Govern This! Data Discovery and the application of data governance with new s...
 
Becoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural ChangeBecoming Data-Driven Through Cultural Change
Becoming Data-Driven Through Cultural Change
 
Contexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to ProductionContexti / Oracle - Big Data : From Pilot to Production
Contexti / Oracle - Big Data : From Pilot to Production
 
High-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache ImpalaHigh-Performance Analytics in the Cloud with Apache Impala
High-Performance Analytics in the Cloud with Apache Impala
 
The Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data HubThe Future of Data Management: The Enterprise Data Hub
The Future of Data Management: The Enterprise Data Hub
 

Destaque

From insight to action - data analysis that makes a difference! - Heena Jethwa
From insight to action - data analysis that makes a difference! - Heena JethwaFrom insight to action - data analysis that makes a difference! - Heena Jethwa
From insight to action - data analysis that makes a difference! - Heena JethwaIBM SPSS Denmark
 
Data-Driven Innovation: 3 Ways to Create a New Level of Performance in Your O...
Data-Driven Innovation: 3 Ways to Create a New Level of Performance in Your O...Data-Driven Innovation: 3 Ways to Create a New Level of Performance in Your O...
Data-Driven Innovation: 3 Ways to Create a New Level of Performance in Your O...Travis Barker
 
Overview of Blue Medora - New Relic Plugin for HP Blade Servers
Overview of Blue Medora - New Relic Plugin for HP Blade ServersOverview of Blue Medora - New Relic Plugin for HP Blade Servers
Overview of Blue Medora - New Relic Plugin for HP Blade ServersBlue Medora
 
IBM InterConnect 2016 Greg Hodgkinson 2238 Thriving DevOps at BMI (Prolifics)
IBM InterConnect 2016 Greg Hodgkinson 2238 Thriving DevOps at BMI (Prolifics)IBM InterConnect 2016 Greg Hodgkinson 2238 Thriving DevOps at BMI (Prolifics)
IBM InterConnect 2016 Greg Hodgkinson 2238 Thriving DevOps at BMI (Prolifics)Greg Hodgkinson
 
How Cognizant's ZDLC solution is helping Data Lineage for compliance to Basel...
How Cognizant's ZDLC solution is helping Data Lineage for compliance to Basel...How Cognizant's ZDLC solution is helping Data Lineage for compliance to Basel...
How Cognizant's ZDLC solution is helping Data Lineage for compliance to Basel...Dr. Bippin Makoond
 
Becoming a Data Driven Organisation
Becoming a Data Driven OrganisationBecoming a Data Driven Organisation
Becoming a Data Driven OrganisationWizdee
 
Oracle on premises and oracle cloud - how to coexist webinar
Oracle on premises and oracle cloud  - how to coexist webinarOracle on premises and oracle cloud  - how to coexist webinar
Oracle on premises and oracle cloud - how to coexist webinarPanaya
 
5 Essential Practices of the Data Driven Organization
5 Essential Practices of the Data Driven Organization5 Essential Practices of the Data Driven Organization
5 Essential Practices of the Data Driven OrganizationVivastream
 
Data-Driven Organisation
Data-Driven OrganisationData-Driven Organisation
Data-Driven OrganisationJaakko Särelä
 
Panaya Test Center – Auf zu postmodernem ERP Testing
Panaya Test Center – Auf zu postmodernem ERP TestingPanaya Test Center – Auf zu postmodernem ERP Testing
Panaya Test Center – Auf zu postmodernem ERP TestingPanaya
 
The Role of the CTO in a Growing Organization
The Role of the CTO in a Growing OrganizationThe Role of the CTO in a Growing Organization
The Role of the CTO in a Growing OrganizationRoger Smith
 
The Role of CTO: A Rantifesto
The Role of CTO: A RantifestoThe Role of CTO: A Rantifesto
The Role of CTO: A RantifestoCamille Fournier
 
Prolifics Managed Services Offering
Prolifics Managed Services OfferingProlifics Managed Services Offering
Prolifics Managed Services Offeringvenkata burra
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Dataconomy Media
 
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)Amazon Web Services
 
Cognizant's HCM Capabilities
Cognizant's HCM CapabilitiesCognizant's HCM Capabilities
Cognizant's HCM CapabilitiesArlene DeMita
 

Destaque (20)

From insight to action - data analysis that makes a difference! - Heena Jethwa
From insight to action - data analysis that makes a difference! - Heena JethwaFrom insight to action - data analysis that makes a difference! - Heena Jethwa
From insight to action - data analysis that makes a difference! - Heena Jethwa
 
Data-Driven Innovation: 3 Ways to Create a New Level of Performance in Your O...
Data-Driven Innovation: 3 Ways to Create a New Level of Performance in Your O...Data-Driven Innovation: 3 Ways to Create a New Level of Performance in Your O...
Data-Driven Innovation: 3 Ways to Create a New Level of Performance in Your O...
 
Infosys
InfosysInfosys
Infosys
 
Overview of Blue Medora - New Relic Plugin for HP Blade Servers
Overview of Blue Medora - New Relic Plugin for HP Blade ServersOverview of Blue Medora - New Relic Plugin for HP Blade Servers
Overview of Blue Medora - New Relic Plugin for HP Blade Servers
 
IBM InterConnect 2016 Greg Hodgkinson 2238 Thriving DevOps at BMI (Prolifics)
IBM InterConnect 2016 Greg Hodgkinson 2238 Thriving DevOps at BMI (Prolifics)IBM InterConnect 2016 Greg Hodgkinson 2238 Thriving DevOps at BMI (Prolifics)
IBM InterConnect 2016 Greg Hodgkinson 2238 Thriving DevOps at BMI (Prolifics)
 
The TCS Brand
The TCS BrandThe TCS Brand
The TCS Brand
 
How Cognizant's ZDLC solution is helping Data Lineage for compliance to Basel...
How Cognizant's ZDLC solution is helping Data Lineage for compliance to Basel...How Cognizant's ZDLC solution is helping Data Lineage for compliance to Basel...
How Cognizant's ZDLC solution is helping Data Lineage for compliance to Basel...
 
Becoming a Data Driven Organisation
Becoming a Data Driven OrganisationBecoming a Data Driven Organisation
Becoming a Data Driven Organisation
 
Oracle on premises and oracle cloud - how to coexist webinar
Oracle on premises and oracle cloud  - how to coexist webinarOracle on premises and oracle cloud  - how to coexist webinar
Oracle on premises and oracle cloud - how to coexist webinar
 
5 Essential Practices of the Data Driven Organization
5 Essential Practices of the Data Driven Organization5 Essential Practices of the Data Driven Organization
5 Essential Practices of the Data Driven Organization
 
Data-Driven Organisation
Data-Driven OrganisationData-Driven Organisation
Data-Driven Organisation
 
Panaya Test Center – Auf zu postmodernem ERP Testing
Panaya Test Center – Auf zu postmodernem ERP TestingPanaya Test Center – Auf zu postmodernem ERP Testing
Panaya Test Center – Auf zu postmodernem ERP Testing
 
Startup CTO Role v3
Startup CTO Role v3Startup CTO Role v3
Startup CTO Role v3
 
The Role of the CTO in a Growing Organization
The Role of the CTO in a Growing OrganizationThe Role of the CTO in a Growing Organization
The Role of the CTO in a Growing Organization
 
The Role of CTO: A Rantifesto
The Role of CTO: A RantifestoThe Role of CTO: A Rantifesto
The Role of CTO: A Rantifesto
 
Prolifics Managed Services Offering
Prolifics Managed Services OfferingProlifics Managed Services Offering
Prolifics Managed Services Offering
 
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
Sören Eickhoff, Informatica GmbH, "Informatica Intelligent Data Lake – Self S...
 
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
AWS re:Invent 2016: FINRA in the Cloud: the Big Data Enterprise (ENT313)
 
Infosys
InfosysInfosys
Infosys
 
Cognizant's HCM Capabilities
Cognizant's HCM CapabilitiesCognizant's HCM Capabilities
Cognizant's HCM Capabilities
 

Semelhante a Is your big data journey stalling? Take the Leap with Capgemini and Cloudera

Capgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini
 
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...Denodo
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Nathan Bijnens
 
Accelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data VirtualizationAccelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data VirtualizationDenodo
 
Future of Making Things
Future of Making ThingsFuture of Making Things
Future of Making ThingsJC Davis
 
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)Denodo
 
Trends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsTrends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsDATAVERSITY
 
Get ahead of the cloud or get left behind
Get ahead of the cloud or get left behindGet ahead of the cloud or get left behind
Get ahead of the cloud or get left behindMatt Mandich
 
Looking to the Future: Embracing the Cloud for a More Modern Data Quality App...
Looking to the Future: Embracing the Cloud for a More Modern Data Quality App...Looking to the Future: Embracing the Cloud for a More Modern Data Quality App...
Looking to the Future: Embracing the Cloud for a More Modern Data Quality App...Precisely
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationDenodo
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloudredmondpulver
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Precisely
 
MongoDB World 2019: Data Digital Decoupling
MongoDB World 2019: Data Digital DecouplingMongoDB World 2019: Data Digital Decoupling
MongoDB World 2019: Data Digital DecouplingMongoDB
 
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Denodo
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?Xpand IT
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with HadoopPrecisely
 
Rick Mutsaers Informatica
Rick Mutsaers InformaticaRick Mutsaers Informatica
Rick Mutsaers InformaticaBigDataExpo
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnectaDigital
 

Semelhante a Is your big data journey stalling? Take the Leap with Capgemini and Cloudera (20)

Capgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with ClouderaCapgemini Leap Data Transformation Framework with Cloudera
Capgemini Leap Data Transformation Framework with Cloudera
 
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
Data Virtualization, a Strategic IT Investment to Build Modern Enterprise Dat...
 
Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)Data Mesh in Azure using Cloud Scale Analytics (WAF)
Data Mesh in Azure using Cloud Scale Analytics (WAF)
 
Data Analytics.pptx
Data Analytics.pptxData Analytics.pptx
Data Analytics.pptx
 
Accelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data VirtualizationAccelerate Cloud Migrations and Architecture with Data Virtualization
Accelerate Cloud Migrations and Architecture with Data Virtualization
 
Future of Making Things
Future of Making ThingsFuture of Making Things
Future of Making Things
 
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
 
Trends in Enterprise Advanced Analytics
Trends in Enterprise Advanced AnalyticsTrends in Enterprise Advanced Analytics
Trends in Enterprise Advanced Analytics
 
Get ahead of the cloud or get left behind
Get ahead of the cloud or get left behindGet ahead of the cloud or get left behind
Get ahead of the cloud or get left behind
 
Looking to the Future: Embracing the Cloud for a More Modern Data Quality App...
Looking to the Future: Embracing the Cloud for a More Modern Data Quality App...Looking to the Future: Embracing the Cloud for a More Modern Data Quality App...
Looking to the Future: Embracing the Cloud for a More Modern Data Quality App...
 
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data VirtualizationKASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
KASHTECH AND DENODO: ROI and Economic Value of Data Virtualization
 
Data and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the CloudData and Application Modernization in the Age of the Cloud
Data and Application Modernization in the Age of the Cloud
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
 
MongoDB World 2019: Data Digital Decoupling
MongoDB World 2019: Data Digital DecouplingMongoDB World 2019: Data Digital Decoupling
MongoDB World 2019: Data Digital Decoupling
 
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
Maximizing Oil and Gas (Data) Asset Utilization with a Logical Data Fabric (A...
 
What's New in Pentaho 7.0?
What's New in Pentaho 7.0?What's New in Pentaho 7.0?
What's New in Pentaho 7.0?
 
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy:  A Simple, Scalable Solution for Getting Started with HadoopBig Data Made Easy:  A Simple, Scalable Solution for Getting Started with Hadoop
Big Data Made Easy: A Simple, Scalable Solution for Getting Started with Hadoop
 
Rick Mutsaers Informatica
Rick Mutsaers InformaticaRick Mutsaers Informatica
Rick Mutsaers Informatica
 
Connecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud PlatformConnecta Event: Big Query och dataanalys med Google Cloud Platform
Connecta Event: Big Query och dataanalys med Google Cloud Platform
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 

Mais de Cloudera, Inc.

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxCloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards FinalistsCloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformCloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Cloudera, Inc.
 

Mais de Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Último

Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 

Último (20)

Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 

Is your big data journey stalling? Take the Leap with Capgemini and Cloudera

  • 1. 1© Cloudera, Inc. All rights reserved. | Is your Big Data journey stalling? Take the Leap with Capgemini and Cloudera Industrializing your transition to the Modern Data Landscape |
  • 2. 2© Cloudera, Inc. All rights reserved. | Speakers Andrea Capodicasa Senior Solution Architect Insights & Data Goutham Belliappa Big Data practice leader Insights & Data Alex Gutow Senior Manager, Product Marketing
  • 3. 3© Cloudera, Inc. All rights reserved. | Agenda • The Case for Change • Industrializing the Change • Adoption • Q&A
  • 4. 4© Cloudera, Inc. All rights reserved. | Capgemini Insights & Data Global Practice Global reach with over 13,000 professionals across 40+ countries with over 500 Big Data & Data Science professionals, including 100+ Hadoop certified consultants We employ >13,000 information management specialist practitioners, deployed across Capgemini’s global network We were recognised again by Gartner as one of the 4 leading information service providers globally Capgemini Insights & Data Global Practice since 2015, delivering business & IT Insights and data services Capgemini has a global reach and local presence in 44 Countries and over 100 Languages Canada USA Mexico Centers of Excellence in Mumbai and Bangalore Brazil Argentina Saudi Arabia South Africa China Australia 4500 400 70300 1200 5000 Western Europe Eastern Europe Middle East & Africa Latin America North America Asia Pacific India Morocco EUROPE • Austria • Finland • France • Italy • Germany • Norway • Sweden • Netherlands • Poland • Spain • Switzerland • UK
  • 5. 5© Cloudera, Inc. All rights reserved. | The case for change
  • 6. 6© Cloudera, Inc. All rights reserved. | Information Trends: What are seeing in the market place? Recent years have brought unprecedented changes to the Information landscape. Each of these “disruptors” have individual momentum and collectively represent significant opportunity to improve an organization’s effectiveness. Successful CIOs and leaders consciously take these trends into consideration when planning the evolution of their information architecture. Empower the business by focusing from the “user down”, not the “system up”. Modeling business requirements months or even years in advance and IT delivering a multi year plan to rollout a solution that may not apply in a fast changing business environment are long gone Ms. Agility killed Mr. Waterfall The availability of “finished” business functions within the cloud provides organizations with tremendous opportunities while increasing IT information challenges Cloud Computing Open source architecture provides substantial development and complexity cost savings vs. legacy software packages. Open Source Software as a Service offerings in Big Data, Data Transformation & finished analytics are removing the infrastructure bottle necks of servers, software and maintenance from obstructing speed to market As a Service The proliferation of web-connected IP devices creates a “hyper-evolving” cyber breach potential for organizations; privacy laws create compliance challenges with mobile devices Security & Privacy Traditionally data dictionaries have been single purpose and technically focused. As data becomes more valuable and the same information is used in multiple ways, then the need for Business Meta-data will become critical Business Meta-Data Has resulted in data where segments are loosely connected and correlations are at times non-intuitive, requiring new ways to mine and derive insights Social Computing Massive in-memory databases with intensely complex analytics are highly scalable -- change anything, anytime, and simultaneously compare the results of multiple scenarios in seconds In Memory Analytics Describes the transition from historical or hind-sight indicators to insight and foresight indicators and visualizations. “Real” Analytics
  • 7. 7© Cloudera, Inc. All rights reserved. | Customers are Looking for a Guide
  • 8. 8© Cloudera, Inc. All rights reserved. | Cloudera Enterprise Making Hadoop Fast, Easy, and Secure A new kind of data platform • One place for unlimited data • Unified, multi- framework data access Cloudera makes it • Fast for business • Easy to manage • Secure without compromisePublic Cloud Private Cloud Hybrid Environments Hybrid Deployment Flexibility OPERATIONS DATA MANAGEMENT STRUCTURED UNSTRUCTURED PROCESS, ANALYZE, SERVE UNIFIED SERVICES RESOURCE MANAGEMENT SECURITY NoSQL STORE INTEGRATE BATCH STREAM SQL SEARCH OTHER OTHERFILESYSTEM RELATIONAL
  • 9. 9© Cloudera, Inc. All rights reserved. | The traditional approach to BI & Analytics is a bottleneck in the operational value chain Traditional BI & Analytics approach • Centralised BI teams too monolithic and divorced from the business operations • Insights latency • Reporting on the past, limited ability to predict and prescribe what is needed now • Each new business question asked = more time required to crunch the right data • Heavy duplication in operational data throughout the BI layers & systems • Diluted data quality & governance create risks of security breach, compliance issues & risk exposure • Significant costs – infrastructure and people. • Limited ability to scale - either from organic data volumes growth or increasing data complexity
  • 10. 10© Cloudera, Inc. All rights reserved. | The Insights-driven enterprise puts information at the centre and insights “at the point of action” Next Generation approach • Next-generation data management platform enabling a pervasive, real-time “insights & data fabric” serving operations • Standardized & cost effective data management, allowing high agility on insights and the ability to “ask any questions” • Operational applications provide data and integrate insights back in a continuous improvement loop • Operations integrate predicted best outcomes to optimise business processes, automatically where possible • Ability to detect and catch events on the fly that will require immediate action (e.g. fraud detection) for optimal reaction or proactive action • Coherent management of platforms & data management processes, with insights & data science skills embedded directly in the operational units for maximum impact • Optimized total cost of ownership (TCO) with a rationalized and simplified data landscape
  • 11. 11© Cloudera, Inc. All rights reserved. | OPERATIONS DATAMANAGEMENT UNIFIED SERVICES PROCESS,ANALYZE, SERVE STORE INTEGRATE Key challenges blur the vision on both the target and the journey to the Insights-driven enterprise Challenges addressed “Which data should we retain and/or which data could we archive?” “I don’t know how to drive value from my data” “Can I decrease costs by moving my data (landscape) to the cloud or As-A-Service” “How mature is my data landscape in comparison to the best industrial trends?” “I have been told to“ do something” about big data analytics but don’t know where to start” “Can the Business Intelligence landscape be optimized to derive the maximum value out of it?” “Our data landscape is scattered, complex and very expensive, can we fix it?” Value created A modern data strategy will enable:  Reduced complexity: Rationalizing the data strategy to meet demand  Lower cost: Reduce the operating cost of your data strategy  Increased agility and better time to market: More speed in the development of new information applications  More/Better insights and return on intelligence: Ease to derive meaningful insights and enable business transformation  Less risk: Reduce complexity of the data strategy  Data security & privacy: Make your data strategy compliant with rules and regulations
  • 12. 12© Cloudera, Inc. All rights reserved. | Industrializing the change
  • 13. 13© Cloudera, Inc. All rights reserved. | Misura Diligent Idem Blend Papillon Virtu Capgemini’s Leap Data Transformation Framework Modules overview Essence (Semantic Layer consolidation)  Analyze existing semantic layer of architecture  Identify potential functional overlap and produce recommendations for consolidation Data concierge  Business Information Catalog  Self service ingestion, distillation, analytics  Data Operations Services Estimation Discovery Design/Build Testing  Agile environment provisioning  Continuous Integration lifecycle One-Click leap  Optimize/reduce transformation scope  Optimize reporting design  Optimize SQL  Industrialize end to end testing  Estimate the transformation effort  Optimize ETL semantic design
  • 14. 14© Cloudera, Inc. All rights reserved. | Diligent / Blend Applications Business Problem  Large and complex DW estates have been built over the last 20 years or, so and the infrastructure hosting them might need update  A number of reports and underlying tables will be duplicated or not utilised anymore – they can be decommissioned saving valuable resources  Users are reluctant to give up “their” reports/data when migrations programmes occur Solution  Scope reduction through identifying current BO reports that are not used. Up to 40% discovered with a customer of ours  Scope reduction in identifying reports that are duplicates or share a number of data items.  Automated method to migrate BO reports to Pentaho, hence reduced workload and reduced errors.  A scientific and objective approach to measure which data are actually used  Diligent BO Audit data explorer to identify interactions between users and Universes / Reports and tables  Diligent BO Meta data gathering Module to extract Universe and report information.  Blend Report merger to identify reports reduction  Blend XML Generator to create Pentaho reporting cubes from Diligent gathered metadata. Diligent Blend Accelerator Results
  • 15. 15© Cloudera, Inc. All rights reserved. | IDEM-DA Business Problem  The customer has very strict security and normalisation requirements when loading their data, they need different obfuscation types for different “semantic types pre” e.g. names, phone numbers, social security numbers. Etc.  Left it as a manual activity, this would imply a laborious and time consuming identification of hundred of thousands of columns – a costly and error prone activity Solution  Automated identification of tables columns for encryption, and standardisation  Automated creation of ETL meta-data spreadsheets which drive Data Acquisitions Pentaho jobs for data migration Accelerator Results  Manual generation of meta-data spreadsheet: Several Days - Weeks  IDEM-DA: 15mins - 2 hours  Manual eyeballing of data – human errors. Can take hours to several days  IDEM-DA: Approximately 70% reduction and more accurate identification of known types Project manager of Data Migration project: “IDEM-DA is the only way forward” Idem
  • 16. 16© Cloudera, Inc. All rights reserved. | Example table IDEM-DA Column Name Dataset mob_no 07710232931,07083210302 email example@hotmail.com, hello@gmail.com free_text_field My address is 12 lucky street, London, E12 2TF serial_id 11234, 22313, 3231313 Semantic Type MOBILE_NO EMAIL Address UNKNOWN IDEM-DA IDEM-DA is a Module used to support the ETL from legacy data warehouses into Modern architecture Idem
  • 17. 17© Cloudera, Inc. All rights reserved. | IDEM-ES Business Problem  The customer has a load pattern called “cutover+delta” – historical tables are updated with daily files  Although many tables have most of the columns with similar names, Left it as a manual activity, this would imply a time consuming identification of hundred of thousands of columns – a error prone activity Solution  Machine learning based solution to automatically identify similarity between columns (humanly supervised)  Column name similarity (ngrams)  Column content similarity (ngrams)  Column content agnostic distribution (hist)  Open architecture to automatically evaluate best model (tested 600+)  Automated creation of INSERT INTO ETL scripts Accelerator Results - Acceleration expected around 30-50% Can automatically generate SQL insert statements to create the current view Idem
  • 18. 18© Cloudera, Inc. All rights reserved. | IDEM-ES Idem
  • 19. 19© Cloudera, Inc. All rights reserved. | IDEM ES Idem
  • 20. 20© Cloudera, Inc. All rights reserved. | Virtu – Data testing Framework Business Problem  Testing data migrations – and in general integrity of data transformations in large scale BI/DW estates is complicated  Thousands of objects moved across during the migration – and when in production loaded every day might lead to hundred of defects – without an automated system to keep track of all of them can become a daunting task  Continuously monitoring of the DQ performance and regression error history is essential to maintain acceptable levels of quality Solution Benefits • Customer can easily plan and execute a large amount of checks – completely controlling their lifecycle (creation, modification, decommissioning) • Configurable engine to store details of defects to have maximum visibility and transparency on errors and their resolutions • Native connection to modern defect management systems (Jira) – and easily expandable to any systems with reachable API • DQ dashboard gives real time and drillable information on current DQ state • Compatible with 3 system types – Oracle, Impala & MySQL  A complete e2e testing framework that accelerates the configuration, execution and evaluation of tests for large scale BI domains  Comprised of Web UI for maximum user friendliness in configuration  Scheduler engine to launch configurable batches of tests  Real time Defect manager for timely defects issuing and progress check  DQ dashboard for monitoring state and progress
  • 21. 21© Cloudera, Inc. All rights reserved. | Virtu – Testing Framework
  • 22. 22© Cloudera, Inc. All rights reserved. | Virtu – Testing Framework
  • 23. 23© Cloudera, Inc. All rights reserved. | Adoption
  • 24. 24© Cloudera, Inc. All rights reserved. | Leap Data Transformation Framework is the result of a client co-innovation process and delivered efficiencies on large projects  Capgemini client in Public Sector is building a Business Data Lake (BDL) to support all digital channels interactions as well as rationalize/optimize its IT Business Intelligence legacy landscape on top of the new Big Data architecture  In the scope of the IT Rationalization project, 10+ data warehouses, hundreds of analytical business services, and thousands of BO reports must be moved on top of the BDL, for thousands of business users throughout the organization.  In this context, Leap Data Transformation Framework was used on a 1st business scope  Leap is a framework consisting of a transformation methodology and accelerators across the transformation lifecycle which can operate at scale:  The methodology is modular and covering all phases of transformations  Elements of the Discovery phase were automated  Design and Build process automation (metadata driven) and application deployment controls delivered development efficiencies and scalability  A metadata driven test automation framework reduced initial test effort and subsequent regression test activities  A Continuous Development process  Platform application stack deployment efficiencies Approach Key Outcomes Accelerator Results An end to end, fact-based transformation framework to deliver IT Rationalization on top of Big Data architectures  40% reduction of the transformation scope Diligent  15% efficiency in the design/build process through use of: • Semi-Automated ETL code optimizer • Semi-Automated SQL optimizer • Semi-Automated report optimizer Idem Papillon Blend  10% efficiency in the test development process (1st pass) & 30% efficiency in regression testing through: • Automated test & assurance framework Virtu
  • 25. 25© Cloudera, Inc. All rights reserved. | Use cases for Capgemini’s Leap Data Transformation Framework for optimized business data lakes  For advanced clients embracing the potential of modern architectures  Opportunity to transform, simplify and rationalize an organization’s data landscape for optimized TCO  Leap Data Transformation full suite enables risk and cost reduction working well in an agile approach Replatforming  For clients in need of better visibility of their current data assets before moving to Big Data  Leap Data Transformation Framework can help optimize current data management processes, reduce substantially transformation scope, identify the optimal platform for the workloads and shape a future project for success Legacy Discovery/DW optimization  Capgemini takes over current BI estate and modernizes it through its NextGen BISC approach  For clients with redundant and expensive DW estates concerned about risks to move to modern architectures  Leap Data Transformation Framework full suite is a key element to optimize the TCO and ensuring quality in the transformation process Managing existing BI & move to modern architectures  For clients needing to automate their data testing in big data environments or large relational environments  Tools can automate the testing lifecycle for both big data and traditional relational DW estates Testing
  • 26. 26© Cloudera, Inc. All rights reserved. | Replatforming legacy BI applications requires strong strategies for user adoption and decommissioning Strong user adoption strategy  End users understand the new value they will get out of the new system  They are empowered to use it  Their success is spreading to new initiatives • They forget all about the old & slow stuff fairly quickly Weak user adoption strategy  End users fear the new system will impact their capacity to do their jobs  The known is safer than the new  First tests on the new systems disappoint, any failure goes viral  Evolutions still run on the old system, “just in case” Strong kill strategy  Systems are killed according to roadmap, costs linked to unused HW & SW are recovered  IT & Business impacts are anticipated, managed and communicated  The energy is focused on the new Weak kill strategy  First systems are shut down ignoring business constraints, impacting operations  Endless hours spent to compare the old and the new and explain differences  Unprepared board escalations when unplanned impacts arise THE USER ADOPTION STRATEGY THE KILL STRATEGY
  • 27. 27© Cloudera, Inc. All rights reserved. | Sample Table of contents for the output of a 4 week Data Warehouse Optimization roadmap based on LEAP  Data Extract & Staging  Data Management & EDW  Semantic Layer  Sandbox & Analytics  Operational Analytics  Data Virtualization Layer  Master Data Management  Metadata Management  Data Distribution Layer  Our Understanding  Big Data Trends in Heavy Equipment /farm Industry  Technology Principles  Reference Architecture – Conceptual Architecture – Architecture Components  Technology Choice Points – ETL tool comparison – EMR vs. Hadoop  ETL & Data Offloading Plan – Project Structure, Sequence, Sprints – Assumptions – Collaborative Planning & Prep  Logical Architecture  Business Value Proposition  Current State Architecture  End State Architecture  Current State + 6 months Architecture  Current State + 12 months Architecture  Current State + 18 months Architecture  Data Distribution Layer
  • 28. 28© Cloudera, Inc. All rights reserved. | What’s next?
  • 29. 29© Cloudera, Inc. All rights reserved. | Contact our experts Schedule a discovery session with our experts Schedule a first assessment of the value of Leap for your organization Goutham Belliappa Goutham.belliappa@capgemini.com https://www.linkedin.com/in/gouthambelliappa Andrea CAPODICASA Andrea.capodicasa@capgemini.com Duane Garrett duane@cloudera.com

Notas do Editor

  1. Speaker: Goutham
  2. 6
  3. Speaker: Goutham
  4. Speaker: Alexandra Let’s talk a bit about this new architecture that complements and extends existing investments. An enterprise data hub can store unlimited data, cost-effectively and reliably, for as long as you need, and lets users access that data in a variety of ways. Data can be collected, stored, processed, explored, modeled, and served in one unified platform. Cloudera’s enterprise data hub, powered by Apache Hadoop, the popular open source distributed data platform, is differentiated in several crucial areas. We provide: Leading query performance. The enterprise management and governance that you require of all of your mission-critical infrastructure. Comprehensive, transparent, compliance-ready security at the core. An open source platform that is also built of open standards – projects that are supported by multiple vendors to ensure sustainability, portability, and compatibility. Our platform offers flexible deployment options, whether on-premises or in the cloud. === Cheat Sheet version: Our enterprise data hub is: One place for unlimited data Accessible to anyone Connected to the systems you already depend on Secure, governed, managed & compliant Built on open source and open standards Deployed however you want Coupled with the support and enablement you need to succeed. Important Note: Our EDH emphasizes “unified analytics” over “unified data”: It’s not practical or probable that customers will actually unify all their data. Much of it lives in the cloud or on storage (e.g. Isilon), in remote datacenters, is of uncertain value vs. cost of moving it to a hub, or security mandates preclude collocation. We enable customers to gather unlimited data, while bringing diverse processing and analytics to that data.
  5. Speaker: Alexandra
  6. Speaker: Alexandra Value drivers!
  7. Speaker: Alexandra How can I get value from data What data do I keep Lots of separate, complex, expensive systems – do I need them Is my business set up to be competitive? Compliant and productionalize using real data
  8. Speaker: Goutham
  9. Speaker: Andrea
  10. Speaker: Andrea
  11. Speaker: Andrea
  12. Speaker: Andrea
  13. Speaker: Andrea
  14. Speaker: Andrea
  15. Speaker: Andrea
  16. Speaker: Andrea
  17. Speaker: Andrea
  18. Speaker: Goutham
  19. Speaker: Goutham
  20. Speaker: Goutham
  21. Speaker: Goutham