SlideShare uma empresa Scribd logo
1 de 35
@joe_Caserta#ChiefDataNY
Balancing
Data Governance
and
Innovation
Presented by:
Joe Caserta
@joe_Caserta#ChiefDataNY
@joe_Caserta#ChiefDataNY
About Caserta Concepts
• Consulting Data Innovation and Modern Data Engineering
• Award-winning company
• Internationally recognized work force
• Strategy, Architecture, Implementation, Governance
• Innovation Partner
• Strategic Consulting
• Advanced Architecture
• Build & Deploy
• Leader in Enterprise Data Solutions
• Big Data Analytics
• Data Warehousing
• Business Intelligence
• Data Science
• Cloud Computing
• Data Governance
@joe_Caserta#ChiefDataNY
Client Portfolio
Retail/eCommerce
& Manufacturing
Digital Media/AdTech
Education & Services
Finance. Healthcare
& Insurance
@joe_Caserta#ChiefDataNY
The Future of Data is Today
As a Mindful Cyborg, Chris
Dancy utilizes up to
700 sensors, devices,
applications, and services to
track, analyze, and optimize as
many areas of his existence.
Data quantification enables
him to see the connections of
otherwise invisible data,
resulting in dramatic upgrades
to his health, productivity, and
quality of life.
@joe_Caserta#ChiefDataNY
The Evolution of Data Analytics
Descriptive
Analytics
Diagnostic
Analytics
Predictive
Analytics
Prescriptive
Analytics
What
happened?
Why did it
happen?
What will
happen?
How can we make
It happen?
Data Analytics Sophistication
BusinessValue
Source: Gartner
Reports  Correlations  Predictions  Recommendations
@joe_Caserta#ChiefDataNY
The Evolution of Data Analytics
Source: Gartner
Reports  Correlations  Predictions  Recommendations
Cognitive Computing / Cognitive Data Analytics
@joe_Caserta#ChiefDataNY
Traditional Data Analytics Methods
• Design – Top Down, Bottom Up
• Customer Interviews and requirements gathering
• Data Profiling
• Create Data Models
• Facts and Dimensions
• Extract Transform Load (ETL)
• Copy data from sources to data warehouse
• Data Governance
• Stewardship, business rules, data quality
• Put a BI Tool on Top
• Design semantic layer
• Develop reports
@joe_Caserta#ChiefDataNY
A Day in the Life
• Onboarding new data is difficult!
• Rigid Structures and Data Governance
• Disconnected/removed from business requirements:
“Hey – I need to analyze some new data”
 IT Conforms and profiles the data
 Loads it into dimensional models
 Builds a semantic layer nobody is going to use
 Creates a dashboard we hope someone will notice
..and then you can access your data 3-6 months later to see if it has value!
@joe_Caserta#ChiefDataNY
Houston, we have a Problem: Data Sprawl
• There is one application for every 5-10 employees generating copies of
the same files leading to massive amounts of duplicate idle data strewn all
across the enterprise. - Michael Vizard, ITBusinessEdge.com
• Employees spend 35% of their work time searching for information...
finding what they seek 50% of the time or less.
- “The High Cost of Not Finding Information,” IDC
@joe_Caserta#ChiefDataNY
@joe_Caserta#ChiefDataNY
@joe_Caserta#ChiefDataNY
OLD WAY:
• Structure  Ingest  Analyze
• Fixed Capacity
• Monolithic
NEW WAY:
• Ingest  Analyze  Structure
• Dynamic Capacity
• Ecosystem
RECIPE:
• Data Lake
• Cloud
• Polyglot Data Landscape
The Paradigm Shift
Big Data is not the problem,
It’s the Change Agent
@joe_Caserta#ChiefDataNY
Enrollments
Claims
Finance
ETL
Ad-Hoc Query
Horizontally Scalable Environment - Optimized for Analytics
Data Lake
Canned Reporting
Big Data Analytics
NoSQL
DatabasesETL
Ad-Hoc/Canned
Reporting
Traditional BI
Spark MapReduce Pig/Hive
N1 N2 N4N3 N5
Hadoop Distributed File System (HDFS)
Traditional
EDW
Others…
The Evolution of Data Analytics
Data Science
@joe_Caserta#ChiefDataNY
Innovation is the only sustainable competitive advantage a company can have
Innovations may fail, but companies that don’t innovate will fail
@joe_Caserta#ChiefDataNY
@joe_Caserta#ChiefDataNY
Technology:
• Scalable distributed storage  Hadoop, S3
• Pluggable fit-for-purpose processing  Spark, EMR
Functional Capabilities:
• Remove barriers from data ingestion and analysis
• Storage and processing for all data
• Tunable Governance
@joe_Caserta#ChiefDataNY
@joe_Caserta#ChiefDataNY
Govern
Speed
Access
Ensure
• Govern/Secure Data
• Make Accessible/Available
• Ensure Quality
• Instant Delivery
The CDO Quandary
@joe_Caserta#ChiefDataNY
Data Munging Versus Reporting
Data Governance
AvailabilityRequirement
Fast
Slow
Minimum Maximum
Does Data munging in a data science
lab need the same restrictive
governance and enterprise reporting?
@joe_Caserta#ChiefDataNY
•This is the ‘people’ part. Establishing Enterprise Data Council, Data Stewards, etc.Organization
•Definitions, lineage (where does this data come from), business definitions, technical
metadataMetadata
•Identify and control sensitive data, regulatory compliancePrivacy/Security
•Data must be complete and correct. Measure, improve, certifyData Quality and Monitoring
•Policies around data frequency, source availability, etc.Business Process Integration
•Ensure consistent business critical data i.e. Members, Providers, Agents, etc.Master Data Management
•Data retention, purge schedule, storage/archiving
Information Lifecycle
Management (ILM)
Data Governance
@joe_Caserta#ChiefDataNY
•This is the ‘people’ part. Establishing Enterprise Data Council, Data Stewards, etc.Organization
•Definitions, lineage (where does this data come from), business definitions, technical
metadataMetadata
•Identify and control sensitive data, regulatory compliancePrivacy/Security
•Data must be complete and correct. Measure, improve, certifyData Quality and Monitoring
•Policies around data frequency, source availability, etc.Business Process Integration
•Ensure consistent business critical data i.e. Members, Providers, Agents, etc.Master Data Management
•Data retention, purge schedule, storage/archiving
Information Lifecycle
Management (ILM)
Data Governance
• Add Big Data to overall framework and assign responsibility
• Add data scientists to the Stewardship program
• Assign stewards to new data sets (twitter, call center logs, etc.)
• Graph databases are more flexible than relational
• Lower latency service required
• Distributed data quality and matching algorithms
• Data Quality and Monitoring (probably home grown, drools?)
• Quality checks not only SQL: machine learning, Pig and Map Reduce
• Acting on large dataset quality checks may require distribution
• Larger scale
• New datatypes
• Integrate with Hive Metastore, HCatalog, home grown tables
• Secure and mask multiple data types (not just tabular)
• Deletes are more uncommon (unless there is regulatory requirement)
• Take advantage of compression and archiving (like AWS Glacier)
• Data detection and masking on unstructured data upon ingest
• Near-zero latency, DevOps, Core component of business operations
for Big Data
@joe_Caserta#ChiefDataNY
The Big Data Pyramid
Ingest Raw
Data
Organize, Define,
Complete
Munging, Blending
Machine Learning
Data Quality and Monitoring
Metadata, ILM , Security
Data Catalog
Data Integration
Fully Governed ( trusted)
Arbitrary/Ad-hoc Queries and
Reporting
Usage Pattern Data Governance
Metadata, ILM,
Security
@joe_Caserta#ChiefDataNY
Peeling back the layers… The Landing Area
• Source data in it’s full fidelity
• Programmatically Loaded
• Partitioned for data processing
• Little governance other than catalog and ILM (Security and Retention)
Consumers: ETL Processes, Applications
@joe_Caserta#ChiefDataNY
Data Lake
• Enriched, lightly integrated
• Data accessible via Hive or ‘raw’
• Either processed into tabular relations
• Or via Hive Serdes directly upon Raw Data
• Partitioned for data access
• Additional Governance
• Consumers: Data Scientists, ETL Processes,
Applications, Data Analysts
@joe_Caserta#ChiefDataNY
Data Science Workspace
• No barrier for onboarding and analysis of new data
• Blending of new data with entire Data Lake, including the Big Data
Warehouse
• Data Scientists enrich data with insight
Consumers: Data Scientists
@joe_Caserta#ChiefDataNY
Big Data Warehouse
• Data is Fully Governed
• Data is Structured
• Partitioned/tuned for data access
• Governance includes a guarantee of completeness and
accuracy, security
Big
Data
Warehouse
Consumers: Data Scientists, ETL Processes, Applications,
Data Analysts, and Business Users (the masses)
@joe_Caserta#ChiefDataNY
The Refinery
BDW
Data Science
Workspace
Data Lake
Landing Area
Cool
new
data
New
Insights
• The feedback loop between Data Science and Data Warehouse is critical
• Successful work products of science must Graduate into the appropriate
layers of the Data Lake
@joe_Caserta#ChiefDataNY
Define and Find Your Data
• Data Classification
• Import/Define business taxonomy
• Capture/Automate relationships between data sets
• Integrate metadata with other systems
• Centralized Auditing
• Security access information for every application with data
• Operational information for execution
• Search & Lineage (Browse)
• Predefined navigation paths to explore data
• Text-based search for data elements across data ecosystem
• Browse visualization of data lineage
• Security & Policy Engine
• Rationalize compliance policy at run-time
• Prevent data derivation based on classification (re-classification)
Key Requirements
• Automatic data-
discovery
• Metadata tagging
• Classification
@joe_Caserta#ChiefDataNY
Caution: Assembly Required
 Some of the most hopeful tools are brand new or in
incubation!
 Enterprise big data implementations typically combine
products with custom built components
Tools
People, Processes and Business commitment is still critical!
Data Integration Data Catalog & Governance Emerging Solutions
@joe_Caserta#ChiefDataNY
Existing On-Premise Solution
• Challenges with operations of data servers in Data Center
• Increasing infrastructure complexity
• Keeping up with data growth
Cloud Advantages
• Reduced upfront capital investment
• Faster speed to value
• Elasticity
“Those that go out and buy expensive
infrastructure find that the problem scope and
domain shift really quickly. By the time they get
around to answering the original question, the
business has moved on.” - Matt Wood, AWS
Move to the Cloud?
@joe_Caserta#ChiefDataNY
Come out and Play
CIL - Caserta
Innovations Lab
Experience
Big Data Warehousing Meetup
• Meet monthly to share data best
practices, experiences
• 3,300+ Members
http://www.meetup.com/Big-Data-Warehousing/
Examples of Previous Topics
• Data Governance, Compliance &
Security in Hadoop w/Cloudera
• Real Time Trade Data Monitoring
with Storm & Cassandra
• Predictive Analytics
• Exploring Big Data Analytics
Techniques w/Datameer
• Using a Graph DB for MDM &
Relationship Mgmt
• Data Science w/Claudia
Perlcih & Revolution Analytics
• Processing 1.4 Trillion Events
in Hadoop
• Building a Relevance Engine
using Hadoop, Mahout & Pig
• Big Data 2.0 – YARN Distributed
ETL & SQL w/Hadoop
• Intro to NoSQL w/10GEN
Next Meetup:
December 9, 2015
Co-host:
@joe_Caserta#ChiefDataNY
Thank You / Q&A
Joe Caserta
President, Caserta Concepts
joe@casertaconcepts.com
@joe_Caserta
@joe_Caserta#ChiefDataNY
The Data Scientist Winning Trifecta
Modern Data
Engineering/Data
Preparation
Domain
Knowledge/Business
Expertise
Advanced
Mathematics/
Statistics
@joe_Caserta#ChiefDataNY
Electronic Medical Records (EMR) Analytics
Hadoop Data LakeEdge Node
`
100k
files
variant 1..n
…
variant 1..n
HDFS
Put
Netezza DW
Sqoop
Pig EMR
Processor
UDF
Library
Provider table
(parquet)
Member table
(parquet)
Python Wrapper
Provider table
Member table
Forqlift
Sequence
Files
…
variant 1..n
Sequence
Files
…
15 More
Entities
(parquet)
More
Dimensions
And
Facts
• Receive Electronic Medial Records from various providers in various formats
• Address Hadoop ‘small file’ problem
• No barrier for onboarding and analysis of new data
• Blend new data with Data Lake and Big Data Warehouse
• Machine Learning
• Text Analytics
• Natural Language Processing
• Reporting
• Ad-hoc queries
• File ingestion
• Information Lifecycle Mgmt

Mais conteúdo relacionado

Mais procurados

Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for EveryoneCaserta
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Caserta
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsCaserta
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure CloudCaserta
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Caserta
 
Defining and Applying Data Governance in Today’s Business Environment
Defining and Applying Data Governance in Today’s Business EnvironmentDefining and Applying Data Governance in Today’s Business Environment
Defining and Applying Data Governance in Today’s Business EnvironmentCaserta
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
 
Cloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummitCloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummitMing Yuan
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...DataWorks Summit
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
A Year in Review - Building a Comprehensive Data Management Program
A Year in Review - Building a Comprehensive Data Management ProgramA Year in Review - Building a Comprehensive Data Management Program
A Year in Review - Building a Comprehensive Data Management ProgramDataWorks Summit
 
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyAgile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyTamrMarketing
 
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...TamrMarketing
 
Data Management vs. Data Governance Program
Data Management vs. Data Governance ProgramData Management vs. Data Governance Program
Data Management vs. Data Governance ProgramDATAVERSITY
 

Mais procurados (20)

Big Data Boom
Big Data BoomBig Data Boom
Big Data Boom
 
Making Big Data Easy for Everyone
Making Big Data Easy for EveryoneMaking Big Data Easy for Everyone
Making Big Data Easy for Everyone
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
Data Governance and Analytics
Data Governance and AnalyticsData Governance and Analytics
Data Governance and Analytics
 
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
Integrating the CDO Role Into Your Organization; Managing the Disruption (MIT...
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
Moving Past Infrastructure Limitations
Moving Past Infrastructure LimitationsMoving Past Infrastructure Limitations
Moving Past Infrastructure Limitations
 
Benefits of the Azure Cloud
Benefits of the Azure CloudBenefits of the Azure Cloud
Benefits of the Azure Cloud
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
 
Defining and Applying Data Governance in Today’s Business Environment
Defining and Applying Data Governance in Today’s Business EnvironmentDefining and Applying Data Governance in Today’s Business Environment
Defining and Applying Data Governance in Today’s Business Environment
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Cloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummitCloud and Analytics -- 2020 sparksummit
Cloud and Analytics -- 2020 sparksummit
 
A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...A modern, flexible approach to Hadoop implementation incorporating innovation...
A modern, flexible approach to Hadoop implementation incorporating innovation...
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
A Year in Review - Building a Comprehensive Data Management Program
A Year in Review - Building a Comprehensive Data Management ProgramA Year in Review - Building a Comprehensive Data Management Program
A Year in Review - Building a Comprehensive Data Management Program
 
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and UncertaintyAgile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
Agile Leadership: Guiding DataOps Teams Through Rapid Change and Uncertainty
 
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...Michael Stonebraker:  Big Data, Disruption, and the 800 Pound Gorilla in the ...
Michael Stonebraker: Big Data, Disruption, and the 800 Pound Gorilla in the ...
 
Data Management vs. Data Governance Program
Data Management vs. Data Governance ProgramData Management vs. Data Governance Program
Data Management vs. Data Governance Program
 

Destaque

Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopCaserta
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalCaserta
 
Data Driven Decisions - Big Data Warehousing Meetup, FICO
Data Driven Decisions - Big Data Warehousing Meetup, FICOData Driven Decisions - Big Data Warehousing Meetup, FICO
Data Driven Decisions - Big Data Warehousing Meetup, FICOCaserta
 
Neo4j Solutions - Master Data Management
Neo4j Solutions - Master Data ManagementNeo4j Solutions - Master Data Management
Neo4j Solutions - Master Data ManagementCaserta
 
DGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityDGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityCaserta
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaCaserta
 
Deploying a Governed Data Lake
Deploying a Governed Data LakeDeploying a Governed Data Lake
Deploying a Governed Data LakeWaterlineData
 
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
Big MDM Part 2: Using a Graph Database for MDM and Relationship ManagementBig MDM Part 2: Using a Graph Database for MDM and Relationship Management
Big MDM Part 2: Using a Graph Database for MDM and Relationship ManagementCaserta
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkCaserta
 
Webinar: Initiating a Customer MDM/Data Governance Program
Webinar: Initiating a Customer MDM/Data Governance ProgramWebinar: Initiating a Customer MDM/Data Governance Program
Webinar: Initiating a Customer MDM/Data Governance ProgramDATAVERSITY
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
 
The Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics SolutionThe Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics SolutionSenturus
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseCaserta
 

Destaque (13)

Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with HadoopBig Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
Big Data 2.0: YARN Enablement for Distributed ETL & SQL with Hadoop
 
Data Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobalData Quality in the Data Hub with RedPointGlobal
Data Quality in the Data Hub with RedPointGlobal
 
Data Driven Decisions - Big Data Warehousing Meetup, FICO
Data Driven Decisions - Big Data Warehousing Meetup, FICOData Driven Decisions - Big Data Warehousing Meetup, FICO
Data Driven Decisions - Big Data Warehousing Meetup, FICO
 
Neo4j Solutions - Master Data Management
Neo4j Solutions - Master Data ManagementNeo4j Solutions - Master Data Management
Neo4j Solutions - Master Data Management
 
DGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data QualityDGIQ 2015 The Fundamentals of Data Quality
DGIQ 2015 The Fundamentals of Data Quality
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
 
Deploying a Governed Data Lake
Deploying a Governed Data LakeDeploying a Governed Data Lake
Deploying a Governed Data Lake
 
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
Big MDM Part 2: Using a Graph Database for MDM and Relationship ManagementBig MDM Part 2: Using a Graph Database for MDM and Relationship Management
Big MDM Part 2: Using a Graph Database for MDM and Relationship Management
 
Mastering Customer Data on Apache Spark
Mastering Customer Data on Apache SparkMastering Customer Data on Apache Spark
Mastering Customer Data on Apache Spark
 
Webinar: Initiating a Customer MDM/Data Governance Program
Webinar: Initiating a Customer MDM/Data Governance ProgramWebinar: Initiating a Customer MDM/Data Governance Program
Webinar: Initiating a Customer MDM/Data Governance Program
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 
The Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics SolutionThe Science of Predictive Maintenance: IBM's Predictive Analytics Solution
The Science of Predictive Maintenance: IBM's Predictive Analytics Solution
 
The Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's EnterpriseThe Rise of the CDO in Today's Enterprise
The Rise of the CDO in Today's Enterprise
 

Semelhante a Balancing Data Governance and Innovation

Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureCaserta
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeCaserta
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceCaserta
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AIGary Allemann
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatiaSatish Bhatia
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big DataUmair Shafique
 
When the business needs intelligence (15Oct2014)
When the business needs intelligence   (15Oct2014)When the business needs intelligence   (15Oct2014)
When the business needs intelligence (15Oct2014)Dipti Patil
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementTony Bain
 
Predictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupPredictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupCaserta
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefitsRicky Barron
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedDunn Solutions Group
 
DBAs - Is Your Company’s Personal and Sensitive Data Safe?
DBAs - Is Your Company’s Personal and Sensitive Data Safe?DBAs - Is Your Company’s Personal and Sensitive Data Safe?
DBAs - Is Your Company’s Personal and Sensitive Data Safe?DevOps.com
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityPrecisely
 
All Together Now: A Recipe for Successful Data Governance
All Together Now: A Recipe for Successful Data GovernanceAll Together Now: A Recipe for Successful Data Governance
All Together Now: A Recipe for Successful Data GovernanceInside Analysis
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeMicrosoft
 

Semelhante a Balancing Data Governance and Innovation (20)

Incorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic ArchitectureIncorporating the Data Lake into Your Analytic Architecture
Incorporating the Data Lake into Your Analytic Architecture
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
How to build a successful Data Lake
How to build a successful Data LakeHow to build a successful Data Lake
How to build a successful Data Lake
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
Digital intelligence satish bhatia
Digital intelligence satish bhatiaDigital intelligence satish bhatia
Digital intelligence satish bhatia
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
When the business needs intelligence (15Oct2014)
When the business needs intelligence   (15Oct2014)When the business needs intelligence   (15Oct2014)
When the business needs intelligence (15Oct2014)
 
Big data
Big dataBig data
Big data
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data ManagementBig Data, NoSQL, NewSQL & The Future of Data Management
Big Data, NoSQL, NewSQL & The Future of Data Management
 
Predictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing MeetupPredictive Analytics - Big Data Warehousing Meetup
Predictive Analytics - Big Data Warehousing Meetup
 
Data lake benefits
Data lake benefitsData lake benefits
Data lake benefits
 
The Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They NeedThe Data Lake and Getting Buisnesses the Big Data Insights They Need
The Data Lake and Getting Buisnesses the Big Data Insights They Need
 
DBAs - Is Your Company’s Personal and Sensitive Data Safe?
DBAs - Is Your Company’s Personal and Sensitive Data Safe?DBAs - Is Your Company’s Personal and Sensitive Data Safe?
DBAs - Is Your Company’s Personal and Sensitive Data Safe?
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
All Together Now: A Recipe for Successful Data Governance
All Together Now: A Recipe for Successful Data GovernanceAll Together Now: A Recipe for Successful Data Governance
All Together Now: A Recipe for Successful Data Governance
 
Derfor skal du bruge en DataLake
Derfor skal du bruge en DataLakeDerfor skal du bruge en DataLake
Derfor skal du bruge en DataLake
 
Big_Data.pptx
Big_Data.pptxBig_Data.pptx
Big_Data.pptx
 

Mais de Caserta

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingCaserta
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Caserta
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017Caserta
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Caserta
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Caserta
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the CloudCaserta
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by DatabricksCaserta
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupCaserta
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWSCaserta
 

Mais de Caserta (9)

Using Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven MarketingUsing Machine Learning & Spark to Power Data-Driven Marketing
Using Machine Learning & Spark to Power Data-Driven Marketing
 
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
Data Intelligence: How the Amalgamation of Data, Science, and Technology is C...
 
General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017General Data Protection Regulation - BDW Meetup, October 11th, 2017
General Data Protection Regulation - BDW Meetup, October 11th, 2017
 
Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)Introduction to Data Science (Data Summit, 2017)
Introduction to Data Science (Data Summit, 2017)
 
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
Looker Data Modeling in the Age of Cloud - BDW Meetup May 2, 2017
 
Big Data Analytics on the Cloud
Big Data Analytics on the CloudBig Data Analytics on the Cloud
Big Data Analytics on the Cloud
 
Not Your Father's Database by Databricks
Not Your Father's Database by DatabricksNot Your Father's Database by Databricks
Not Your Father's Database by Databricks
 
Introducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing MeetupIntroducing Kudu, Big Data Warehousing Meetup
Introducing Kudu, Big Data Warehousing Meetup
 
Real Time Big Data Processing on AWS
Real Time Big Data Processing on AWSReal Time Big Data Processing on AWS
Real Time Big Data Processing on AWS
 

Último

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 

Balancing Data Governance and Innovation

  • 3. @joe_Caserta#ChiefDataNY About Caserta Concepts • Consulting Data Innovation and Modern Data Engineering • Award-winning company • Internationally recognized work force • Strategy, Architecture, Implementation, Governance • Innovation Partner • Strategic Consulting • Advanced Architecture • Build & Deploy • Leader in Enterprise Data Solutions • Big Data Analytics • Data Warehousing • Business Intelligence • Data Science • Cloud Computing • Data Governance
  • 4. @joe_Caserta#ChiefDataNY Client Portfolio Retail/eCommerce & Manufacturing Digital Media/AdTech Education & Services Finance. Healthcare & Insurance
  • 5. @joe_Caserta#ChiefDataNY The Future of Data is Today As a Mindful Cyborg, Chris Dancy utilizes up to 700 sensors, devices, applications, and services to track, analyze, and optimize as many areas of his existence. Data quantification enables him to see the connections of otherwise invisible data, resulting in dramatic upgrades to his health, productivity, and quality of life.
  • 6. @joe_Caserta#ChiefDataNY The Evolution of Data Analytics Descriptive Analytics Diagnostic Analytics Predictive Analytics Prescriptive Analytics What happened? Why did it happen? What will happen? How can we make It happen? Data Analytics Sophistication BusinessValue Source: Gartner Reports  Correlations  Predictions  Recommendations
  • 7. @joe_Caserta#ChiefDataNY The Evolution of Data Analytics Source: Gartner Reports  Correlations  Predictions  Recommendations Cognitive Computing / Cognitive Data Analytics
  • 8. @joe_Caserta#ChiefDataNY Traditional Data Analytics Methods • Design – Top Down, Bottom Up • Customer Interviews and requirements gathering • Data Profiling • Create Data Models • Facts and Dimensions • Extract Transform Load (ETL) • Copy data from sources to data warehouse • Data Governance • Stewardship, business rules, data quality • Put a BI Tool on Top • Design semantic layer • Develop reports
  • 9. @joe_Caserta#ChiefDataNY A Day in the Life • Onboarding new data is difficult! • Rigid Structures and Data Governance • Disconnected/removed from business requirements: “Hey – I need to analyze some new data”  IT Conforms and profiles the data  Loads it into dimensional models  Builds a semantic layer nobody is going to use  Creates a dashboard we hope someone will notice ..and then you can access your data 3-6 months later to see if it has value!
  • 10. @joe_Caserta#ChiefDataNY Houston, we have a Problem: Data Sprawl • There is one application for every 5-10 employees generating copies of the same files leading to massive amounts of duplicate idle data strewn all across the enterprise. - Michael Vizard, ITBusinessEdge.com • Employees spend 35% of their work time searching for information... finding what they seek 50% of the time or less. - “The High Cost of Not Finding Information,” IDC
  • 13. @joe_Caserta#ChiefDataNY OLD WAY: • Structure  Ingest  Analyze • Fixed Capacity • Monolithic NEW WAY: • Ingest  Analyze  Structure • Dynamic Capacity • Ecosystem RECIPE: • Data Lake • Cloud • Polyglot Data Landscape The Paradigm Shift Big Data is not the problem, It’s the Change Agent
  • 14. @joe_Caserta#ChiefDataNY Enrollments Claims Finance ETL Ad-Hoc Query Horizontally Scalable Environment - Optimized for Analytics Data Lake Canned Reporting Big Data Analytics NoSQL DatabasesETL Ad-Hoc/Canned Reporting Traditional BI Spark MapReduce Pig/Hive N1 N2 N4N3 N5 Hadoop Distributed File System (HDFS) Traditional EDW Others… The Evolution of Data Analytics Data Science
  • 15. @joe_Caserta#ChiefDataNY Innovation is the only sustainable competitive advantage a company can have Innovations may fail, but companies that don’t innovate will fail
  • 17. @joe_Caserta#ChiefDataNY Technology: • Scalable distributed storage  Hadoop, S3 • Pluggable fit-for-purpose processing  Spark, EMR Functional Capabilities: • Remove barriers from data ingestion and analysis • Storage and processing for all data • Tunable Governance
  • 19. @joe_Caserta#ChiefDataNY Govern Speed Access Ensure • Govern/Secure Data • Make Accessible/Available • Ensure Quality • Instant Delivery The CDO Quandary
  • 20. @joe_Caserta#ChiefDataNY Data Munging Versus Reporting Data Governance AvailabilityRequirement Fast Slow Minimum Maximum Does Data munging in a data science lab need the same restrictive governance and enterprise reporting?
  • 21. @joe_Caserta#ChiefDataNY •This is the ‘people’ part. Establishing Enterprise Data Council, Data Stewards, etc.Organization •Definitions, lineage (where does this data come from), business definitions, technical metadataMetadata •Identify and control sensitive data, regulatory compliancePrivacy/Security •Data must be complete and correct. Measure, improve, certifyData Quality and Monitoring •Policies around data frequency, source availability, etc.Business Process Integration •Ensure consistent business critical data i.e. Members, Providers, Agents, etc.Master Data Management •Data retention, purge schedule, storage/archiving Information Lifecycle Management (ILM) Data Governance
  • 22. @joe_Caserta#ChiefDataNY •This is the ‘people’ part. Establishing Enterprise Data Council, Data Stewards, etc.Organization •Definitions, lineage (where does this data come from), business definitions, technical metadataMetadata •Identify and control sensitive data, regulatory compliancePrivacy/Security •Data must be complete and correct. Measure, improve, certifyData Quality and Monitoring •Policies around data frequency, source availability, etc.Business Process Integration •Ensure consistent business critical data i.e. Members, Providers, Agents, etc.Master Data Management •Data retention, purge schedule, storage/archiving Information Lifecycle Management (ILM) Data Governance • Add Big Data to overall framework and assign responsibility • Add data scientists to the Stewardship program • Assign stewards to new data sets (twitter, call center logs, etc.) • Graph databases are more flexible than relational • Lower latency service required • Distributed data quality and matching algorithms • Data Quality and Monitoring (probably home grown, drools?) • Quality checks not only SQL: machine learning, Pig and Map Reduce • Acting on large dataset quality checks may require distribution • Larger scale • New datatypes • Integrate with Hive Metastore, HCatalog, home grown tables • Secure and mask multiple data types (not just tabular) • Deletes are more uncommon (unless there is regulatory requirement) • Take advantage of compression and archiving (like AWS Glacier) • Data detection and masking on unstructured data upon ingest • Near-zero latency, DevOps, Core component of business operations for Big Data
  • 23. @joe_Caserta#ChiefDataNY The Big Data Pyramid Ingest Raw Data Organize, Define, Complete Munging, Blending Machine Learning Data Quality and Monitoring Metadata, ILM , Security Data Catalog Data Integration Fully Governed ( trusted) Arbitrary/Ad-hoc Queries and Reporting Usage Pattern Data Governance Metadata, ILM, Security
  • 24. @joe_Caserta#ChiefDataNY Peeling back the layers… The Landing Area • Source data in it’s full fidelity • Programmatically Loaded • Partitioned for data processing • Little governance other than catalog and ILM (Security and Retention) Consumers: ETL Processes, Applications
  • 25. @joe_Caserta#ChiefDataNY Data Lake • Enriched, lightly integrated • Data accessible via Hive or ‘raw’ • Either processed into tabular relations • Or via Hive Serdes directly upon Raw Data • Partitioned for data access • Additional Governance • Consumers: Data Scientists, ETL Processes, Applications, Data Analysts
  • 26. @joe_Caserta#ChiefDataNY Data Science Workspace • No barrier for onboarding and analysis of new data • Blending of new data with entire Data Lake, including the Big Data Warehouse • Data Scientists enrich data with insight Consumers: Data Scientists
  • 27. @joe_Caserta#ChiefDataNY Big Data Warehouse • Data is Fully Governed • Data is Structured • Partitioned/tuned for data access • Governance includes a guarantee of completeness and accuracy, security Big Data Warehouse Consumers: Data Scientists, ETL Processes, Applications, Data Analysts, and Business Users (the masses)
  • 28. @joe_Caserta#ChiefDataNY The Refinery BDW Data Science Workspace Data Lake Landing Area Cool new data New Insights • The feedback loop between Data Science and Data Warehouse is critical • Successful work products of science must Graduate into the appropriate layers of the Data Lake
  • 29. @joe_Caserta#ChiefDataNY Define and Find Your Data • Data Classification • Import/Define business taxonomy • Capture/Automate relationships between data sets • Integrate metadata with other systems • Centralized Auditing • Security access information for every application with data • Operational information for execution • Search & Lineage (Browse) • Predefined navigation paths to explore data • Text-based search for data elements across data ecosystem • Browse visualization of data lineage • Security & Policy Engine • Rationalize compliance policy at run-time • Prevent data derivation based on classification (re-classification) Key Requirements • Automatic data- discovery • Metadata tagging • Classification
  • 30. @joe_Caserta#ChiefDataNY Caution: Assembly Required  Some of the most hopeful tools are brand new or in incubation!  Enterprise big data implementations typically combine products with custom built components Tools People, Processes and Business commitment is still critical! Data Integration Data Catalog & Governance Emerging Solutions
  • 31. @joe_Caserta#ChiefDataNY Existing On-Premise Solution • Challenges with operations of data servers in Data Center • Increasing infrastructure complexity • Keeping up with data growth Cloud Advantages • Reduced upfront capital investment • Faster speed to value • Elasticity “Those that go out and buy expensive infrastructure find that the problem scope and domain shift really quickly. By the time they get around to answering the original question, the business has moved on.” - Matt Wood, AWS Move to the Cloud?
  • 32. @joe_Caserta#ChiefDataNY Come out and Play CIL - Caserta Innovations Lab Experience Big Data Warehousing Meetup • Meet monthly to share data best practices, experiences • 3,300+ Members http://www.meetup.com/Big-Data-Warehousing/ Examples of Previous Topics • Data Governance, Compliance & Security in Hadoop w/Cloudera • Real Time Trade Data Monitoring with Storm & Cassandra • Predictive Analytics • Exploring Big Data Analytics Techniques w/Datameer • Using a Graph DB for MDM & Relationship Mgmt • Data Science w/Claudia Perlcih & Revolution Analytics • Processing 1.4 Trillion Events in Hadoop • Building a Relevance Engine using Hadoop, Mahout & Pig • Big Data 2.0 – YARN Distributed ETL & SQL w/Hadoop • Intro to NoSQL w/10GEN Next Meetup: December 9, 2015 Co-host:
  • 33. @joe_Caserta#ChiefDataNY Thank You / Q&A Joe Caserta President, Caserta Concepts joe@casertaconcepts.com @joe_Caserta
  • 34. @joe_Caserta#ChiefDataNY The Data Scientist Winning Trifecta Modern Data Engineering/Data Preparation Domain Knowledge/Business Expertise Advanced Mathematics/ Statistics
  • 35. @joe_Caserta#ChiefDataNY Electronic Medical Records (EMR) Analytics Hadoop Data LakeEdge Node ` 100k files variant 1..n … variant 1..n HDFS Put Netezza DW Sqoop Pig EMR Processor UDF Library Provider table (parquet) Member table (parquet) Python Wrapper Provider table Member table Forqlift Sequence Files … variant 1..n Sequence Files … 15 More Entities (parquet) More Dimensions And Facts • Receive Electronic Medial Records from various providers in various formats • Address Hadoop ‘small file’ problem • No barrier for onboarding and analysis of new data • Blend new data with Data Lake and Big Data Warehouse • Machine Learning • Text Analytics • Natural Language Processing • Reporting • Ad-hoc queries • File ingestion • Information Lifecycle Mgmt