SlideShare uma empresa Scribd logo
1 de 20
BUILDING A TRUE
ENTERPRISE DATA
GOVERNANCE PLATFORM
FOR THE MODERN MACHINE
LEARNING ERA
Subash DSouza
Director Big Data & Ops,
Warner Bros – Data
Intelligence Group
BIO
Organizer - Data Con LA formerly known as Big Data
Day LA
Organizer - LA Big Data Users Group
Organizer - LA Apache Spark Users Group
Advisor - Anthill Studio
Investor – FOSSA
Data Governance PMC Member – The Linux Foundation
– ODPi
Founder – Archangel Technology Consultants LLC
ATTRIBUTES OF
ENTERPRISE
DATA
GOVERNANCE
ENTERPRISE DATA GOVERNANCE
CHALLENGES
4
Poor data quality
costs real money
1
Process efficiency is
negatively impacted
by poor data
governance
2
Full potential benefits
of new systems not
be realized because
of poor data
governance
3
Decision making is
negatively affected by
poor data governance
4
ENTERPRISE DATA GOVERNANCE
OBJECTIVES
5
Guide information management decision-makingGuide
Ensure information is consistently defined and well understoodEnsure
Increase the use and trust of data as an organization assetIncrease
Improve consistency of projects across the organizationImprove
Ensure regulatory complianceEnsure
Eliminate data risksEliminate
MASTER DATA
MANAGEMENT
PROBLEMS
6
Discovery - cannot find the right information
Integration - cannot manipulate and combine
information
Dissemination - cannot consume information
Insight - cannot extract value and knowledge from
information
Management – cannot manage and control
information volumes and growth
MASTER DATA
MANAGEMENT
ISSUES
7
52% of users don’t have
confidence in their
information
59% of managers miss
information they should
have used
42% of managers use
wrong information at
least once a week
75% of CIOs believe they
can strengthen their
competitive advantage
by better using and
managing enterprise
data
78% of CIOs want to
improve the way they
use and manage their
data
Only 15% of CIOs
believe that their data is
currently
comprehensively well
manage
META DATA
MANAGEMENT
8
Metadata needs to be managed to ensure ...
Availability: metadata
needs to be stored where it
can be accessed and
indexed so it can be
found.
Quality: metadata needs to
be of consistent quality so
users know that it can be
trusted.
Persistence: metadata
needs to be kept over
time.
Open License: metadata
should be available under
a public domain license to
enable its reuse.
Metadata is structured information that describes,
explains, locates, or otherwise makes it easier to
retrieve, use, or manage an information resource.
Metadata is often called data about data or
information about information.
META DATA
MANAGEMENT
LIFECYCLE
The metadata lifecycle is larger than the data
lifecycle:
 Metadata may be created before data is created or captured,
e.g. to inform about data that will be available in the future.
 Metadata needs to be kept after data has been removed, e.g. to
inform about data that has been decommissioned or
withdrawn.
9
DATA LINEAGE
10
DATA QUALITY ISSUES
11
DATA QUALITY LIFECYCLE
12
IMPORTANC
E OF A DATA
CATALOG
13
14
DATA SECURITY & COMPLIANCE
PII, GDPR, Retention, Oversight, Reporting
DATA CATALOG
Data Discovery, Classification, AI, ML
DATA QUALITY
Data Cleansing, Data Completeness
DATA LINEAGE
Attribution, Survivorship, Feedback Loops
METADATA MANAGEMENT
Data Tagging, Federated Search
MDM (MASTER DATA MANAGEMENT)
Taxonomy, Single Version of the Truth
DATA GOVERNANCE PROJECT / SERVICE SCOPE
Value add service for data security, retention and
compliance
Crawl, introspect, discover and classify data
Clean, correct, validate and triage data
Track and record data source, path and life cycle
Manage searchable properties associated with data
Service for the ingestion, cleansing and curation of
data
BUSINESS VALUE ADD
15
Revenue Agility
Cost Compliance
• Reduce risk
• Control access to data
• Adhere to government and corporate
regulations
• Manage customer privacy preferences
• Automate manual business processes
• Reduce data errors
• Eliminate duplication of efforts and
technologies
• IT system consolidation initiatives
• Recognize new business opportunities
• Increase business reach by identifying
synergy
• Improve marketing capabilities
• Faster time to market
• Consolidate data from silos
• Meet demands of new business channels
• Grow with the business
• Identify key relationships and hierarchies
PROPOSED SOLUTION
16
Metadata Management
Data Lineage
Data Catalog
Data Security
Data Quality
Master Data Management
RulesEngine
Workflow&Audit
Applications
Data Data
SERVICE ORIENTED ARCHITECTURE
Communicating Truth to Others
SOA Patterns:
● Service Provider
● Service Broker, Registry, Repository
● Service Requester / Recipient
● Incoming & outgoing Services
Defining Concepts:
● Business Value
● Strategic Goals
● Intrinsic Interoperability
● Shared Services
● Flexibility
● Evolutionary Refinement
Orchestrated Service Types:
● Push & Pull
● RESTful Services
● SOAP Services
● Database Connectivity
● Flat File Support
DETAILED SERVICE OFFERING
ARCHITECTURE
17
Rules Engine
SOA Service Oriented Architecture
Auditing
InfrastructureNetwork Storage Messaging Transport
Data
Catalog.
Data
Classify
Data
Discovery
Data
Quality
Quality
Cleansing
Data
Lineage
Data
Source
Workflow
Feedback
Loop
Metadata
Mgmt.
Mgmt.
Strategy
Metadata
Storage
Metadata
Capture
Master Data
Mgmt.
Data
Architectur
e
Data
Quality
Taxonomy
Distribution
FTP
HTTP
TCP / UDP
DATABASE
Data
Security
B2B / Internal Users B2C / External Users
Web
Services
Search
Internal Up Stream Systems External Down Stream Systems User Interfaces
Complianc
e
Encryption
Obfuscate
/
Anonymize
Access
Controle
Business Process Orchestration
Tag
FILE
PII / GDPR
Service
Registries
Messaging
Document
Logical
Ordering
Publish
Complete
Accuracy
Consistenc
y
Metadata
Integration
Business
Intelligence
Distributio
n &
Feedback
ProcessingDatabase
MDM USE CASE – DATA
CLEANSING
18
Upstream/SourceSystems
MDM Components
MDM Master Data Curation
Review,
Curate &
Approve
Clean MDM Master
DB Exports
(as needed)
DB ETL Feed
(as needed)
MDM Curators MDM Users
(Optional)
Data
Merge
MDM Users
Source
System 1
Source
System 2
Source
System
N
Target
System 1
MDM
Portal
(Optional)
MDM DATA EXPOLITATION
Adding Business Value
ISDATACLEAN???
AUTOMATED DATA
CLEANSING /
TRIAGE
Data Support
Staff
Technical
Support Staff
MANUAL DATA
CLEANSING /
TRIAGE
Unresolved
ETL
ETL
RESOLVED RESOLVED
Is Data
Clean
Automate
Technical
Cleanse
Logical Structural
IS DATA CLEAN & RESOLVED?
Structural Data
Target
System
N
RETURN CLEAN DATA BACK TO SOURCE SYSTEMS VIA SOA SERVICES
MicroservicesData
Staging
ETL
PROPOSED DATA GOVERNANCE
SOLUTION MATURITY MODEL
Q&A
Contact me
 @sawjd22
 linkedin.com/in/sawjd
 sawjd@yahoo.com

Mais conteúdo relacionado

Mais procurados

CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...
CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...
CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...Jeff Kelly
 
Privacera Databricks CCPA Webinar Feb 2020
Privacera Databricks CCPA Webinar Feb 2020Privacera Databricks CCPA Webinar Feb 2020
Privacera Databricks CCPA Webinar Feb 2020Privacera
 
Data Sheet - Manage unstructured data growth with Symantec Data Insight
Data Sheet - Manage unstructured data growth with Symantec Data InsightData Sheet - Manage unstructured data growth with Symantec Data Insight
Data Sheet - Manage unstructured data growth with Symantec Data InsightSymantec
 
The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?DATUM LLC
 
Data Privacy in the DMBOK - No Need to Reinvent the Wheel
Data Privacy in the DMBOK - No Need to Reinvent the WheelData Privacy in the DMBOK - No Need to Reinvent the Wheel
Data Privacy in the DMBOK - No Need to Reinvent the WheelDATAVERSITY
 
Big Data Governance in Hadoop Environments with Cloudera Navigatorfeb2017meetu
Big Data Governance in Hadoop Environments with Cloudera Navigatorfeb2017meetuBig Data Governance in Hadoop Environments with Cloudera Navigatorfeb2017meetu
Big Data Governance in Hadoop Environments with Cloudera Navigatorfeb2017meetuEmre Sevinç
 
Top 10 Best Practices for Implementing Data Classification
Top 10 Best Practices for Implementing Data ClassificationTop 10 Best Practices for Implementing Data Classification
Top 10 Best Practices for Implementing Data ClassificationWatchful Software
 
SharePoint Information Governance
SharePoint Information GovernanceSharePoint Information Governance
SharePoint Information GovernanceGalaxy Consulting
 
Deliver Data Governance with a “Yes”
Deliver Data Governance with a “Yes”Deliver Data Governance with a “Yes”
Deliver Data Governance with a “Yes”Jean-Michel Franco
 
Three Cool Things You Can Do with Standards
Three Cool Things You Can Do with StandardsThree Cool Things You Can Do with Standards
Three Cool Things You Can Do with StandardsMatt Turner
 
The Metadata Secret in Your Data
The Metadata Secret in Your DataThe Metadata Secret in Your Data
The Metadata Secret in Your DataEverteam
 
Symantec Data Insight for Storage
Symantec Data Insight for StorageSymantec Data Insight for Storage
Symantec Data Insight for StorageSymantec
 
Are Your Data Ready for GDPR? (with MAPR and Talend)
Are Your Data Ready for GDPR? (with MAPR and Talend)Are Your Data Ready for GDPR? (with MAPR and Talend)
Are Your Data Ready for GDPR? (with MAPR and Talend)Jean-Michel Franco
 
Practical steps to GDPR compliance
Practical steps to GDPR compliance Practical steps to GDPR compliance
Practical steps to GDPR compliance Jean-Michel Franco
 
Five Elements of Effective Data Access Governance
Five Elements of Effective Data Access Governance  Five Elements of Effective Data Access Governance
Five Elements of Effective Data Access Governance Privacera
 
Big data services slideshare - agilisium 2.0 - v1.0
Big data services   slideshare - agilisium 2.0 - v1.0Big data services   slideshare - agilisium 2.0 - v1.0
Big data services slideshare - agilisium 2.0 - v1.0Agilisium Consulting
 
BigID, OneTrust, IAPP Webinar: Bridging the Privacy Office with IT
BigID, OneTrust, IAPP Webinar: Bridging the Privacy Office with ITBigID, OneTrust, IAPP Webinar: Bridging the Privacy Office with IT
BigID, OneTrust, IAPP Webinar: Bridging the Privacy Office with ITBigID Inc
 
A Dynamic Data Catalog for Autonomy and Self-Service
A Dynamic Data Catalog for Autonomy and Self-ServiceA Dynamic Data Catalog for Autonomy and Self-Service
A Dynamic Data Catalog for Autonomy and Self-ServiceDenodo
 

Mais procurados (20)

CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...
CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...
CCPA Compliance for Analytics and Data Science Use Cases with Databricks and ...
 
Privacera Databricks CCPA Webinar Feb 2020
Privacera Databricks CCPA Webinar Feb 2020Privacera Databricks CCPA Webinar Feb 2020
Privacera Databricks CCPA Webinar Feb 2020
 
2. Smart Data Discovery
2. Smart Data Discovery2. Smart Data Discovery
2. Smart Data Discovery
 
Data Sheet - Manage unstructured data growth with Symantec Data Insight
Data Sheet - Manage unstructured data growth with Symantec Data InsightData Sheet - Manage unstructured data growth with Symantec Data Insight
Data Sheet - Manage unstructured data growth with Symantec Data Insight
 
The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?The Merger is Happening, Now What Do We Do?
The Merger is Happening, Now What Do We Do?
 
Data Privacy in the DMBOK - No Need to Reinvent the Wheel
Data Privacy in the DMBOK - No Need to Reinvent the WheelData Privacy in the DMBOK - No Need to Reinvent the Wheel
Data Privacy in the DMBOK - No Need to Reinvent the Wheel
 
Big Data Governance in Hadoop Environments with Cloudera Navigatorfeb2017meetu
Big Data Governance in Hadoop Environments with Cloudera Navigatorfeb2017meetuBig Data Governance in Hadoop Environments with Cloudera Navigatorfeb2017meetu
Big Data Governance in Hadoop Environments with Cloudera Navigatorfeb2017meetu
 
Top 10 Best Practices for Implementing Data Classification
Top 10 Best Practices for Implementing Data ClassificationTop 10 Best Practices for Implementing Data Classification
Top 10 Best Practices for Implementing Data Classification
 
SharePoint Information Governance
SharePoint Information GovernanceSharePoint Information Governance
SharePoint Information Governance
 
Deliver Data Governance with a “Yes”
Deliver Data Governance with a “Yes”Deliver Data Governance with a “Yes”
Deliver Data Governance with a “Yes”
 
Three Cool Things You Can Do with Standards
Three Cool Things You Can Do with StandardsThree Cool Things You Can Do with Standards
Three Cool Things You Can Do with Standards
 
The Metadata Secret in Your Data
The Metadata Secret in Your DataThe Metadata Secret in Your Data
The Metadata Secret in Your Data
 
Needtoknow
NeedtoknowNeedtoknow
Needtoknow
 
Symantec Data Insight for Storage
Symantec Data Insight for StorageSymantec Data Insight for Storage
Symantec Data Insight for Storage
 
Are Your Data Ready for GDPR? (with MAPR and Talend)
Are Your Data Ready for GDPR? (with MAPR and Talend)Are Your Data Ready for GDPR? (with MAPR and Talend)
Are Your Data Ready for GDPR? (with MAPR and Talend)
 
Practical steps to GDPR compliance
Practical steps to GDPR compliance Practical steps to GDPR compliance
Practical steps to GDPR compliance
 
Five Elements of Effective Data Access Governance
Five Elements of Effective Data Access Governance  Five Elements of Effective Data Access Governance
Five Elements of Effective Data Access Governance
 
Big data services slideshare - agilisium 2.0 - v1.0
Big data services   slideshare - agilisium 2.0 - v1.0Big data services   slideshare - agilisium 2.0 - v1.0
Big data services slideshare - agilisium 2.0 - v1.0
 
BigID, OneTrust, IAPP Webinar: Bridging the Privacy Office with IT
BigID, OneTrust, IAPP Webinar: Bridging the Privacy Office with ITBigID, OneTrust, IAPP Webinar: Bridging the Privacy Office with IT
BigID, OneTrust, IAPP Webinar: Bridging the Privacy Office with IT
 
A Dynamic Data Catalog for Autonomy and Self-Service
A Dynamic Data Catalog for Autonomy and Self-ServiceA Dynamic Data Catalog for Autonomy and Self-Service
A Dynamic Data Catalog for Autonomy and Self-Service
 

Semelhante a Data Science Salon 2018 - Building a true enterprise data governance platform for the modern machine learning era by Subash D'Souza

Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaCaserta
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceDATAVERSITY
 
Increasing Agility Through Data Virtualization
Increasing Agility Through Data VirtualizationIncreasing Agility Through Data Virtualization
Increasing Agility Through Data VirtualizationDenodo
 
Defining and Applying Data Governance in Today’s Business Environment
Defining and Applying Data Governance in Today’s Business EnvironmentDefining and Applying Data Governance in Today’s Business Environment
Defining and Applying Data Governance in Today’s Business EnvironmentCaserta
 
Strata NYC 2015 - Transamerica and INFA v1
Strata NYC 2015 - Transamerica and INFA v1Strata NYC 2015 - Transamerica and INFA v1
Strata NYC 2015 - Transamerica and INFA v1Vishal Bamba
 
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineQlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineSrikanth Sharma Boddupalli
 
How a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 ViewHow a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 ViewDenodo
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on HadoopCaserta
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation Caserta
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonDATAVERSITY
 
Achieving a Single View of Business – Critical Data with Master Data Management
Achieving a Single View of Business – Critical Data with Master Data ManagementAchieving a Single View of Business – Critical Data with Master Data Management
Achieving a Single View of Business – Critical Data with Master Data ManagementDATAVERSITY
 
Ensuring Data Quality and Lineage in Cloud Migration - Dan Power
Ensuring Data Quality and Lineage in Cloud Migration - Dan PowerEnsuring Data Quality and Lineage in Cloud Migration - Dan Power
Ensuring Data Quality and Lineage in Cloud Migration - Dan PowerMolly Alexander
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It? Caserta
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data ScienceCaserta
 
DoD Data Quality Challenges
DoD Data Quality ChallengesDoD Data Quality Challenges
DoD Data Quality ChallengesJay j
 
The Missing Link in Enterprise Data Governance - Automated Metadata Management
The Missing Link in Enterprise Data Governance - Automated Metadata ManagementThe Missing Link in Enterprise Data Governance - Automated Metadata Management
The Missing Link in Enterprise Data Governance - Automated Metadata ManagementDATAVERSITY
 
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...Big Data Week
 
Best Practices To Build a Data Lake
Best Practices To Build a Data LakeBest Practices To Build a Data Lake
Best Practices To Build a Data LakeFibonalabs
 

Semelhante a Data Science Salon 2018 - Building a true enterprise data governance platform for the modern machine learning era by Subash D'Souza (20)

Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Increasing Agility Through Data Virtualization
Increasing Agility Through Data VirtualizationIncreasing Agility Through Data Virtualization
Increasing Agility Through Data Virtualization
 
Defining and Applying Data Governance in Today’s Business Environment
Defining and Applying Data Governance in Today’s Business EnvironmentDefining and Applying Data Governance in Today’s Business Environment
Defining and Applying Data Governance in Today’s Business Environment
 
Data Governance for Enterprises
Data Governance for EnterprisesData Governance for Enterprises
Data Governance for Enterprises
 
Strata NYC 2015 - Transamerica and INFA v1
Strata NYC 2015 - Transamerica and INFA v1Strata NYC 2015 - Transamerica and INFA v1
Strata NYC 2015 - Transamerica and INFA v1
 
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipelineQlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
Qlik wp 2021_q3_data_governance_in_the_modern_data_analytics_pipeline
 
How a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 ViewHow a Logical Data Fabric Enhances the Customer 360 View
How a Logical Data Fabric Enhances the Customer 360 View
 
Intro to Data Science on Hadoop
Intro to Data Science on HadoopIntro to Data Science on Hadoop
Intro to Data Science on Hadoop
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
Denodo’s Data Catalog: Bridging the Gap between Data and Business (APAC)
 
The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation The Data Lake - Balancing Data Governance and Innovation
The Data Lake - Balancing Data Governance and Innovation
 
ADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and ComparisonADV Slides: Data Pipelines in the Enterprise and Comparison
ADV Slides: Data Pipelines in the Enterprise and Comparison
 
Achieving a Single View of Business – Critical Data with Master Data Management
Achieving a Single View of Business – Critical Data with Master Data ManagementAchieving a Single View of Business – Critical Data with Master Data Management
Achieving a Single View of Business – Critical Data with Master Data Management
 
Ensuring Data Quality and Lineage in Cloud Migration - Dan Power
Ensuring Data Quality and Lineage in Cloud Migration - Dan PowerEnsuring Data Quality and Lineage in Cloud Migration - Dan Power
Ensuring Data Quality and Lineage in Cloud Migration - Dan Power
 
What Data Do You Have and Where is It?
What Data Do You Have and Where is It? What Data Do You Have and Where is It?
What Data Do You Have and Where is It?
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
DoD Data Quality Challenges
DoD Data Quality ChallengesDoD Data Quality Challenges
DoD Data Quality Challenges
 
The Missing Link in Enterprise Data Governance - Automated Metadata Management
The Missing Link in Enterprise Data Governance - Automated Metadata ManagementThe Missing Link in Enterprise Data Governance - Automated Metadata Management
The Missing Link in Enterprise Data Governance - Automated Metadata Management
 
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
 
Best Practices To Build a Data Lake
Best Practices To Build a Data LakeBest Practices To Build a Data Lake
Best Practices To Build a Data Lake
 

Mais de Data Con LA

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA
 

Mais de Data Con LA (20)

Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynotes
Data Con LA 2022 KeynotesData Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup ShowcaseData Con LA 2022 - Startup Showcase
Data Con LA 2022 - Startup Showcase
 
Data Con LA 2022 Keynote
Data Con LA 2022 KeynoteData Con LA 2022 Keynote
Data Con LA 2022 Keynote
 
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendationsData Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - Using Google trends data to build product recommendations
 
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - AI EthicsData Con LA 2022 - AI Ethics
Data Con LA 2022 - AI Ethics
 
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learningData Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - Improving disaster response with machine learning
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentationData Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Real world consumer segmentation
 
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
 
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWSData Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Moving Data at Scale to AWS
 
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AIData Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
 
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
 
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data ScienceData Con LA 2022 - Intro to Data Science
Data Con LA 2022 - Intro to Data Science
 
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing EntertainmentData Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
 
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
 
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
 
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
 
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with KafkaData Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 - Data Streaming with Kafka
 

Último

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 

Último (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Data Science Salon 2018 - Building a true enterprise data governance platform for the modern machine learning era by Subash D'Souza

  • 1. BUILDING A TRUE ENTERPRISE DATA GOVERNANCE PLATFORM FOR THE MODERN MACHINE LEARNING ERA Subash DSouza Director Big Data & Ops, Warner Bros – Data Intelligence Group
  • 2. BIO Organizer - Data Con LA formerly known as Big Data Day LA Organizer - LA Big Data Users Group Organizer - LA Apache Spark Users Group Advisor - Anthill Studio Investor – FOSSA Data Governance PMC Member – The Linux Foundation – ODPi Founder – Archangel Technology Consultants LLC
  • 4. ENTERPRISE DATA GOVERNANCE CHALLENGES 4 Poor data quality costs real money 1 Process efficiency is negatively impacted by poor data governance 2 Full potential benefits of new systems not be realized because of poor data governance 3 Decision making is negatively affected by poor data governance 4
  • 5. ENTERPRISE DATA GOVERNANCE OBJECTIVES 5 Guide information management decision-makingGuide Ensure information is consistently defined and well understoodEnsure Increase the use and trust of data as an organization assetIncrease Improve consistency of projects across the organizationImprove Ensure regulatory complianceEnsure Eliminate data risksEliminate
  • 6. MASTER DATA MANAGEMENT PROBLEMS 6 Discovery - cannot find the right information Integration - cannot manipulate and combine information Dissemination - cannot consume information Insight - cannot extract value and knowledge from information Management – cannot manage and control information volumes and growth
  • 7. MASTER DATA MANAGEMENT ISSUES 7 52% of users don’t have confidence in their information 59% of managers miss information they should have used 42% of managers use wrong information at least once a week 75% of CIOs believe they can strengthen their competitive advantage by better using and managing enterprise data 78% of CIOs want to improve the way they use and manage their data Only 15% of CIOs believe that their data is currently comprehensively well manage
  • 8. META DATA MANAGEMENT 8 Metadata needs to be managed to ensure ... Availability: metadata needs to be stored where it can be accessed and indexed so it can be found. Quality: metadata needs to be of consistent quality so users know that it can be trusted. Persistence: metadata needs to be kept over time. Open License: metadata should be available under a public domain license to enable its reuse. Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information.
  • 9. META DATA MANAGEMENT LIFECYCLE The metadata lifecycle is larger than the data lifecycle:  Metadata may be created before data is created or captured, e.g. to inform about data that will be available in the future.  Metadata needs to be kept after data has been removed, e.g. to inform about data that has been decommissioned or withdrawn. 9
  • 13. IMPORTANC E OF A DATA CATALOG 13
  • 14. 14 DATA SECURITY & COMPLIANCE PII, GDPR, Retention, Oversight, Reporting DATA CATALOG Data Discovery, Classification, AI, ML DATA QUALITY Data Cleansing, Data Completeness DATA LINEAGE Attribution, Survivorship, Feedback Loops METADATA MANAGEMENT Data Tagging, Federated Search MDM (MASTER DATA MANAGEMENT) Taxonomy, Single Version of the Truth DATA GOVERNANCE PROJECT / SERVICE SCOPE Value add service for data security, retention and compliance Crawl, introspect, discover and classify data Clean, correct, validate and triage data Track and record data source, path and life cycle Manage searchable properties associated with data Service for the ingestion, cleansing and curation of data
  • 15. BUSINESS VALUE ADD 15 Revenue Agility Cost Compliance • Reduce risk • Control access to data • Adhere to government and corporate regulations • Manage customer privacy preferences • Automate manual business processes • Reduce data errors • Eliminate duplication of efforts and technologies • IT system consolidation initiatives • Recognize new business opportunities • Increase business reach by identifying synergy • Improve marketing capabilities • Faster time to market • Consolidate data from silos • Meet demands of new business channels • Grow with the business • Identify key relationships and hierarchies
  • 16. PROPOSED SOLUTION 16 Metadata Management Data Lineage Data Catalog Data Security Data Quality Master Data Management RulesEngine Workflow&Audit Applications Data Data SERVICE ORIENTED ARCHITECTURE Communicating Truth to Others SOA Patterns: ● Service Provider ● Service Broker, Registry, Repository ● Service Requester / Recipient ● Incoming & outgoing Services Defining Concepts: ● Business Value ● Strategic Goals ● Intrinsic Interoperability ● Shared Services ● Flexibility ● Evolutionary Refinement Orchestrated Service Types: ● Push & Pull ● RESTful Services ● SOAP Services ● Database Connectivity ● Flat File Support
  • 17. DETAILED SERVICE OFFERING ARCHITECTURE 17 Rules Engine SOA Service Oriented Architecture Auditing InfrastructureNetwork Storage Messaging Transport Data Catalog. Data Classify Data Discovery Data Quality Quality Cleansing Data Lineage Data Source Workflow Feedback Loop Metadata Mgmt. Mgmt. Strategy Metadata Storage Metadata Capture Master Data Mgmt. Data Architectur e Data Quality Taxonomy Distribution FTP HTTP TCP / UDP DATABASE Data Security B2B / Internal Users B2C / External Users Web Services Search Internal Up Stream Systems External Down Stream Systems User Interfaces Complianc e Encryption Obfuscate / Anonymize Access Controle Business Process Orchestration Tag FILE PII / GDPR Service Registries Messaging Document Logical Ordering Publish Complete Accuracy Consistenc y Metadata Integration Business Intelligence Distributio n & Feedback ProcessingDatabase
  • 18. MDM USE CASE – DATA CLEANSING 18 Upstream/SourceSystems MDM Components MDM Master Data Curation Review, Curate & Approve Clean MDM Master DB Exports (as needed) DB ETL Feed (as needed) MDM Curators MDM Users (Optional) Data Merge MDM Users Source System 1 Source System 2 Source System N Target System 1 MDM Portal (Optional) MDM DATA EXPOLITATION Adding Business Value ISDATACLEAN??? AUTOMATED DATA CLEANSING / TRIAGE Data Support Staff Technical Support Staff MANUAL DATA CLEANSING / TRIAGE Unresolved ETL ETL RESOLVED RESOLVED Is Data Clean Automate Technical Cleanse Logical Structural IS DATA CLEAN & RESOLVED? Structural Data Target System N RETURN CLEAN DATA BACK TO SOURCE SYSTEMS VIA SOA SERVICES MicroservicesData Staging ETL
  • 20. Q&A Contact me  @sawjd22  linkedin.com/in/sawjd  sawjd@yahoo.com