SlideShare uma empresa Scribd logo
1 de 33
Baixar para ler offline
Data Profiling:
The First Step to Big Data Quality
Harald Smith, Dir. Product Marketing
Housekeeping
Webcast Audio
• Today’s webcast audio is streamed through your computer speakers.
• If you need technical assistance with the web interface or audio,
please reach out to us using the chat window.
Questions Welcome
• Submit your questions at any time during the presentation
using the chat window.
• Our team will reach out to you to answer them following the
presentation.
Recording and slides
• This webcast is being recorded. You will receive an
email following the webcast with a link to download
both the recording and the slides.
Speaker
Harald Smith
• Director of Product Marketing, Syncsort
• 20+ years in Information Management with a focus on
data quality, integration, and governance
• Co-author of Patterns of Information Management
• Author of two Redbooks on Information Governance
and Data Integration
• Blog author: “Data Democratized”
Only 35%of senior executives have a
high level of trust in the
accuracy of their Big Data
Analytics
KPMG 2016 Global CEO Outlook
92% of
executives are concerned
about the negative impact of
data and analytics on
corporate reputation
KPMG 2017 Global CEO Outlook
80%of AI/ML projects are stalling
due to poor data quality
Dimensional Research, 2019
Big Data Needs
Data Quality
“Societal trust in business is
arguably at an all-time low
and, in a world increasingly
driven by data and
technology,
reputations and brands are
ever harder to protect.”
EY “Trust in Data and Why it Matters”, 2017.
The importance of data
quality in the enterprise:
• Decision making
• Customer centricity
• Compliance
• Machine learning & AI
“
”
The magic of machine learning is that you build a
statistical model based on the most valid dataset for
the domain of interest.
If the data is junk, then you’ll be building a junk
model that will not be able to do its job.
James Kobeilus
SiliconANGLE Wikibon
Lead Analyst for Data Science, Deep Learning, App Development
2018
Data Quality Challenges with Machine Learning
Incorrect, Incomplete, Mis-Formatted, and Sparse “Dirty Data” –
Mistakes and errors are almost never the patterns you’re looking for in
a data set. Sparse data generates other issues. Correcting and
standardizing will tend to boost the signal, but must account for bias.
Missing context – Many data sources lack context around location or
population segments. Unless enriched with other data sets, (e.g.
geospatial, demographics, or firmographics data), some ML algorithms
will not be usable.
Multiple copies – If your data comes from many sources, as it often
does, it may contain multiple records of information about the same
person, company, product or other entity. Removing duplicates and
enhancing the overall depth and accuracy of knowledge about a single
entity can make a huge difference.
Spurious correlations – Just as missing context may hinder some ML
algorithms, inclusion of already correlated data (e.g. city and postal
code) may result in overfitting of ML algorithms.
Correcting data problems vastly increases a data set’s usefulness for machine learning.
But data analysts may not be aware of
specific data quality issues that must be
addressed to support machine learning.
Traditional data quality processes are
an effective method to identify defects.
Understanding Big Data Quality
Data Profiling
The set of analytical techniques that
evaluate actual data content (vs.
metadata) to provide a complete view
of each data element in a data source.
Provides summarized inferences, and
details of value and pattern frequencies
to quickly gain data insights.
Business Rules
The data quality or validation rules that
help ensure that data is “fit for use” in
its intended operational and decision-
making contexts.
Covers the accuracy, completeness,
consistency, relevance, timeliness and
validity of data.
Five Key Steps to effective Data Profiling
These are not new, but good to reiterate in the
context of Big Data:
1. How you want to analyze the data?
2. What should you review? (there's a lot of stuff)
3. What should you look for? (based on data “type”)
4. When should you build rules? (laser-focus; CDE’s)
5. What needs to be communicated?
1. How do you want to analyze the data?
Universal DQ best practices:
Understand the End Goal
• How does the business intend to
use the data (i.e. what’s the use
case)?
• Empower users (“Who”) to gain
new clarity into the core problem
(“Why”)
• What will the data be used for?
• What defines the Fitness for your
Purpose?
Establish Scope
• Ask the “right questions” about the
use case and the data (not just
“what” and “how”)
• What data is relevant to the effort?
• Big Data or other, you need to set
boundaries for the work
Understand Context
• How does the business define the
data?
• What are the important
characteristics and context of the
data?
• What are the Critical Data
Elements?
• What qualities will you need to
address, or leave alone?
• “High-quality data” definition will
vary by business problem“If you don’t know what you want to
get out of the data, how can you
know what data you need – and
what insight you’re looking for?”
Wolf Ruzicka, Chairman of the Board at EastBanc Technologies,
Blog post: June 1, 2017, “Grow A Data Tree Out Of The “Big Data”
Swamp”
“
”
Never lead with a data set;
lead with a question.
Anthony Scriffignano, Chief Data Scientist, Dun & Bradstreet
Forbes Insights, May 31, 2017, “The Data Differentiator”
To Sample or not to Sample?
Sampling helps with:
• Data Integration
• Source-to-target mapping
• Data Modeling
• Discovering Correlations
When the focus is on the structure of the data
❖ REMEMBER: your target is a statistically
valid sample!
❖ ~16k records gives you 99% confidence
with a margin of error of 1% for 100B
records
❖ ~66k records gives you 99% confidence
with a margin of error of .5% for same
Full Volume needed with:
• Data Quality
• Data Governance
• Regulatory Compliance
• Finding Outliers and Issues
with Content
• “Needles in the haystack”
When the focus is on the quality of or risks
within the data
❖ Focus on critical data elements and
leverage tools that scale to data volume
Big Data at scale distributes data across many
nodes – not necessarily with other relevant data!
• Processing routines must apply same approach and logic each
time
• Implications for profiling, joining, sorting, and matching data,
whether for enrichment, verification against trusted sources, or a
consolidated single view
Data Quality functions must be performed in a consistent manner,
no matter where actual processing takes place, how the data is
segmented, and what the data volume is.
• Data quality cleansing and preparation routines have to be
reproduced at scale, both to get the data ready to train machine
learning models, and to comply with business regulations.
• Critical to establishing, building, and maintaining trust
Scaling Data Quality best practices:
Consistent processing at scale
Source: HP Analyst Briefing
2. What do you want to review?
Common Data Quality Measurements
What measures can we take advantage of?
1. Completeness – Are the relevant fields populated?
2. Integrity – Does the data maintain an internal structural
integrity or a relational integrity across sources
3. Uniqueness – Are keys or records unique?
4. Validity – Does the data have the correct values?
• Code and reference values
• Valid ranges
• Valid value combinations
5. Consistency – Is the data at consistent levels of
aggregation or does it have consistent valid values
over time?
6. Timeliness – Did the data arrive in a time period
that makes it useful or usable?
New data, new data quality challenges
• 3rd Party and external data with unknown provenance or relevance
• Bias in the data – whether in collection, extraction, or other processing
• Data without standardized structure or formatting
• Continuously streaming data
• Disjointed data (e.g. gaps in receipt)
• Consistency and verification of data sources
• Changes and transformation applied to data (i.e. does it really
represent the original input)
New Data Quality Problems
“34 percent of bankers in our survey report that their organization
has been the target of adversarial AI at least once, and 78 percent
believe automated systems create new risks, such as fake data,
external data manipulation, and inherent bias.”
Accenture Banking Technology Vision 2018
• Contextual visualizations
• Value and pattern distributions
• Attribute summaries and metadata
• Sort and filter to quickly find data
of interest
• Detail drilldowns to any content
Let Data Profiling guide you
3. What should you look for?
Common Data Types
What variances do you need awareness of?
1. Identifiers – data that uniquely identifies something
2. Indicators – data that flags a specific condition
3. Dates – data that identifies a point in time
4. Quantities – data that identifies an amount or value of something
5. Codes – data that segments other data
6. Text – data that describes or names something
Identifiers
Use cases:
• Business Operations
• 360 View of Entity
• BI Reporting (incl. EDW)
• Analytics
• AI/ML
Examples:
• Customer ID
• National ID / Passport #
• Social Security # / Tax ID
• Product ID
What to look for:
• 100% Complete
• All Unique values
• Anomalous patterns
• Numeric vs. String
Notes:
• Needs full volume assessment
Indicators (aka Flags)
Use cases:
• Business Operations
• 360 View of Entity
• BI Reporting (incl. EDW)
• Governance and Compliance
• Analytics
• AI/ML
Examples:
• True / False (or T/F)
• Yes / No (or Y/N)
• 1 / 0
What to look for:
• Binary Values only
• Consistent pattern
• No mixing of “Y” vs “YES”
• If NULL occurs, it must be
one of the binary values
• Skews in frequency
distributions
Notes:
• May need segmentation, filtering, or
grouping via business rules to resolve or
clarify discrepancies
• Often are triggers for other conditions –
look for use in business rules, but likely
occur downstream
Codes
Use cases:
• Business Operations
• 360 View of Entity
• BI Reporting (incl. EDW)
• Governance and Compliance
• Analytics
• AI/ML
Examples:
• Account Status
• Credit Rating
• Diagnosis/Procedure Codes
• Order Status
• Postal Code
What to look for:
• Expected values
• Consistent patterns
• No mixing of “A” vs “active”
• NULL values
• Skews in frequency
distributions
Notes:
• May need segmentation, filtering, or
grouping via business rules to resolve or
clarify discrepancies
• Often are triggers for or from other
conditions – look for use in business rules
• May correlate to other fields
Dates
Use cases:
• Business Operations
• BI Reporting (incl. EDW)
• Governance and Compliance
• Analytics
• AI/ML
Examples:
• Birth Date
• Departure Date
• Order Date
• Shipping Date
• Timestamp
What to look for:
• Skews in frequency
distributions
• E.g. 01/01/2001
• Anomalous patterns
• Numeric vs. String
• Unusual values
• Missing values and gaps
Notes:
• May need segmentation, filtering, or
grouping via business rules to resolve or
clarify
Quantities
Use cases:
• Business Operations
• BI Reporting (incl. EDW)
• Governance and Compliance
• Analytics
• AI/ML
Examples:
• Amount (e.g. item count, amount due)
• Price
• Sales
• Total (e.g. order total)
What to look for:
• Skews in frequency
distributions
• Anomalous patterns
• Excessively high (or low)
values
Notes:
• May need segmentation, filtering, or
grouping via business rules to resolve or
clarify
Text
Use cases:
• Business Operations
• Building blocks for other
identifiers!
• 360 View of Entity
• Governance and Compliance
• Analytics
• AI/ML
Examples:
• Name
• Address
• Product Description
• Claim Description
What to look for:
• Missing Values
• Frequency of patterns /
Anomalous patterns
• Existence of numerics
• Values <= 5 characters
• Compound values
• Unusual, recurring values
• “Do not use”
Notes:
• Look for correlations with Code values
that indicate specific conditions (e.g.
values used for testing purposes)
4. When do you build rules?
Focus on:
• Critical Data Elements (data quality dimensions)
• Policy-based conditions (e.g. regulatory
compliance)
• Correlated data conditions (e.g. If x, then y)
• Filtering and segmenting data (refining
evaluations; investigating root cause)
Build Rules for Defined Conditions
• Validate critical requirements within or
across data sources
• Build common rules that can be readily
tested and shared
• Evaluate and remediate issues
• Take action on incorrect data and defaults
• Create flags for subsequent use in marking
or remediating data
• Filter result sets and export for additional
use
Benefits of Business Rules
5. What should you communicate?
Culture of Data Literacy
• “Democratization of Data” requires cultural support
• Empowered to ask questions about the data
• Trained to understand and use data
• Trained to understand approaching and evaluating data quality
• Traditional data, new data, machine learning requirements, …
• Understand the business context of the data
Program of Data Governance
• Provide the processes and practices necessary for success
• Measure, monitor, and improve
• Continous iteration and development
Center of Excellence/Knowledge Base
• Where do you go to find answers?
• Who can help show you how?
Communicate!
• Annotate what you’ve found
• Identify the subject and add a description that is meaningful
• Utilize flags, tags, and other indicators to help others distinguish
types and severity of issues
• Integrate into data governance and BI tools for maximum visibility
Annotate Results with Findings
Summary
Evaluating Big Data
It is challenging to keep the end
goal in mind
• Data comes from multiple
disparate systems & sources
• The number of touchpoints for
policies and rules has grown
• There is a higher demand and
expectation for seeing data
quality in context.
• You need to assess and measure
the data content if you
5 Key Steps
• Remember the end goal – ask
questions, use best practices,
and establish scope & context
• Consider what criteria and
dimensions are needed
• Focus your attention based on
the type of data and the use case
• Build rules when necessary to
get laser-focused
• Determine what needs to be
communicated and delivered
Gaining insight and measurement of data quality is more critical than ever!
Data Profiling: The First Step to Big Data Quality

Mais conteúdo relacionado

Mais procurados

Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesIvo Andreev
 
Data Quality Strategies
Data Quality StrategiesData Quality Strategies
Data Quality StrategiesDATAVERSITY
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesDATAVERSITY
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profilingShailja Khurana
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogDATAVERSITY
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsDATAVERSITY
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachDATAVERSITY
 
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...DATAVERSITY
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...DataScienceConferenc1
 
Data Quality
Data QualityData Quality
Data Qualityjerdeb
 
Essential Metadata Strategies
Essential Metadata StrategiesEssential Metadata Strategies
Essential Metadata StrategiesDATAVERSITY
 
Data Management vs Data Strategy
Data Management vs Data StrategyData Management vs Data Strategy
Data Management vs Data StrategyDATAVERSITY
 
Data Quality
Data QualityData Quality
Data QualityVijaya K
 
Data Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step ApproachData Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step ApproachFindWhitePapers
 
The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...Pieter De Leenheer
 

Mais procurados (20)

Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Data modelling 101
Data modelling 101Data modelling 101
Data modelling 101
 
Data Warehouse Design and Best Practices
Data Warehouse Design and Best PracticesData Warehouse Design and Best Practices
Data Warehouse Design and Best Practices
 
Data Quality Strategies
Data Quality StrategiesData Quality Strategies
Data Quality Strategies
 
Data Mesh
Data MeshData Mesh
Data Mesh
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success Stories
 
Data quality and data profiling
Data quality and data profilingData quality and data profiling
Data quality and data profiling
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Data Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced AnalyticsData Architecture Best Practices for Advanced Analytics
Data Architecture Best Practices for Advanced Analytics
 
Business Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected ApproachBusiness Intelligence & Data Analytics– An Architected Approach
Business Intelligence & Data Analytics– An Architected Approach
 
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
Conformed Dimensions of Data Quality – An Organized Approach to Data Quality ...
 
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Data Quality
Data QualityData Quality
Data Quality
 
Essential Metadata Strategies
Essential Metadata StrategiesEssential Metadata Strategies
Essential Metadata Strategies
 
Data modeling for the business
Data modeling for the businessData modeling for the business
Data modeling for the business
 
Data Management vs Data Strategy
Data Management vs Data StrategyData Management vs Data Strategy
Data Management vs Data Strategy
 
Data Quality
Data QualityData Quality
Data Quality
 
Data Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step ApproachData Quality Strategy: A Step-by-Step Approach
Data Quality Strategy: A Step-by-Step Approach
 
The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...
 

Semelhante a Data Profiling: The First Step to Big Data Quality

Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataFoundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataPrecisely
 
Transform Your Downstream Cloud Analytics with Data Quality 
Transform Your Downstream Cloud Analytics with Data Quality Transform Your Downstream Cloud Analytics with Data Quality 
Transform Your Downstream Cloud Analytics with Data Quality Precisely
 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackYour AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackPrecisely
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AIGary Allemann
 
Data-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData Blueprint
 
Data-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data ModelingData-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data ModelingDATAVERSITY
 
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav MisraFrom Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav MisraMolly Alexander
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?Precisely
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?Precisely
 
DataSpryng Overview
DataSpryng OverviewDataSpryng Overview
DataSpryng Overviewjkvr
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesDATAVERSITY
 
Emerging Data Quality Trends for Governing and Analyzing Big Data
Emerging Data Quality Trends for Governing and Analyzing Big DataEmerging Data Quality Trends for Governing and Analyzing Big Data
Emerging Data Quality Trends for Governing and Analyzing Big DataDATAVERSITY
 
Data Quality: principles, approaches, and best practices
Data Quality: principles, approaches, and best practicesData Quality: principles, approaches, and best practices
Data Quality: principles, approaches, and best practicesCarl Anderson
 
Emerging Data Quality Trends for Governing and Analyzing Big Data
Emerging Data Quality Trends for Governing and Analyzing Big DataEmerging Data Quality Trends for Governing and Analyzing Big Data
Emerging Data Quality Trends for Governing and Analyzing Big DataPrecisely
 
From Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data GovernanceFrom Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data GovernancePrecisely
 
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...DATAVERSITY
 
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deckDC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deckBeth Fitzpatrick
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Noise to Signal - The Biggest Problem in Data
Noise to Signal - The Biggest Problem in DataNoise to Signal - The Biggest Problem in Data
Noise to Signal - The Biggest Problem in DataDATAVERSITY
 

Semelhante a Data Profiling: The First Step to Big Data Quality (20)

Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your DataFoundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
 
Transform Your Downstream Cloud Analytics with Data Quality 
Transform Your Downstream Cloud Analytics with Data Quality Transform Your Downstream Cloud Analytics with Data Quality 
Transform Your Downstream Cloud Analytics with Data Quality 
 
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on TrackYour AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
Your AI and ML Projects Are Failing – Key Steps to Get Them Back on Track
 
Deliveinrg explainable AI
Deliveinrg explainable AIDeliveinrg explainable AI
Deliveinrg explainable AI
 
Data-Ed: Trends in Data Modeling
Data-Ed: Trends in Data ModelingData-Ed: Trends in Data Modeling
Data-Ed: Trends in Data Modeling
 
Data-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data ModelingData-Ed Online: Trends in Data Modeling
Data-Ed Online: Trends in Data Modeling
 
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav MisraFrom Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
From Foundation to Mastery – Building a Mature Analytics Roadmap - Manav Misra
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?You Need a Data Catalog. Do You Know Why?
You Need a Data Catalog. Do You Know Why?
 
DataSpryng Overview
DataSpryng OverviewDataSpryng Overview
DataSpryng Overview
 
Data Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & ApproachesData Lake Architecture – Modern Strategies & Approaches
Data Lake Architecture – Modern Strategies & Approaches
 
Emerging Data Quality Trends for Governing and Analyzing Big Data
Emerging Data Quality Trends for Governing and Analyzing Big DataEmerging Data Quality Trends for Governing and Analyzing Big Data
Emerging Data Quality Trends for Governing and Analyzing Big Data
 
Data Quality: principles, approaches, and best practices
Data Quality: principles, approaches, and best practicesData Quality: principles, approaches, and best practices
Data Quality: principles, approaches, and best practices
 
Emerging Data Quality Trends for Governing and Analyzing Big Data
Emerging Data Quality Trends for Governing and Analyzing Big DataEmerging Data Quality Trends for Governing and Analyzing Big Data
Emerging Data Quality Trends for Governing and Analyzing Big Data
 
From Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data GovernanceFrom Compliance to Customer 360: Winning with Data Quality & Data Governance
From Compliance to Customer 360: Winning with Data Quality & Data Governance
 
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
 
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deckDC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
uae views on big data
  uae views on  big data  uae views on  big data
uae views on big data
 
Noise to Signal - The Biggest Problem in Data
Noise to Signal - The Biggest Problem in DataNoise to Signal - The Biggest Problem in Data
Noise to Signal - The Biggest Problem in Data
 

Mais de Precisely

Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Crucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdfCrucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdfPrecisely
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10Precisely
 
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Precisely
 
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...Precisely
 
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3fTestjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3fPrecisely
 
Data Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity TrendsData Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity TrendsPrecisely
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
Optimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAPOptimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAPPrecisely
 
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige InvestitionenSAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige InvestitionenPrecisely
 
Automatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIsAutomatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIsPrecisely
 
Moving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and PreciselyMoving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and PreciselyPrecisely
 
Effective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to KnowEffective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to KnowPrecisely
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellencePrecisely
 
5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation Management5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation ManagementPrecisely
 
Unlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter TomorrowUnlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter TomorrowPrecisely
 
Navigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar DeckNavigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar DeckPrecisely
 
Mainframe Sort Operations: Gaining the Insights You Need for Peak Performance
Mainframe Sort Operations: Gaining the Insights You Need for Peak PerformanceMainframe Sort Operations: Gaining the Insights You Need for Peak Performance
Mainframe Sort Operations: Gaining the Insights You Need for Peak PerformancePrecisely
 
Preventing Downtime with Better IT Operations Management
Preventing Downtime with Better IT Operations ManagementPreventing Downtime with Better IT Operations Management
Preventing Downtime with Better IT Operations ManagementPrecisely
 

Mais de Precisely (20)

Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Crucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdfCrucial Considerations for AI-ready Data.pdf
Crucial Considerations for AI-ready Data.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10Justifying Capacity Managment Webinar 4/10
Justifying Capacity Managment Webinar 4/10
 
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
Automate Studio Training: Materials Maintenance Tips for Efficiency and Ease ...
 
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
Leveraging Mainframe Data in Near Real Time to Unleash Innovation With Cloud:...
 
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3fTestjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
Testjrjnejrvnorno4rno3nrfnfjnrfnournfou3nfou3f
 
Data Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity TrendsData Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity Trends
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
Optimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAPOptimisez la fonction financière en automatisant vos processus SAP
Optimisez la fonction financière en automatisant vos processus SAP
 
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige InvestitionenSAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
SAPS/4HANA Migration - Transformation-Management + nachhaltige Investitionen
 
Automatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIsAutomatisierte SAP Prozesse mit Hilfe von APIs
Automatisierte SAP Prozesse mit Hilfe von APIs
 
Moving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and PreciselyMoving IBM i Applications to the Cloud with AWS and Precisely
Moving IBM i Applications to the Cloud with AWS and Precisely
 
Effective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to KnowEffective Security Monitoring for IBM i: What You Need to Know
Effective Security Monitoring for IBM i: What You Need to Know
 
Automate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center ExcellenceAutomate Your Master Data Processes for Shared Service Center Excellence
Automate Your Master Data Processes for Shared Service Center Excellence
 
5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation Management5 Keys to Improved IT Operation Management
5 Keys to Improved IT Operation Management
 
Unlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter TomorrowUnlock Efficiency With Your Address Data Today For a Smarter Tomorrow
Unlock Efficiency With Your Address Data Today For a Smarter Tomorrow
 
Navigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar DeckNavigating Cloud Trends in 2024 Webinar Deck
Navigating Cloud Trends in 2024 Webinar Deck
 
Mainframe Sort Operations: Gaining the Insights You Need for Peak Performance
Mainframe Sort Operations: Gaining the Insights You Need for Peak PerformanceMainframe Sort Operations: Gaining the Insights You Need for Peak Performance
Mainframe Sort Operations: Gaining the Insights You Need for Peak Performance
 
Preventing Downtime with Better IT Operations Management
Preventing Downtime with Better IT Operations ManagementPreventing Downtime with Better IT Operations Management
Preventing Downtime with Better IT Operations Management
 

Último

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Último (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Data Profiling: The First Step to Big Data Quality

  • 1. Data Profiling: The First Step to Big Data Quality Harald Smith, Dir. Product Marketing
  • 2. Housekeeping Webcast Audio • Today’s webcast audio is streamed through your computer speakers. • If you need technical assistance with the web interface or audio, please reach out to us using the chat window. Questions Welcome • Submit your questions at any time during the presentation using the chat window. • Our team will reach out to you to answer them following the presentation. Recording and slides • This webcast is being recorded. You will receive an email following the webcast with a link to download both the recording and the slides.
  • 3. Speaker Harald Smith • Director of Product Marketing, Syncsort • 20+ years in Information Management with a focus on data quality, integration, and governance • Co-author of Patterns of Information Management • Author of two Redbooks on Information Governance and Data Integration • Blog author: “Data Democratized”
  • 4. Only 35%of senior executives have a high level of trust in the accuracy of their Big Data Analytics KPMG 2016 Global CEO Outlook 92% of executives are concerned about the negative impact of data and analytics on corporate reputation KPMG 2017 Global CEO Outlook 80%of AI/ML projects are stalling due to poor data quality Dimensional Research, 2019 Big Data Needs Data Quality “Societal trust in business is arguably at an all-time low and, in a world increasingly driven by data and technology, reputations and brands are ever harder to protect.” EY “Trust in Data and Why it Matters”, 2017. The importance of data quality in the enterprise: • Decision making • Customer centricity • Compliance • Machine learning & AI
  • 5. “ ” The magic of machine learning is that you build a statistical model based on the most valid dataset for the domain of interest. If the data is junk, then you’ll be building a junk model that will not be able to do its job. James Kobeilus SiliconANGLE Wikibon Lead Analyst for Data Science, Deep Learning, App Development 2018
  • 6. Data Quality Challenges with Machine Learning Incorrect, Incomplete, Mis-Formatted, and Sparse “Dirty Data” – Mistakes and errors are almost never the patterns you’re looking for in a data set. Sparse data generates other issues. Correcting and standardizing will tend to boost the signal, but must account for bias. Missing context – Many data sources lack context around location or population segments. Unless enriched with other data sets, (e.g. geospatial, demographics, or firmographics data), some ML algorithms will not be usable. Multiple copies – If your data comes from many sources, as it often does, it may contain multiple records of information about the same person, company, product or other entity. Removing duplicates and enhancing the overall depth and accuracy of knowledge about a single entity can make a huge difference. Spurious correlations – Just as missing context may hinder some ML algorithms, inclusion of already correlated data (e.g. city and postal code) may result in overfitting of ML algorithms. Correcting data problems vastly increases a data set’s usefulness for machine learning. But data analysts may not be aware of specific data quality issues that must be addressed to support machine learning. Traditional data quality processes are an effective method to identify defects.
  • 7. Understanding Big Data Quality Data Profiling The set of analytical techniques that evaluate actual data content (vs. metadata) to provide a complete view of each data element in a data source. Provides summarized inferences, and details of value and pattern frequencies to quickly gain data insights. Business Rules The data quality or validation rules that help ensure that data is “fit for use” in its intended operational and decision- making contexts. Covers the accuracy, completeness, consistency, relevance, timeliness and validity of data.
  • 8. Five Key Steps to effective Data Profiling These are not new, but good to reiterate in the context of Big Data: 1. How you want to analyze the data? 2. What should you review? (there's a lot of stuff) 3. What should you look for? (based on data “type”) 4. When should you build rules? (laser-focus; CDE’s) 5. What needs to be communicated?
  • 9. 1. How do you want to analyze the data?
  • 10. Universal DQ best practices: Understand the End Goal • How does the business intend to use the data (i.e. what’s the use case)? • Empower users (“Who”) to gain new clarity into the core problem (“Why”) • What will the data be used for? • What defines the Fitness for your Purpose? Establish Scope • Ask the “right questions” about the use case and the data (not just “what” and “how”) • What data is relevant to the effort? • Big Data or other, you need to set boundaries for the work Understand Context • How does the business define the data? • What are the important characteristics and context of the data? • What are the Critical Data Elements? • What qualities will you need to address, or leave alone? • “High-quality data” definition will vary by business problem“If you don’t know what you want to get out of the data, how can you know what data you need – and what insight you’re looking for?” Wolf Ruzicka, Chairman of the Board at EastBanc Technologies, Blog post: June 1, 2017, “Grow A Data Tree Out Of The “Big Data” Swamp”
  • 11. “ ” Never lead with a data set; lead with a question. Anthony Scriffignano, Chief Data Scientist, Dun & Bradstreet Forbes Insights, May 31, 2017, “The Data Differentiator”
  • 12. To Sample or not to Sample? Sampling helps with: • Data Integration • Source-to-target mapping • Data Modeling • Discovering Correlations When the focus is on the structure of the data ❖ REMEMBER: your target is a statistically valid sample! ❖ ~16k records gives you 99% confidence with a margin of error of 1% for 100B records ❖ ~66k records gives you 99% confidence with a margin of error of .5% for same Full Volume needed with: • Data Quality • Data Governance • Regulatory Compliance • Finding Outliers and Issues with Content • “Needles in the haystack” When the focus is on the quality of or risks within the data ❖ Focus on critical data elements and leverage tools that scale to data volume
  • 13. Big Data at scale distributes data across many nodes – not necessarily with other relevant data! • Processing routines must apply same approach and logic each time • Implications for profiling, joining, sorting, and matching data, whether for enrichment, verification against trusted sources, or a consolidated single view Data Quality functions must be performed in a consistent manner, no matter where actual processing takes place, how the data is segmented, and what the data volume is. • Data quality cleansing and preparation routines have to be reproduced at scale, both to get the data ready to train machine learning models, and to comply with business regulations. • Critical to establishing, building, and maintaining trust Scaling Data Quality best practices: Consistent processing at scale Source: HP Analyst Briefing
  • 14. 2. What do you want to review?
  • 15. Common Data Quality Measurements What measures can we take advantage of? 1. Completeness – Are the relevant fields populated? 2. Integrity – Does the data maintain an internal structural integrity or a relational integrity across sources 3. Uniqueness – Are keys or records unique? 4. Validity – Does the data have the correct values? • Code and reference values • Valid ranges • Valid value combinations 5. Consistency – Is the data at consistent levels of aggregation or does it have consistent valid values over time? 6. Timeliness – Did the data arrive in a time period that makes it useful or usable?
  • 16. New data, new data quality challenges • 3rd Party and external data with unknown provenance or relevance • Bias in the data – whether in collection, extraction, or other processing • Data without standardized structure or formatting • Continuously streaming data • Disjointed data (e.g. gaps in receipt) • Consistency and verification of data sources • Changes and transformation applied to data (i.e. does it really represent the original input) New Data Quality Problems “34 percent of bankers in our survey report that their organization has been the target of adversarial AI at least once, and 78 percent believe automated systems create new risks, such as fake data, external data manipulation, and inherent bias.” Accenture Banking Technology Vision 2018
  • 17. • Contextual visualizations • Value and pattern distributions • Attribute summaries and metadata • Sort and filter to quickly find data of interest • Detail drilldowns to any content Let Data Profiling guide you
  • 18. 3. What should you look for?
  • 19. Common Data Types What variances do you need awareness of? 1. Identifiers – data that uniquely identifies something 2. Indicators – data that flags a specific condition 3. Dates – data that identifies a point in time 4. Quantities – data that identifies an amount or value of something 5. Codes – data that segments other data 6. Text – data that describes or names something
  • 20. Identifiers Use cases: • Business Operations • 360 View of Entity • BI Reporting (incl. EDW) • Analytics • AI/ML Examples: • Customer ID • National ID / Passport # • Social Security # / Tax ID • Product ID What to look for: • 100% Complete • All Unique values • Anomalous patterns • Numeric vs. String Notes: • Needs full volume assessment
  • 21. Indicators (aka Flags) Use cases: • Business Operations • 360 View of Entity • BI Reporting (incl. EDW) • Governance and Compliance • Analytics • AI/ML Examples: • True / False (or T/F) • Yes / No (or Y/N) • 1 / 0 What to look for: • Binary Values only • Consistent pattern • No mixing of “Y” vs “YES” • If NULL occurs, it must be one of the binary values • Skews in frequency distributions Notes: • May need segmentation, filtering, or grouping via business rules to resolve or clarify discrepancies • Often are triggers for other conditions – look for use in business rules, but likely occur downstream
  • 22. Codes Use cases: • Business Operations • 360 View of Entity • BI Reporting (incl. EDW) • Governance and Compliance • Analytics • AI/ML Examples: • Account Status • Credit Rating • Diagnosis/Procedure Codes • Order Status • Postal Code What to look for: • Expected values • Consistent patterns • No mixing of “A” vs “active” • NULL values • Skews in frequency distributions Notes: • May need segmentation, filtering, or grouping via business rules to resolve or clarify discrepancies • Often are triggers for or from other conditions – look for use in business rules • May correlate to other fields
  • 23. Dates Use cases: • Business Operations • BI Reporting (incl. EDW) • Governance and Compliance • Analytics • AI/ML Examples: • Birth Date • Departure Date • Order Date • Shipping Date • Timestamp What to look for: • Skews in frequency distributions • E.g. 01/01/2001 • Anomalous patterns • Numeric vs. String • Unusual values • Missing values and gaps Notes: • May need segmentation, filtering, or grouping via business rules to resolve or clarify
  • 24. Quantities Use cases: • Business Operations • BI Reporting (incl. EDW) • Governance and Compliance • Analytics • AI/ML Examples: • Amount (e.g. item count, amount due) • Price • Sales • Total (e.g. order total) What to look for: • Skews in frequency distributions • Anomalous patterns • Excessively high (or low) values Notes: • May need segmentation, filtering, or grouping via business rules to resolve or clarify
  • 25. Text Use cases: • Business Operations • Building blocks for other identifiers! • 360 View of Entity • Governance and Compliance • Analytics • AI/ML Examples: • Name • Address • Product Description • Claim Description What to look for: • Missing Values • Frequency of patterns / Anomalous patterns • Existence of numerics • Values <= 5 characters • Compound values • Unusual, recurring values • “Do not use” Notes: • Look for correlations with Code values that indicate specific conditions (e.g. values used for testing purposes)
  • 26. 4. When do you build rules?
  • 27. Focus on: • Critical Data Elements (data quality dimensions) • Policy-based conditions (e.g. regulatory compliance) • Correlated data conditions (e.g. If x, then y) • Filtering and segmenting data (refining evaluations; investigating root cause) Build Rules for Defined Conditions
  • 28. • Validate critical requirements within or across data sources • Build common rules that can be readily tested and shared • Evaluate and remediate issues • Take action on incorrect data and defaults • Create flags for subsequent use in marking or remediating data • Filter result sets and export for additional use Benefits of Business Rules
  • 29. 5. What should you communicate?
  • 30. Culture of Data Literacy • “Democratization of Data” requires cultural support • Empowered to ask questions about the data • Trained to understand and use data • Trained to understand approaching and evaluating data quality • Traditional data, new data, machine learning requirements, … • Understand the business context of the data Program of Data Governance • Provide the processes and practices necessary for success • Measure, monitor, and improve • Continous iteration and development Center of Excellence/Knowledge Base • Where do you go to find answers? • Who can help show you how? Communicate!
  • 31. • Annotate what you’ve found • Identify the subject and add a description that is meaningful • Utilize flags, tags, and other indicators to help others distinguish types and severity of issues • Integrate into data governance and BI tools for maximum visibility Annotate Results with Findings
  • 32. Summary Evaluating Big Data It is challenging to keep the end goal in mind • Data comes from multiple disparate systems & sources • The number of touchpoints for policies and rules has grown • There is a higher demand and expectation for seeing data quality in context. • You need to assess and measure the data content if you 5 Key Steps • Remember the end goal – ask questions, use best practices, and establish scope & context • Consider what criteria and dimensions are needed • Focus your attention based on the type of data and the use case • Build rules when necessary to get laser-focused • Determine what needs to be communicated and delivered Gaining insight and measurement of data quality is more critical than ever!