These are the slides from my 2013 SQL Saturday presentations in Mountain View and Sacramento. I suggest you view the (newer) videos, as they cover all that material and more. However, here is the session description these slides cover:
A recent survey by Information Week found that data quality is the greatest barrier to BI adoption in enterprises. MDS addresses this challenge with modeling, validation, alerting and security capabilities. In this presentation, you will learn how to use MDS to model your data to ensure correctness, update it with changes from your ERP, and create workflows with notifications. Next you will learn the capabilities of DQS and see how it addresses data standardization, completeness and other challenges. You will then see how to use them together to enable Enterprise Information Management. BI professionals will come away with knowledge on how to use tools that address the greatest risk to success for BI projects - data quality
2. Mark Gschwind
Independent Consultant
Business Intelligence practitioner, manager since 1995
Over 50 Business BI projects
Data Warehousing/Cubing/Reporting/Data Mining/EIM
MCP, certified in Oracle Essbase, Melissa Data MVP
Working with clients on EIM since 2008
mark@gschwindconsulting.com
find me on
www.linkedin.com/in/markgschwind
Blog Site:
www.marksbiblog.com
3. Agenda
Enterprise Information Management (EIM)
What is it and why do we need it?
Microsoft EIM, 3 technologies working together
DQS
• Capabilities
• Demo
SSIS
MDS
• Capabilities
• Demo
EIM=DQS+MDS+SSIS
Wrap up
Questions
8. DQS: What is Data Quality?
Data Quality represents the degree to which the
data is suitable for business usages
Data Quality is built through People + Processes +
Technology
Bad Data Bad Business
“Poor data quality can cost companies 15%
to 25% (or more) of their operating budget”
- Larry English (International Data Quality Expert)
9. Common Data Quality Issues
Data
Quality
Issue Sample Data Problem
Standard Are data elements consistently
defined and understood?
Gender code = M, F, U in one system and
Gender code = 0, 1, 2 in another system
Complete Is all necessary data present? 20% of customers‟ last name is blank,
50% of zip-codes are 99999
Accurate Does the data accurately
represent reality or a verifiable
source?
A Supplier is listed as „Active‟ but went out of
business six years ago
Valid Do data values fall within
acceptable ranges?
Salary values should be between
60,000-120,000
Unique Data appears several times Both John Ryan and Jack Ryan appear in
the system – are they the same person?
10. Common Issues DQS Addresses
Name Gender Street House # Zip code City State D.O.B
John Doe Male 60th street 45 New York New York 08/12/64
Jane Doe Male Jonathan ln 36 10023 Poughkeepsy NY 21-dec-1954
Name Gender Street House # Zip
code
City State D.O.B
John Doe Male E 60th St 45W 10022 New York NY 08/12/64
Jane Doe Female Jonathan
Lane
36 10023 Poughkeepsie NY 12/21/54
Name Address Postal Code City State
John Smith 545 S Valley View Drive # 136 34563 Anytown New York
Margaret & John smith 545 Valley View ave unit 136 34563-2341 Anytown New York
Maggie Smith 545 S Valley View Dr Anytown New York
John Smith 545 Valley Drive St. 34253 NY NY
Name Address Zip Code City State Cluster
John Smith 545 S Valley View Drive # 136 34563 Anytown New York 1
Margaret & John smith 545 Valley View ave unit 136 34563-2341 Anytown New York 1
Maggie Smith 545 S Valley View Dr Anytown New York 1
John Smith 545 Valley Drive St. 34253 NY NY 2
Before
Before
After
After
Completeness Accuracy Conformity Consistency Uniqueness
11. DQS Use Cases
• One-Time cleanups
o Merge/Migrate multiple divisional CRMs into one
• Continuous Process with Steward Intervention
o Vendor master with continuous trickle of data
o Customer master with incomplete data
• Continuous Process with Minimal Intervention
o Database marketing mailing list
15. MDS: What is Master Data?
Continuous quality management
Ease of use for business users (not just IT)
Effective sharing (producing and consuming)
Centralized maintenance, by different departments
Changes that keep pace with the business
Master Data contains different attributes for
different departments
(marketing, finance, operations, business
groups…)
The challenge: To make a trusted single source
of business data used across multiple
systems, applications, and processes
16. MDS Use Cases
Regulatory
Enable security
management and auditing
of data used for
regulatory reporting
Data Warehouse /
Data Marts Mgmt
Operational Data
Management
Enable business users to
manage the dimensions
and hierarchies of DW /
Data Marts
Central data records
mgmt and consumption
sourced by other
operational systems
A company has adopted 6 “best
of breed” systems from
different vendors. They need
to be able to propagate the
correct customer information to
each system in a consistent
way.
MDS provides a platform for
central schema, integration
points and validation for
SI/ISV/Internal IT to develop a
custom solution
The IT department has built a
data warehouse and reporting
platform, but business users
complain about the
correctness of the dimensions
and lack of agility in making
updates.
MDS empowers the
business users to manage
dimensions themselves
while IT can govern the
changes
There are 3 G/L systems
whose G/L accounts need to
be consolidated and rolled up
to create financial statements
for regulatory reporting to
several countries
MDS enables an approval
process for changes with
role-based security and
transactional auditing of all
changes
18. Versioning
Validation
Authoring business rules
to ensure data
correctness
Modeling
Entities, Attributes,
Hierarchies
Enabling Integration & Sharing
MDS Capabilities
Role-based Security and
Transaction Annotation
Master Data
Stewardship
External
(CRM, ..)
Excel DWH
Loading batched
data through
Staging Tables
Consuming data
through Views
Registering to
changes through
APIs
Excel Add-In Web UI
Workflow /
Notifications
Data Matching
(DQS Integrated)
19. MDS Architecture
MDS Database
Entity Based
Staging Tables
Subscription
Views
IIS Service
MDS Service
Excel Add-InWEB-UI
External
System
CRM/ERP
Workflow /
Notifications
DWH
Excel Cleansing and
Matching
(DQS)
Silverlight
SSIS
SSIS
SSIS
BI
OLAP
External System
WCF
PW
Pivot
BizTalk / Others
21. Business Rules
Business Rules are expressions and actions that
can govern the conduct of business processes*
Enable data governance by:
-- Enforcing data standards
-- Alerting users to data quality issues
-- Creating simple workflows
Have limitations, but can be extended
*EIM = DQS+MDS+SSIS+People+Process
22. Security
Functional area permissions
Model/Entity level permissions provide column-
level security
Hierarchy permissions allow row-level security
Use AD groups, not individual users
Only use Hierarchy permissions if row-level
security is required
24. Key Takeaways
SQL Server has tools to address EIM, the biggest
impediment to BI success
EIM is People + Processes enabled by Technology
Notas do Editor
Working w these EIM technologies for 5 years, 7 implementations
How many people are using MDS or DQS ? How many people are using something else for MDM ?Need to start w a little background…
http://reports.informationweek.com/cart/index/downloadasset/id/8574“2013 Analytics & Information Management Trends” (in 2012 was “2012 BI and Information Management Trends”)Was top barrier in 2011 as well
Today I will show you 3 tools that address these top 3 impediments to success
Microsoft has 3 tools that work together to address these challengesThese technologies + People+ Processes is the MSFT strategy to Product accurate, trustworthy dataMDS appeared in 2008R2 (acquired Stratature), DQS in 2012 (acquired Zoomix). Integration of these products is a work-in-progress.
Data Quality is kind of like doing the dishes; a lot of work you don’t get much credit for
Larry English claims that “Poor data quality can cost companies 15% to 25% (or more) of their operating budget”Good discussion on the cost of bad data is here:http://dataqualitybook.com/?p=300
<skip>
Now, how to address DQ use cases
DQS is a Knowledge-Driven data quality solution,ie you must know some things about your data in order to cleanse it.Ie, you must know rules to identify valid values, lists of valid values, etc.Create a process to continually improve the KBReference data from the azure marketplace
Transition: from a “Continuous Process with Steward Intervention” use case to “Continuous Process with Minimal Intervention”Map values to a kb + domains in DQS, can do a conditional split on bad values etc
Transition: we’ve gone through 2 legs of EIM (DQS and SSIS), not the 3rd leg, MDS…most of us know what master data is, but stating some things about it will help frame our discussion about it.Because of its importance, it can be in the center of many business processes and hence must be effectively shared for both producing and consumingWhat MDS does is enable these different groups bring their objects together and they can be cared for centrallyOnce an organization has this, it can be used in a number of scenarios
Explaining by saying where it ends up
Let’s talk about MDS’s capabilities for addressing these use casesIn the center we have our data steward who uses the MDS web UI and Excel addin to continuously maintain data qualityModeling an enterprise’s master data objects is a capability brought to the data stewardship process, as well as…DQS – some integration, won’t be showing tonightData Quality Services is acquired from Zoomix in 2008MDS is acquired from Stratature in 2007
Now let’s talk about the underlying technologies supporting these capabilitiesA requirement for any MDM system these days is it has to be SOAP-enabled, to interact with ERPs like SAP and Oracle.The Windows Communication Foundation (or WCF), is an application programming interface (API) in the .NET Framework for building connected, service-oriented applications.The Excel addin communicates through WCF, the Web UI uses Silverlight 5 (new in 2012 and enhances the performance)BizTalk allows organizations to more easily connect disparate systems with over 25 multi-platform adapters and a robust messaging infrastructure.External systems can interact w MDS either through the WCF to the MDS service, or more directly with SQL tablesMention the database can be sql 2008 or sql 2012
DEMOS TO DO:TileSample
Slide Goal: Review what was saidThese technologies + People+ Processes is the MSFT strategy to Product accurate, trustworthy data