Integrating data across systems has been a perpetual challenge. Unfortunately, the current technology-focused solutions have not helped IT to improve its dismal project success statistics. Data warehouses, BI implementations, and general analytical efforts achieve the same levels of success as other IT projects – approximately 1/3rd are considered successes when measured against price, schedule, or functionality objectives. The first step is determining the appropriate analysis approach to the data system integration challenge. The second step is understanding the strengths and weaknesses of various approaches. Turns out that proper analysis at this stage makes actual technology selection far more accurate. Only when these are accomplished can proper matching between problem and capabilities be achieved as the third step and true business value be delivered. This webinar will illustrate that good systems development more often depends on at least three data management disciplines in order to provide a solid foundation.
Takeaways:
•Data system integration challenge analysis
•Understanding of a range of data system-integration technologies including Problem space (BI, Analytics, Big Data), Data (Warehousing, Vault, Cube) and alternative approaches (Virtualization, Linked Data, Portals, Meta-models)
•Understanding foundational data warehousing & BI concepts based on the Data Management Body of Knowledge (DMBOK)
•How to utilize data warehousing & BI in support of business strategy
2. 2
Copyright 2014 by Data Blueprint
Premise
Two types of listeners …
1. Interested in how to
approach the subject of
warehousing data
– Need to integrate disparate
data
– Need more holistic view of
business operations
– Management just discovered
data warehouses and wants
you to "build one"
2. Have complex and/or
messy data warehouse
practices
– Want to improve them
3. Data Warehousing Strategies
3
Copyright 2014 by Data Blueprint
1. Warehousing data in the context of data
management
2. Motivation for integration technologies
(reporting->BI->Analytics)
3. Warehouse integration technologies
4. Three warehousing architecture foci
5. The use of meta models
6. Guiding principles & best practices
5. You can accomplish
Advanced Data Practices
without becoming proficient
in the Foundational Data
Management Practices
however this will:
• Take longer
• Cost more
• Deliver less
• Present
greater
risk
(with thanks to Tom DeMarco)
Data Management Practices Hierarchy
Advanced
Data
Practices
• MDM
• Mining
• Big Data
• Analytics
• Warehousing
• SOA
Foundational Data Management Practices
5
Copyright 2014 by Data Blueprint
Data Platform/Architecture
Data Governance Data Quality
Data Operations
Data Management Strategy
Technologies
Capabilities
6. UsesReuses
What is data management?
6
Copyright 2014 by Data Blueprint
Sources
Data Governance
Data
Engineering
Data
Delivery
Data
Storage
Specialized Team Skills
Understanding the current
and future data needs of an
enterprise and making that
data effective and efficient in
supporting
business activities
Aiken, P, Allen, M. D., Parker, B., Mattia, A.,
"Measuring Data Management's Maturity:
A Community's Self-Assessment"
IEEE Computer (research feature April 2007)
Data management practices connect
data sources and uses in an
organized and efficient manner
• Storage
• Engineering
• Delivery
• Governance
When executed,
engineering, storage, and
delivery implement governance
Note: does not well-depict data reuse
7. Maintain fit-for-purpose data,
efficiently and effectively
DMM℠ Structure of
5 Integrated
DM Practice Areas
7
Copyright 2014 by Data Blueprint
Manage data coherently
Manage data assets professionally
Data architecture
implementation
Data engineering
implementation
Organizational support
9. DAMA DM BoK & CDMP
9
Copyright 2014 by Data Blueprint
• Data Management Body of
Knowledge (DMBOK)
– Published by DAMA International, the
professional association for
Data Managers (40 chapters worldwide)
– Organized around primary data management
functions focused around data delivery to the
organization and several environmental
elements
• Certified Data Management
Professional (CDMP)
– Series of 3 exams by DAMA International and
ICCP
– Membership in a distinct group of
fellow professionals
– Recognition for specialized knowledge in a
choice of 17 specialty areas
– For more information, please visit:
• www.dama.org, www.iccp.org
10. Data Warehousing & Business Intelligence Management
10Copyright 2014 by Data Blueprint
11. Warehousing data in the context of data management
11
Copyright 2014 by Data Blueprint
Assumes you have
• An overarching data strategy
• A strategy for becoming
familiar with "big data
technologies"
• Made a decision to not make
available (integrating or
storing) needed data
• Decided to increase (or
decrease) the complexity of
existing DM practices
• Decided to learn more about
this DM BoK slice
UsesReusesSources
Data Governance
Data
Engineering
Data
Delivery
Data
Storage
Specialized Team Skills
12. Data Warehousing Strategies
12
Copyright 2014 by Data Blueprint
1. Warehousing data in the context of data
management
2. Motivation for integration technologies
(reporting->BI->Analytics)
3. Warehouse integration technologies
4. Three warehousing architecture foci
5. The use of meta models
6. Guiding principles & best practices
13. Payroll Application
(3rd GL)Payroll Data
(database)
R& D Applications
(researcher supported, no documentation)
R & D
Data
(raw) Mfg. Data
(home grown
database)
Mfg. Applications
(contractor supported)
Finance
Data
(indexed)
Finance Application
(3rd GL, batch
system, no source)
Marketing Application
(4rd GL, query facilities,
no reporting, very large)
Marketing Data
(external database)
Personnel App.
(20 years old,
un-normalized data)
Personnel Data
(database)
Typical System Evolution
13
Copyright 2014 by Data Blueprint
Multiple Sources of
(for example)
Customer Data
14. Payroll Data
(database)
R & D
Data
(raw)
Mfg. Data
(home grown
database)
Finance
Data
(indexed)
Marketing Data
(external database)
Personnel Data
(database)
... Then Integrate
14
Copyright 2014 by Data Blueprint
Organizational
Data
15. Payroll Data
(database)
R & D
Data
(raw)
Mfg. Data
(home grown
database)
Finance
Data
(indexed)
Marketing Data
(external database)
Personnel Data
(database)
... Then Re-architect
15
Copyright 2014 by Data Blueprint
Organizational
Data
16. An organization's integration needs ...
16
Copyright 2014 by Data Blueprint
Software
Package 1
Software
Package 2
Software
Package 3
Software
Package 4
Software
Package 5
Software
Package 6
Data Architecture
... map between and across software packages
18. Descriptive
Ask: What happened? What is happening?
Find: Structured data
Show: Profiles, Bar/pie charts, Narrative
Predictive
Ask: What will happen? Why will it happen?
Find: Structured/unstructured data
Show: Risk Profiles, Pros/Cons, Care Recs
Prescriptive
Ask: What should I do? Why should I do it?
Find: Unstructured/structured data
Show: Strategic Goals, Support Recs
Hemophilia Management Analytics
18
Copyright 2014 by Data Blueprint
19. Target Isn't Just Predicting Pregnancies
19
Copyright 2014 by Data Blueprint
http://rmportal.performedia.com/node/1373 and http://www.predictiveanalyticsworld.com/patimes/target-really-predict-teens-pregnancy-inside-story/ http://rmportal.performedia.com/rm/paw10/gallery_01#1373
20. Basics
20
Copyright 2014 by Data Blueprint
• Users can
"drill"
anywhere
• Entire collection
"cube" is
accessible
• Summaries to
transaction-level
detail
21. Sample questions …
21
Copyright 2014 by Data Blueprint
Cancer patient
revenue across
all facilities
Revenue for diseases
this year versus last
year in the NE region
Total costs and revenue at
top 10 facilities
• Emphasis on the
"cube"
– N dimensions
• Permits different
users to "slice and
dice" subsets of data
• Viewing from different
perspectives
23. Portfolio Analysis
23Copyright 2014 by Data Blueprint
• Bank accounts are of
varying value and risk
• Cube by
– Social status
– Geographical location
– Net value, etc.
• Strategy or goal:
balance return on the loan with risk of default
• How to evaluate the portfolio as a whole?
– Least risk loan may be to the very wealthy, but there are a very
limited number
– Many poor customers, but greater risk
• Solution may combine types of analyses
– When to lend, interest rate charged
24. 15 years ago, CarMax started as a way to make the car buying experience simple, fair, and fun. Today CarMax is a FORTUNE 500 retailer and one of FORTUNE’s “100 Best Companies to
Work For.” And we are hiring talented individuals who are interested in:
--solving original, wide-ranging, and open-ended business problems
--not only discovering new insights, but successfully implementing them
--making a significant mark on a growing company
--developing the fundamental skills for a rewarding business career
If that sounds like you, the Strategy Analyst position is the unique opportunity you’ve been looking for. The strategy team at CarMax currently consists of over 40 analysts, many of whom
are recent college graduates from top schools with a variety of academic backgrounds (computer science, economics, English, engineering, journalism, math, political science). These
analysts lead advances and decisions in several key business areas:-Inventory and pricing—what is the optimal selection of inventory, how do we acquire it, what should we pay for it, what
should we price it for?
-Expansion planning—which markets should we enter and how do we store those markets? Will each $10-30 million store investment generate a sufficient economic return?
-Credit strategy—how can our bank (CarMax Auto Finance) approve more customers for loans and convert more approvals to sales?
-Marketing and consumer insight—how do we reach our customers, increase traffic to our stores, and best use the internet to drive sales and build our brand
-Industry and competitive research—what middle- and long-term risks are we exposed to, and how best do we prepare to respond?
-Production—how do we increase vehicle reconditioning quality while reducing cost and production time?
-Sales process and workforce—what is the best way to serve customers in our stores, and how do we manage, motivate and compensate our sales team?
Even early in your career at CarMax, you will have the responsibility to own an area of the business and will be expected to improve it. For example, one undergraduate recruit used data
analysis to reformulate our retail pricing strategy, pitched and sold his idea to the senior executive team, and implemented a new system nationwide in his first 6 months with the company.
That is the kind of impact you can make at CarMax. And as you do this, you will work closely with the senior executives and analytical managers to develop the fundamental and advanced
skills that underpin a successful career in business. In fact, most of our managers in the strategy group started at CarMax as analysts, and our VP of Strategy and Analysis started his
career here through our undergraduate recruiting program. While an MBA is not required to advance or contribute at CarMax, analysts who have chosen to pursue a business degree have
enjoyed superior acceptance rates at their first choice schools, including Harvard, Chicago, UVa, Columbia, and Duke.
Your opportunities to develop, contribute, and lead as an analyst at CarMax are as great as the company’s opportunity to grow. While CarMax is already the largest used car retailer in the
country (with over $8 billion in sales and over 90 superstores across the country), we have only 2% of the 1 to 6-year-old used car market, which, at $280 billion annually, is bigger than the
home improvement or consumer electronics industries. CarMax is already growing at 15% a year, and over the next 10 years plans to have 250-300 stores and achieve $25+ billion in
annual sales. As an analyst, you can be an integral part of that growth, all while enjoying a casual and friendly environment, a diverse group of talented associates, a healthy work-life
balance, and excellent compensation and benefits.
An ideal candidate will have
--Demonstrated top caliber analytic and problem solving skills --History of achievement demonstrated by top 15% GPA, with a quantitative major(s), and/or other recognition such as
scholarships, awards, honor societies
-- Passion for business and desire to develop into a strong business leader
We encourage you to apply. For more information, please visit us at the career fair, on our website (www.carmax.com/collegerecruiting), or email us at college_recruiting@carmax.com
- datablueprint.com
CarMax Example Job Posting
24Copyright 2014 by Data Blueprint
24
own an area of the business and will be expected to improve it
--solving original, wide-ranging, and open-ended business problems
--not only discovering new insights, but successfully implementing them
--making a significant mark on a growing company
--developing the fundamental skills for a rewarding business career
25. Polling Question #1
25Copyright 2014 by Data Blueprint
• Do you have/have
you started data
warehousing, marts
and/or other
warehousing forms
of integration?
a. Last year (2014)
b. This year (2015)
c. Next Year (2016)
d. Nope
26. Data Warehousing Strategies
26
Copyright 2014 by Data Blueprint
1. Warehousing data in the context of data
management
2. Motivation for integration technologies
(reporting->BI->Analytics)
3. Warehouse integration technologies
4. Three warehousing architecture foci
5. The use of meta models
6. Guiding principles & best practices
28. Warehousing Definitions
28Copyright 2014 by Data Blueprint
• Inmon:
– "A subject oriented, integrated, time variant, and non-volatile
collection of summary and detailed historical data used to support
the strategic decision-making processes of the organization."
• Kimball:
– "A copy of transaction data specifically structured for query and
analysis."
• Key concepts focus on:
– Subjects
– Transactions
– Non-volatility
– Restructuring
32. MetaMatrix Integration Example
32
Copyright 2014 by Data Blueprint
• EII Enterprise Information
Integration
– between ETL and EAI - delivers
tailored views of information to
users at the time that it is
required
33. Linked Data
33
Copyright 2014 by Data Blueprint
Linked Data is about using the Web to connect related data
that wasn't previously linked, or using the Web to lower the
barriers to linking data currently linked using other methods.
More specifically, Wikipedia defines Linked Data as "a term
used to describe a recommended best practice for exposing,
sharing, and connecting pieces of data, information, and
knowledge on the Semantic Web using URIs and RDF."
linkeddata.org
34. Health Care Provider Data Warehouse
34Copyright 2014 by Data Blueprint
"A roomful of MBAs
can accomplish this
analysis faster!"
• 1.8 million members
• 1.4 million providers
• 800,000 providers no key
• 29% prov_ssn ≠ 9 digits
• 2.2% prov_number = 9 digits (required)
• 1 User
• $30 million
37. Reframing the question
37
Copyright 2014 by Data Blueprint
• From: How shall we build this data
warehouse?
– (Worse) … What should go into this warehouse?
• To: How can warehousing capabilities
solve this specific business challenge?
– (Better still) … How can warehousing capabilities
solve this class of business challenges?
• Other examples
– Are you ready for a data warehouse?
✓ Foundational practices
– Will you get it right the first time?
✓ Is the business environment constantly evolving?
✓ Do you have an agreed upon enterprise-wide vocabulary?
– Is your data warehouse intended to be the enterprise
audit-able system of record?
✓ Extract, transform and load requirements
✓ Data transformation requirements
– How fast do you need results?
✓ Performance of inserts vs reads
✓ Project deliverables
38. Data Warehousing Strategies
38
Copyright 2014 by Data Blueprint
1. Warehousing data in the context of data
management
2. Motivation for integration technologies
(reporting->BI->Analytics)
3. Warehouse integration technologies
4. Three warehousing architecture foci
5. The use of meta models
6. Guiding principles & best practices
39. Copyright 2013 by Data Blueprint
Inmon Implementation/3NF
39
OPERATIONAL SYSTEM
OPERATIONAL SYSTEM
FLAT FILES
SUM MARY
DATA
RAW DATA
M ETADATA
PURCHASING
SALES
INVENTORY
ANALYSIS
REPORTING
MINING
40. Third Normal Form
40
Copyright 2014 by Data Blueprint
• Each attribute in the relationship is a fact about a key
– Highly normalized structure
• Use Cases
– Transactional Systems
– Operational Data Stores
41. Third Normal Form: Pros and Cons
41
Copyright 2014 by Data Blueprint
Neo4j.com
• Pros
– Easily understood by business and end users
– Reduced data redundancy
– Enforced referential integrity
– Indexed attributes/flexible querying
• Cons
– Joins can be expensive
– Does not scale
42. Copyright 2013 by Data Blueprint
Kimball Implementation/Dimensional
42
43. • Comprised of “fact tables” that contain quantitative data, and any
number of adjoining “dimension” tables
• Optimized for business reporting
• Use Cases
– OLAP (Online Analytic Processing)
– BI
Star Schema
43
Copyright 2014 by Data Blueprint
Wikipedia
44. Star Schema Pros and Cons
44
Copyright 2014 by Data Blueprint
• Pros
– Simple Design
– Fast Queries
– Most major DBMS are optimized for Star Schema Designs
• Cons
– Questions must be
built into the design
– Data marts are
often centralized
on one fact table
46. Data Vault
46
Copyright 2014 by Data Blueprint
Bukhantsov.org
• Designed to facilitate long-term historical storage, focusing on
ease of implementation
• Retains data lineage information (source/date)
• “All the data, all the time” - hybrid approach of Inmon and Kimball.
• Comprised of Hubs (which contain a list of business keys that do
not change often), Links (Associations/transactions between
hubs), and Satellites (descriptive attributes associated with hubs
and links)
• Use Cases
– Data Warehousing
– Complete Audit-ability
47. Data Vault Pros and Cons
47
Copyright 2014 by Data Blueprint
• Pros
– Simple integration
– Houses immense amounts of
data with excellent
performance
– Full data lineage captured
• Cons
– Complication is pushed to the
“back end”
– Can be difficult to setup for
many data workers
– No widespread support for ETL
tools yet
49. Polling Question #2
49Copyright 2014 by Data Blueprint
• Do you have?
a. A single enterprise data warehouse
b. Coordinated data marts
c. Both
d. Uncoordinated efforts
e. None
50. Data Warehousing Strategies
50
Copyright 2014 by Data Blueprint
1. Warehousing data in the context of data
management
2. Motivation for integration technologies
(reporting->BI->Analytics)
3. Warehouse integration technologies
4. Three warehousing architecture foci
5. The use of meta models
6. Guiding principles & best practices
52. Metadata Data Model
SCREEN
ELEMENT
screen element id #
data item id #
screen element descr.
INTERFACE
ELEMENT
interface element id #
data item id #
interface element descr.
INPUT
ELEMENT
input element id #
data item id #
input element descr.
OUTPUT
ELEMENT
output element id #
data item id #
output element descr.
MODEL
VIEW
model view element id #
data item id #
model view element des.
DEPENDENCY
dependency elem id #
data item id #
process id #
dependency description
CODE
code id #
data item id #
stored data item #
code location
INFORMATION
information id #
data item id #
information descr.
information request
PROCESS
process id #
data item id #
process description
USER TYPE
user type id #
data item id #
information id #
user type description
LOCATION
location id #
information id #
printout element id #
process id #
stored data items id #
user type id #
location description
PRINTOUT
ELEMENT
printout element id #
data item id #
printout element descr.
STORED DATA ITEM
stored data item id #
data item id #
location id #
stored data description
DATA ITEM
data item id #
data item description
52
Copyright 2014 by Data Blueprint
53. Warehouse
Process
Warehouse
Opera.on
Transforma.on
XML
Record-‐
Oriented
Mul.
Dimensional
Rela.onal
Business
Informa.on
So@ware
Deployment
ObjectModel
(Core, Behavioral, Rela.onships, Instance)
Warehouse
Management
Resources
Analysis
Object-‐
Oriented
(ObjectModel)
Foundation
OLAP
Data
Mining
Informa.on
Visualiza.on
Business
Nomenclature
Data
Types
Expressions
Keys
Index
Type
Mapping
Overview of CWM Metamodel
http://www.omg.org/technology/documents/modeling_spec_catalog.htm
53Copyright 2014 by Data Blueprint
54. Marco & Jennings's Complete Meta Data Model
Source:http://dmreview.com/article_sub.cfm?articleID=1000941 used with permission
54
Copyright 2014 by Data Blueprint
55. Data Warehousing Strategies
55
Copyright 2014 by Data Blueprint
1. Warehousing data in the context of data
management
2. Motivation for integration technologies
(reporting->BI->Analytics)
3. Warehouse integration technologies
4. Three warehousing architecture foci
5. The use of meta models
6. Guiding principles & best practices
57. Data Reengineering Leverage
57
Copyright 2014 by Data Blueprint
Data Management Practices
Duplicated but ETLed Data
(quality & transformations applied)
"Warehoused" Data
Learning/
Feedback
Marts
Analytics Practices
58. 58
Copyright 2014 by Data Blueprint
Data Warehousing Strategies
1. Warehousing data in the context of data
management
2. Motivation for integration technologies
(reporting->BI->Analytics)
3. Warehouse integration technologies
4. Three warehousing architecture foci
5. The use of meta models
6. Guiding principles & best practices
59. Data Warehousing & Business Intelligence Management
59Copyright 2014 by Data Blueprint
60. Questions?
60
Copyright 2014 by Data Blueprint
It’s your turn!
Use the chat feature or Twitter (#dataed) to submit
your questions to Peter and Steven now.
61. • www.datablueprint.com/webinar-schedule
• www.Dataversity.net
Brought to you by:
Upcoming Events
January Webinar:
Developing a Data-centric Strategy & Roadmap
January 12, 2016 @ 2:00 PM – 3:30 PM ET
(11:00 AM-12:30 PM PT)
February Webinar:
The Importance of Master Data Management
February 9, 2016 @ 2:00 PM – 3:30 PM ET
(11:00 AM-12:30 PM PT)
Sign up here:
61Copyright 2014 by Data Blueprint
67. 6 Best Practices for Data Warehousing
67Copyright 2014 by Data Blueprint
1.Do some initial architecture
envisioning.
2.Model the details just in time (JIT).
3.Prove the architecture early.
4.Focus on usage.
5.Organize your work by requirements.
6.Active stakeholder participation.
http://www.agiledata.org/essays/dataWarehousingBestPractices.html
69. 5 Key Business Intelligence Trends
69Copyright 2014 by Data Blueprint
1. There's so much data, but too little
insight. More data translates to a
greater need to manage it and make
it actionable.
2. Market consolidation means fewer
choices for business intelligence users.
3. Business Intelligence expands from the Board Room to the
front lines. Increasingly, business intelligence tools will be
available at all levels of the corporation
4. The convergence of structured and unstructured data Will
create better business intelligence.
5. Applications will provide new views of business intelligence
data. The next generation of business intelligence
applications is moving beyond the pie charts and bar charts
into more visual depictions of data and trends.
http://www.cio.com/article/150450/Five_Key_Business_Intelligence_Trends_You_Need_to_Know?page=2&taxonomyId=30