This presentation is part of my work for the course 'Heterogeneous and Distributed Information Systems' at TU Berlin within the IT4BI (Information Technology for Business Intelligence) master programme.
WSO2's API Vision: Unifying Control, Empowering Developers
Metadata in Business Intelligence
1. Metadata in Business Intelligence
Jose Luis Lopez Pino
Database Systems and Information Management
Technische Universit¨t Berlin
a
January 28, 2014
v1.2
2. Table of Contents
1 Metadata
What is Metadata?
Metadata for Information
Systems
2 Business Intelligence
What is Business
Intelligence?
Business Intelligence in a
Nutshell
The Dimensional Fact
Model
Data Warehousing
3 Metadata in BI
Motivation
Classification
The Four Commandments
of BI Metadata
4 Examples
ROLAP and Metadata
Oracle Administration Tool
5 Research
Metadata and
Interoperability
Platform-Independent
Models
Metadata in Multiversion
DWH
6 Big Data
Examples
Some Thoughts about
Metadata and Hadoop
7 Conclusions
10 Reasons why Metadata
matters in BI
Final Conclusions
4. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
What is Metadata?
“ Metadata is a set of data that describes and gives
information about other data. ”
— Oxford Dictionary
“ Metadata is explicitly managed data describing other data or
system elements to support their documentation, reusability
and interoperation.” 1
1
Susanne Busse, Ralf-Detlef Kutsche, Ulf Leser, and Herbert Weber.
Federated information systems: Concepts, terminology and architectures.
Citeseer, 1999
Jose Luis Lopez Pino
4
5. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Metadata for Information Systems
Technical metadata: describes information regarding the
technical access mechanisms of components.
Logical metadata: relates to the schemas and their logical
relationships.
Metamodels: supports the interoperability of schemas in
different data models.
Semantic metadata: helps to describe the semantic of
concepts.
Quality-related: describes source-specific properties of
information systems regarding their quality.
Infrastructure metadata: helps users to find relevant data.
User-related metadata: describes responsibilities and
preferences of the users
Jose Luis Lopez Pino
5
7. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
What is Business Intelligence?
Processing and organizing data in order to extract information
and using this information to make business decisions.
“ Business intelligence (BI) is an umbrella term that includes
the applications, infrastructure and tools, and best practices
that enable access to and analysis of information to improve
and optimize decisions and performance.”
— Gartner
Jose Luis Lopez Pino
7
9. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Business Intelligence in a Nutshell I
OLTP: information system oriented to small and interactive
operations
ETL: process that consist of extractions, transformations and
loads of data
Data warehouse: central repository of data used for reporting
and analysis
Datamart: contains a subset of the information of a data
warehouse and it is personalized for a single business view.
OLAP: technique to analyse multi-dimensional data
ROLAP: using a relational database do OLAP analysis
MDX: query language for multidimensional data
Data mining: discovering patterns in data
Jose Luis Lopez Pino
9
10. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Business Intelligence in a Nutshell II
Data visualization: representation of data to make it more
meaningful and/or attractive
Decision support: tools that facilitates making a decision
based on data
Data-driven business: companies leaded by a strategy based
on data
Jose Luis Lopez Pino
10
11. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
The Dimensional Fact Model I
Fact: is an event that is relevant to the decision-making
process.
Measure: is a numerical attribute of the fact
The dimensions categorize the data into a finite number of
slots.
Jose Luis Lopez Pino
11
14. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Data Warehousing
Copyright 2013 Toon Calders http://goo.gl/ds8nZc
Jose Luis Lopez Pino
14
15. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Metadata Management in Data Warehousing
Copyright 2014 LINGARO http://goo.gl/Wfxsni
Jose Luis Lopez Pino
15
17. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Motivation: Quotes I
“Metadata is a vital element of the data warehouse.”
— William Inmon2
“Metadata is the DNA of the data warehouse.”
— Ralph Kimball3
“Metadata is analogous to the data warehouse encyclopedia.”
— Ralph Kimball3
2
William H Inmon. Metadata in the Data Warehouse. Morgan Kaufmann,
2000
3
Ralph Kimball. The data warehouse lifecycle toolkit: expert methods for
designing, developing, and deploying data warehouses. Wiley. com, 1998
Jose Luis Lopez Pino
17
18. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Motivation: Quotes II
“The fact that metadata drives the warehouse is the literal
truth. If you think you wont use metadata, you are mistaken.”
— Ralph Kimball4
“In the scope of data warehousing, meta-data plays an
essential role because it specifies source, values, usage and
features of data warehouse data and defines how data can be
changed and processed at every architecture layer.”
— Matteo Golfarelli, Stefano Rizzi5
4
Ralph Kimball. The data warehouse lifecycle toolkit: expert methods for
designing, developing, and deploying data warehouses. Wiley. com, 1998
5
M. Golfarelli and S. Rizzi. Data Warehouse Design: Modern Principles and
Methodologies. Mcgraw-Hill, 2009
Jose Luis Lopez Pino
18
19. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Metadata is everywhere!
Meaning of the objects.
User profiles.
Security permissions.
Usage statistics.
Logical model.
Relation between physical and logical objects.
DBMS metadata: tables, indexes, FKs, PKs, etc.
Reporting / Data analysis objects.
Transformations of the data.
Data sources and data targets.
Query logs.
ETL logs.
Materialized information.
Jose Luis Lopez Pino
19
20. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Classification
1. Technical metadata:
Describes the physical objects that make up the datata
warehouse.
Tables, fields, indexes, sources, targets, transformations, etc.
2. Business metadata:
Describes the contents of the data warehouse in an accessible
way to conduct the day-to-day business.6
Facts, dimensions, logical relationships, etc.
3. Process metadata:
Describes operations executed on the warehouse and their
results.
Results of the ETL process, query logging, etc.
6
William H Inmon, Bonnie O’Neil, and Lowell Fryman. Business Metadata:
Capturing Enterprise Knowledge: Capturing Enterprise Knowledge. Morgan
Kaufmann, 2010
Jose Luis Lopez Pino
20
21. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
The Four Commandments of BI Metadata
A data warehouses likelihood for success is greatly increased by
following Ralph Kimball advices:7
1. Be aware of what metadata you keep.
2. Centralize it where possible.
3. Track your metadata.
4. Keep it up to date.
7
Ralph Kimball. The data warehouse lifecycle toolkit: expert methods for
designing, developing, and deploying data warehouses. Wiley. com, 1998
Jose Luis Lopez Pino
21
23. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
ROLAP and Metadata
Figure: PostgreSQL’s ROLAP server translates MDX query into SQL
Jose Luis Lopez Pino
23
24. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
ROLAP and Metadata
SELECT
E x p e n s e s . ” E x p e n s e s p e r day ” saw 0 ,
E x p e n s e s . ” Days w i t h e x p e n s e s ” saw 1 ,
E x p e n s e s . ” T o t a l E x p e n s e s ” saw 2 ,
P e r i o d . ” Year ” saw 3
FROM ”HR − T r a v e l E x p e n s e s ”
ORDER BY saw 3
1
2
3
4
5
6
7
Figure: MDX Query
Jose Luis Lopez Pino
24
25. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
ROLAP and Metadata
select
sum( case when T1757 . ZD NUM = 0 then 0 e l s e ( T1757 .
ZMDTACE NAC IM + 1 7 5 7 .ZMDTACO NAC IM + T1757 . ZD NAC IM +
T1757 . ZCOMD NAC IM + T1757 . ZCOMDDIC IM + T1757 .
ZMDTACE EXT IM + T1757 . ZMDTACO EXT IM + T1757 . ZD EXT IM +
T1757 . ZCOMD EXT IM) / n u l l i f ( T1757 . ZD NUM, 0 ) end ) as
c1 ,
sum( T1757 . ZD NUM) as c2 ,
sum( T1757 . ZCLV 032 + T1757 . ZCLV 132 ) as c3 ,
T623 .YEAR as c4
from
SYSADM. PS ZOBI CALENDA VW T623 ,
SYSADM. PS ZOBI DS TBL T1757
where ( T623 . MONTH OF YEAR = T1757 . MONTH OF YEAR and T1757 .
ZID COL = ’T ’ and T623 . MONTH OF YEAR <= 201206 and T623 .
YEAR between 2012 − 2 and 2012 )
group by T623 .YEAR
o r d e r by c4
1
2
3
4
5
6
7
8
9
10
11
Figure: SQL Query
Jose Luis Lopez Pino
25
26. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Oracle Administration Tool
Figure: The physical layer stores the tehnical metadata meanwhile the
other two layers store the business metadata.
Jose Luis Lopez Pino
26
27. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Advantages
Abstraction: the data analysts do not need to have knowledge
of the complex data sources involved in the system. Data
analysts only worry about the business question, not about
how to answer it.
Portability: the changes on the physical model don’t affect
the logical model.
Security: defining a strong security policy allow the
administrators to restrict the access of the users to
information that they must not know about.
Customization: the information is adapted to the user.
Azriel Marla and Bob Ertl. Oracle fusion middleware metadata repository
builder’s guide for oracle business intelligence enterprise edition, 11g release 1
(11.1. 1), 2011
Jose Luis Lopez Pino
27
29. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Metadata and Interoperability
The BI environment is compound of a wide variety of tools
Complex bridges are crucial to integrate metadata among
them.
It is necessary to define a standard to facilitate the
interoperability and integration.
Some attempts:
Open Information Model (OIM) by Meta Data Coalition.
Common Warehouse Metamodel (CWM) by OMG.
OIM was integrated to CWM.
Suggestion: to use domain ontologies to establish semantic
mappings between different data-marts
Stefano Rizzi, Alberto Abell´, Jens Lechtenb¨rger, and Juan Trujillo.
o
o
Research in data warehouse modeling and design: dead or alive? In Proceedings
of the 9th ACM international workshop on Data warehousing and OLAP, pages
3–10. ACM, 2006
Jose Luis Lopez Pino
29
30. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
How Standards proliferate?
Figure: XKCD http://xkcd.com/927/
Jose Luis Lopez Pino
30
31. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
OIM Vs. CWD
They both are metadata standards for data warehousing
OIM’s scope is wider, not only for metadata.
Good for technical metadata, not for business metadata.
OIM is limited to relational data.
Using CWM, metadata exchange between tools that use the
XMI standard is automatic.
Thomas Vetterli, Anca Vaduva, and Martin Staudt. Metadata standards for
data warehousing: open information model vs. common warehouse metadata.
ACM Sigmod Record, 29(3):68–75, 2000
Jose Luis Lopez Pino
31
32. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Platform-Independent Models
The problem: You have to provide OLAP metadata to bridge
the gap between the conceptual and logical model. This
metadata depends on the platform.
The solution:
Define an OLAP algebra that provides semantics in
multidimensional models.
It derives the logical design automatically, for any platform.
Model Driven Architecture: derive the metadata from the
conceptual model.
Jes´s Pardillo, Jose-Norberto Maz´n, and Juan Trujillo. Bridging the
u
o
semantic gap in olap models: platform-independent queries. In Proceedings of
the ACM 11th international workshop on Data warehousing and OLAP, pages
89–96. ACM, 2008
Jes´s Pardillo, Jose-Norberto Maz´n, and Juan Trujillo. Towards the
u
o
automatic generation of analytical end-user tools metadata for data warehouses.
In Sharing Data, Information and Knowledge, pages 203–206. Springer, 2008
Jose Luis Lopez Pino
32
33. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Metadata in Multiversion DWH
Multiversion DWH:
It keeps track of the changes in the schema and the data.
Metadata become more complex and useful in these systems.
Proposal:
Use a metamodel to manage different versions of the DWH.
Use a metamodel to detect changes in the external data
sources.
Robert Wrembel and Bartosz Bebel. Metadata management in a
multiversion data warehouse. In On the Move to Meaningful Internet Systems
2005: CoopIS, DOA, and ODBASE, pages 1347–1364. Springer, 2005
Jose Luis Lopez Pino
33
35. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Examples: HDFS
The NameNode stores all the metadata in a single point.
It keeps all the metadata in memory.
It might be problematic when we store a vast amount of small
files14
14
Grant Mackey, Saba Sehrish, and Jun Wang. Improving metadata
management for small files in hdfs. In Cluster Computing and Workshops, 2009.
CLUSTER’09. IEEE International Conference on, pages 1–4. IEEE, 2009
Jose Luis Lopez Pino
35
36. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Examples: Query Planner
Figure: Apache Drill architecture: http://goo.gl/icZctF
Jose Luis Lopez Pino
36
37. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Examples: Table and Storage Management Layer
Figure: HCatalog http://goo.gl/7E1xLc
Jose Luis Lopez Pino
37
38. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Examples: Authorization to Data and Metadata
Figure: Apache Sentry: http://goo.gl/zAsIyk
Jose Luis Lopez Pino
38
39. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Some Thoughts about Metadata and Hadoop
Technical metadata is necessary.
Hadoop is rapidly becoming a mature platform and hence
metadata will be more relevant in the following years.
Metadata seems to be a perfect fit for the heterogeneous
Hadoop ecosystem.
Jose Luis Lopez Pino
39
41. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
10 Reasons why Metadata matters in BI
1. It’s everywhere!
2. It meets the disparate needs of the data warehouses technical,
administrative, and business user groups.
3. It contains information at least as valuable as regular data.
4. It is used to describe the semantic of concepts.
5. It facilitates the extraction, transformation and load process.
6. It improves data security.
7. It hides implementation details.
8. We can customize how the user sees the data.
9. It helps interoperability among systems.
10. It allow us to design portable solutions.
Jose Luis Lopez Pino
41
42. Metadata
Business Intelligence
Metadata in BI
Examples
Research
Big Data
Conclusions
Final Conclusions
1. Metadata matters
2. Metadata is everywhere.
You can’t get out of
dodge
3. Research is alive
4. Metadata management is
less painful when using
the right tools
5. Big data challenges are
eased by metadata
Jose Luis Lopez Pino
42