The document discusses data governance and outlines several key points:
1) Data governance is about bringing business and IT together to govern data as a key enterprise asset and ensure there is a common understanding of what data means.
2) Existing tools and approaches are insufficient for handling today's data complexity, and semantic technology can help by clarifying the meaning of data elements.
3) Effective data governance requires a combination of technology, organizational structure, methodology, and culture to define roles and processes for validating and reconciling data across stakeholders.
10. World View Information is meaningful data Information is today ’s currency
11.
12. Solution: Data Governance Bringing business and IT together to govern data as an Enterprise Asset What does my data mean? How can I involve all stakeholders? How do I operationalize DG?
14. The Closed World Syndrome requirements and functionality known and specific data model agreed locally and refers to organisational concepts usually cryptically stored in proprietary format (vendor lock-in) only understood by designer designed for the purpose of one organisation all facts about the domain are already stored;facts not stored are presumed false
15.
16.
17. Limits of Data Integration in The Extended Enterprise users, usage context, and applications largely unknown a priori ontologies refer to language-neutral, context-independent concepts agreed by the community systems must combine by interoperation
18. Sounds familiar? what does it mean “ Customer ” ? “ Customer ” is a type of Party of Person that orders at least two Product Items per Year. so “ Customer ” refers to a class with attributes Pname, Paddress,... ? ...and a Party can either be an Individual or a Company... Aha, and what types of Product Item exist ?
26. Cyc Cyc is an artificial intelligence project that attempts to assemble a comprehensive ontology and knowledge base of everyday common sense knowledge, with the goal of enabling AI applications to perform human-like reasoning. Source: http://en.wikipedia.org/wiki/Cyc
27.
28. Wikipedia "an effort to create and distribute a free encyclopedia of the highest possible quality to every single person on the planet in their own language" Source: http://en.wikipedia.org/wiki/Wikipedia
29.
30. Too much freedom? Evolution evolving December 3, 2001: initial version. July 13, 2002: from controversial to commonly accepted in 2 hours. October 1, 2002: debut of biology grad student at Harvard, good for a total of 79 edits over 3 years. August 9, 2004: black line indicates deletion as vandalism (half of all vandalisms are corrected within 5 minutes). March 29, 2005: longest point, discussion to reduce to neutral point of view September 19, 2005: edit war, with rollbacks rollbacked several times 1 2 3 4 5 6 from IBM Watson Research
41. If airplanes were like your systems… Courtesy of Poppy Quintal (see www.aecma.org/Publications/SEnglish/senglish.htm
42. … would you still board them? Courtesy of Poppy Quintal (see www.aecma.org/Publications/SEnglish/senglish.htm
43.
44.
45.
Notas do Editor
Define your business concepts, facts & rules as a “ shared business language ” in a clear and formal way, understood by both business and IT, using open standards; Manage roles & responsibilities using stewards & stakeholders, including the complexity of organizations and their constant change; Collaborate with all stakeholders to ensure usable data for all people and systems involved, across geographical and organizational boundaries; Validate data against business definitions & rules to ensure reliability and correctness.
By applying ICT, the physical information system is partly replaced by a computerised information subsystem component. In order to define such this technical system part, the designer observes the information system as-is, make a conceptual interpretation of it in his mind, and represents this in terms of a set of software functionality specs and a data model. Language is essential to bridge between reality and its modelling concepts, and is concerned with syntax, semanics, and pragmatics. In this example the real-world personnel record is represented in terms of a table structure which relates the concept employee with its name, address, etcetera. Application software provides the human actors with a graphical interface to access and manipulate the personnel database. We call this a closed information system. It is designed for the purpose of one organisation. The software requirements and functionality are known a priori, and the data model is agreed locally and refers to organisational concepts. Information Systems usually suffer from a closed-world syndrome. They were designed from a naive assumption they have already stored all possible facts about the domain. Facts not in their database are presumed to be false. They are also stored in format only fully understood by designer. Hence it is assumed there never will be a need for data exchange with other systems.
For many years, information systems have been designed with a closed world assumption. However, in today ’ s information-centric economy it is becomes increasingly important that information systems are able to communicate with each other. Consider the integration of the information systems of two HR departments of the same company, one in Brussels and one in Paris. In the underlying computerised information systems the syntax and semantics of the data is different, as their designs were based on different assumptions. E.g, both data models use different labels to refer to the entities that describe personnel information of employees. The designers have to align their data models and make this alignment explicit in order to integrate their systems.
All too often, people see MDM as a problem you solve once through a so-called “ golden record ” that intergates underlying databases. MDM is perceived as a piece of technology you install, as opposed to a discipline that you need to pursue. This approach contradicts with the inherent dynamicity of the data space where errors caused by manual input are not uncommon and the new valuable facts related to a certain business entity may show up every hour. Consequently, the chance of building a successful information governance platform that is scalable and sustainable over longer periods of time based on these premises only is very low.
In an open system ’ s assumption the usage context and applications are unknown before. As communities evolves, new data processing requirements emerge. To dynamically combine the underlying information systems, integration is impossible.In order to establish semantic interoperability between these systems there is a need for an abstraction layer that refers to language-neutral and context-independent concept types. We will call this an ontology.Usually, ontologies are managed by a select team of people with a technical background. However, the definition and evolution of an ontology should not be based on organisational assumptions, but on shared and agreed needs of the community. Bridging this gap introduces many challenges.
The goal is to enact communities in the evolution of their ontologies. The basic principle of CBOE is the co-evolution of three first-class citizens of the community: the social interactions, the underlying information systems, and the ontology that establishes semantic interoperability. Therefore we must consider both social as technical aspects, and the gap in between. In order to bridge the gap, a viable approach must put into practice the necessary activities to identify common needs from ad-hoc social interactions, and bring the community stakeholders together to find an ontological agreement to support these needs. We will coin this collaborative approach Business Semantics Management.
Current solutions fail to bridge this gap So-called metadata management solutions operate more on the technical level and do not provide support for business users to enforce their rules and vocabularies. As a result management of business glossaries and technical metadata are not aligned. Moreover, enterprise utlise several tools offered by differen vendore, which obviously are not compatible with each other. As a result the meaning of data is governed by a walhalla of technical metadata walled gardens.
In order to empower data governance one needs to introduce a full-cycle that co-evolves business definitions with technical metadata counterparts used for semantic interoperability.
This feedback loop is implemented by a methodology that oultines the different activities involved. Doing so, it becomes a true discipline.