This presentation was from a joint BCS/DAMA event on 20/6/13 discussing different aspects of assessing data quality and the role that data quality dimensions can play. This presentation was by Tim King, LSC Group who provided an overview on ISO8000 and the standards perspectives to assessing data quality.
The video for this presentation is available here https://www.youtube.com/watch?v=kftnEO_A49c
The Great Data Debate (3) ISO8000: Systemic and systematic data quality, T.King
1. The Great Data Debate –
Do data quality dimensions have a place in
assessing data quality?
DAMA UK/ BCS Data Management Specialist Group – 20th June 2013
3. ISO 8000: Systemic & systematic
data quality
Dr. Timothy M. KING CEng CITP FIMechE FBCS DIC ACGI
IKM Principal Consultant, LSC Group
Convenor, ISO/TC184/SC4/WG13
DAMA / BCS DSMG
Do data quality dimensions have a place in assessing data quality?
2013-06-20
4. The context
• ISO/TC184/SC4
– "Industrial data"
– sub-committee of ISO/TC184 – "Automation systems & integration"
– founded July 1984
• standards for exchange, sharing & archiving of industrial data
– ISO 10303 – Product data representation & exchange
– ISO 13584 – Parts library
– ISO 15531 – Industrial manufacturing management data
– ISO 15926 – Integration of life-cycle data for process plants
– ISO 16739 – Data sharing in the construction & facility management industries
– ISO 17506 – 3D visualization of industrial data
– ISO 18629 – Process specification language
– ISO 18876 – Integration of industrial data for exchange, access & sharing
– ISO 22745 – Open technical dictionaries & their application to master data
– ISO 29002 – Exchange of characteristic data
23
5. The context
• standards for exchange, sharing & archiving of industrial data
– ISO 10303 – Product data representation & exchange
– ISO 13584 – Parts library
– ISO 15531 – Industrial manufacturing management data
– ISO 15926 – Integration of life-cycle data for process plants
– ISO 16739 – Data sharing in the construction & facility management industries
– ISO 17506 – 3D visualization of industrial data
– ISO 18629 – Process specification language
– ISO 18876 – Integration of industrial data for exchange, access & sharing
– ISO 22745 – Open technical dictionaries & their application to master data
– ISO 29002 – Exchange of characteristic data
ISO/TC184/SC4/WG13 "Industrial data quality"
developing ISO 8000 "Data quality"
since 2006
24
6. ISO/TC184/SC4/WG13
• "Industrial data"
• founded 2006
• three face-to-face meetings per year
– two in parallel with parent committee ISO/TC184/SC4
• teleconference calls using Webex
– provided by ISO with free dial capability for all participants
• e-mail distribution list
– 150+ experts (including academics, engineers, scientists, consultants)
– 20+ countries
– manufacturing, logistics, mining, health, finance
• typical attendance at meetings of 15 to 20 individuals
25
8. What is data quality?
• ... lost upon entry into orbit around Mars
• the Executive Summary from the Mishap
Investigation Board identified that the
primary cause of the accident was a data
quality issue …
The Mars Climate Orbiter
"thruster performance data in English units
was used … the data … was required to be in
metric units per existing software interface
documentation"
27
9. What is data quality?
data quality
spare part in warehouse
but not recorded in
computer
number in stock
= 0
data has no sensible
interpretation
length of bolt
= "green"
self-intersecting curve in CAD file
28
10. What is data quality?
• ISO/IEC 25012 (Software engineering data quality model)
• ISO/IEC 15288 (Systems engineering)
• Accenture
• US Defense Logistics Information Service
• Butler Group
• Korean Database Promotion Centre
• Shell
• UK MOD Acquisition Management System
• DGIQ (German Data & Information Quality Association)
• IAIDQ (International Association for Information & Data Quality)
29
11. What is data quality?
accessibility accessibility / security accuracy
appropriate amount of data authenticity availability believability
changeability clarity compatibility complete completeness
compliance concise representation conciseness confidential
confidentiality conformance with business rules congruity
consistency consistent representation correctness cost / benefit
credibility currency current currentness ease of manipulation
efficiency flexibility free of error inaccurate integrity
interpretability legible liability necessity objectivity outdated
portability precision protection recoverability redundancy
redundant referential integrity relevance relevancy relevant
reputation retrievability safety security sufficiency timeliness
timeliness / timely traceability unanimity understandability
usability utility utilization validity validity of data content
validity of format value added verifiable
30
12. ISO/IEC 25012
(Software engineering data quality model)
accessibility accessibility / security accuracy
appropriate amount of data authenticity availability believability
changeability clarity compatibility complete completeness
compliance concise representation conciseness confidential
confidentiality conformance with business rules congruity
consistency consistent representation correctness cost / benefit
credibility currency current currentness ease of manipulation
efficiency flexibility free of error inaccurate integrity
interpretability legible liability necessity objectivity outdated
portability precision protection recoverability redundancy
redundant referential integrity relevance relevancy relevant
reputation retrievability safety security sufficiency timeliness
timeliness / timely traceability unanimity understandability
usability utility utilization validity validity of data content
validity of format value added verifiable
31
13. IAIDQ
(International Association for Information & Data Quality)
accessibility accessibility / security accuracy
appropriate amount of data authenticity availability believability
changeability clarity compatibility complete completeness
compliance concise representation conciseness confidential
confidentiality conformance with business rules congruity
consistency consistent representation correctness cost / benefit
credibility currency current currentness ease of manipulation
efficiency flexibility free of error inaccurate integrity
interpretability legible liability necessity objectivity outdated
portability precision protection recoverability redundancy
redundant referential integrity relevance relevancy relevant
reputation retrievability safety security sufficiency timeliness
timeliness / timely traceability unanimity understandability
usability utility utilization validity validity of data content
validity of format value added verifiable
32
14. What is data quality?
ISO/IEC 25012
Software engineering data
quality model
IAIDQ
International Association for
Information & Data Quality
accessibility accessibility / security accuracy
appropriate amount of data authenticity availability believability
changeability clarity compatibility complete completeness
compliance concise representation conciseness confidential
confidentiality conformance with business rules congruity consistency
consistent representation correctness cost / benefit credibility
currency current currentness ease of manipulation efficiency
flexibility free of error inaccurate integrity interpretability legible
liability necessity objectivity outdated portability precision
protection recoverability redundancy redundant referential integrity
relevance relevancy relevant reputation retrievability safety
security sufficiency timeliness timeliness / timely traceability
unanimity understandability usability utility utilization validity
validity of data content validity of format value added verifiable
accessibility accessibility / security accuracy
appropriate amount of data authenticity availability believability
changeability clarity compatibility complete completeness
compliance concise representation conciseness confidential
confidentiality conformance with business rules congruity consistency
consistent representation correctness cost / benefit credibility
currency current currentness ease of manipulation efficiency
flexibility free of error inaccurate integrity interpretability legible
liability necessity objectivity outdated portability precision
protection recoverability redundancy redundant referential integrity
relevance relevancy relevant reputation retrievability safety
security sufficiency timeliness timeliness / timely traceability
unanimity understandability usability utility utilization validity
validity of data content validity of format value added verifiable
33
16. The fundamentals of quality
continual
improvement
of the quality
management
system
customer
ISO 9000:2005
A process-based
quality management
systemaccountability
measurement,
analysis &
improvement
management
responsibility
resource
management
satisfaction
output
input
requirements
product
product
realization
35
17. Information & data quality
continual
improvement
of the quality
management
system
customer
ISO 9000:2005
A process-based
quality management
systemaccountability
measurement,
analysis &
improvement
management
responsibility
resource
management
satisfaction
output
input
requirements
product
product
realization
for data processes, "product" is
data
product
quality is conformance
to requirements, data
quality is conformance
to data requirements
requirements
a process focus is the basis on which
to build in quality
product
realization
36
18. The different perspectives on
information & data quality
business
processes
• the primary, core
processes of interest
to the user, involving
making decisions &
achieving outcomes
for which the user is
responsible
• examples of these
processes include
designing an aircraft,
recruiting a new
member of staff,
extinguishing a fire,
manufacturing ice
cream etc.
37
19. The different perspectives on
information & data quality
business
processes
information
management
• the means by which
data are made available
to ensure the right
person at the right time
can make the right
decision as part of a
particular business
process
• ISO 15288 identifies the
following tasks as
forming information
management: generate,
collect, transform,
retain, retrieve,
disseminate & dispose
DAMA-DMBOK Guide
• data governance
• data architecture
management
• data development
• database operations
management
• data security
management
• reference & master data
management
• data warehousing &
business intelligence
management
• document & content
management
• meta data management
• data quality management
38
20. The different perspectives on
information & data quality
business
processes
information
management
data enable
processes
processes
create data
resources enable
information management
• any component by
which to achieve the
required outcomes of
information
management
• these resources
include people,
software & hardware
39
21. The different perspectives on
information & data quality
business
processes
information
management
data enable
processes
processes
create data
resources enable
information management
process focus
quality
management &
process
maturity
data focus
quality =
conformance
of data to
requirements
ISO 9000
ISO 15504
(ISO 33000)
three types
of quality
• syntactic
• semantic
• pragmatic
40
22. ISO 8000 – In-scope list
• The following are within the scope of ISO 8000:
– principles of data quality;
– characteristics of data that determine its quality;
– requirements for achieving data quality;
– requirements for the representation of data
requirements, measurement methods, and inspection
results for the purposes of data quality;
– frameworks for measuring and improving data quality.
41
23. The parts of ISO 8000
General
Information & data
focus
Process focus
42
24. The parts of ISO 8000
General
Information & data
focus
Process focus
1 Overview, principles & general requirements
2 Terminology
3 Taxonomy
43
25. The parts of ISO 8000
General
Information & data
focus
Process focus
8 Information quality: Concepts & measuring
9 Information quality: Relationship to other standards
10
Exchange of data: Syntax, semantic encoding &
conformance to data specification
20 Exchange of data: Provenance
30 Exchange of data: Accuracy
40 Exchange of data: Completeness
100 Master data: Overview
102 Master data: Terminology
110
Master data: Exchange of characteristic data: Syntax,
semantic encoding & conformance to data specification
120 Master data: Provenance
130 Master data: Accuracy
140 Master data: Completeness
311 Usage guide for ISO 10303-59 (Product data quality-shape)
44
26. The parts of ISO 8000
General
Information & data
focus
Process focus
60
Data quality management: The overview of process
assessment
61 Data quality management: Process reference model
62
Data quality management: Process maturity assessment
model
63 Data quality management: Measurement framework
150 Master data: Quality management framework
45
27. Some complications
• "information" & "data"
– definitions from ISO/IEC 2382-1:1993
• data: "re-interpretable representation of information in a formalized
manner suitable for communication, interpretation, or processing"
• information: "knowledge concerning objects, such as facts, events,
things, processes, or ideas, including concepts, that within a certain
context has a particular meaning"
• attributes? dimensions? does data have colour?
– try reading warning notices in red text when wearing night
vision goggles …
– multiple layers to the issue
• ISO/IEC 25012: "Software engineering data quality model"
46
29. ISO 8000-120
Master Data Warehouse
Portable master data with
provenance
Load Data
Capture
provenance data
Map metadata to
eOTD
Convert to ISO
22745-40 data
stream
ERP
ISO 22745
Managed Ontology
Terminology
Data requirements
Classifications
Description rules
Data Integration
Master Data Cleansing
1. Identify reference data
2. Identify or assign class
3. Assign data requirement
4. Map properties (attributes)
5. Identify & standardize values
6. Obtain missing data (enrich)
7. Validate data
Create multilingual
descriptions
Identify potential
duplicates
ECCMA
Managed Ontology
Terminology (eOTD)
Data requirements (eDRR)
Classifications (eCLR)
ISO 8000 in implementation form
Courtesy
of PiLog
48
30. Rigorous statement & exchange of requirements
Data
requester
Data
provider
Sub
Request for data
eOTD-q-xml
ISO 22745-35
Data exchange
eOTD-r-xml
ISO 22745-40
Request for data
eOTD-q-xml
ISO 22745-35
Data exchange
eOTD-r-xml
ISO 22745-40
Data requirement
eOTD-i-xml
ISO 22745-30
49
31. 52368965412 – Tire Bridgestone
435/95 R25
56329845 – Tyre BS 435/R25
Standard Purpose E3 2 Star Radial
125435 – Bridge Stone 25inch
435/95
965123465 – Tyre Bridgestone Part
Number 12345
Inventory rationalization as a result of ISO 8000
Common ERP descriptions
Standardised Long Description:
Tire: Pneumatic, Vehicular: Service
Type for Which Designed: Loader Tire
Rim Nominal Diameter: 25' Tire
Width: 445mm Aspect Ratio: 0.95 Tire
Ply Arrangement: Radial Ply
Rating: 2* Tire & Rim Association
Number: E3 Tread Material: Standard
Tire Air Retention Method: Tubeless
Tire Load Index and Speed
Symbol: NA Tread Pattern: VHB TKPH
Rating: 80
Standardised Short Description:
Tire Pneumatic: Loader 25‘ 445mm
0.95 2*
50
32. The benefits of ISO 8000
vague data
requirements
human-readable
requirements
requirements differ
from project to project
repeated cleansing of
same non-conformances
ad hoc approaches to
validation
explicit, measurable
data requirements
computer-processable
requirements
classified, common
types of requirement
data right, first
& every time
recommended types
of validation
51
33. Conclusions
• systematic
– alignment with ISO 9000 principles of quality
– driven by explicit, robust data requirements
• systemic
– errors in data fields as a symptom of the real
problem
– sustainable quality from the enterprise strategy
downwards
52
34. Useful links
• ISO
– http://www.iso.org/iso/home.html
• ISO/TC184/SC4/WG13
– http://isotc.iso.org/livelink/livelink?func=ll&objId=8838237&objAction=brows
e&sort=name
• BSI AMT/4 "Industrial data & manufacturing interfaces"
– http://standardsdevelopment.bsigroup.com/Home/Committee/50001757
• LSC Group
– http://www.lsc.co.uk/
53