SlideShare a Scribd company logo
1 of 37
UKSG Conference
                   April 2013
Phil Nicolson
Data Governance
 What is Data Governance
 What is Data Quality
 The challenges
 Data governance programme
 A publisher approach
 The outcome: Book author example
 ICEDIS
 Summary
Data governance
“I think that the key issue here, is that the
information is probably incorrect, inaccurate and in a
form that almost certainly shouldn't have been used”




                  Dr John Thomson cardiologist at Leeds General Infirmary,
                                                     Sky News 30/3/2013
Data Governance – a definition
 Data governance is defined as the
 processes, policies, standards, organisation, and technologies
 required to manage and ensure the
 availability, accessibility, quality, consistency, auditability, and
 security of data
Data Quality - definitions
 Data are of high quality "if they are fit for their intended uses
  in operations, decision making and planning"

 Data are deemed of high quality if they correctly represent
  the real-world construct to which they refer
Data Quality
 Data quality attributes:
   Accurate
   Reliable
   Complete
   Appropriate
   Timely
   Credible
   Up-to-date
The challenge: Data Sources
 Multiple data sources – ‘system’ data silos
 Multiple locations – ‘geographic’ data silos
 Data entered through multiple channels
 Data entered by different people
The challenge: Data Sources
Typical publisher systems:   Data can be entered by:
    Financial system         Organisation staff
    CRM/Sales database       Authors
    Authentication system    Society members
    Fulfilment
                              Agents in the supply chain
    Usage statistics
                              3rd party organisations
    Submissions system
                              …..
    Author database
    …..
The challenge: Institutions
 UCL:
         University College London (UK)
         Université Catholique de Louvain (Belgium)
         Universidad Cristiana Latinoamericana (Ecuador)
         University College Lillebælt (Denmark)
         Centro Universitario Celso Lisboa (Brazil)
         Union County Library (USA)
 NPL:
         National Physical Laboratory (UK)
         National Physical Laboratory (India)
 York Uni.
         University of York (UK)
         York University (Canada)
 Northeastern University:
         Northeastern University (Boston, USA)
         Northeastern University (Shenyang, China)
The challenge: Individuals
How can we uniquely identify individuals? Of the 700,000
individuals known to the RSC in 2012 there were:

 Smith:
           ~1,500
 Jones:
           ~1,000
 Li:
           >10,000
Consequences of poor data
Biggest obstacle(s) to data quality
improvement in your organization?
Lack of accountability and responsibility for data quality                                              55.4%
Too many information silos                                                                              51.8%
Lack of awareness or communication of the magnitude of data quality problems                            51.4%
Lack of common understanding of what data quality means                                                 50.2%
Lack of awareness or communication of the opportunities associated with high quality data               45.0%
Lack of senior leadership in tackling data quality issues                                               44.2%
Lack of data quality policies, plans, and procedures                                                    42.2%
Perception that data quality is an IT issue only rather than an organisation wide issue                 41.8%




                              The State of Information and Data Quality 2012 Industry Survey& Report, (IAIDQ)
                              Understanding how Organizations Manage the Quality of their Information and Data Assets.
                              Pierce, Yonke, Malik, Nagaraj
Data Governance – why it is vital
            “processes, policies, standards… ensure quality and consistency”

 Increase consistency and confidence in our decision making
 Maximise the income generation potential of our data
 Provide excellent customer service
 Designating accountability for information quality
 Minimising or eliminating re-work
 Optimise staff effectiveness
 Decreasing the risk of regulatory fines
 Improving data security


  Data is one of the most valuable assets within an organisation
Data governance – a new culture
Data governance programme
Plan & prioritise
 Sponsorship: director level sponsor?
 Program management: business or IT driven?
 Organisational structure: local, national, international?
 Scope: focus on the most important data?
 Ownership: who are the business owners of critical data?
 New system implementation: protect investment
Plan & prioritise
 Resources: dedicated staff?
 Funding: which area of the business will fund the program?
 Business drivers: what are the major business drivers?
 Barriers: what are the main barriers
  (cultural, funding, resources, priorities etc.) and can they be
  mitigated
Audit & Analyse
 Audit existing data quality
 Review all relevant systems
 How poor is it?
   Incomplete data
   Invalid
   Out of date
   ….
Clean existing data
 Prioritise
 Quick wins
 Highlight progress
 What can be automated?
 Introduce unique identifiers
Identifiers available
 People                           Organisations
   International Standard Name      International Standard Name
    Identifier (ISNI)                   Identifier (ISNI)
   Open Researcher and                Ringgold ID
    Contributor ID (ORCID)             DUNS Number (D&B) and
   Scopus Author Identifier            other business and finance
   ResearcherID                        IDs
                                       MDR PID Numbers and other
                                        marketing IDs
                                       Library of Congress MARC
                                        Code List for Organizations
ISNI
ISNI is designed      ISNI Number          ISNI Number

to be a “bridge
identifier”
                       Party ID 1           Party ID 2




                       Proprietary          Proprietary
                   Information and/or   Information and/or
                        Metadata             Metadata
Author IDs
 ORCID is designed to persistently identify and disambiguate
  scholarly researchers and attach them to research output
 ORCID identifiers utilize a format compliant with the ISNI ISO
  standard
 ISNI has reserved a block of identifiers for use by ORCID, so
  there will be no overlaps in assignments
 Recorded as http://orcid.org/0000-0001-2345-6789

http://about.orcid.org/
http://www.isni.org/
Use cases
 Disambiguation of researchers
  and connection to all their
  research
 Links to
  contributors, editors, compiler
  s and others involved in the
  research process
 Embed IDs into research
  workflows and the supply
  chain
 Integrate systems
Institutional IDs
 Ringgold is an ISNI Registration Agency
 Unique institutional ID number maps data across systems
 ISNI numbers should be used across the scholarly supply
  chain to:
   Disambiguate institutional records
   Eradicate duplication of data
   Map institutions into their hierarchy
   Link systems using the institutional ID as the lynchpin
Minimising the impact of data silos
 Standard identifiers (both individual and institution) can be
  used to breakdown silos by enabling better system linking:
Improve data capture
 Data quality policy
 Web forms
 Closer collaboration with 3rd parties to encourage use of
  industry standard identifiers such as ISNI or ORCID
Data capture - data quality policy
 Design to ensure accuracy, quality and consistency
 Individual responsibilities:
    All staff are responsible for the accuracy and consistency of data
    Capture data in such a way that it is uniquely identifiable and easily
     shared within the organisation and with 3rd parties
    Records relating to individuals
    Records relating to institutions
    Reporting of inaccuracies to Data Owners
 Data owners responsibilities:
    All source data systems must have a designated Data Owner
    Data owner retains overall responsibility for all records within their
     source data system
Improve data capture – web forms
 Required fields
 Validation
 Address validation – postcode lookup
 Institution validation – institution lookup
 ‘Internal’ and ‘external’ web form consistency
 Language barriers
 Help and hints
 Free-text fields
On-going monitoring
 Dashboards
 Regular audits
 Metrics – Institutional
  Linking Rate
 Staff awareness
 Reporting of errors
A publisher example
 Develop a Data Governance Programme
   Data ‘champion’
   Engagement – at all levels
   Ownership – at all levels
   Allocate necessary resources
   Guidelines/Policy - Data quality policy
   Processes put in place
   Education - raise awareness
   New staff – training on Data Governance and their wider impact
   Change of culture
A publisher example
 Ringgold and DataSalon client
   All institutional records contain Ringgold Identifiers
   System linking via Individual and Institutional identifiers
   Data (both good and bad) visible to all via MasterVision
   Use of data governance dashboards
   Tidying of existing data
   Simple reporting of incorrect data across organisation
   New data captured correctly
Author database
1.       Create a data governance dashboard to
         monitor problem areas:
     •      Book authors with no related institution
     •      Unknown book authors
     •      Author records without an affiliation entry
     •      Author records with commas in the
            affiliation entry
     •      Book authors without an email address
     •      Book authors with an invalid email address

2.       Correct problem records in existing data
     •      Dashboard clearly highlighted all records of
            concern and these records were corrected
Author database
3.       Ensure new records are created correctly
     •      Raise staff understanding of the importance of capturing data correctly and
            the impact it has across the organisation as a whole (data silos)
     •      Training covering data governance

4.       Ensure appropriate Ringgold coverage
     •      Where institutions were discovered in the Author database that didn’t exist
            within Identify these were reported to Ringgold. This not only means that
            individual authors can be linked to the new institution but that any
            individuals in other data sources at the same institution can be linked. This
            benefits all users of our data and potentially highlights new sales
            opportunities.

5.       Monitor data quality on an on-going basis
     •      Books data governance dashboard update on a weekly basis.
Author database – results
                    100.00%   10% will never link:
                              • Missing data (old records)
                    95.00%
                              • Institution no longer exists
                    90.00%    • Retired author
                    85.00%    • Genuinely no related institution
 All data sources
 ANKO               80.00%

                    75.00%
                               End of process:
                    70.00%
                               • 15% increase in authors linked to
                                 institutions - information
                                 valuable in supporting all areas
                                 of the business
                               • Ready for data migration
ICEDIS
 The international standards organization EDItEUR is working to
    encourage improvements in the ways that "party" information is
    communicated
   Some parts of the supply chain continue to send unstructured name &
    address records, making matching, disambiguation and automatic ingest
    near impossible
   ICEDIS has collaborated with EDItEUR to develop a highly structured
    data model for exchanging names, addresses and standard identifiers.
   The group has recently been validating the model by means of a "paper
    pilot", using a small library of about 100 name & address types
   An XML schema and HTML documentation are freely available
www.editeur.org
www.editeur.org/138/Structured-Name-and-Address-Model
info@editeur.org
Summary
 Your data is a very valuable asset when managed correctly
 Establishing a data governance programme will enable you to
  gain maximum benefit from that data
 Data governance is as much about changing the culture of an
  organisation as it is about processes and procedures
 It will take time but the benefits can be enormous
Phil Nicolson
Data Manager
Ringgold Inc.
phil.nicolson@ringgold.com

More Related Content

What's hot

LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016
Anjan Roy, PMP
 
Decision support systems and its impact on organization empowerment field st...
Decision support systems and its impact on organization empowerment  field st...Decision support systems and its impact on organization empowerment  field st...
Decision support systems and its impact on organization empowerment field st...
Alexander Decker
 

What's hot (13)

Magdalena Balazinska: Data pricing and data license agreements
Magdalena Balazinska: Data pricing and data license agreementsMagdalena Balazinska: Data pricing and data license agreements
Magdalena Balazinska: Data pricing and data license agreements
 
Access Lab 2020: Context aware unified institutional knowledge services
Access Lab 2020: Context aware unified institutional knowledge servicesAccess Lab 2020: Context aware unified institutional knowledge services
Access Lab 2020: Context aware unified institutional knowledge services
 
Developing A Universal Approach to Cleansing Customer and Product Data
Developing A Universal Approach to Cleansing Customer and Product DataDeveloping A Universal Approach to Cleansing Customer and Product Data
Developing A Universal Approach to Cleansing Customer and Product Data
 
LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016LS_WhitePaper_NextGenAnalyticsMay2016
LS_WhitePaper_NextGenAnalyticsMay2016
 
The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...The Data Driven University - Automating Data Governance and Stewardship in Au...
The Data Driven University - Automating Data Governance and Stewardship in Au...
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
 
Advancements in Legal Entity Data Quality
Advancements in Legal Entity Data QualityAdvancements in Legal Entity Data Quality
Advancements in Legal Entity Data Quality
 
Health data mining
Health data miningHealth data mining
Health data mining
 
Oracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast ChartsOracle Big Data Governance Webcast Charts
Oracle Big Data Governance Webcast Charts
 
When SharePoint Isn't Enough - Adding Enterprise Class Search for Better Coll...
When SharePoint Isn't Enough - Adding Enterprise Class Search for Better Coll...When SharePoint Isn't Enough - Adding Enterprise Class Search for Better Coll...
When SharePoint Isn't Enough - Adding Enterprise Class Search for Better Coll...
 
Decision support systems and its impact on organization empowerment field st...
Decision support systems and its impact on organization empowerment  field st...Decision support systems and its impact on organization empowerment  field st...
Decision support systems and its impact on organization empowerment field st...
 
Coming to an Understanding: a Cross-institutional Examination of Assessments ...
Coming to an Understanding: a Cross-institutional Examination of Assessments ...Coming to an Understanding: a Cross-institutional Examination of Assessments ...
Coming to an Understanding: a Cross-institutional Examination of Assessments ...
 
Successful Stewardship NZ
Successful Stewardship NZSuccessful Stewardship NZ
Successful Stewardship NZ
 

Viewers also liked

Artifacts to Enable Data Goverance
Artifacts to Enable Data GoveranceArtifacts to Enable Data Goverance
Artifacts to Enable Data Goverance
DATAVERSITY
 
Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data Governance
DATAVERSITY
 

Viewers also liked (11)

DC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deckDC Salesforce1 Tour Data Governance Lunch Best Practices deck
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
 
Real-World Data Governance: A Different Way of Defining Data Stewards & Stewa...
Real-World Data Governance: A Different Way of Defining Data Stewards & Stewa...Real-World Data Governance: A Different Way of Defining Data Stewards & Stewa...
Real-World Data Governance: A Different Way of Defining Data Stewards & Stewa...
 
Artifacts to Enable Data Goverance
Artifacts to Enable Data GoveranceArtifacts to Enable Data Goverance
Artifacts to Enable Data Goverance
 
Implementing Agile Data Governance
Implementing Agile Data GovernanceImplementing Agile Data Governance
Implementing Agile Data Governance
 
Top 10 Artifacts Needed For Data Governance
Top 10 Artifacts Needed For Data GovernanceTop 10 Artifacts Needed For Data Governance
Top 10 Artifacts Needed For Data Governance
 
Data-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance StrategiesData-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance Strategies
 
Data Architecture for Data Governance
Data Architecture for Data GovernanceData Architecture for Data Governance
Data Architecture for Data Governance
 
Data Governance: Keystone of Information Management Initiatives
Data Governance: Keystone of Information Management InitiativesData Governance: Keystone of Information Management Initiatives
Data Governance: Keystone of Information Management Initiatives
 
Credit Suisse: Multi-Domain Enterprise Reference Data
Credit Suisse: Multi-Domain Enterprise Reference DataCredit Suisse: Multi-Domain Enterprise Reference Data
Credit Suisse: Multi-Domain Enterprise Reference Data
 
Data Governance Best Practices
Data Governance Best PracticesData Governance Best Practices
Data Governance Best Practices
 
Implementing Effective Data Governance
Implementing Effective Data GovernanceImplementing Effective Data Governance
Implementing Effective Data Governance
 

Similar to Rubbish in Rubbish out: applying good data governance techniques to gain maximum benefit from publisher data

Choosing an Analytics Solution in Healthcare
Choosing an Analytics Solution in HealthcareChoosing an Analytics Solution in Healthcare
Choosing an Analytics Solution in Healthcare
Dale Sanders
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For Good
Karry Lu
 
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Aridhia Informatics Ltd
 

Similar to Rubbish in Rubbish out: applying good data governance techniques to gain maximum benefit from publisher data (20)

Ringgold Webinar Series: 2. Core Strength - Standard Identifiers as the Found...
Ringgold Webinar Series: 2. Core Strength - Standard Identifiers as the Found...Ringgold Webinar Series: 2. Core Strength - Standard Identifiers as the Found...
Ringgold Webinar Series: 2. Core Strength - Standard Identifiers as the Found...
 
Ringgold Webinar Series: 1. Taking Stock – Commitment to Healthy Data
Ringgold Webinar Series: 1. Taking Stock – Commitment to Healthy DataRinggold Webinar Series: 1. Taking Stock – Commitment to Healthy Data
Ringgold Webinar Series: 1. Taking Stock – Commitment to Healthy Data
 
Institutional Identifiers internally and throughout the supply chain
Institutional Identifiers internally and throughout the supply chainInstitutional Identifiers internally and throughout the supply chain
Institutional Identifiers internally and throughout the supply chain
 
The Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for EnterprisesThe Role of Community-Driven Data Curation for Enterprises
The Role of Community-Driven Data Curation for Enterprises
 
Institutional Identifiers - Phil Nicolson at ALPSP 'Setting The Standard' 2015
Institutional Identifiers - Phil Nicolson at ALPSP 'Setting The Standard' 2015Institutional Identifiers - Phil Nicolson at ALPSP 'Setting The Standard' 2015
Institutional Identifiers - Phil Nicolson at ALPSP 'Setting The Standard' 2015
 
Choosing an Analytics Solution in Healthcare
Choosing an Analytics Solution in HealthcareChoosing an Analytics Solution in Healthcare
Choosing an Analytics Solution in Healthcare
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For Good
 
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
Of Unicorns, Yetis, and Error-Free Datasets (or what is data quality?)
 
Institutional Identifiers in Practice
Institutional Identifiers in PracticeInstitutional Identifiers in Practice
Institutional Identifiers in Practice
 
The Identity Project (Rhys Smith)
The Identity Project (Rhys Smith)The Identity Project (Rhys Smith)
The Identity Project (Rhys Smith)
 
Accelerating Your Move to Value-Based Care
Accelerating Your Move to Value-Based CareAccelerating Your Move to Value-Based Care
Accelerating Your Move to Value-Based Care
 
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
Challenges in Clinical Research: Aridhia Disrupts Technology Approach to Rese...
 
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
Challenges in Clinical Research: Aridhia's Disruptive Technology Approach to ...
 
What Publishers Need to Know About Web Scale Discovery
What Publishers Need to Know About Web Scale DiscoveryWhat Publishers Need to Know About Web Scale Discovery
What Publishers Need to Know About Web Scale Discovery
 
Building Communities of “Trust”
 Building Communities of “Trust” Building Communities of “Trust”
Building Communities of “Trust”
 
Managing data responsibly to enable research interity
Managing data responsibly to enable research interityManaging data responsibly to enable research interity
Managing data responsibly to enable research interity
 
Data Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data QualityData Profiling: The First Step to Big Data Quality
Data Profiling: The First Step to Big Data Quality
 
Information systems
Information systemsInformation systems
Information systems
 
Accounting System
Accounting SystemAccounting System
Accounting System
 
Data Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity TrendsData Innovation Summit: Data Integrity Trends
Data Innovation Summit: Data Integrity Trends
 

More from Ringgold Inc

Persistent Identifiers in Scholarly Communications - Christine Orr at SSP 2016
Persistent Identifiers in Scholarly Communications - Christine Orr at SSP 2016Persistent Identifiers in Scholarly Communications - Christine Orr at SSP 2016
Persistent Identifiers in Scholarly Communications - Christine Orr at SSP 2016
Ringgold Inc
 
Pulling Together: information flow throughout the scholarly supply chain
Pulling Together: information flow throughout the scholarly supply chainPulling Together: information flow throughout the scholarly supply chain
Pulling Together: information flow throughout the scholarly supply chain
Ringgold Inc
 

More from Ringgold Inc (20)

Identify Database User Group Meeting 2017 UK
Identify Database User Group Meeting 2017 UKIdentify Database User Group Meeting 2017 UK
Identify Database User Group Meeting 2017 UK
 
Using your Data to Drive Revenue – Laura Cox at London Book Fair 2018
Using your Data to Drive Revenue – Laura Cox at London Book Fair 2018 Using your Data to Drive Revenue – Laura Cox at London Book Fair 2018
Using your Data to Drive Revenue – Laura Cox at London Book Fair 2018
 
Persistent Identifiers - The 5 Things You Need To Know
Persistent Identifiers - The 5 Things You Need To KnowPersistent Identifiers - The 5 Things You Need To Know
Persistent Identifiers - The 5 Things You Need To Know
 
Ringgold User Group Meeting 2016 (USA)
Ringgold User Group Meeting 2016 (USA)Ringgold User Group Meeting 2016 (USA)
Ringgold User Group Meeting 2016 (USA)
 
Metadata Standards: A Golden Age Arrives? - Christine Orr at STM
Metadata Standards: A Golden Age Arrives? - Christine Orr at STMMetadata Standards: A Golden Age Arrives? - Christine Orr at STM
Metadata Standards: A Golden Age Arrives? - Christine Orr at STM
 
Small Data, Big Benefits - Christine Orr at SSP 2016
Small Data, Big Benefits - Christine Orr at SSP 2016Small Data, Big Benefits - Christine Orr at SSP 2016
Small Data, Big Benefits - Christine Orr at SSP 2016
 
Persistent Identifiers in Scholarly Communications - Christine Orr at SSP 2016
Persistent Identifiers in Scholarly Communications - Christine Orr at SSP 2016Persistent Identifiers in Scholarly Communications - Christine Orr at SSP 2016
Persistent Identifiers in Scholarly Communications - Christine Orr at SSP 2016
 
Emerging Standards: Data and Data Exchange in Scholarly Publishing - Jay Henr...
Emerging Standards: Data and Data Exchange in Scholarly Publishing - Jay Henr...Emerging Standards: Data and Data Exchange in Scholarly Publishing - Jay Henr...
Emerging Standards: Data and Data Exchange in Scholarly Publishing - Jay Henr...
 
Metadata & Standards in Scholarly Communication
Metadata & Standards in Scholarly CommunicationMetadata & Standards in Scholarly Communication
Metadata & Standards in Scholarly Communication
 
Institutional Identifiers in Practice: Christine Orr at CESSE 2015
Institutional Identifiers in Practice: Christine Orr at CESSE 2015Institutional Identifiers in Practice: Christine Orr at CESSE 2015
Institutional Identifiers in Practice: Christine Orr at CESSE 2015
 
Emerging Standards: Data and Data Exchange in Scholarly Publishing
Emerging Standards: Data and Data Exchange in Scholarly PublishingEmerging Standards: Data and Data Exchange in Scholarly Publishing
Emerging Standards: Data and Data Exchange in Scholarly Publishing
 
Using Data to Drive Discovery of New Scholarly Works
Using Data to Drive Discovery of New Scholarly WorksUsing Data to Drive Discovery of New Scholarly Works
Using Data to Drive Discovery of New Scholarly Works
 
Ringgold Webinar Series: ProtoView - Publication Metadata to Drive Discovery,...
Ringgold Webinar Series: ProtoView - Publication Metadata to Drive Discovery,...Ringgold Webinar Series: ProtoView - Publication Metadata to Drive Discovery,...
Ringgold Webinar Series: ProtoView - Publication Metadata to Drive Discovery,...
 
Connecting people, places and things
Connecting people, places and things Connecting people, places and things
Connecting people, places and things
 
Unique Identifiers for Business Partners: progress with ISNI, the Ringgold ID...
Unique Identifiers for Business Partners: progress with ISNI, the Ringgold ID...Unique Identifiers for Business Partners: progress with ISNI, the Ringgold ID...
Unique Identifiers for Business Partners: progress with ISNI, the Ringgold ID...
 
Ringgold Webinar Series: 4. 30-Minute Workout - Quick Tips for Better Custome...
Ringgold Webinar Series: 4. 30-Minute Workout - Quick Tips for Better Custome...Ringgold Webinar Series: 4. 30-Minute Workout - Quick Tips for Better Custome...
Ringgold Webinar Series: 4. 30-Minute Workout - Quick Tips for Better Custome...
 
Ringgold Webinar Series: 3. Lean and Mean - Publication Metadata to Enhance D...
Ringgold Webinar Series: 3. Lean and Mean - Publication Metadata to Enhance D...Ringgold Webinar Series: 3. Lean and Mean - Publication Metadata to Enhance D...
Ringgold Webinar Series: 3. Lean and Mean - Publication Metadata to Enhance D...
 
Identify database
Identify databaseIdentify database
Identify database
 
CDO
CDOCDO
CDO
 
Pulling Together: information flow throughout the scholarly supply chain
Pulling Together: information flow throughout the scholarly supply chainPulling Together: information flow throughout the scholarly supply chain
Pulling Together: information flow throughout the scholarly supply chain
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Rubbish in Rubbish out: applying good data governance techniques to gain maximum benefit from publisher data

  • 1. UKSG Conference April 2013 Phil Nicolson
  • 2. Data Governance  What is Data Governance  What is Data Quality  The challenges  Data governance programme  A publisher approach  The outcome: Book author example  ICEDIS  Summary
  • 3. Data governance “I think that the key issue here, is that the information is probably incorrect, inaccurate and in a form that almost certainly shouldn't have been used” Dr John Thomson cardiologist at Leeds General Infirmary, Sky News 30/3/2013
  • 4. Data Governance – a definition  Data governance is defined as the processes, policies, standards, organisation, and technologies required to manage and ensure the availability, accessibility, quality, consistency, auditability, and security of data
  • 5. Data Quality - definitions  Data are of high quality "if they are fit for their intended uses in operations, decision making and planning"  Data are deemed of high quality if they correctly represent the real-world construct to which they refer
  • 6. Data Quality  Data quality attributes:  Accurate  Reliable  Complete  Appropriate  Timely  Credible  Up-to-date
  • 7. The challenge: Data Sources  Multiple data sources – ‘system’ data silos  Multiple locations – ‘geographic’ data silos  Data entered through multiple channels  Data entered by different people
  • 8. The challenge: Data Sources Typical publisher systems: Data can be entered by:  Financial system  Organisation staff  CRM/Sales database  Authors  Authentication system  Society members  Fulfilment  Agents in the supply chain  Usage statistics  3rd party organisations  Submissions system  …..  Author database  …..
  • 9. The challenge: Institutions  UCL:  University College London (UK)  Université Catholique de Louvain (Belgium)  Universidad Cristiana Latinoamericana (Ecuador)  University College Lillebælt (Denmark)  Centro Universitario Celso Lisboa (Brazil)  Union County Library (USA)  NPL:  National Physical Laboratory (UK)  National Physical Laboratory (India)  York Uni.  University of York (UK)  York University (Canada)  Northeastern University:  Northeastern University (Boston, USA)  Northeastern University (Shenyang, China)
  • 10. The challenge: Individuals How can we uniquely identify individuals? Of the 700,000 individuals known to the RSC in 2012 there were:  Smith:  ~1,500  Jones:  ~1,000  Li:  >10,000
  • 12. Biggest obstacle(s) to data quality improvement in your organization? Lack of accountability and responsibility for data quality 55.4% Too many information silos 51.8% Lack of awareness or communication of the magnitude of data quality problems 51.4% Lack of common understanding of what data quality means 50.2% Lack of awareness or communication of the opportunities associated with high quality data 45.0% Lack of senior leadership in tackling data quality issues 44.2% Lack of data quality policies, plans, and procedures 42.2% Perception that data quality is an IT issue only rather than an organisation wide issue 41.8% The State of Information and Data Quality 2012 Industry Survey& Report, (IAIDQ) Understanding how Organizations Manage the Quality of their Information and Data Assets. Pierce, Yonke, Malik, Nagaraj
  • 13. Data Governance – why it is vital “processes, policies, standards… ensure quality and consistency”  Increase consistency and confidence in our decision making  Maximise the income generation potential of our data  Provide excellent customer service  Designating accountability for information quality  Minimising or eliminating re-work  Optimise staff effectiveness  Decreasing the risk of regulatory fines  Improving data security Data is one of the most valuable assets within an organisation
  • 14. Data governance – a new culture
  • 16. Plan & prioritise  Sponsorship: director level sponsor?  Program management: business or IT driven?  Organisational structure: local, national, international?  Scope: focus on the most important data?  Ownership: who are the business owners of critical data?  New system implementation: protect investment
  • 17. Plan & prioritise  Resources: dedicated staff?  Funding: which area of the business will fund the program?  Business drivers: what are the major business drivers?  Barriers: what are the main barriers (cultural, funding, resources, priorities etc.) and can they be mitigated
  • 18. Audit & Analyse  Audit existing data quality  Review all relevant systems  How poor is it?  Incomplete data  Invalid  Out of date  ….
  • 19. Clean existing data  Prioritise  Quick wins  Highlight progress  What can be automated?  Introduce unique identifiers
  • 20. Identifiers available  People  Organisations  International Standard Name  International Standard Name Identifier (ISNI) Identifier (ISNI)  Open Researcher and  Ringgold ID Contributor ID (ORCID)  DUNS Number (D&B) and  Scopus Author Identifier other business and finance  ResearcherID IDs  MDR PID Numbers and other marketing IDs  Library of Congress MARC Code List for Organizations
  • 21. ISNI ISNI is designed ISNI Number ISNI Number to be a “bridge identifier” Party ID 1 Party ID 2 Proprietary Proprietary Information and/or Information and/or Metadata Metadata
  • 22. Author IDs  ORCID is designed to persistently identify and disambiguate scholarly researchers and attach them to research output  ORCID identifiers utilize a format compliant with the ISNI ISO standard  ISNI has reserved a block of identifiers for use by ORCID, so there will be no overlaps in assignments  Recorded as http://orcid.org/0000-0001-2345-6789 http://about.orcid.org/ http://www.isni.org/
  • 23. Use cases  Disambiguation of researchers and connection to all their research  Links to contributors, editors, compiler s and others involved in the research process  Embed IDs into research workflows and the supply chain  Integrate systems
  • 24. Institutional IDs  Ringgold is an ISNI Registration Agency  Unique institutional ID number maps data across systems  ISNI numbers should be used across the scholarly supply chain to:  Disambiguate institutional records  Eradicate duplication of data  Map institutions into their hierarchy  Link systems using the institutional ID as the lynchpin
  • 25. Minimising the impact of data silos  Standard identifiers (both individual and institution) can be used to breakdown silos by enabling better system linking:
  • 26. Improve data capture  Data quality policy  Web forms  Closer collaboration with 3rd parties to encourage use of industry standard identifiers such as ISNI or ORCID
  • 27. Data capture - data quality policy  Design to ensure accuracy, quality and consistency  Individual responsibilities:  All staff are responsible for the accuracy and consistency of data  Capture data in such a way that it is uniquely identifiable and easily shared within the organisation and with 3rd parties  Records relating to individuals  Records relating to institutions  Reporting of inaccuracies to Data Owners  Data owners responsibilities:  All source data systems must have a designated Data Owner  Data owner retains overall responsibility for all records within their source data system
  • 28. Improve data capture – web forms  Required fields  Validation  Address validation – postcode lookup  Institution validation – institution lookup  ‘Internal’ and ‘external’ web form consistency  Language barriers  Help and hints  Free-text fields
  • 29. On-going monitoring  Dashboards  Regular audits  Metrics – Institutional Linking Rate  Staff awareness  Reporting of errors
  • 30. A publisher example  Develop a Data Governance Programme  Data ‘champion’  Engagement – at all levels  Ownership – at all levels  Allocate necessary resources  Guidelines/Policy - Data quality policy  Processes put in place  Education - raise awareness  New staff – training on Data Governance and their wider impact  Change of culture
  • 31. A publisher example  Ringgold and DataSalon client  All institutional records contain Ringgold Identifiers  System linking via Individual and Institutional identifiers  Data (both good and bad) visible to all via MasterVision  Use of data governance dashboards  Tidying of existing data  Simple reporting of incorrect data across organisation  New data captured correctly
  • 32. Author database 1. Create a data governance dashboard to monitor problem areas: • Book authors with no related institution • Unknown book authors • Author records without an affiliation entry • Author records with commas in the affiliation entry • Book authors without an email address • Book authors with an invalid email address 2. Correct problem records in existing data • Dashboard clearly highlighted all records of concern and these records were corrected
  • 33. Author database 3. Ensure new records are created correctly • Raise staff understanding of the importance of capturing data correctly and the impact it has across the organisation as a whole (data silos) • Training covering data governance 4. Ensure appropriate Ringgold coverage • Where institutions were discovered in the Author database that didn’t exist within Identify these were reported to Ringgold. This not only means that individual authors can be linked to the new institution but that any individuals in other data sources at the same institution can be linked. This benefits all users of our data and potentially highlights new sales opportunities. 5. Monitor data quality on an on-going basis • Books data governance dashboard update on a weekly basis.
  • 34. Author database – results 100.00% 10% will never link: • Missing data (old records) 95.00% • Institution no longer exists 90.00% • Retired author 85.00% • Genuinely no related institution All data sources ANKO 80.00% 75.00% End of process: 70.00% • 15% increase in authors linked to institutions - information valuable in supporting all areas of the business • Ready for data migration
  • 35. ICEDIS  The international standards organization EDItEUR is working to encourage improvements in the ways that "party" information is communicated  Some parts of the supply chain continue to send unstructured name & address records, making matching, disambiguation and automatic ingest near impossible  ICEDIS has collaborated with EDItEUR to develop a highly structured data model for exchanging names, addresses and standard identifiers.  The group has recently been validating the model by means of a "paper pilot", using a small library of about 100 name & address types  An XML schema and HTML documentation are freely available www.editeur.org www.editeur.org/138/Structured-Name-and-Address-Model info@editeur.org
  • 36. Summary  Your data is a very valuable asset when managed correctly  Establishing a data governance programme will enable you to gain maximum benefit from that data  Data governance is as much about changing the culture of an organisation as it is about processes and procedures  It will take time but the benefits can be enormous
  • 37. Phil Nicolson Data Manager Ringgold Inc. phil.nicolson@ringgold.com

Editor's Notes

  1. Smith: 1,418Jones: 982Li: 9,500+RSC 700,000 individuals
  2. Data amnesty
  3. Quick wins – something as simple as standardising country names
  4. DUNS:MDR:
  5. RSC - ScholarOne
  6. C Able example3rd party fulfilment house