This webinar discusses improving data quality through a data normalization and auditing process. It recommends creating a data policy, filling in missing data elements, and standardizing naming conventions. The webinar outlines a three-step process for data cleanup: 1) create a data policy, 2) fill in missing data, and 3) organize data consistently. It also describes Ringgold's auditing service which uses experts to match records to standardized identifiers and provide additional metadata. The auditing process results in clean, structured data that can enhance customer records and support business decisions.
2. Today’s topic, in context
Your Business
Data Governance
Program
Healthy
Records
IDs
3.
4. Agenda
1. Get Ready to Clean
2. Tidy Your Data in Three Steps
3. What to do Next
4. Ringgold’s Audit Service
5. One: Get Ready
• What problem are you trying
to solve?
• Service issues, new
journal, system migration?
• Prioritize your records.
• Geo, type of customer, scope of
records?
• Gather tools & resources.
• People, information
sources, time, money.
6. Remember……
Back up your original data, and
work from a copy.
Make any changes there, then
re-import.
Keep a copy of the pre-cleaned
data in case you experience
problems.
8. Step 1: Create a Data Policy Document
A. Decide which
elements are
required
Which must be filled in now as part of
the cleanup effort, or for the creation
of any new records
B. Define each data
element
Make a list including all fields of a
record, and define what each means
(and what it doesn’t)
9. C. Set naming conventions
To abbreviate or not to
abbreviate?
UCL:
University College London (UK)
Université Catholique de Louvain
(Belgium)
Universidad Cristiana
Latinoamericana (Ecuador)
University College Lillebælt
(Denmark)
Centro Universitario Celso Lisboa
(Brazil)
Union County Library (USA)
English or Native Language?
10. Tip #1: Don’t reinvent the wheel
Use existing authority files where possible when
defining data elements or setting naming
conventions
ISO Country Code List (ISO 3166)
UN/LOCODE for Cities
Carnegie Classifications Data File
Standard Identifiers (ISNI, Ringgold ID)
Europa World of Learning for institution names
13. Step 3: Put everything in
its proper place
ID
55897
Title
Lt.
FirstName
Address 2
City
90044
State
Jean
LastName Company
Worf
High Altitude Warfare
School
Luc
Picard
587
Capt.
55632
57741
Address1
778 North Ring
Road, Gulmarg
University of
Birmingham
Shakespeare
Institute
Katholieke Univ Leuven 5963 Central
Departement
University Road
Molecularie
Celbiologie
Star Fleet
1 Ellipse Way
Edgbaston
on-Avon
Dr.
Beverly
Crusher
Leuven
Com.
William
T. Riker
Main Library
Stratford
Postal
UK
3000
San
Franisco
Calif
Country Title
India
Weaponry Weekly
United
States
Leadership Quarterly
Belgium Journal of Interspecies
Bioengineering
Leadership Quarterly
14. Three: Next Steps
1. Evangelize the data
policy document
2. Encourage & enforce
good data governance
practices where you can
3. Keep the ball rolling….
15. ….and maintain good data hygiene
Review new records on a regular basis
Determine if additional data sets need
attention
Educate new staff
Show tangible ROI
17. Going beyond a quick “spring clean”
Multiple accounts for each
customer
Multiple internal IT
systems, different
divisions, methods of data
capture, etc.
Lack of information about which
accounts are related
Missing key pieces of information
Lack of standard identifiers
18. What goes into a Ringgold audit?
Database
Regional &
Language
Expertise
Defined
Process
Data & IT
Professionals
Identify
19. Audit Process
Receive files from
client
Normalise data
(de-duplication
and automatching)
Researcher
checks and
matches to
Ringgold
IDs, hierarchy
etc.
Researcher
creates new IDs
for unidentified
organizations
Data split into
countries
Data assigned to
appropriate
country expert
Data uploaded to
Identify system
Client sent
encrypted file via
FTP with IDs and
metadata
21. Audit deliverables
Lots & lots of definitive, clean, structured data about your
subscribers.
A series of files which provide the complete Identify details
for each of your accounts
Unique Ringgold ID number
Institutional hierarchy
Additional metadata: size, tier, industry sector, consortia
membership, alternative names, etc.
22. Data: Before
Customer
Account
39824
3994
86101
98897
90438
57700
51648
20466
79256
88279
63641
Customer Name
Swets
Antoni Van Leeuwenhoekhuis
Christchurch School of Medicine
Pohjois Karjalan Keskussairaala
Keskusairalaa Pohojois Karl
WFH St Joseph
Torbay Hospital
MHH
Serials Dept.
Osaka Prefectural Medical Center
Osaka Furitsu Seijinbyou Ctr
Address 1
Address 2
City
State
Info Services
Aberdeen
Plesmanlaan 121
Amsterdam
2 Riccarton Ave
Christ Church
Tikkamäentie
16
Jeonsu
Tikkamäentie 16
Joensuu
P.O. Box 284
Milwaukee Wisc.
Newton road, Devon, Torquay, TQ2 7AA Great Britain
Saueramferweg
5
Hannover
1-11-18 Chiharacho
Izumi
1-9-10 Kyomachibori
Habikino
1-1-50 Fukushima Nishiku
Osaka
Osaka
PostCode
AB24 3FX
1006 BE
80210
Finland
53210-9988
Country
Scotland
Holand
New Zealand
Finland
30625
Deutschland
594-1101 Japan
583-8588 JP
537-8511 Japan
23. Data: After
Customer
Account
Customer Name
RIN
Number
RIN Name
RIN City
RIN
State
RIN Country
Swets
1019
University of Aberdeen
3994
Antoni Van Leeuwenhoekhuis
1228
1006 BE
Netherlands www.nki.nl
86101
Christchurch School of Medicine
2494
Nederlands Kanker Instituut - Amsterdam
Antoni van Leeuwenhoek
Ziekenhuis
University of Otago
Christchurch
Christchurch School of
Medicine and Health Sciences
98897
Pohjois Karjalan Keskussairaala
4152
90438
Keskusairalaa Pohojois Karl
57700
RIN Type
RIN Number
RIN Name next level up
next level up
A4
academic/hos academic
pital
H5
1229
Institute for Research in
Extramural Medicine
8140
New Zealand www.chmeds.ac. academic/me academic
nz
dsch
M3
58994
University of Otago
Faculty of Medicine
Pohjois-Karjalan sairaanhoito- Joensuu
ja sosiaalipalvelujen
kuntayhtyma
80210
Finland
www.pkssk.fi
public/health public
P2
4152
Pohjois-Karjalan sairaanhoito- Joensuu
ja sosiaalipalvelujen
kuntayhtyma
80210
Finland
www.pkssk.fi
public/health public
P3
WFH St Joseph
5509
Wheaton Franciscan
Healthcare - Saint Joseph
Milwaukee
532109988
United
States
www.mywheaton hospital
.org/stjoseph
hospital
H2
5510
Wheaton Franciscan
Healthcare
51648
Torbay Hospital
7993
Torquay
TQ2 7AA
53247
South West Strategic
Health Authority
9177
79256
Serials Dept.
13608
H5
183158
Osaka Furitsu Byoin Kiko
88279
Osaka Prefectural Medical Center 13615
www.sdhct.nhs.u
k
www.mhhannover.de
www.mch.pref.os
aka.jp
www.ra.opho.jp
H2
MHH
United
Kingdom
Germany
hospital
20466
South Devon Healthcare NHS
Foundation Trust
Medizinische Hochschule
Hannover
Osaka Furitsu Boshi Hoken
Sogo Iryo Center
Osaka Furitsu Kokyuki Allergy
Iryo Center
H2
183158
Osaka Furitsu Byoin Kiko
63641
Osaka Furitsu Seijinbyou Ctr
Osaka Furitsu Seijinbyo
Center
Osaka
H2
183158
Osaka Furitsu Byoin Kiko
WI
NI
30625
Izumi
594-1101 Japan
Habikino
583-8588 Japan
Osaka
537-8511 Japan
www.abdn.ac.uk academic
RIN Sector RIN Tier
academic
Hannover
AB24 3FX United
Kingdom
RIN Url
39824
53312
Aberdeen
RIN Zip
hospital
academic/me academic
dsch
hospital/childr hospital
en
hospital
hospital
www.mc.pref.osa hospital
ka.jp
hospital
M3
24. Auditing Use Cases
Understand & analyze your customer base
Disambiguate institutions & find duplicate accounts
Reveal institutional relationships with hierarchies
Enhance customer records with Identify metadata
Support pricing decisions & policies
Your audited records can act as an authority file of
institutions in any system: editorial, MSS
submissions, CRMs, financial, fulfillment, etc.
25. To sum up:
It’s absolutely worth it
You can do it
…But you may not
want to do it alone
26. Questions?
We’d love to hear from
you, and help you develop
your roadmap to better
data health.
Set naming conventions for organization names, addresses, etc. As many fields as you can.Abbreviation: Once you abbreviate anything, it’s easy to start abbreviating everything. UCL example. If you must abbreviate due to fixed length fields, be very clear about what words may be abbreviated & how. (of course a standard unique identifier like ISNI or RIN, or a separate “Official Name” field, means you may not need to try to standardize all org names anyway)English vs native: Applies to cities, countries (Germany/Deutschland; Venice/Venezia, Beijing/Peking), as well as institution names. If you are an international org with staff in different countries, native may be more helpful as you are likely to be creating records in multiple languages. Just be consistent.
You can google any of these for more information.
You can always go back and refine this, or define additional “nice to have” fields. It is tempting to try and make the style sheet a huge part of the project, and strive for perfectionism and to cover every possible data element. Keep it to the most important bits for now, and get your project off the ground.
Continue to improve your data, and create a periodic regular review process & schedule for new records Renewal season is a good time for this; lots of new records being added, but they are all high value records. What can you do if you get orders w no identifiable inst name? ---Push back on whoever submitted the order, ask for better information --- match on ID number (agent or your own customer number); import only new title/product order info rather than overwrite org name & address fieldsDetermine if additional data sets or records need attention – these simple steps can be applied to any data setEducate new staff on the importance of clean data. Continue to evangelize and get buy-in from those affected by poor data, or those who rely on good data to make key decisions. Show the results of your data improvement efforts (examples: time saved, revenue preserved or increased, ROI, )