From Ambition to Go Live

Richard Wallis
Richard WallisSchema.org & Structured Web Data Consultant - Developer Advocate at Google via Adecco UK
From Ambition to Go Live
The National Library Board of Singapore’s Journey
to an operational Linked Data Management
and Discovery System
Richard Wallis
Evangelist and Founder
Data Liberate
richard.wallis@dataliberate.com
@dataliberate
15th Semantic Web in Libraries Conference
SWIB23 - 12th September 2023 - Berlin
Independent Consultant, Evangelist & Founder
W3C Community Groups:
• Bibframe2Schema (Chair) – Standardised conversion path(s)
• Schema Bib Extend (Chair) - Bibliographic data
• Schema Architypes (Chair) - Archives
• Financial Industry Business Ontology – Financial schema.org
• Tourism Structured Web Data (Co-Chair)
• Schema Course Extension
• Schema IoT Community
• Educational & Occupational Credentials in Schema.org
richard.wallis@dataliberate.com — @dataliberate
40+ Years – Computing
30+ Years – Cultural Heritage technology
20+ Years – Semantic Web & Linked Data
Worked With:
• Google – Schema.org vocabulary, site, extensions. documentation and community
• OCLC – Global library cooperative
• FIBO – Financial Industry Business Ontology Group
• Various Clients – Implementing/understanding Linked Data, Schema.org:
National Library Board Singapore
British Library — Stanford University — Europeana
2
3
National Library Board Singapore
Public Libraries
Network of 28 Public Libraries,
including 2 partner libraries*
Reading Programmes and Initiatives
Programmes and Exhibitions
targeted at Singapore communities
*Partner libraries are libraries which are partner owned
and
funded but managed by NLB/NLB’s subsidiary Libraries
and Archives Solutions Pte Ltd. Library@Chinatown and
the Lifelong Learning Institute Library are Partner libraries.
National Archives
Transferred from NHB to NLB in Nov
2012
Custodian of Singapore’s
Collective Memory: Responsible
for Collection, Preservation and
Management of
Singapore’s Public and Private
Archival Records
Promotes Public Interest in our
Nation’s
History and Heritage
National Library
Preserving Singapore’s Print
and Literary Heritage, and
Intellectual memory
Reference Collections
Legal Deposit (including
electronic)
4
Over
560,000
Singapore &
SEA items
Over 147,000
Chinese, Malay &
Tamil Languages
items
Reference Collection
Over 62,000
Social Sciences
& Humanities
items
Over 39,000
Science &
Technology
items
Over
53,000
Arts items
Over 19,000
Rare Materials
items
Archival Materials
Over 290,000
Government files &
Parliament papers
Over 190,000
Audiovisual & sound
recordings
Over 70,000
Maps & building
plans
Over
1.14m
Photographs
Over 35,000
Oral history
interviews
Over 55,000
Speeches & press
releases
Over
7,000
Posters
National Library Board Singapore
Over 5m
print collection
Over 2.4m
music tracks
78
databases
Over 7,400
e-newspapers and
e-magazines titles
Over
8,000
e-learning
courses
Over 1.7m
e-books and
audio books
Lending Collection
5
National Library Board Online Services
7
The Ambition
• To enable the discovery & display of entitles from different sources
in a combined interface
• To bring together resources physical and digital
• To bring together diverse systems across the National Library,
National Archives, and Public Libraries in a Linked Data Environment
• To provide a staff interface to view and manage all entities, their
descriptions and relationships
8
The Ambition
• To enable the discovery & display of entitles from different sources
in a combined interface
• To bring together resources physical and digital
• To bring together diverse systems across the National Library,
National Archives, and Public Libraries in a Linked Data Environment
• To provide a staff interface to view and manage all entities, their
descriptions and relationships
9
The Ambition
• To enable the discovery & display of entitles from different sources
in a combined interface
• To bring together resources physical and digital
• To bring together diverse systems across the National Library,
National Archives, and Public Libraries in a Linked Data Environment
• To provide a staff interface to view and manage all entities, their
descriptions and relationships
10
The Ambition
• To enable the discovery & display of entitles from different sources
in a combined interface
• To bring together resources physical and digital
• To bring together diverse systems across the National Library,
National Archives, and Public Libraries in a Linked Data Environment
• To provide a staff interface to view and manage all entities, their
descriptions and relationships
11
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
12
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
13
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
14
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
15
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
16
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
17
The Journey Starts(March 2021)
A Linked Data Management System (LDMS)
• Cloud-based System
• Management Interface
• Source data curated at source
– ILS, Authorities, CMS, NAS
• Linked Data
• Public Interface – Google friendly
– Schema.org
• Shared standards
– Bibframe, Schema.org
18
Contract Awarded
metaphactory platform
Low-code knowledge graph platform
Semantic knowledge modeling
Semantic search & discovery
AWS Partner
Public sector partner
Singapore based
Linked Data, Structured data, Semantic
Web, bibliographic meta data, Schema.org
and management systems consultant
19
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
20
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
21
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
22
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
23
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
24
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
25
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
26
Initial Challenges
• Data – lots of it!
– Record-based formats: MARC-XML, DC-XML, CSV
• Entities described in the data – lots and lots!
• Regular updates – daily, weekly, monthly
• Automatic ingestion
• Not the single source of truth
• Manual update capability
• Entity-based search & discovery
• Web friendly – Schema.org output
27
Basic Data Model
• Linked Data
– BIBFRAME to capture detail of bibliographic records
– Schema.org to deliver structured data for search engines
– Schema.org representation of CMS, NAS, TTE data
– Schema.org enrichment of BIBFRAME
• Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph
– All entities described using Schema.org as a minimum.
28
Basic Data Model
• Linked Data
– BIBFRAME to capture detail of bibliographic records
– Schema.org to deliver structured data for search engines
– Schema.org representation of CMS, NAS, TTE data
– Schema.org enrichment of BIBFRAME
• Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph
– All entities described using Schema.org as a minimum.
29
Basic Data Model
• Linked Data
– BIBFRAME to capture detail of bibliographic records
– Schema.org to deliver structured data for search engines
– Schema.org representation of CMS, NAS, TTE data
– Schema.org enrichment of BIBFRAME
• Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph
– All entities described using Schema.org as a minimum.
30
Basic Data Model
• Linked Data
– BIBFRAME to capture detail of bibliographic records
– Schema.org to deliver structured data for search engines
– Schema.org representation of CMS, NAS, TTE data
– Schema.org enrichment of BIBFRAME
• Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph
– All entities described using Schema.org as a minimum.
31
Basic Data Model
• Linked Data
– BIBFRAME to capture detail of bibliographic records
– Schema.org to deliver structured data for search engines
– Schema.org representation of CMS, NAS, TTE data
– Schema.org enrichment of BIBFRAME
• Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph
– All entities described using Schema.org as a minimum.
32
Data Data Data!
Data Source Source Records Entity Count Update Frequency
ILS 1.4m 56.8m Daily
CMS 82k 228k Weekly
NAS 1.6m 6.7m Monthly
TTE 3k 317k Monthly
3.1m 70.4m
33
Data Ingest Pipelines – ILS
• Source data in MARC-XML files
• Step 1: marc2bibframe2 scripts
– Open source – shared by Library of Congress
– “standard” approach
– Output BIBFRAME as individual RDF-XML files
• Step 2: bibframe2schema.org script
– Open source – Bibframe2Schema.org Community Group
– SPARQL-based script
– Output Schema.org enriched individual BIBFRAME RDF files for loading into
Knowledge graph triplestore
34
Data Ingest Pipelines – ILS
• Source data in MARC-XML files
• Step 1: marc2bibframe2 scripts
– Open source – shared by Library of Congress
– “standard” approach
– Output BIBFRAME as individual RDF-XML files
• Step 2: bibframe2schema.org script
– Open source – Bibframe2Schema.org Community Group
– SPARQL-based script
– Output Schema.org enriched individual BIBFRAME RDF files for loading into
Knowledge graph triplestore
35
Data Ingest Pipelines – ILS
• Source data in MARC-XML files
• Step 1: marc2bibframe2 scripts
– Open source – shared by Library of Congress
– “standard” approach
– Output BIBFRAME as individual RDF-XML files
• Step 2: bibframe2schema.org script
– Open source – Bibframe2Schema.org Community Group
– SPARQL-based script
– Output Schema.org enriched individual BIBFRAME RDF files for loading into
Knowledge graph triplestore
36
Data Ingestion Pipelines – TTE (Authorities)
• Source data in bespoke CSV format
• Exported in collection-based files
– People, Places, organisations, time periods, etc.
• Interrelated references so needed to be considered as a whole
• Bespoke python script
• Creating Schema.org entity descriptions
• Output RDF format file for loading into Knowledge graph
triplestore
37
Data Ingestion Pipelines – TTE (Authorities)
• Source data in bespoke CSV format
• Exported in collection-based files
– People, Places, organisations, time periods, etc.
• Interrelated references so needed to be considered as a whole
• Bespoke python script
• Creating Schema.org entity descriptions
• Output RDF format file for loading into Knowledge graph
triplestore
38
Data Ingestion Pipelines – TTE (Authorities)
• Source data in bespoke CSV format
• Exported in collection-based files
– People, Places, organisations, time periods, etc.
• Interrelated references so needed to be considered as a whole
• Bespoke python script
• Creating Schema.org entity descriptions
• Output RDF format file for loading into Knowledge graph
triplestore
39
Data Ingestion Pipelines – NAS & CMS
• Source data in DC-XML files
• Step 1: DC-XML to DC-Terms RDF
– Bespoke XSLT script to translate from DC record to DCT-RDF
• Step 2: TTE lookup
– Lookup identified values against TTE entities in Knowledge graph
• use URI if matched
• Step 3: Schema.org entity creation
– SPARQL script create schema representation of DCT entities
• Output individual RDF format files for loading into Knowledge graph
40
Data Ingestion Pipelines – NAS & CMS
• Source data in DC-XML files
• Step 1: DC-XML to DC-Terms RDF
– Bespoke XSLT script to translate from DC record to DCT-RDF
• Step 2: TTE lookup
– Lookup identified values against TTE entities in Knowledge graph
• use URI if matched
• Step 3: Schema.org entity creation
– SPARQL script create schema representation of DCT entities
• Output individual RDF format files for loading into Knowledge graph
41
Data Ingestion Pipelines – NAS & CMS
• Source data in DC-XML files
• Step 1: DC-XML to DC-Terms RDF
– Bespoke XSLT script to translate from DC record to DCT-RDF
• Step 2: TTE lookup
– Lookup identified values against TTE entities in Knowledge graph
• use URI if matched
• Step 3: Schema.org entity creation
– SPARQL script create schema representation of DCT entities
• Output individual RDF format files for loading into Knowledge graph
42
Data Ingestion Pipelines – NAS & CMS
• Source data in DC-XML files
• Step 1: DC-XML to DC-Terms RDF
– Bespoke XSLT script to translate from DC record to DCT-RDF
• Step 2: TTE lookup
– Lookup identified values against TTE entities in Knowledge graph
• use URI if matched
• Step 3: Schema.org entity creation
– SPARQL script create schema representation of DCT entities
• Output individual RDF format files for loading into Knowledge graph
43
Technical Architecture (simplified)
Hosted on Amazon Web Services
Batch Scripts
import control
Etc.
SOURCE DATA
IMPORT
44
Technical Architecture (simplified)
Hosted on Amazon Web Services
Pipeline
processing
Batch Scripts
import control
Etc.
SOURCE DATA
IMPORT
45
Technical Architecture (simplified)
Hosted on Amazon Web Services
GraphDB
Cluster
GraphDB
Cluster
GraphDB
Cluster
GraphDB
Cluster
Pipeline
processing
Batch Scripts
import control
Etc.
SOURCE DATA
IMPORT
46
PUBLIC ACCESS
Technical Architecture (simplified)
Hosted on Amazon Web Services
DI
GraphDB
Cluster
GraphDB
Cluster
GraphDB
Cluster
GraphDB
Cluster
Pipeline
processing
Batch Scripts
import control
Etc.
SOURCE DATA
IMPORT
47
CURATOR ACCESS
PUBLIC ACCESS
Technical Architecture (simplified)
Hosted on Amazon Web Services
DI
GraphDB
Cluster
GraphDB
Cluster
GraphDB
Cluster
GraphDB
Cluster
Pipeline
processing
Batch Scripts
import control
Etc.
SOURCE DATA
IMPORT
DMI
48
A need for entity reconciliation …..
• Lots (and lots and lots) of source entities – 70.4 million entities
• Lots of duplication
– Lee, Kuan Yew – 1st Prime Minister of Singapore
• 160 individual entities in ILS source data
– Singapore Art Museum
• Entities from source data
• 21 CMS, 1 NAS, 66 ILS, 1 TTE
• Users only want 1 of each!
49
A need for entity reconciliation …..
• Lots (and lots and lots) of source entities – 70.4 million entities
• Lots of duplication
– Lee, Kuan Yew – 1st Prime Minister of Singapore
• 160 individual entities in ILS source data
– Singapore Art Museum
• Entities from source data
• 21 CMS, 1 NAS, 66 ILS, 1 TTE
• Users only want 1 of each!
50
A need for entity reconciliation …..
• Lots (and lots and lots) of source entities – 70.4 million entities
• Lots of duplication
– Lee, Kuan Yew – 1st Prime Minister of Singapore
• 160 individual entities in ILS source data
– Singapore Art Museum
• Entities from source data
• 21 CMS, 1 NAS, 66 ILS, 1 TTE
• Users only want 1 of each!
51
A need for entity reconciliation …..
• Lots (and lots and lots) of source entities – 70.4 million entities
• Lots of duplication
– Lee, Kuan Yew – 1st Prime Minister of Singapore
• 160 individual entities in ILS source data
– Singapore Art Museum
• Entities from source data
• 21 CMS, 1 NAS, 66 ILS, 1 TTE
• Users only want 1 of each!
52
70.4(million) into 6.1(million) entities does go!
• We know that now - but how did we get there?
• Requirements hurdles to be cleared:
– Not a single source of truth
– Regular automatic updates – add / update / delete
– Manual management of combined entities
• Suppression of incorrect or ’private’ attributes from display
• Addition of attributes not in source data
• Creating / breaking relationships between entities
• Near real-time updates
53
70.4(million) into 6.1(million) entities does go!
• We know that now - but how did we get there?
• Requirements hurdles to be cleared:
– Not a single source of truth
– Regular automatic updates – add / update / delete
– Manual management of combined entities
• Suppression of incorrect or ’private’ attributes from display
• Addition of attributes not in source data
• Creating / breaking relationships between entities
• Near real-time updates
54
70.4(million) into 6.1(million) entities does go!
• We know that now - but how did we get there?
• Requirements hurdles to be cleared:
– Not a single source of truth
– Regular automatic updates – add / update / delete
– Manual management of combined entities
• Suppression of incorrect or ’private’ attributes from display
• Addition of attributes not in source data
• Creating / breaking relationships between entities
• Near real-time updates
55
70.4(million) into 6.1(million) entities does go!
• We know that now - but how did we get there?
• Requirements hurdles to be cleared:
– Not a single source of truth
– Regular automatic updates – add / update / delete
– Manual management of combined entities
• Suppression of incorrect or ’private’ attributes from display
• Addition of attributes not in source data
• Creating / breaking relationships between entities
• Near real-time updates
56
70.4(million) into 6.1(million) entities does go!
• We know that now - but how did we get there?
• Requirements hurdles to be cleared:
– Not a single source of truth
– Regular automatic updates – add / update / delete
– Manual management of combined entities
• Suppression of incorrect or ’private’ attributes from display
• Addition of attributes not in source data
• Creating / breaking relationships between entities
• Near real-time updates
57
Adaptive Data Model Concepts
• Source entitles
– Individual representation of source data
• Aggregation entities
– Tracking relationships between source entities for the same thing
– No copying of attributes
• Primary Entities
– Searchable by users
– Displayable to users
– Consolidation of aggregated source data & managed attributes
58
Adaptive Data Model Concepts
• Source entitles
– Individual representation of source data
• Aggregation entities
– Tracking relationships between source entities for the same thing
– No copying of attributes
• Primary Entities
– Searchable by users
– Displayable to users
– Consolidation of aggregated source data & managed attributes
59
Adaptive Data Model Concepts
• Source entitles
– Individual representation of source data
• Aggregation entities
– Tracking relationships between source entities for the same thing
– No copying of attributes
• Primary Entities
– Searchable by users
– Displayable to users
– Consolidation of aggregated source data & managed attributes
60
61
62
63
64
65
66
67
68
69
Entity Matching
• Candidate matches based on schema:names
– Lucene indexes matching
– Levenshtein similarity refined
– Same entity type
– Entity type specific rules eg:
• Work: name + creator / contributor / author / sameAs
• Person: name + birthDate / deathDate / sameAs
70
Entity Matching
• Candidate matches based on schema:names
– Lucene indexes matching
– Levenshtein similarity refined
– Same entity type
– Entity type specific rules eg:
• Work: name + creator / contributor / author / sameAs
• Person: name + birthDate / deathDate / sameAs
71
Entity Matching
• Candidate matches based on schema:names
– Lucene indexes matching
– Levenshtein similarity refined
– Same entity type
– Entity type specific rules eg:
• Work: name + creator / contributor / author / sameAs
• Person: name + birthDate / deathDate / sameAs
72
Entity Matching
• Candidate matches based on schema:names
– Lucene indexes matching
– Levenshtein similarity refined
– Same entity type
– Entity type specific rules eg:
• Work: name + creator / contributor / author / sameAs
• Person: name + birthDate / deathDate / sameAs
73
Entity Matching
• Candidate matches based on schema:names
– Lucene indexes matching
– Levenshtein similarity refined
– Same entity type
– Entity type specific rules eg:
• Work: name + creator / contributor / author / sameAs
• Person: name + birthDate / deathDate / sameAs
74
The entity iceberg
75
The entity iceberg
Primary
76
The entity iceberg
Primary
Discovery
77
The entity iceberg
Primary
Aggregation
Discovery
78
The entity iceberg
Primary
Aggregation
Source
Ingestion
Pipelines
Discovery
79
The entity iceberg
Primary
Aggregation
Source
Ingestion
Pipelines
Discovery
Management
80
Data Management Interface
• Rapidly developed – easily modified
• Using mataphactory’s semantic visual interface
• A collaboration environment for data curators
• Pre-publish entity management
• Search & discovery – emulates DI
• Dashboard view
• Entity management / Creation
81
Data Management Interface
• Rapidly developed – easily modified
• Using mataphactory’s semantic visual interface
• A collaboration environment for data curators
• Pre-publish entity management
• Search & discovery – emulates DI
• Dashboard view
• Entity management / Creation
82
Data Management Interface
• Rapidly developed – easily modified
• Using mataphactory’s semantic visual interface
• A collaboration environment for data curators
• Pre-publish entity management
• Search & discovery – emulates DI
• Dashboard view
• Entity management / Creation
83
Data Management Interface
• Rapidly developed – easily modified
• Using mataphactory’s semantic visual interface
• A collaboration environment for data curators
• Pre-publish entity management
• Search & discovery – emulates DI
• Dashboard view
• Entity management / Creation
84
Data Management Interface
• Rapidly developed – easily modified
• Using mataphactory’s semantic visual interface
• A collaboration environment for data curators
• Pre-publish entity management
• Search & discovery – emulates DI
• Dashboard view
• Entity management / Creation
85
Data Management Interface
• Rapidly developed – easily modified
• Using mataphactory’s semantic visual interface
• A collaboration environment for data curators
• Pre-publish entity management
• Search & discovery – emulates DI
• Dashboard view
• Entity management / Creation
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
Discovery Interface (DI)
104
105
106
107
108
109
110
111
LDMS System Live – December 2022
• Live – data curation team using DMI
– Entity curation
– Manually merging & splitting aggregations
• Live – data ingest pipelines
– Daily / Weekly / Monthly
– Delta update exports from source systems
• Live – Discovery Interface ready for deployment
• Live – Support
– Issue resolution & updates
112
LDMS System Live – December 2022
• Live – data curation team using DMI
– Entity curation
– Manually merging & splitting aggregations
• Live – data ingest pipelines
– Daily / Weekly / Monthly
– Delta update exports from source systems
• Live – Discovery Interface ready for deployment
• Live – Support
– Issue resolution & updates
113
LDMS System Live – December 2022
• Live – data curation team using DMI
– Entity curation
– Manually merging & splitting aggregations
• Live – data ingest pipelines
– Daily / Weekly / Monthly
– Delta update exports from source systems
• Live – Discovery Interface ready for deployment
• Live – Support
– Issue resolution & updates
114
LDMS System Live – December 2022
• Live – data curation team using DMI
– Entity curation
– Manually merging & splitting aggregations
• Live – data ingest pipelines
– Daily / Weekly / Monthly
– Delta update exports from source systems
• Live – Discovery Interface ready for deployment
• Live – Support
– Issue resolution & updates
115
Discovery Interface
• Live and ready for deployment
– Performance, load and user tested
• Designed & developed by NLB Data Team to demonstrate
characteristics and benefits of a Linked Data service
• NLB services rolled-out by National Library & Public Library divisions
– Recently identified a service for Linked Data integration
• Currently undergoing fine tuning to meet service’s UI requirements
116
Discovery Interface
• Live and ready for deployment
– Performance, load and user tested
• Designed & developed by NLB Data Team to demonstrate
characteristics and benefits of a Linked Data service
• NLB services rolled-out by National Library & Public Library divisions
– Recently identified a service for Linked Data integration
• Currently undergoing fine tuning to meet service’s UI requirements
117
Discovery Interface
• Live and ready for deployment
– Performance, load and user tested
• Designed & developed by NLB Data Team to demonstrate
characteristics and benefits of a Linked Data service
• NLB services rolled-out by National Library & Public Library divisions
– Recently identified a service for Linked Data integration
• Currently undergoing fine tuning to meet service’s UI requirements
118
Daily Delta Ingestion & Reconciliation
119
• Inaccurate source data
– Not apparent in source systems
– Very obvious when aggregated in LDMS
• Eg. Subject and Person URIs duplicated
– Identified and analysed in DMI
– Reported to source curators and fixed
– Updated in next delta update
– Data issues identified in LDMS – source data quality enhanced
Some Challenges Addressed on the Way
120
• Inaccurate source data
– Not apparent in source systems
– Very obvious when aggregated in LDMS
• Eg. Subject and Person URIs duplicated
– Identified and analysed in DMI
– Reported to source curators and fixed
– Updated in next delta update
– Data issues identified in LDMS – source data quality enhanced
Some Challenges Addressed on the Way
121
• Asian name matching
– Often consisting of several short 3-4 letter names
– Lucene matches not of much help
– Introduction of Levenshtein similarity matching helped
– Tuning is challenging – trading off European & Asian names
Some Challenges Addressed on the Way
122
• Asian name matching
– Often consisting of several short 3-4 letter names
– Lucene matches not of much help
– Introduction of Levenshtein similarity matching helped
– Tuning is challenging – trading off European & Asian names
Some Challenges Addressed on the Way
137
From Ambition to Go Live – A Summary
• Ambition based on previous years of trials, experimentation, and understanding of
Linked Data potential
• Ambition to deliver a production LDMS to provide Linked Data curation and
management capable of delivering a public discovery service
• With experienced commercial partners & tools as part of a globally distributed team
– Benefiting from Open Source community developments where appropriate
• Unique and challenging requirements
– Distributed sources of truth in disparate data formats
– Auto updating, consolidated entity view – plus individual entity management
– Web friendly – providing structured data for search engines
• Live and operational actively delivering benefits from December 2022
138
From Ambition to Go Live – A Summary
• Ambition based on previous years of trials, experimentation, and understanding of
Linked Data potential
• Ambition to deliver a production LDMS to provide Linked Data curation and
management capable of delivering a public discovery service
• With experienced commercial partners & tools as part of a globally distributed team
– Benefiting from Open Source community developments where appropriate
• Unique and challenging requirements
– Distributed sources of truth in disparate data formats
– Auto updating, consolidated entity view – plus individual entity management
– Web friendly – providing structured data for search engines
• Live and operational actively delivering benefits from December 2022
139
From Ambition to Go Live – A Summary
• Ambition based on previous years of trials, experimentation, and understanding of
Linked Data potential
• Ambition to deliver a production LDMS to provide Linked Data curation and
management capable of delivering a public discovery service
• With experienced commercial partners & tools as part of a globally distributed team
– Benefiting from Open Source community developments where appropriate
• Unique and challenging requirements
– Distributed sources of truth in disparate data formats
– Auto updating, consolidated entity view – plus individual entity management
– Web friendly – providing structured data for search engines
• Live and operational actively delivering benefits from December 2022
140
From Ambition to Go Live – A Summary
• Ambition based on previous years of trials, experimentation, and understanding of
Linked Data potential
• Ambition to deliver a production LDMS to provide Linked Data curation and
management capable of delivering a public discovery service
• With experienced commercial partners & tools as part of a globally distributed team
– Benefiting from Open Source community developments where appropriate
• Unique and challenging requirements
– Distributed sources of truth in disparate data formats
– Auto updating, consolidated entity view – plus individual entity management
– Web friendly – providing structured data for search engines
• Live and operational actively delivering benefits from December 2022
141
From Ambition to Go Live – A Summary
• Ambition based on previous years of trials, experimentation, and understanding of
Linked Data potential
• Ambition to deliver a production LDMS to provide Linked Data curation and
management capable of delivering a public discovery service
• With experienced commercial partners & tools as part of a globally distributed team
– Benefiting from Open Source community developments where appropriate
• Unique and challenging requirements
– Distributed sources of truth in disparate data formats
– Auto updating, consolidated entity view – plus individual entity management
– Web friendly – providing structured data for search engines
• Live and operational actively delivering benefits from December 2022
From Ambition to Go Live
The National Library Board of Singapore’s Journey
to an operational Linked Data Management
and Discovery System
Richard Wallis
Evangelist and Founder
Data Liberate
richard.wallis@dataliberate.com
@dataliberate
15th Semantic Web in Libraries Conference
SWIB23 - 12th September 2023 - Berlin
1 de 127

Recomendados

NAAC 2021 LIBRARY.pptx por
NAAC 2021 LIBRARY.pptxNAAC 2021 LIBRARY.pptx
NAAC 2021 LIBRARY.pptxDr.Jagadish Mysore Venkataraju
351 visualizações71 slides
Sepsis for nurses por
Sepsis for nursesSepsis for nurses
Sepsis for nursesDr fakhir Raza
6.3K visualizações11 slides
Thesis Writing Tips por
Thesis Writing TipsThesis Writing Tips
Thesis Writing Tipsmaryrob
777 visualizações10 slides
How to Write Literature Review Article por
How to Write Literature Review ArticleHow to Write Literature Review Article
How to Write Literature Review ArticleDaniel C
128 visualizações42 slides
Library Presentation for Naac Team : cutm por
Library Presentation for Naac Team : cutmLibrary Presentation for Naac Team : cutm
Library Presentation for Naac Team : cutmTapas Kumar Bhuyan
1.6K visualizações22 slides

Mais conteúdo relacionado

Similar a From Ambition to Go Live

Contextual Computing: Laying a Global Data Foundation por
Contextual Computing: Laying a Global Data FoundationContextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data FoundationRichard Wallis
868 visualizações129 slides
Contextual Computing - Knowledge Graphs & Web of Entities por
Contextual Computing - Knowledge Graphs & Web of EntitiesContextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of EntitiesRichard Wallis
2.8K visualizações135 slides
Schema.org: Where did that come from! por
Schema.org: Where did that come from!Schema.org: Where did that come from!
Schema.org: Where did that come from!Richard Wallis
1.7K visualizações90 slides
CILIP Conference - x metadata evolution the final mile - Richard Wallis por
CILIP Conference - x metadata evolution the final mile - Richard WallisCILIP Conference - x metadata evolution the final mile - Richard Wallis
CILIP Conference - x metadata evolution the final mile - Richard WallisCILIP
361 visualizações30 slides
Marc and beyond: 3 Linked Data Choices por
 Marc and beyond: 3 Linked Data Choices  Marc and beyond: 3 Linked Data Choices
Marc and beyond: 3 Linked Data Choices Richard Wallis
1.4K visualizações20 slides
LD4L OCLC Data Strategy por
LD4L OCLC Data StrategyLD4L OCLC Data Strategy
LD4L OCLC Data StrategyRichard Wallis
2.2K visualizações80 slides

Similar a From Ambition to Go Live(20)

Contextual Computing: Laying a Global Data Foundation por Richard Wallis
Contextual Computing: Laying a Global Data FoundationContextual Computing: Laying a Global Data Foundation
Contextual Computing: Laying a Global Data Foundation
Richard Wallis868 visualizações
Contextual Computing - Knowledge Graphs & Web of Entities por Richard Wallis
Contextual Computing - Knowledge Graphs & Web of EntitiesContextual Computing - Knowledge Graphs & Web of Entities
Contextual Computing - Knowledge Graphs & Web of Entities
Richard Wallis2.8K visualizações
Schema.org: Where did that come from! por Richard Wallis
Schema.org: Where did that come from!Schema.org: Where did that come from!
Schema.org: Where did that come from!
Richard Wallis1.7K visualizações
CILIP Conference - x metadata evolution the final mile - Richard Wallis por CILIP
CILIP Conference - x metadata evolution the final mile - Richard WallisCILIP Conference - x metadata evolution the final mile - Richard Wallis
CILIP Conference - x metadata evolution the final mile - Richard Wallis
CILIP361 visualizações
Marc and beyond: 3 Linked Data Choices por Richard Wallis
 Marc and beyond: 3 Linked Data Choices  Marc and beyond: 3 Linked Data Choices
Marc and beyond: 3 Linked Data Choices
Richard Wallis1.4K visualizações
LD4L OCLC Data Strategy por Richard Wallis
LD4L OCLC Data StrategyLD4L OCLC Data Strategy
LD4L OCLC Data Strategy
Richard Wallis2.2K visualizações
Linked Data: from Library Entities to the Web of Data por Richard Wallis
Linked Data: from Library Entities to the Web of DataLinked Data: from Library Entities to the Web of Data
Linked Data: from Library Entities to the Web of Data
Richard Wallis2.4K visualizações
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio... por Marcus Smith
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
Linked Open Data and The Digital Archaeological Workflow at the Swedish Natio...
Marcus Smith715 visualizações
Scaling up Linked Data por EUCLID project
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
EUCLID project18.5K visualizações
Telling the World and Our Users What We Have por Richard Wallis
Telling the World and Our Users What We HaveTelling the World and Our Users What We Have
Telling the World and Our Users What We Have
Richard Wallis1.7K visualizações
Three Linked Data choices for Libraries por Richard Wallis
Three Linked Data choices for LibrariesThree Linked Data choices for Libraries
Three Linked Data choices for Libraries
Richard Wallis636 visualizações
Scaling up Linked Data por Marin Dimitrov
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
Marin Dimitrov1.9K visualizações
WorldCat, Works, and Schema.org por Richard Wallis
WorldCat, Works, and Schema.orgWorldCat, Works, and Schema.org
WorldCat, Works, and Schema.org
Richard Wallis2.3K visualizações
Schema.org where did that come from? por Richard Wallis
Schema.org where did that come from?Schema.org where did that come from?
Schema.org where did that come from?
Richard Wallis735 visualizações
Entification: The Route to 'Useful' Library Data por Richard Wallis
Entification: The Route to 'Useful' Library DataEntification: The Route to 'Useful' Library Data
Entification: The Route to 'Useful' Library Data
Richard Wallis1.7K visualizações
The Web of Data is Our Opportunity por Richard Wallis
The Web of Data is Our OpportunityThe Web of Data is Our Opportunity
The Web of Data is Our Opportunity
Richard Wallis1.6K visualizações
Introduction to APIs and Linked Data por Adrian Stevenson
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked Data
Adrian Stevenson1.1K visualizações
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD... por ariadnenetwork
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
Achille Felicetti "Introduction to the Ariadne winter school and to the ARIAD...
ariadnenetwork314 visualizações

Mais de Richard Wallis

Structured Data: It's All About the Graph! por
Structured Data: It's All About the Graph!Structured Data: It's All About the Graph!
Structured Data: It's All About the Graph!Richard Wallis
692 visualizações35 slides
Schema.org Structured data the What, Why, & How por
Schema.org Structured data the What, Why, & HowSchema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & HowRichard Wallis
2.8K visualizações88 slides
Structured data: Where did that come from & why are Google asking for it por
Structured data: Where did that come from & why are Google asking for itStructured data: Where did that come from & why are Google asking for it
Structured data: Where did that come from & why are Google asking for itRichard Wallis
2K visualizações115 slides
FIBO & Schema.org por
FIBO & Schema.orgFIBO & Schema.org
FIBO & Schema.orgRichard Wallis
2K visualizações50 slides
Schema.org - An Extending Influence por
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending InfluenceRichard Wallis
1.8K visualizações125 slides
Schema.org - Extending Benefits por
Schema.org - Extending BenefitsSchema.org - Extending Benefits
Schema.org - Extending BenefitsRichard Wallis
5.4K visualizações121 slides

Mais de Richard Wallis(18)

Structured Data: It's All About the Graph! por Richard Wallis
Structured Data: It's All About the Graph!Structured Data: It's All About the Graph!
Structured Data: It's All About the Graph!
Richard Wallis692 visualizações
Schema.org Structured data the What, Why, & How por Richard Wallis
Schema.org Structured data the What, Why, & HowSchema.org Structured data the What, Why, & How
Schema.org Structured data the What, Why, & How
Richard Wallis2.8K visualizações
Structured data: Where did that come from & why are Google asking for it por Richard Wallis
Structured data: Where did that come from & why are Google asking for itStructured data: Where did that come from & why are Google asking for it
Structured data: Where did that come from & why are Google asking for it
Richard Wallis2K visualizações
FIBO & Schema.org por Richard Wallis
FIBO & Schema.orgFIBO & Schema.org
FIBO & Schema.org
Richard Wallis2K visualizações
Schema.org - An Extending Influence por Richard Wallis
Schema.org - An Extending InfluenceSchema.org - An Extending Influence
Schema.org - An Extending Influence
Richard Wallis1.8K visualizações
Schema.org - Extending Benefits por Richard Wallis
Schema.org - Extending BenefitsSchema.org - Extending Benefits
Schema.org - Extending Benefits
Richard Wallis5.4K visualizações
Identifying The Benefit of Linked Data por Richard Wallis
Identifying The Benefit of Linked DataIdentifying The Benefit of Linked Data
Identifying The Benefit of Linked Data
Richard Wallis3.5K visualizações
Web Driven Revolution For Library Data por Richard Wallis
Web Driven Revolution For Library DataWeb Driven Revolution For Library Data
Web Driven Revolution For Library Data
Richard Wallis1.4K visualizações
The Web of Data is Our Oyster por Richard Wallis
The Web of Data is Our OysterThe Web of Data is Our Oyster
The Web of Data is Our Oyster
Richard Wallis2K visualizações
Linked Data in Libraries por Richard Wallis
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
Richard Wallis3.5K visualizações
Links and Entities por Richard Wallis
Links and EntitiesLinks and Entities
Links and Entities
Richard Wallis1.6K visualizações
Schema.org: What It Means For You and Your Library por Richard Wallis
Schema.org: What It Means For You and Your LibrarySchema.org: What It Means For You and Your Library
Schema.org: What It Means For You and Your Library
Richard Wallis2.3K visualizações
Extending Schema.org por Richard Wallis
Extending Schema.orgExtending Schema.org
Extending Schema.org
Richard Wallis4.8K visualizações
Designing Linked Data Software & Services for Libraries por Richard Wallis
Designing Linked Data Software & Services for LibrariesDesigning Linked Data Software & Services for Libraries
Designing Linked Data Software & Services for Libraries
Richard Wallis2.1K visualizações
The Power of Sharing Linked Data: Bibliothekartag 2014 por Richard Wallis
The Power of Sharing Linked Data: Bibliothekartag 2014The Power of Sharing Linked Data: Bibliothekartag 2014
The Power of Sharing Linked Data: Bibliothekartag 2014
Richard Wallis991 visualizações
The Power of Sharing Linked Data - ELAG 2014 Workshop por Richard Wallis
The Power of Sharing Linked Data - ELAG 2014 WorkshopThe Power of Sharing Linked Data - ELAG 2014 Workshop
The Power of Sharing Linked Data - ELAG 2014 Workshop
Richard Wallis1.8K visualizações
The Simple Power of the Link - ELAG 2014 Workshop por Richard Wallis
The Simple Power of the Link - ELAG 2014 WorkshopThe Simple Power of the Link - ELAG 2014 Workshop
The Simple Power of the Link - ELAG 2014 Workshop
Richard Wallis1.2K visualizações
Why schema.org for Libraries por Richard Wallis
Why schema.org for LibrariesWhy schema.org for Libraries
Why schema.org for Libraries
Richard Wallis3K visualizações

Último

PROGRAMME.pdf por
PROGRAMME.pdfPROGRAMME.pdf
PROGRAMME.pdfHiNedHaJar
19 visualizações13 slides
How Leaders See Data? (Level 1) por
How Leaders See Data? (Level 1)How Leaders See Data? (Level 1)
How Leaders See Data? (Level 1)Narendra Narendra
13 visualizações76 slides
ColonyOS por
ColonyOSColonyOS
ColonyOSJohanKristiansson6
9 visualizações17 slides
Organic Shopping in Google Analytics 4.pdf por
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdfGA4 Tutorials
12 visualizações13 slides
3196 The Case of The East River por
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East RiverErickANDRADE90
12 visualizações4 slides
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf por
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdfVikas 500 BIG DATA TECHNOLOGIES LAB.pdf
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdfvikas12611618
8 visualizações30 slides

Último(20)

PROGRAMME.pdf por HiNedHaJar
PROGRAMME.pdfPROGRAMME.pdf
PROGRAMME.pdf
HiNedHaJar19 visualizações
How Leaders See Data? (Level 1) por Narendra Narendra
How Leaders See Data? (Level 1)How Leaders See Data? (Level 1)
How Leaders See Data? (Level 1)
Narendra Narendra13 visualizações
Organic Shopping in Google Analytics 4.pdf por GA4 Tutorials
Organic Shopping in Google Analytics 4.pdfOrganic Shopping in Google Analytics 4.pdf
Organic Shopping in Google Analytics 4.pdf
GA4 Tutorials12 visualizações
3196 The Case of The East River por ErickANDRADE90
3196 The Case of The East River3196 The Case of The East River
3196 The Case of The East River
ErickANDRADE9012 visualizações
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf por vikas12611618
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdfVikas 500 BIG DATA TECHNOLOGIES LAB.pdf
Vikas 500 BIG DATA TECHNOLOGIES LAB.pdf
vikas126116188 visualizações
Survey on Factuality in LLM's.pptx por NeethaSherra1
Survey on Factuality in LLM's.pptxSurvey on Factuality in LLM's.pptx
Survey on Factuality in LLM's.pptx
NeethaSherra15 visualizações
Binder1.pdf por EstherSita2
Binder1.pdfBinder1.pdf
Binder1.pdf
EstherSita210 visualizações
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation por DataScienceConferenc1
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
[DSC Europe 23] Spela Poklukar & Tea Brasanac - Retrieval Augmented Generation
DataScienceConferenc111 visualizações
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M... por DataScienceConferenc1
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
[DSC Europe 23] Milos Grubjesic Empowering Business with Pepsico s Advanced M...
DataScienceConferenc15 visualizações
UNEP FI CRS Climate Risk Results.pptx por pekka28
UNEP FI CRS Climate Risk Results.pptxUNEP FI CRS Climate Risk Results.pptx
UNEP FI CRS Climate Risk Results.pptx
pekka2811 visualizações
CRIJ4385_Death Penalty_F23.pptx por yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1006 visualizações
Chapter 3b- Process Communication (1) (1)(1) (1).pptx por ayeshabaig2004
Chapter 3b- Process Communication (1) (1)(1) (1).pptxChapter 3b- Process Communication (1) (1)(1) (1).pptx
Chapter 3b- Process Communication (1) (1)(1) (1).pptx
ayeshabaig20045 visualizações
Advanced_Recommendation_Systems_Presentation.pptx por neeharikasingh29
Advanced_Recommendation_Systems_Presentation.pptxAdvanced_Recommendation_Systems_Presentation.pptx
Advanced_Recommendation_Systems_Presentation.pptx
neeharikasingh295 visualizações
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx por JaysonGarabilesEspej
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docxRIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
RIO GRANDE SUPPLY COMPANY INC, JAYSON.docx
JaysonGarabilesEspej6 visualizações
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx por DataScienceConferenc1
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
[DSC Europe 23] Zsolt Feleki - Machine Translation should we trust it.pptx
DataScienceConferenc15 visualizações
Cross-network in Google Analytics 4.pdf por GA4 Tutorials
Cross-network in Google Analytics 4.pdfCross-network in Google Analytics 4.pdf
Cross-network in Google Analytics 4.pdf
GA4 Tutorials6 visualizações
TGP 2.docx por sandi636490
TGP 2.docxTGP 2.docx
TGP 2.docx
sandi63649010 visualizações
SUPER STORE SQL PROJECT.pptx por khan888620
SUPER STORE SQL PROJECT.pptxSUPER STORE SQL PROJECT.pptx
SUPER STORE SQL PROJECT.pptx
khan88862012 visualizações

From Ambition to Go Live

  • 1. From Ambition to Go Live The National Library Board of Singapore’s Journey to an operational Linked Data Management and Discovery System Richard Wallis Evangelist and Founder Data Liberate richard.wallis@dataliberate.com @dataliberate 15th Semantic Web in Libraries Conference SWIB23 - 12th September 2023 - Berlin
  • 2. Independent Consultant, Evangelist & Founder W3C Community Groups: • Bibframe2Schema (Chair) – Standardised conversion path(s) • Schema Bib Extend (Chair) - Bibliographic data • Schema Architypes (Chair) - Archives • Financial Industry Business Ontology – Financial schema.org • Tourism Structured Web Data (Co-Chair) • Schema Course Extension • Schema IoT Community • Educational & Occupational Credentials in Schema.org richard.wallis@dataliberate.com — @dataliberate 40+ Years – Computing 30+ Years – Cultural Heritage technology 20+ Years – Semantic Web & Linked Data Worked With: • Google – Schema.org vocabulary, site, extensions. documentation and community • OCLC – Global library cooperative • FIBO – Financial Industry Business Ontology Group • Various Clients – Implementing/understanding Linked Data, Schema.org: National Library Board Singapore British Library — Stanford University — Europeana 2
  • 3. 3 National Library Board Singapore Public Libraries Network of 28 Public Libraries, including 2 partner libraries* Reading Programmes and Initiatives Programmes and Exhibitions targeted at Singapore communities *Partner libraries are libraries which are partner owned and funded but managed by NLB/NLB’s subsidiary Libraries and Archives Solutions Pte Ltd. Library@Chinatown and the Lifelong Learning Institute Library are Partner libraries. National Archives Transferred from NHB to NLB in Nov 2012 Custodian of Singapore’s Collective Memory: Responsible for Collection, Preservation and Management of Singapore’s Public and Private Archival Records Promotes Public Interest in our Nation’s History and Heritage National Library Preserving Singapore’s Print and Literary Heritage, and Intellectual memory Reference Collections Legal Deposit (including electronic)
  • 4. 4 Over 560,000 Singapore & SEA items Over 147,000 Chinese, Malay & Tamil Languages items Reference Collection Over 62,000 Social Sciences & Humanities items Over 39,000 Science & Technology items Over 53,000 Arts items Over 19,000 Rare Materials items Archival Materials Over 290,000 Government files & Parliament papers Over 190,000 Audiovisual & sound recordings Over 70,000 Maps & building plans Over 1.14m Photographs Over 35,000 Oral history interviews Over 55,000 Speeches & press releases Over 7,000 Posters National Library Board Singapore Over 5m print collection Over 2.4m music tracks 78 databases Over 7,400 e-newspapers and e-magazines titles Over 8,000 e-learning courses Over 1.7m e-books and audio books Lending Collection
  • 5. 5 National Library Board Online Services
  • 6. 7 The Ambition • To enable the discovery & display of entitles from different sources in a combined interface • To bring together resources physical and digital • To bring together diverse systems across the National Library, National Archives, and Public Libraries in a Linked Data Environment • To provide a staff interface to view and manage all entities, their descriptions and relationships
  • 7. 8 The Ambition • To enable the discovery & display of entitles from different sources in a combined interface • To bring together resources physical and digital • To bring together diverse systems across the National Library, National Archives, and Public Libraries in a Linked Data Environment • To provide a staff interface to view and manage all entities, their descriptions and relationships
  • 8. 9 The Ambition • To enable the discovery & display of entitles from different sources in a combined interface • To bring together resources physical and digital • To bring together diverse systems across the National Library, National Archives, and Public Libraries in a Linked Data Environment • To provide a staff interface to view and manage all entities, their descriptions and relationships
  • 9. 10 The Ambition • To enable the discovery & display of entitles from different sources in a combined interface • To bring together resources physical and digital • To bring together diverse systems across the National Library, National Archives, and Public Libraries in a Linked Data Environment • To provide a staff interface to view and manage all entities, their descriptions and relationships
  • 10. 11 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 11. 12 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 12. 13 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 13. 14 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 14. 15 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 15. 16 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 16. 17 The Journey Starts(March 2021) A Linked Data Management System (LDMS) • Cloud-based System • Management Interface • Source data curated at source – ILS, Authorities, CMS, NAS • Linked Data • Public Interface – Google friendly – Schema.org • Shared standards – Bibframe, Schema.org
  • 17. 18 Contract Awarded metaphactory platform Low-code knowledge graph platform Semantic knowledge modeling Semantic search & discovery AWS Partner Public sector partner Singapore based Linked Data, Structured data, Semantic Web, bibliographic meta data, Schema.org and management systems consultant
  • 18. 19 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 19. 20 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 20. 21 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 21. 22 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 22. 23 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 23. 24 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 24. 25 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 25. 26 Initial Challenges • Data – lots of it! – Record-based formats: MARC-XML, DC-XML, CSV • Entities described in the data – lots and lots! • Regular updates – daily, weekly, monthly • Automatic ingestion • Not the single source of truth • Manual update capability • Entity-based search & discovery • Web friendly – Schema.org output
  • 26. 27 Basic Data Model • Linked Data – BIBFRAME to capture detail of bibliographic records – Schema.org to deliver structured data for search engines – Schema.org representation of CMS, NAS, TTE data – Schema.org enrichment of BIBFRAME • Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph – All entities described using Schema.org as a minimum.
  • 27. 28 Basic Data Model • Linked Data – BIBFRAME to capture detail of bibliographic records – Schema.org to deliver structured data for search engines – Schema.org representation of CMS, NAS, TTE data – Schema.org enrichment of BIBFRAME • Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph – All entities described using Schema.org as a minimum.
  • 28. 29 Basic Data Model • Linked Data – BIBFRAME to capture detail of bibliographic records – Schema.org to deliver structured data for search engines – Schema.org representation of CMS, NAS, TTE data – Schema.org enrichment of BIBFRAME • Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph – All entities described using Schema.org as a minimum.
  • 29. 30 Basic Data Model • Linked Data – BIBFRAME to capture detail of bibliographic records – Schema.org to deliver structured data for search engines – Schema.org representation of CMS, NAS, TTE data – Schema.org enrichment of BIBFRAME • Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph – All entities described using Schema.org as a minimum.
  • 30. 31 Basic Data Model • Linked Data – BIBFRAME to capture detail of bibliographic records – Schema.org to deliver structured data for search engines – Schema.org representation of CMS, NAS, TTE data – Schema.org enrichment of BIBFRAME • Schema.org as the ‘lingua franca’ vocabulary of the Knowledge graph – All entities described using Schema.org as a minimum.
  • 31. 32 Data Data Data! Data Source Source Records Entity Count Update Frequency ILS 1.4m 56.8m Daily CMS 82k 228k Weekly NAS 1.6m 6.7m Monthly TTE 3k 317k Monthly 3.1m 70.4m
  • 32. 33 Data Ingest Pipelines – ILS • Source data in MARC-XML files • Step 1: marc2bibframe2 scripts – Open source – shared by Library of Congress – “standard” approach – Output BIBFRAME as individual RDF-XML files • Step 2: bibframe2schema.org script – Open source – Bibframe2Schema.org Community Group – SPARQL-based script – Output Schema.org enriched individual BIBFRAME RDF files for loading into Knowledge graph triplestore
  • 33. 34 Data Ingest Pipelines – ILS • Source data in MARC-XML files • Step 1: marc2bibframe2 scripts – Open source – shared by Library of Congress – “standard” approach – Output BIBFRAME as individual RDF-XML files • Step 2: bibframe2schema.org script – Open source – Bibframe2Schema.org Community Group – SPARQL-based script – Output Schema.org enriched individual BIBFRAME RDF files for loading into Knowledge graph triplestore
  • 34. 35 Data Ingest Pipelines – ILS • Source data in MARC-XML files • Step 1: marc2bibframe2 scripts – Open source – shared by Library of Congress – “standard” approach – Output BIBFRAME as individual RDF-XML files • Step 2: bibframe2schema.org script – Open source – Bibframe2Schema.org Community Group – SPARQL-based script – Output Schema.org enriched individual BIBFRAME RDF files for loading into Knowledge graph triplestore
  • 35. 36 Data Ingestion Pipelines – TTE (Authorities) • Source data in bespoke CSV format • Exported in collection-based files – People, Places, organisations, time periods, etc. • Interrelated references so needed to be considered as a whole • Bespoke python script • Creating Schema.org entity descriptions • Output RDF format file for loading into Knowledge graph triplestore
  • 36. 37 Data Ingestion Pipelines – TTE (Authorities) • Source data in bespoke CSV format • Exported in collection-based files – People, Places, organisations, time periods, etc. • Interrelated references so needed to be considered as a whole • Bespoke python script • Creating Schema.org entity descriptions • Output RDF format file for loading into Knowledge graph triplestore
  • 37. 38 Data Ingestion Pipelines – TTE (Authorities) • Source data in bespoke CSV format • Exported in collection-based files – People, Places, organisations, time periods, etc. • Interrelated references so needed to be considered as a whole • Bespoke python script • Creating Schema.org entity descriptions • Output RDF format file for loading into Knowledge graph triplestore
  • 38. 39 Data Ingestion Pipelines – NAS & CMS • Source data in DC-XML files • Step 1: DC-XML to DC-Terms RDF – Bespoke XSLT script to translate from DC record to DCT-RDF • Step 2: TTE lookup – Lookup identified values against TTE entities in Knowledge graph • use URI if matched • Step 3: Schema.org entity creation – SPARQL script create schema representation of DCT entities • Output individual RDF format files for loading into Knowledge graph
  • 39. 40 Data Ingestion Pipelines – NAS & CMS • Source data in DC-XML files • Step 1: DC-XML to DC-Terms RDF – Bespoke XSLT script to translate from DC record to DCT-RDF • Step 2: TTE lookup – Lookup identified values against TTE entities in Knowledge graph • use URI if matched • Step 3: Schema.org entity creation – SPARQL script create schema representation of DCT entities • Output individual RDF format files for loading into Knowledge graph
  • 40. 41 Data Ingestion Pipelines – NAS & CMS • Source data in DC-XML files • Step 1: DC-XML to DC-Terms RDF – Bespoke XSLT script to translate from DC record to DCT-RDF • Step 2: TTE lookup – Lookup identified values against TTE entities in Knowledge graph • use URI if matched • Step 3: Schema.org entity creation – SPARQL script create schema representation of DCT entities • Output individual RDF format files for loading into Knowledge graph
  • 41. 42 Data Ingestion Pipelines – NAS & CMS • Source data in DC-XML files • Step 1: DC-XML to DC-Terms RDF – Bespoke XSLT script to translate from DC record to DCT-RDF • Step 2: TTE lookup – Lookup identified values against TTE entities in Knowledge graph • use URI if matched • Step 3: Schema.org entity creation – SPARQL script create schema representation of DCT entities • Output individual RDF format files for loading into Knowledge graph
  • 42. 43 Technical Architecture (simplified) Hosted on Amazon Web Services Batch Scripts import control Etc. SOURCE DATA IMPORT
  • 43. 44 Technical Architecture (simplified) Hosted on Amazon Web Services Pipeline processing Batch Scripts import control Etc. SOURCE DATA IMPORT
  • 44. 45 Technical Architecture (simplified) Hosted on Amazon Web Services GraphDB Cluster GraphDB Cluster GraphDB Cluster GraphDB Cluster Pipeline processing Batch Scripts import control Etc. SOURCE DATA IMPORT
  • 45. 46 PUBLIC ACCESS Technical Architecture (simplified) Hosted on Amazon Web Services DI GraphDB Cluster GraphDB Cluster GraphDB Cluster GraphDB Cluster Pipeline processing Batch Scripts import control Etc. SOURCE DATA IMPORT
  • 46. 47 CURATOR ACCESS PUBLIC ACCESS Technical Architecture (simplified) Hosted on Amazon Web Services DI GraphDB Cluster GraphDB Cluster GraphDB Cluster GraphDB Cluster Pipeline processing Batch Scripts import control Etc. SOURCE DATA IMPORT DMI
  • 47. 48 A need for entity reconciliation ….. • Lots (and lots and lots) of source entities – 70.4 million entities • Lots of duplication – Lee, Kuan Yew – 1st Prime Minister of Singapore • 160 individual entities in ILS source data – Singapore Art Museum • Entities from source data • 21 CMS, 1 NAS, 66 ILS, 1 TTE • Users only want 1 of each!
  • 48. 49 A need for entity reconciliation ….. • Lots (and lots and lots) of source entities – 70.4 million entities • Lots of duplication – Lee, Kuan Yew – 1st Prime Minister of Singapore • 160 individual entities in ILS source data – Singapore Art Museum • Entities from source data • 21 CMS, 1 NAS, 66 ILS, 1 TTE • Users only want 1 of each!
  • 49. 50 A need for entity reconciliation ….. • Lots (and lots and lots) of source entities – 70.4 million entities • Lots of duplication – Lee, Kuan Yew – 1st Prime Minister of Singapore • 160 individual entities in ILS source data – Singapore Art Museum • Entities from source data • 21 CMS, 1 NAS, 66 ILS, 1 TTE • Users only want 1 of each!
  • 50. 51 A need for entity reconciliation ….. • Lots (and lots and lots) of source entities – 70.4 million entities • Lots of duplication – Lee, Kuan Yew – 1st Prime Minister of Singapore • 160 individual entities in ILS source data – Singapore Art Museum • Entities from source data • 21 CMS, 1 NAS, 66 ILS, 1 TTE • Users only want 1 of each!
  • 51. 52 70.4(million) into 6.1(million) entities does go! • We know that now - but how did we get there? • Requirements hurdles to be cleared: – Not a single source of truth – Regular automatic updates – add / update / delete – Manual management of combined entities • Suppression of incorrect or ’private’ attributes from display • Addition of attributes not in source data • Creating / breaking relationships between entities • Near real-time updates
  • 52. 53 70.4(million) into 6.1(million) entities does go! • We know that now - but how did we get there? • Requirements hurdles to be cleared: – Not a single source of truth – Regular automatic updates – add / update / delete – Manual management of combined entities • Suppression of incorrect or ’private’ attributes from display • Addition of attributes not in source data • Creating / breaking relationships between entities • Near real-time updates
  • 53. 54 70.4(million) into 6.1(million) entities does go! • We know that now - but how did we get there? • Requirements hurdles to be cleared: – Not a single source of truth – Regular automatic updates – add / update / delete – Manual management of combined entities • Suppression of incorrect or ’private’ attributes from display • Addition of attributes not in source data • Creating / breaking relationships between entities • Near real-time updates
  • 54. 55 70.4(million) into 6.1(million) entities does go! • We know that now - but how did we get there? • Requirements hurdles to be cleared: – Not a single source of truth – Regular automatic updates – add / update / delete – Manual management of combined entities • Suppression of incorrect or ’private’ attributes from display • Addition of attributes not in source data • Creating / breaking relationships between entities • Near real-time updates
  • 55. 56 70.4(million) into 6.1(million) entities does go! • We know that now - but how did we get there? • Requirements hurdles to be cleared: – Not a single source of truth – Regular automatic updates – add / update / delete – Manual management of combined entities • Suppression of incorrect or ’private’ attributes from display • Addition of attributes not in source data • Creating / breaking relationships between entities • Near real-time updates
  • 56. 57 Adaptive Data Model Concepts • Source entitles – Individual representation of source data • Aggregation entities – Tracking relationships between source entities for the same thing – No copying of attributes • Primary Entities – Searchable by users – Displayable to users – Consolidation of aggregated source data & managed attributes
  • 57. 58 Adaptive Data Model Concepts • Source entitles – Individual representation of source data • Aggregation entities – Tracking relationships between source entities for the same thing – No copying of attributes • Primary Entities – Searchable by users – Displayable to users – Consolidation of aggregated source data & managed attributes
  • 58. 59 Adaptive Data Model Concepts • Source entitles – Individual representation of source data • Aggregation entities – Tracking relationships between source entities for the same thing – No copying of attributes • Primary Entities – Searchable by users – Displayable to users – Consolidation of aggregated source data & managed attributes
  • 59. 60
  • 60. 61
  • 61. 62
  • 62. 63
  • 63. 64
  • 64. 65
  • 65. 66
  • 66. 67
  • 67. 68
  • 68. 69 Entity Matching • Candidate matches based on schema:names – Lucene indexes matching – Levenshtein similarity refined – Same entity type – Entity type specific rules eg: • Work: name + creator / contributor / author / sameAs • Person: name + birthDate / deathDate / sameAs
  • 69. 70 Entity Matching • Candidate matches based on schema:names – Lucene indexes matching – Levenshtein similarity refined – Same entity type – Entity type specific rules eg: • Work: name + creator / contributor / author / sameAs • Person: name + birthDate / deathDate / sameAs
  • 70. 71 Entity Matching • Candidate matches based on schema:names – Lucene indexes matching – Levenshtein similarity refined – Same entity type – Entity type specific rules eg: • Work: name + creator / contributor / author / sameAs • Person: name + birthDate / deathDate / sameAs
  • 71. 72 Entity Matching • Candidate matches based on schema:names – Lucene indexes matching – Levenshtein similarity refined – Same entity type – Entity type specific rules eg: • Work: name + creator / contributor / author / sameAs • Person: name + birthDate / deathDate / sameAs
  • 72. 73 Entity Matching • Candidate matches based on schema:names – Lucene indexes matching – Levenshtein similarity refined – Same entity type – Entity type specific rules eg: • Work: name + creator / contributor / author / sameAs • Person: name + birthDate / deathDate / sameAs
  • 79. 80 Data Management Interface • Rapidly developed – easily modified • Using mataphactory’s semantic visual interface • A collaboration environment for data curators • Pre-publish entity management • Search & discovery – emulates DI • Dashboard view • Entity management / Creation
  • 80. 81 Data Management Interface • Rapidly developed – easily modified • Using mataphactory’s semantic visual interface • A collaboration environment for data curators • Pre-publish entity management • Search & discovery – emulates DI • Dashboard view • Entity management / Creation
  • 81. 82 Data Management Interface • Rapidly developed – easily modified • Using mataphactory’s semantic visual interface • A collaboration environment for data curators • Pre-publish entity management • Search & discovery – emulates DI • Dashboard view • Entity management / Creation
  • 82. 83 Data Management Interface • Rapidly developed – easily modified • Using mataphactory’s semantic visual interface • A collaboration environment for data curators • Pre-publish entity management • Search & discovery – emulates DI • Dashboard view • Entity management / Creation
  • 83. 84 Data Management Interface • Rapidly developed – easily modified • Using mataphactory’s semantic visual interface • A collaboration environment for data curators • Pre-publish entity management • Search & discovery – emulates DI • Dashboard view • Entity management / Creation
  • 84. 85 Data Management Interface • Rapidly developed – easily modified • Using mataphactory’s semantic visual interface • A collaboration environment for data curators • Pre-publish entity management • Search & discovery – emulates DI • Dashboard view • Entity management / Creation
  • 85. 86
  • 86. 87
  • 87. 88
  • 88. 89
  • 89. 90
  • 90. 91
  • 91. 92
  • 92. 93
  • 93. 94
  • 94. 95
  • 95. 96
  • 96. 97
  • 97. 98
  • 98. 99
  • 99. 100
  • 100. 101
  • 101. 102
  • 103. 104
  • 104. 105
  • 105. 106
  • 106. 107
  • 107. 108
  • 108. 109
  • 109. 110
  • 110. 111 LDMS System Live – December 2022 • Live – data curation team using DMI – Entity curation – Manually merging & splitting aggregations • Live – data ingest pipelines – Daily / Weekly / Monthly – Delta update exports from source systems • Live – Discovery Interface ready for deployment • Live – Support – Issue resolution & updates
  • 111. 112 LDMS System Live – December 2022 • Live – data curation team using DMI – Entity curation – Manually merging & splitting aggregations • Live – data ingest pipelines – Daily / Weekly / Monthly – Delta update exports from source systems • Live – Discovery Interface ready for deployment • Live – Support – Issue resolution & updates
  • 112. 113 LDMS System Live – December 2022 • Live – data curation team using DMI – Entity curation – Manually merging & splitting aggregations • Live – data ingest pipelines – Daily / Weekly / Monthly – Delta update exports from source systems • Live – Discovery Interface ready for deployment • Live – Support – Issue resolution & updates
  • 113. 114 LDMS System Live – December 2022 • Live – data curation team using DMI – Entity curation – Manually merging & splitting aggregations • Live – data ingest pipelines – Daily / Weekly / Monthly – Delta update exports from source systems • Live – Discovery Interface ready for deployment • Live – Support – Issue resolution & updates
  • 114. 115 Discovery Interface • Live and ready for deployment – Performance, load and user tested • Designed & developed by NLB Data Team to demonstrate characteristics and benefits of a Linked Data service • NLB services rolled-out by National Library & Public Library divisions – Recently identified a service for Linked Data integration • Currently undergoing fine tuning to meet service’s UI requirements
  • 115. 116 Discovery Interface • Live and ready for deployment – Performance, load and user tested • Designed & developed by NLB Data Team to demonstrate characteristics and benefits of a Linked Data service • NLB services rolled-out by National Library & Public Library divisions – Recently identified a service for Linked Data integration • Currently undergoing fine tuning to meet service’s UI requirements
  • 116. 117 Discovery Interface • Live and ready for deployment – Performance, load and user tested • Designed & developed by NLB Data Team to demonstrate characteristics and benefits of a Linked Data service • NLB services rolled-out by National Library & Public Library divisions – Recently identified a service for Linked Data integration • Currently undergoing fine tuning to meet service’s UI requirements
  • 117. 118 Daily Delta Ingestion & Reconciliation
  • 118. 119 • Inaccurate source data – Not apparent in source systems – Very obvious when aggregated in LDMS • Eg. Subject and Person URIs duplicated – Identified and analysed in DMI – Reported to source curators and fixed – Updated in next delta update – Data issues identified in LDMS – source data quality enhanced Some Challenges Addressed on the Way
  • 119. 120 • Inaccurate source data – Not apparent in source systems – Very obvious when aggregated in LDMS • Eg. Subject and Person URIs duplicated – Identified and analysed in DMI – Reported to source curators and fixed – Updated in next delta update – Data issues identified in LDMS – source data quality enhanced Some Challenges Addressed on the Way
  • 120. 121 • Asian name matching – Often consisting of several short 3-4 letter names – Lucene matches not of much help – Introduction of Levenshtein similarity matching helped – Tuning is challenging – trading off European & Asian names Some Challenges Addressed on the Way
  • 121. 122 • Asian name matching – Often consisting of several short 3-4 letter names – Lucene matches not of much help – Introduction of Levenshtein similarity matching helped – Tuning is challenging – trading off European & Asian names Some Challenges Addressed on the Way
  • 122. 137 From Ambition to Go Live – A Summary • Ambition based on previous years of trials, experimentation, and understanding of Linked Data potential • Ambition to deliver a production LDMS to provide Linked Data curation and management capable of delivering a public discovery service • With experienced commercial partners & tools as part of a globally distributed team – Benefiting from Open Source community developments where appropriate • Unique and challenging requirements – Distributed sources of truth in disparate data formats – Auto updating, consolidated entity view – plus individual entity management – Web friendly – providing structured data for search engines • Live and operational actively delivering benefits from December 2022
  • 123. 138 From Ambition to Go Live – A Summary • Ambition based on previous years of trials, experimentation, and understanding of Linked Data potential • Ambition to deliver a production LDMS to provide Linked Data curation and management capable of delivering a public discovery service • With experienced commercial partners & tools as part of a globally distributed team – Benefiting from Open Source community developments where appropriate • Unique and challenging requirements – Distributed sources of truth in disparate data formats – Auto updating, consolidated entity view – plus individual entity management – Web friendly – providing structured data for search engines • Live and operational actively delivering benefits from December 2022
  • 124. 139 From Ambition to Go Live – A Summary • Ambition based on previous years of trials, experimentation, and understanding of Linked Data potential • Ambition to deliver a production LDMS to provide Linked Data curation and management capable of delivering a public discovery service • With experienced commercial partners & tools as part of a globally distributed team – Benefiting from Open Source community developments where appropriate • Unique and challenging requirements – Distributed sources of truth in disparate data formats – Auto updating, consolidated entity view – plus individual entity management – Web friendly – providing structured data for search engines • Live and operational actively delivering benefits from December 2022
  • 125. 140 From Ambition to Go Live – A Summary • Ambition based on previous years of trials, experimentation, and understanding of Linked Data potential • Ambition to deliver a production LDMS to provide Linked Data curation and management capable of delivering a public discovery service • With experienced commercial partners & tools as part of a globally distributed team – Benefiting from Open Source community developments where appropriate • Unique and challenging requirements – Distributed sources of truth in disparate data formats – Auto updating, consolidated entity view – plus individual entity management – Web friendly – providing structured data for search engines • Live and operational actively delivering benefits from December 2022
  • 126. 141 From Ambition to Go Live – A Summary • Ambition based on previous years of trials, experimentation, and understanding of Linked Data potential • Ambition to deliver a production LDMS to provide Linked Data curation and management capable of delivering a public discovery service • With experienced commercial partners & tools as part of a globally distributed team – Benefiting from Open Source community developments where appropriate • Unique and challenging requirements – Distributed sources of truth in disparate data formats – Auto updating, consolidated entity view – plus individual entity management – Web friendly – providing structured data for search engines • Live and operational actively delivering benefits from December 2022
  • 127. From Ambition to Go Live The National Library Board of Singapore’s Journey to an operational Linked Data Management and Discovery System Richard Wallis Evangelist and Founder Data Liberate richard.wallis@dataliberate.com @dataliberate 15th Semantic Web in Libraries Conference SWIB23 - 12th September 2023 - Berlin