SlideShare uma empresa Scribd logo
1 de 76
Linked Library Datain the wild
Technical Lead for Prism Phil John Introductions...
So, what’s Prism then? Introductions...
a next generation discovery interface Prism Introductions
(yes…even configuration settings) Built entirely on Linked Data Prism
Discovery of library  catalogue resources Prism but grander plans afoot...
...some future sources... Prism ,[object Object]
 archives/records (e.g. DS Calm)
 thesis repositories
 rare items/special collections
 and more!,[object Object]
MARC 21    RDF Performs data conversion Prism
this ensures it keeps in sync with the LMS Initial “bulk” conversion then periodic “delta” files Prism
provided by a suite of RESTful web services Borrower/Availability data pulled from LMS “live” Prism
just add .rss to collectionsor .rdf/.nt/.ttl/.json to items Linked Data API Prism
The Challenges Prism
Extracting data from MARC 21 The Challenges
Some quotes... Extracting Data from MARC 21 ...cataloguers may want to look away now
...and even if it does, there are millions of existing records that we’ll want to convert MARC 21 is not going away anytime soon... Extracting Data from MARC 21
How are we approaching it? Extracting Data from MARC 21
By tackling it in small chunks! Extracting Data from MARC 21
We’ve created a solution that... Extracting Data from MARC 21 ,[object Object]
 compartmentalises code for different sections
 provides robustness
 is performant
 allows us to experiment ,[object Object]
fires events when it encounters a MARC 21 data structure; very strict with syntax MARC 21 Parser Extracting Data from MARC 21
listens for MARC 21 data structures and hands control over to one or more handlers Event Observer Extracting Data from MARC 21
know how to convert MARC 21structures and fields into linked data Bibliographic Handlers Extracting Data from MARC 21
So, where are we up to? Extracting Data from MARC 21
we tackled this one first as it allows us to reason more fully about the record Format (and duration) Extracting Data from MARC 21
In theory quite easy... Format
...in practice not so much... Format ,[object Object]
 DVD and LaserDisc share(d) a code
 LC slow(ish) to support new formats in M21
 limited use of control field (007) codings...
 ...so need to parse text from 3xx, 5xx fields,[object Object]
Which gives us...
an important part of the recordto model, or so I’ve been told Title Extracting Data from MARC 21
Quite tricky because... Title ,[object Object]
 ‡c must be last subfield in a 245...
 ...so sometimes data from ‡n / ‡p is in ‡c instead...
 ...which means we can’t just drop the ‡c ,[object Object]
Now with more title
sounds easy...acronyms from EAN to UPC describing 13 digit codes...right? Identifier Extracting Data from MARC 21
what are all those other things doing in the ‡a? ...STOP! Identifier
Identifier “For a hardbound resource, there is no attempt to use a consistent term other than to use one that conveys the condition intelligibly.” Library of Congress Rule Interpretation 1.8
(and then validate whatever’s left) So we need to parse them out Identifier
LDR: 01425ngm a22005058  4504 001: 750785 003: xxxxxxx 005: 20090824164118.0 007: vd||s|||| 008: 080623s2007    enk||| e          v|eng d 020:  ,   | $c Retail (S24.99) | 024: 3,   | $a 7321900108089 | 028: 4, 0 | $a BDY10808 | $b Warner Home Video | 029:  ,   | $a 7321900108089 | 082:  ,   | $a 812 245: 0, 0 | $a Goodfellas | $h [videorecording] / | $c directed by Martin Scorsese ; music by Christopher Brooks 260:  ,   | $b Warner Home Video, | $c 2007. | 300:  ,   | $a 1 Blu-Ray (139 min.) : | $b col. | 306:  ,   | $a 021900 | 366:  ,   | $b 20070611 | 511:  ,   | $a Starring Robert De Niro, Ray Liotta and Joe Pesci 521: 8,   | $a BBFC code: 18. | 538:  ,   | $a Blu-Ray. | 700: 1,   | $a Scorsese, Martin | 700: 1,   | $a Brooks, Christopher | 852:  ,   | $b John Harvard | $c BLU-RAY DISC | $m 18 | $z , $z Blu Ray Disc. 18Cert Phew, this one’s easy, no (pbk), (hbk) or even (pbk. , alk. paper) to contend with
Now we can start performing lookups against other sources!
hardest of the lot... Author Extracting Data from MARC 21
...why? Author ,[object Object]
 Rowling, J.K. vs Rowling, Joanne K.
 Few records with relator term in 100/700 ‡e...
 ...so we have to parse that from the 245 ‡c...
 ...and we don’t just deal with English records.,[object Object]
we’ve licensed the names/subjects authority files, and created RDF from them Library of Congress to the rescue! Author
LDR: 01425ngm a22005058  4504 001: 750785 003: xxxxxxx 005: 20090824164118.0 007: vd||s|||| 008: 080623s2007    enk||| e          v|eng d 020:  ,   | $c Retail (S24.99) | 024: 3,   | $a 7321900108089 | 028: 4, 0 | $a BDY10808 | $b Warner Home Video | 029:  ,   | $a 7321900108089 | 082:  ,   | $a 812 245: 0, 0 | $a Goodfellas | $h [videorecording] / | $c directed by Martin Scorsese ; music by Christopher Brooks 260:  ,   | $b Warner Home Video, | $c 2007. | 300:  ,   | $a 1 Blu-Ray (139 min.) : | $b col. | 306:  ,   | $a 021900 | 366:  ,   | $b 20070611 | 511:  ,   | $a Starring Robert De Niro, Ray Liotta and Joe Pesci 521: 8,   | $a BBFC code: 18. | 538:  ,   | $a Blu-Ray. | 700: 1,   | $a Scorsese, Martin | 700: 1,   | $a Brooks, Christopher | $e music 852:  ,   | $b John Harvard | $c BLU-RAY DISC | $m 18 | $z , $z Blu Ray Disc. 18Cert A contrived example (sorry!) with and without relator terms
Hope you can all read this at the back!
A closer look at Authority Matching Author
Some requirements: Author ,[object Object]
 ...(able to process 2M records in several hours)
 requires accuracy
 must handle pseudonyms and variant spellings,[object Object]
You can tell J.K. Rowling is successful, she’s been translated lots
Language/Alternate Graphical Representation Extracting Data from MARC 21
Nice “high impact” feature Language ,[object Object]

Mais conteúdo relacionado

Semelhante a Linked Library Data in the wild

SHARE Interface in Flash Storage for Relational and NoSQL Databases
SHARE Interface in Flash Storage for Relational and NoSQL DatabasesSHARE Interface in Flash Storage for Relational and NoSQL Databases
SHARE Interface in Flash Storage for Relational and NoSQL DatabasesFarzad Nozarian
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?Jeremy Schneider
 
All About Storeconfigs
All About StoreconfigsAll About Storeconfigs
All About StoreconfigsBrice Figureau
 
Introduction to Transcoding: Tools and Processes
Introduction to Transcoding: Tools and ProcessesIntroduction to Transcoding: Tools and Processes
Introduction to Transcoding: Tools and ProcessesPrestoCentre
 
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014Amazon Web Services
 
IBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance AnalysisIBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance Analysisbrettallison
 
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsBuilding an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsAmazon Web Services
 
Tips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software EngineeringTips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software Engineeringjtdudley
 
Avtex Lync 2013 Event - Fargo
Avtex Lync 2013 Event - FargoAvtex Lync 2013 Event - Fargo
Avtex Lync 2013 Event - FargoAvtex
 
Data Alchemy: Turn your Data into Gold
Data Alchemy: Turn your Data into GoldData Alchemy: Turn your Data into Gold
Data Alchemy: Turn your Data into GoldSøren Schaffstein
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks
 
15 bufferand records
15 bufferand records15 bufferand records
15 bufferand recordsashish61_scs
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility bufferWolfgang Engel
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGDuyhai Doan
 
DynamoDB as a Secondary Language - Pop-up Loft Tel Aviv
DynamoDB as a Secondary Language - Pop-up Loft Tel AvivDynamoDB as a Secondary Language - Pop-up Loft Tel Aviv
DynamoDB as a Secondary Language - Pop-up Loft Tel AvivAmazon Web Services
 

Semelhante a Linked Library Data in the wild (20)

PAL
PALPAL
PAL
 
SHARE Interface in Flash Storage for Relational and NoSQL Databases
SHARE Interface in Flash Storage for Relational and NoSQL DatabasesSHARE Interface in Flash Storage for Relational and NoSQL Databases
SHARE Interface in Flash Storage for Relational and NoSQL Databases
 
Cwmg
CwmgCwmg
Cwmg
 
CouchDB
CouchDBCouchDB
CouchDB
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 
All About Storeconfigs
All About StoreconfigsAll About Storeconfigs
All About Storeconfigs
 
Introduction to Transcoding: Tools and Processes
Introduction to Transcoding: Tools and ProcessesIntroduction to Transcoding: Tools and Processes
Introduction to Transcoding: Tools and Processes
 
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
 
IBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance AnalysisIBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance Analysis
 
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsBuilding an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
 
Tips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software EngineeringTips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software Engineering
 
Avtex Lync 2013 Event - Fargo
Avtex Lync 2013 Event - FargoAvtex Lync 2013 Event - Fargo
Avtex Lync 2013 Event - Fargo
 
Data Alchemy: Turn your Data into Gold
Data Alchemy: Turn your Data into GoldData Alchemy: Turn your Data into Gold
Data Alchemy: Turn your Data into Gold
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
unit 5.ppt
unit 5.pptunit 5.ppt
unit 5.ppt
 
15 bufferand records
15 bufferand records15 bufferand records
15 bufferand records
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
No more dumb hex!
No more dumb hex!No more dumb hex!
No more dumb hex!
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ ING
 
DynamoDB as a Secondary Language - Pop-up Loft Tel Aviv
DynamoDB as a Secondary Language - Pop-up Loft Tel AvivDynamoDB as a Secondary Language - Pop-up Loft Tel Aviv
DynamoDB as a Secondary Language - Pop-up Loft Tel Aviv
 

Último

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Último (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Linked Library Data in the wild

  • 2. Technical Lead for Prism Phil John Introductions...
  • 3. So, what’s Prism then? Introductions...
  • 4.
  • 5.
  • 6.
  • 7. a next generation discovery interface Prism Introductions
  • 8. (yes…even configuration settings) Built entirely on Linked Data Prism
  • 9. Discovery of library catalogue resources Prism but grander plans afoot...
  • 10.
  • 13. rare items/special collections
  • 14.
  • 15. MARC 21 RDF Performs data conversion Prism
  • 16. this ensures it keeps in sync with the LMS Initial “bulk” conversion then periodic “delta” files Prism
  • 17. provided by a suite of RESTful web services Borrower/Availability data pulled from LMS “live” Prism
  • 18. just add .rss to collectionsor .rdf/.nt/.ttl/.json to items Linked Data API Prism
  • 19.
  • 20.
  • 21.
  • 23. Extracting data from MARC 21 The Challenges
  • 24. Some quotes... Extracting Data from MARC 21 ...cataloguers may want to look away now
  • 25.
  • 26. ...and even if it does, there are millions of existing records that we’ll want to convert MARC 21 is not going away anytime soon... Extracting Data from MARC 21
  • 27.
  • 28. How are we approaching it? Extracting Data from MARC 21
  • 29. By tackling it in small chunks! Extracting Data from MARC 21
  • 30.
  • 31. compartmentalises code for different sections
  • 34.
  • 35. fires events when it encounters a MARC 21 data structure; very strict with syntax MARC 21 Parser Extracting Data from MARC 21
  • 36. listens for MARC 21 data structures and hands control over to one or more handlers Event Observer Extracting Data from MARC 21
  • 37. know how to convert MARC 21structures and fields into linked data Bibliographic Handlers Extracting Data from MARC 21
  • 38. So, where are we up to? Extracting Data from MARC 21
  • 39. we tackled this one first as it allows us to reason more fully about the record Format (and duration) Extracting Data from MARC 21
  • 40. In theory quite easy... Format
  • 41.
  • 42. DVD and LaserDisc share(d) a code
  • 43. LC slow(ish) to support new formats in M21
  • 44. limited use of control field (007) codings...
  • 45.
  • 47. an important part of the recordto model, or so I’ve been told Title Extracting Data from MARC 21
  • 48.
  • 49. ‡c must be last subfield in a 245...
  • 50. ...so sometimes data from ‡n / ‡p is in ‡c instead...
  • 51.
  • 52. Now with more title
  • 53. sounds easy...acronyms from EAN to UPC describing 13 digit codes...right? Identifier Extracting Data from MARC 21
  • 54. what are all those other things doing in the ‡a? ...STOP! Identifier
  • 55. Identifier “For a hardbound resource, there is no attempt to use a consistent term other than to use one that conveys the condition intelligibly.” Library of Congress Rule Interpretation 1.8
  • 56.
  • 57. (and then validate whatever’s left) So we need to parse them out Identifier
  • 58. LDR: 01425ngm a22005058 4504 001: 750785 003: xxxxxxx 005: 20090824164118.0 007: vd||s|||| 008: 080623s2007 enk||| e v|eng d 020: , | $c Retail (S24.99) | 024: 3, | $a 7321900108089 | 028: 4, 0 | $a BDY10808 | $b Warner Home Video | 029: , | $a 7321900108089 | 082: , | $a 812 245: 0, 0 | $a Goodfellas | $h [videorecording] / | $c directed by Martin Scorsese ; music by Christopher Brooks 260: , | $b Warner Home Video, | $c 2007. | 300: , | $a 1 Blu-Ray (139 min.) : | $b col. | 306: , | $a 021900 | 366: , | $b 20070611 | 511: , | $a Starring Robert De Niro, Ray Liotta and Joe Pesci 521: 8, | $a BBFC code: 18. | 538: , | $a Blu-Ray. | 700: 1, | $a Scorsese, Martin | 700: 1, | $a Brooks, Christopher | 852: , | $b John Harvard | $c BLU-RAY DISC | $m 18 | $z , $z Blu Ray Disc. 18Cert Phew, this one’s easy, no (pbk), (hbk) or even (pbk. , alk. paper) to contend with
  • 59. Now we can start performing lookups against other sources!
  • 60. hardest of the lot... Author Extracting Data from MARC 21
  • 61.
  • 62. Rowling, J.K. vs Rowling, Joanne K.
  • 63. Few records with relator term in 100/700 ‡e...
  • 64. ...so we have to parse that from the 245 ‡c...
  • 65.
  • 66. we’ve licensed the names/subjects authority files, and created RDF from them Library of Congress to the rescue! Author
  • 67. LDR: 01425ngm a22005058 4504 001: 750785 003: xxxxxxx 005: 20090824164118.0 007: vd||s|||| 008: 080623s2007 enk||| e v|eng d 020: , | $c Retail (S24.99) | 024: 3, | $a 7321900108089 | 028: 4, 0 | $a BDY10808 | $b Warner Home Video | 029: , | $a 7321900108089 | 082: , | $a 812 245: 0, 0 | $a Goodfellas | $h [videorecording] / | $c directed by Martin Scorsese ; music by Christopher Brooks 260: , | $b Warner Home Video, | $c 2007. | 300: , | $a 1 Blu-Ray (139 min.) : | $b col. | 306: , | $a 021900 | 366: , | $b 20070611 | 511: , | $a Starring Robert De Niro, Ray Liotta and Joe Pesci 521: 8, | $a BBFC code: 18. | 538: , | $a Blu-Ray. | 700: 1, | $a Scorsese, Martin | 700: 1, | $a Brooks, Christopher | $e music 852: , | $b John Harvard | $c BLU-RAY DISC | $m 18 | $z , $z Blu Ray Disc. 18Cert A contrived example (sorry!) with and without relator terms
  • 68. Hope you can all read this at the back!
  • 69. A closer look at Authority Matching Author
  • 70.
  • 71. ...(able to process 2M records in several hours)
  • 73.
  • 74. You can tell J.K. Rowling is successful, she’s been translated lots
  • 75. Language/Alternate Graphical Representation Extracting Data from MARC 21
  • 76.
  • 77. both forms can be searched for
  • 78.
  • 79. tagged with an ISO-639-2 language and masquerading as the field listed in ‡6 Passes 880s back into Observer Language
  • 81.
  • 82.
  • 83.
  • 84. it’s part of the reason we use Linked Data...but it’s got some challenges at the moment Using/Linking to External Datasets The Challenges
  • 85.
  • 86. ...or worse, is taken offline permanently?
  • 87. can we trust this data?
  • 88.
  • 89. ...or, if that’s not practical, proxy requests using a caching proxy such as Squid
  • 90. if using Wikipedia and worried about vandalism...
  • 91.
  • 92. ...or – what we’d like to seehappen to Linked Library Data The Future...
  • 93. especially on the peripheries – authority data, author information, links to other resources More library data as LOD The Future
  • 94. seriously – this would makeour lives so much simpler LMS vendors adopting LOD The Future
  • 95. LOD replacing MARC 21 as the standard representation of bibliographic records The Future
  • 96.
  • 97. Photo Credits Slide 15 - http://www.flickr.com/photos/gammaman/5241860326/ Slide 21 - http://www.flickr.com/photos/agizienski/3778965891/ Slide 40 - http://www.flickr.com/photos/54409200@N04/5070012761/ Slide 42 - http://www.flickr.com/photos/proimos/4199675334/ Slide 48 - http://www.flickr.com/photos/maveric2003/91198458/ Slide 63 - http://richard.cyganiak.de/2007/10/lod/ Slide 67 - http://www.flickr.com/photos/markchapmanphoto/5139429152/ Slide 72 - http://www.flickr.com/photos/-bast-/349497988/