SlideShare uma empresa Scribd logo
1 de 76
Linked Library Datain the wild
Technical Lead for Prism Phil John Introductions...
So, what’s Prism then? Introductions...
a next generation discovery interface Prism Introductions
(yes…even configuration settings) Built entirely on Linked Data Prism
Discovery of library  catalogue resources Prism but grander plans afoot...
...some future sources... Prism ,[object Object]
 archives/records (e.g. DS Calm)
 thesis repositories
 rare items/special collections
 and more!,[object Object]
MARC 21    RDF Performs data conversion Prism
this ensures it keeps in sync with the LMS Initial “bulk” conversion then periodic “delta” files Prism
provided by a suite of RESTful web services Borrower/Availability data pulled from LMS “live” Prism
just add .rss to collectionsor .rdf/.nt/.ttl/.json to items Linked Data API Prism
The Challenges Prism
Extracting data from MARC 21 The Challenges
Some quotes... Extracting Data from MARC 21 ...cataloguers may want to look away now
...and even if it does, there are millions of existing records that we’ll want to convert MARC 21 is not going away anytime soon... Extracting Data from MARC 21
How are we approaching it? Extracting Data from MARC 21
By tackling it in small chunks! Extracting Data from MARC 21
We’ve created a solution that... Extracting Data from MARC 21 ,[object Object]
 compartmentalises code for different sections
 provides robustness
 is performant
 allows us to experiment ,[object Object]
fires events when it encounters a MARC 21 data structure; very strict with syntax MARC 21 Parser Extracting Data from MARC 21
listens for MARC 21 data structures and hands control over to one or more handlers Event Observer Extracting Data from MARC 21
know how to convert MARC 21structures and fields into linked data Bibliographic Handlers Extracting Data from MARC 21
So, where are we up to? Extracting Data from MARC 21
we tackled this one first as it allows us to reason more fully about the record Format (and duration) Extracting Data from MARC 21
In theory quite easy... Format
...in practice not so much... Format ,[object Object]
 DVD and LaserDisc share(d) a code
 LC slow(ish) to support new formats in M21
 limited use of control field (007) codings...
 ...so need to parse text from 3xx, 5xx fields,[object Object]
Which gives us...
an important part of the recordto model, or so I’ve been told Title Extracting Data from MARC 21
Quite tricky because... Title ,[object Object]
 ‡c must be last subfield in a 245...
 ...so sometimes data from ‡n / ‡p is in ‡c instead...
 ...which means we can’t just drop the ‡c ,[object Object]
Now with more title
sounds easy...acronyms from EAN to UPC describing 13 digit codes...right? Identifier Extracting Data from MARC 21
what are all those other things doing in the ‡a? ...STOP! Identifier
Identifier “For a hardbound resource, there is no attempt to use a consistent term other than to use one that conveys the condition intelligibly.” Library of Congress Rule Interpretation 1.8
(and then validate whatever’s left) So we need to parse them out Identifier
LDR: 01425ngm a22005058  4504 001: 750785 003: xxxxxxx 005: 20090824164118.0 007: vd||s|||| 008: 080623s2007    enk||| e          v|eng d 020:  ,   | $c Retail (S24.99) | 024: 3,   | $a 7321900108089 | 028: 4, 0 | $a BDY10808 | $b Warner Home Video | 029:  ,   | $a 7321900108089 | 082:  ,   | $a 812 245: 0, 0 | $a Goodfellas | $h [videorecording] / | $c directed by Martin Scorsese ; music by Christopher Brooks 260:  ,   | $b Warner Home Video, | $c 2007. | 300:  ,   | $a 1 Blu-Ray (139 min.) : | $b col. | 306:  ,   | $a 021900 | 366:  ,   | $b 20070611 | 511:  ,   | $a Starring Robert De Niro, Ray Liotta and Joe Pesci 521: 8,   | $a BBFC code: 18. | 538:  ,   | $a Blu-Ray. | 700: 1,   | $a Scorsese, Martin | 700: 1,   | $a Brooks, Christopher | 852:  ,   | $b John Harvard | $c BLU-RAY DISC | $m 18 | $z , $z Blu Ray Disc. 18Cert Phew, this one’s easy, no (pbk), (hbk) or even (pbk. , alk. paper) to contend with
Now we can start performing lookups against other sources!
hardest of the lot... Author Extracting Data from MARC 21
...why? Author ,[object Object]
 Rowling, J.K. vs Rowling, Joanne K.
 Few records with relator term in 100/700 ‡e...
 ...so we have to parse that from the 245 ‡c...
 ...and we don’t just deal with English records.,[object Object]
we’ve licensed the names/subjects authority files, and created RDF from them Library of Congress to the rescue! Author
LDR: 01425ngm a22005058  4504 001: 750785 003: xxxxxxx 005: 20090824164118.0 007: vd||s|||| 008: 080623s2007    enk||| e          v|eng d 020:  ,   | $c Retail (S24.99) | 024: 3,   | $a 7321900108089 | 028: 4, 0 | $a BDY10808 | $b Warner Home Video | 029:  ,   | $a 7321900108089 | 082:  ,   | $a 812 245: 0, 0 | $a Goodfellas | $h [videorecording] / | $c directed by Martin Scorsese ; music by Christopher Brooks 260:  ,   | $b Warner Home Video, | $c 2007. | 300:  ,   | $a 1 Blu-Ray (139 min.) : | $b col. | 306:  ,   | $a 021900 | 366:  ,   | $b 20070611 | 511:  ,   | $a Starring Robert De Niro, Ray Liotta and Joe Pesci 521: 8,   | $a BBFC code: 18. | 538:  ,   | $a Blu-Ray. | 700: 1,   | $a Scorsese, Martin | 700: 1,   | $a Brooks, Christopher | $e music 852:  ,   | $b John Harvard | $c BLU-RAY DISC | $m 18 | $z , $z Blu Ray Disc. 18Cert A contrived example (sorry!) with and without relator terms
Hope you can all read this at the back!
A closer look at Authority Matching Author
Some requirements: Author ,[object Object]
 ...(able to process 2M records in several hours)
 requires accuracy
 must handle pseudonyms and variant spellings,[object Object]
You can tell J.K. Rowling is successful, she’s been translated lots
Language/Alternate Graphical Representation Extracting Data from MARC 21
Nice “high impact” feature Language ,[object Object]

Mais conteúdo relacionado

Semelhante a Linked Library Data in the wild

SHARE Interface in Flash Storage for Relational and NoSQL Databases
SHARE Interface in Flash Storage for Relational and NoSQL DatabasesSHARE Interface in Flash Storage for Relational and NoSQL Databases
SHARE Interface in Flash Storage for Relational and NoSQL DatabasesFarzad Nozarian
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?Jeremy Schneider
 
All About Storeconfigs
All About StoreconfigsAll About Storeconfigs
All About StoreconfigsBrice Figureau
 
Introduction to Transcoding: Tools and Processes
Introduction to Transcoding: Tools and ProcessesIntroduction to Transcoding: Tools and Processes
Introduction to Transcoding: Tools and ProcessesPrestoCentre
 
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014Amazon Web Services
 
IBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance AnalysisIBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance Analysisbrettallison
 
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsBuilding an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsAmazon Web Services
 
Tips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software EngineeringTips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software Engineeringjtdudley
 
Avtex Lync 2013 Event - Fargo
Avtex Lync 2013 Event - FargoAvtex Lync 2013 Event - Fargo
Avtex Lync 2013 Event - FargoAvtex
 
Data Alchemy: Turn your Data into Gold
Data Alchemy: Turn your Data into GoldData Alchemy: Turn your Data into Gold
Data Alchemy: Turn your Data into GoldSøren Schaffstein
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks
 
15 bufferand records
15 bufferand records15 bufferand records
15 bufferand recordsashish61_scs
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility bufferWolfgang Engel
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGDuyhai Doan
 
DynamoDB as a Secondary Language - Pop-up Loft Tel Aviv
DynamoDB as a Secondary Language - Pop-up Loft Tel AvivDynamoDB as a Secondary Language - Pop-up Loft Tel Aviv
DynamoDB as a Secondary Language - Pop-up Loft Tel AvivAmazon Web Services
 

Semelhante a Linked Library Data in the wild (20)

PAL
PALPAL
PAL
 
SHARE Interface in Flash Storage for Relational and NoSQL Databases
SHARE Interface in Flash Storage for Relational and NoSQL DatabasesSHARE Interface in Flash Storage for Relational and NoSQL Databases
SHARE Interface in Flash Storage for Relational and NoSQL Databases
 
Cwmg
CwmgCwmg
Cwmg
 
CouchDB
CouchDBCouchDB
CouchDB
 
String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?String Comparison Surprises: Did Postgres lose my data?
String Comparison Surprises: Did Postgres lose my data?
 
All About Storeconfigs
All About StoreconfigsAll About Storeconfigs
All About Storeconfigs
 
Introduction to Transcoding: Tools and Processes
Introduction to Transcoding: Tools and ProcessesIntroduction to Transcoding: Tools and Processes
Introduction to Transcoding: Tools and Processes
 
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
(BDT203) From Zero to NoSQL Hero: Amazon DynamoDB Tutorial | AWS re:Invent 2014
 
IBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance AnalysisIBM SAN Volume Controller Performance Analysis
IBM SAN Volume Controller Performance Analysis
 
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics ToolsBuilding an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
Building an Amazon Datawarehouse and Using Business Intelligence Analytics Tools
 
Tips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software EngineeringTips And Tricks For Bioinformatics Software Engineering
Tips And Tricks For Bioinformatics Software Engineering
 
Avtex Lync 2013 Event - Fargo
Avtex Lync 2013 Event - FargoAvtex Lync 2013 Event - Fargo
Avtex Lync 2013 Event - Fargo
 
Data Alchemy: Turn your Data into Gold
Data Alchemy: Turn your Data into GoldData Alchemy: Turn your Data into Gold
Data Alchemy: Turn your Data into Gold
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
unit 5.ppt
unit 5.pptunit 5.ppt
unit 5.ppt
 
15 bufferand records
15 bufferand records15 bufferand records
15 bufferand records
 
Triangle Visibility buffer
Triangle Visibility bufferTriangle Visibility buffer
Triangle Visibility buffer
 
No more dumb hex!
No more dumb hex!No more dumb hex!
No more dumb hex!
 
Fast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ INGFast track to getting started with DSE Max @ ING
Fast track to getting started with DSE Max @ ING
 
DynamoDB as a Secondary Language - Pop-up Loft Tel Aviv
DynamoDB as a Secondary Language - Pop-up Loft Tel AvivDynamoDB as a Secondary Language - Pop-up Loft Tel Aviv
DynamoDB as a Secondary Language - Pop-up Loft Tel Aviv
 

Último

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Último (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Linked Library Data in the wild

  • 2. Technical Lead for Prism Phil John Introductions...
  • 3. So, what’s Prism then? Introductions...
  • 4.
  • 5.
  • 6.
  • 7. a next generation discovery interface Prism Introductions
  • 8. (yes…even configuration settings) Built entirely on Linked Data Prism
  • 9. Discovery of library catalogue resources Prism but grander plans afoot...
  • 10.
  • 13. rare items/special collections
  • 14.
  • 15. MARC 21 RDF Performs data conversion Prism
  • 16. this ensures it keeps in sync with the LMS Initial “bulk” conversion then periodic “delta” files Prism
  • 17. provided by a suite of RESTful web services Borrower/Availability data pulled from LMS “live” Prism
  • 18. just add .rss to collectionsor .rdf/.nt/.ttl/.json to items Linked Data API Prism
  • 19.
  • 20.
  • 21.
  • 23. Extracting data from MARC 21 The Challenges
  • 24. Some quotes... Extracting Data from MARC 21 ...cataloguers may want to look away now
  • 25.
  • 26. ...and even if it does, there are millions of existing records that we’ll want to convert MARC 21 is not going away anytime soon... Extracting Data from MARC 21
  • 27.
  • 28. How are we approaching it? Extracting Data from MARC 21
  • 29. By tackling it in small chunks! Extracting Data from MARC 21
  • 30.
  • 31. compartmentalises code for different sections
  • 34.
  • 35. fires events when it encounters a MARC 21 data structure; very strict with syntax MARC 21 Parser Extracting Data from MARC 21
  • 36. listens for MARC 21 data structures and hands control over to one or more handlers Event Observer Extracting Data from MARC 21
  • 37. know how to convert MARC 21structures and fields into linked data Bibliographic Handlers Extracting Data from MARC 21
  • 38. So, where are we up to? Extracting Data from MARC 21
  • 39. we tackled this one first as it allows us to reason more fully about the record Format (and duration) Extracting Data from MARC 21
  • 40. In theory quite easy... Format
  • 41.
  • 42. DVD and LaserDisc share(d) a code
  • 43. LC slow(ish) to support new formats in M21
  • 44. limited use of control field (007) codings...
  • 45.
  • 47. an important part of the recordto model, or so I’ve been told Title Extracting Data from MARC 21
  • 48.
  • 49. ‡c must be last subfield in a 245...
  • 50. ...so sometimes data from ‡n / ‡p is in ‡c instead...
  • 51.
  • 52. Now with more title
  • 53. sounds easy...acronyms from EAN to UPC describing 13 digit codes...right? Identifier Extracting Data from MARC 21
  • 54. what are all those other things doing in the ‡a? ...STOP! Identifier
  • 55. Identifier “For a hardbound resource, there is no attempt to use a consistent term other than to use one that conveys the condition intelligibly.” Library of Congress Rule Interpretation 1.8
  • 56.
  • 57. (and then validate whatever’s left) So we need to parse them out Identifier
  • 58. LDR: 01425ngm a22005058 4504 001: 750785 003: xxxxxxx 005: 20090824164118.0 007: vd||s|||| 008: 080623s2007 enk||| e v|eng d 020: , | $c Retail (S24.99) | 024: 3, | $a 7321900108089 | 028: 4, 0 | $a BDY10808 | $b Warner Home Video | 029: , | $a 7321900108089 | 082: , | $a 812 245: 0, 0 | $a Goodfellas | $h [videorecording] / | $c directed by Martin Scorsese ; music by Christopher Brooks 260: , | $b Warner Home Video, | $c 2007. | 300: , | $a 1 Blu-Ray (139 min.) : | $b col. | 306: , | $a 021900 | 366: , | $b 20070611 | 511: , | $a Starring Robert De Niro, Ray Liotta and Joe Pesci 521: 8, | $a BBFC code: 18. | 538: , | $a Blu-Ray. | 700: 1, | $a Scorsese, Martin | 700: 1, | $a Brooks, Christopher | 852: , | $b John Harvard | $c BLU-RAY DISC | $m 18 | $z , $z Blu Ray Disc. 18Cert Phew, this one’s easy, no (pbk), (hbk) or even (pbk. , alk. paper) to contend with
  • 59. Now we can start performing lookups against other sources!
  • 60. hardest of the lot... Author Extracting Data from MARC 21
  • 61.
  • 62. Rowling, J.K. vs Rowling, Joanne K.
  • 63. Few records with relator term in 100/700 ‡e...
  • 64. ...so we have to parse that from the 245 ‡c...
  • 65.
  • 66. we’ve licensed the names/subjects authority files, and created RDF from them Library of Congress to the rescue! Author
  • 67. LDR: 01425ngm a22005058 4504 001: 750785 003: xxxxxxx 005: 20090824164118.0 007: vd||s|||| 008: 080623s2007 enk||| e v|eng d 020: , | $c Retail (S24.99) | 024: 3, | $a 7321900108089 | 028: 4, 0 | $a BDY10808 | $b Warner Home Video | 029: , | $a 7321900108089 | 082: , | $a 812 245: 0, 0 | $a Goodfellas | $h [videorecording] / | $c directed by Martin Scorsese ; music by Christopher Brooks 260: , | $b Warner Home Video, | $c 2007. | 300: , | $a 1 Blu-Ray (139 min.) : | $b col. | 306: , | $a 021900 | 366: , | $b 20070611 | 511: , | $a Starring Robert De Niro, Ray Liotta and Joe Pesci 521: 8, | $a BBFC code: 18. | 538: , | $a Blu-Ray. | 700: 1, | $a Scorsese, Martin | 700: 1, | $a Brooks, Christopher | $e music 852: , | $b John Harvard | $c BLU-RAY DISC | $m 18 | $z , $z Blu Ray Disc. 18Cert A contrived example (sorry!) with and without relator terms
  • 68. Hope you can all read this at the back!
  • 69. A closer look at Authority Matching Author
  • 70.
  • 71. ...(able to process 2M records in several hours)
  • 73.
  • 74. You can tell J.K. Rowling is successful, she’s been translated lots
  • 75. Language/Alternate Graphical Representation Extracting Data from MARC 21
  • 76.
  • 77. both forms can be searched for
  • 78.
  • 79. tagged with an ISO-639-2 language and masquerading as the field listed in ‡6 Passes 880s back into Observer Language
  • 81.
  • 82.
  • 83.
  • 84. it’s part of the reason we use Linked Data...but it’s got some challenges at the moment Using/Linking to External Datasets The Challenges
  • 85.
  • 86. ...or worse, is taken offline permanently?
  • 87. can we trust this data?
  • 88.
  • 89. ...or, if that’s not practical, proxy requests using a caching proxy such as Squid
  • 90. if using Wikipedia and worried about vandalism...
  • 91.
  • 92. ...or – what we’d like to seehappen to Linked Library Data The Future...
  • 93. especially on the peripheries – authority data, author information, links to other resources More library data as LOD The Future
  • 94. seriously – this would makeour lives so much simpler LMS vendors adopting LOD The Future
  • 95. LOD replacing MARC 21 as the standard representation of bibliographic records The Future
  • 96.
  • 97. Photo Credits Slide 15 - http://www.flickr.com/photos/gammaman/5241860326/ Slide 21 - http://www.flickr.com/photos/agizienski/3778965891/ Slide 40 - http://www.flickr.com/photos/54409200@N04/5070012761/ Slide 42 - http://www.flickr.com/photos/proimos/4199675334/ Slide 48 - http://www.flickr.com/photos/maveric2003/91198458/ Slide 63 - http://richard.cyganiak.de/2007/10/lod/ Slide 67 - http://www.flickr.com/photos/markchapmanphoto/5139429152/ Slide 72 - http://www.flickr.com/photos/-bast-/349497988/