Slides from seminar on Digital Cultural Heritage given to UCL Institute of Sustainable Heritage's two programmes: the MSc Sustainable Heritage and the MRes Science and Engineering in Arts, Heritage and Archaeology.
2. www.bl.uk 2
The Digital Scholarship Team
A cross-disciplinary mix of curators,
digital history and humanities scholars,
librarians, data & computer scientists.
Founded in 2010, we support the
innovative use of British Library's digital
collections and data through:
• Getting content in digital form and
online
• Offering digital research support and
guidance
• Supporting collaborative projects
• Running events, competitions, and
awards
3. www.bl.uk 3
The British Library is the
national library of the UK
and by many counts one
of the largest research
libraries in the world.
By law (Legal Deposit) a
copy of every UK and
Ireland print publication
must be given to the
British Library by its
publishers. In 2013 this
extended to digital.
4. www.bl.uk 4
Well over 150 Million items
are currently stored in
London and in York.
The building in St Pancras
can sit 1,200 researchers
at any one time across 11
reading rooms.
If you saw 5 items a day it
would take you 80,000
years to see the whole
collection.
16. www.bl.uk 16
Printers “fill the world with
pamphlets and books that
are foolish, ignorant,
malignant, libelous, mad,
impious and subversive;
and such is the flood that
even things that might
have done some good lose
all their goodness,”
….wrote Erasmus, 15th
Century
19. www.bl.uk 19
Not as simple as point and click…
Activity/Overhead Description
Technological Infrastructure Includes equipment (scanners, computers), software and suitable
space for Digitisation
Selection Choosing material to be digitised. This includes rights clearance.
Description Cataloguing, description, indexing and the creation of management
information
Conservation Care, handling, packaging, transport and conservation of the material.
Preparation Making objects and books ready to be digitised: for example unbinding
Conversion to master Digital formats Scanning, digital photography or audio and video encoding
Production of intermediates For example, access copies.
Quality Management Error checking and correction
Storage/maintenance Storage and management of digital assets for use and preservation. Format migration.
http://nickpoole.org.uk/wp-content/uploads/2011/12/digiti_report.pdf
20. www.bl.uk 20
Why we digitise, how we prioritise
• Is it unique to the British Library collection and does it have a particular
relevance to UK cultural heritage?
• What additional user benefits and outcomes might digitisation bring the
object (accessibility & discoverability, reconstruction, reunification, creative re-
use, scientific enquiry, authentication)
• Could it help preserve the original object? Or document the original
object before going out on loan, or to be sold overseas?
• Does someone with a lot of £ want to digitise it? Strategic Partnerships
can support huge collections getting online, they also bring commercial
opportunities, revenue generation, technical expertise.
21. www.bl.uk 21
What are ways in which we
might make a digital
collection truly
“accessible”?
22. www.bl.uk 22
• Open Rights
• Open platforms
• Cataloguing & description
• APIs to share
• User Experience Testing
• Full text (Optical Character Recognition, Transcription, Named
entity recognition)
• Multiple language support (Translations)
• Exhibitions & Interpretations
• Search engine optimisation (Schema.org)
23. www.bl.uk 23
• A data set represents a distinct collection of data ideally packaged,
preserved and made accessible for enquiry.
• Humanities data might be sets of bibliographic information, images,
image processing details, texts, texts with mark-up and annotations
etc.
• No one knows our collections data better. This expertise is essential
in digital scholarship.
• We can play a part by creating reliable datasets for reuse by
researchers-also allowing us to respond more quickly to frequent
similar requests.
• Easy access to data and datasets which researchers can trust
enables new research
Accessibility=Data
24. www.bl.uk 24
• The Library has spent the last two decades creating digital assets
through digitisation and preserving born-digital objects and will do
far into the future.
• We can now do much more than use technology to simply discover
these digital objects and must embrace the opportunities afforded
by analysing these digital collections at scale.
• If scholars view our archives as an infinite pool of multiple layers of
loosely held data from which new research questions can be wrung
then so must we.
• The Digital Research Team and BL Labs aim to provide services
beyond simple resource discovery then, that is, beyond helping to
point a single user to a single items or objects via a catalogue
The Digital Research View
25. www.bl.uk 25
The emergence of the new digital humanities isn’t an isolated
academic phenomenon. The institutional and disciplinary changes
are part of a larger cultural shift, inside and outside the
academy, a rapid cycle of emergence and convergence in
technology and culture
Steven E Jones, Emergence of the Digital Humanities (2014)
http://lisacharlotterost.github.io/2015/06/20/Searching-through-the-years/
27. www.bl.uk 27
Political Meetings Mapper
Dr. Katrina Navickas, a self-professed
luddite, wanted to know how many, and
where, Chartist movement meetings took
place in the 19th Century and if there was a
more efficient way to extract this information
programmatically from our digitised
newspapers, rather than by hand.
5,519 meetings held from 1838 to 1850
discovered in 462 towns and villages across
the UK!
Will be added to her existing findings:
http://protesthistory.org.uk/the-story-1789-
1848/database-of-meetings
“I was able to do in minutes with a python code what
I’d spent the last ten years trying to do by hand!”
-Dr. Katrina Navickas, BL Labs Winner 2015
32. www.bl.uk 32
‘Early users of medieval books
of hours and prayer books left
signs of their reading in the form
of fingerprints in the margins.
The darkness of their
fingerprints correlates to
the intensity of their use
and handling. A
densitometer -- a machine that
measures the darkness of a
reflecting surface -- can reveal
which texts a reader favored.’
Kathryn M. Rudy, ‘Dirty Books:
Quantifying Patterns of Use in Medieval
Manuscripts Using a Densitometer’,
Journal of Historians of Nederlandish Art
(2010)
34. www.bl.uk 34
Where do we find the £? Some case
studies…
• Public/Private Partnerships: (Microsoft) 19th Century Books Online
• Charitable Funds: (Arcadia) Endangered Archives Programme
• Internal budgets: (though mostly just for stabilisation/preservation)
Libcrowds
• Grants: (Newton/AHRC) Two Centuries of Indian Print
• Government (and gamblers): (Heritage Lottery Fund) Unlocking our
Sound Heritage
• Individual donations: Support Us
37. www.bl.uk 37
Endangered Archives Programme
Through an annual competition, EAP grants
provide funding to preserve social and
cultural archival material that is in danger of
destruction, neglect or physical deterioration
world-wide.
To date, the EAP has awarded 290 grants in
80 countries, preserving cultural and social
archives across Africa, Asia, Europe,
Americas and Oceania.
EAP266: History of Bolama, the first capital of Portuguese
Guinea (1879-1941), as reflected in the Guinean National
Historical Archives
http://sounds.bl.uk/World-and-
traditional-music/Syliphone-record-
label-collection
40. www.bl.uk 40
Big Data History of Music
How can vast amounts of bibliographic data held by research libraries be unlocked for
music researchers to analyse?
Can this data be interrogated in ways that challenge the traditional narratives of music
history?
Analyses and visualisations
exposed previously
uncharted patterns in the
history of music, for instance
the rise and fall of music
printing in 16th- and 17th-
century Europe (huge dips in
output in Venice were down
to plague and war).
https://www.royalholloway.ac
.uk/music/research/abigdata
historyofmusic/home.aspx
41. www.bl.uk 41
Pilot will see over 4,000 items between
1713 to 1914, mostly Bengali to be
digitised and catalogued
http://www.bl.uk/press-
releases/2015/november/unlocking-indias-
printed-heritage
Dedicated Digital Curator supporting
computationally driven research, such as
text mining, with outputs, through creating
and curating datasets for inclusion on
data.bl.uk and providing digital skills
training.
Two Centuries of Indian Print
Right: Pleasing tales designed to improve the understanding, and
direct the conduct of young persons, 1825
43. www.bl.uk 43
The “West and the rest”
Buttressed by the rise of data science, faculty
across humanities fields have harnessed
search algorithms and optical character
recognition (OCR) to conduct research on an
unprecedented scale. Petabytes, not pages,
are now the unit of analysis. Yet the majority of
these tools only handle Latin script.
“Digital databases and text corpora – the ‘raw
material’ of text mining and computational text
analysis – are far more abundant for English
and other Latin alphabetic scripts than they are
for Chinese, Japanese, Korean, Sanskrit, Hindi,
Arabic and other non-Latin orthographies,”
Mullaney said. Troves of unread primary
sources lie dormant because no text mining
technology exists to parse them…..”
http://news.stanford.edu/thedish/2016/10/17/digital-humanities-scholars-
receive-mellon-support/
45. www.bl.uk 45
Food for thought
• What are some ways biases might manifest themselves in
digitised archives? Think about accessibility, formats,
collection decisions, languages…
• How might technology & digitisation impact on our
understanding of history?
• How might digital technology help or hinder bias in the
archive?
• What active things can we as professionals do to
mitigate/reverse bias in the archive?
46. www.bl.uk 46
A note on skills
The Digital Scholarship Training
Programme is an internal staff
training initiative by the Digital Curator
team that launched in November
2012.
Helps us to situate our collections and
expertise in the realm of digital
research. Explore opportunities and
challenges.
LibraryCarpentry &
Programming Historian are great
places to start!
47. www.bl.uk 47
Internal Staff Courses
• 101 This is Digital Scholarship
• 103 Digitisation at British Library
• 105 Crowdsourcing in Libraries, Museums and Cultural Heritage Institutions
• 107 Data Visualisation for Analysis in Scholarly Research
• 108 Geocoding Historical Information and Digital Mapping
• 109 Data on the Web: Mash-ups, API’s and The Semantic Web
• 118 Cleaning up Data
Our Hack & Yacks
• Handwritten Text Recognition with Transkribus
• From Paper Maps to the Web: A DIY Digital Maps Primer
• Literary & Historical Network Analysis using Gephi
• Interactive writing platforms: Twine and Inklewriter