1. Doing Digital Research
@ British Library
An intro to the Digital Research Team
Pre-1600 Doctoral Open Day 2017
2. www.bl.uk 2
Defining Digital Research
Using computational methods
either to answer existing research
questions or to challenge existing
theoretical paradigms…. Geotagging
Data Visualisation
Data Mining
Georeferencing
Digital Mapping
Crowdsourcing
Text mining
Collaboration
3. www.bl.uk 3
The Digital Research Team is a cross-
disciplinary mix of curators, researchers,
librarians and programmers supporting
the creation and innovative use of British
Library's digital collections.
http://bl.uk/digital
@BL_DigiSchol
Meet the Digital Scholarship Team
4. www.bl.uk 4
The Digital Research Team
We support researchers in the innovative
use of British Library's digital collections
and data by:
• Offering digital research training and
guidance
• Supporting collaborative projects
• Running events, competitions, and awards
• Behind the scenes work to get content
digitised and online
8. www.bl.uk 8
Unique Digital Projects
International Dunhuang Project (IDP) http://idp.bl.uk/
A ground-breaking international collaboration to make information and
images of all manuscripts, paintings, textiles and artefacts from Dunhuang
and archaeological sites of the Eastern Silk Road freely available on the
Internet.
Endangered Archives Programme (EAP) http://eap.bl.uk/
Preserves at-risk archives in danger of destruction, neglect or physical
deterioration world-wide. The archival material relates to a pre-modern period
of a society's history, typically any period before industrialisation, digital
collections include newspapers, periodicals, audio and audio-visual material,
photographs and rare printed books.
Hebrew Manuscripts Project 3000 digitised manuscripts spanning 1000
years. Digital Curator, Adi Keinan-Schoonbaert, explored 3D modelling,
annotations, data visualisations, image processing, spatial representations
11. www.bl.uk 11
Big Data History of Music
How can vast amounts of bibliographic data held by research libraries
be unlocked for music researchers to analyse?
Can this data be interrogated in ways that challenge the traditional
narratives of music history?
Analyses and
visualisations exposed
previously uncharted
patterns in the history of
music, for instance the
rise and fall of music
printing in 16th- and 17th-
century Europe (huge
dips in output in Venice
were down to plague and
war).
14. www.bl.uk 14
“I was able to do in minutes with a
python code what I’d spent the last
ten years trying to do by hand!”
Dr. Katrina Navickas, BL Labs
Winner 2015
Political Meetings Mapper
15. www.bl.uk 15
Combining Text Analysis and Geographic
Information Systems to investigate the
representation of disease in nineteenth-
century newspapers
Goal: analyse the geographies in large corpora while remaining sensitive to
the subtleties and nuances within the texts (over 377 million words from the
London based newspaper The Era, 1838–1900)
Spatial Humanities: Texts, GIS, Places at Lancaster University, with Paul
Atkinson (historian), Ian Gregory (digital humanities), Andrew Hardie
(linguistics), Daniel Kershaw (computer science), Amelia Joulain-Jay
(linguistics), Catherine Porter (geography) and Paul Rayson (computer
science).
16. www.bl.uk 16
Digital/computational techniques
Combining techniques from Geographical Information Systems (GIS) and
corpus linguistics to create a set of techniques they call Geographical Text
Analysis (GTA).
GIS is effectively a mapping and database technology that is typically
used with quantitative sources.
Corpus linguistics is concerned with analysing large textual collections
using a combination of quantitative and qualitative approaches.
Collocation effectively asks what words are found near to a search-term,
allowing us to understand what themes are associated with other themes.
Geoparsing allows us to identify place-names in the text and allocate them
with coordinates.
17. www.bl.uk 17
Virtual Mappa used the DM image
annotation software. Simple markup
tools applicable to almost any visual-
textual document.
- Cotton Tiberius B V, f.56v
- Royal 14 C, f.1v-2r
- Harley 3667, f.8v
- Add 28681, f.9r
Maps transcribed and translated,
through annotations linked to roll-over
markers on map images. Full text
searchable. Multiple manuscripts from
different repositories in one view.
Virtual Mappa Project: Online, Annotated
Medieval Mappaemundi
18. www.bl.uk 18
Pelagios: Enabling Linked Ancient Geodata
Collaborative project to transcribe & map Classical and
Medieval placenames from digital texts and manuscript
images. Allows for visualisation of place and space in historical
documents.
BL contributed >350 digitised
images of medieval materials,
plus one digital paleographer!
19. www.bl.uk 19
Find out more
• Humanist mailing list http://dhhumanist.org/
• IHR Digital History Seminar http://ihrdighist.blogs.sas.ac.uk/
• Digital Classicist mailing list, events
http://www.digitalclassicist.org/
• BL Labs Awards
http://labs.bl.uk/British+Library+Labs+Awards
21. www.bl.uk 21
Got a question? Get in touch!
www.bl.uk/digital
digitalresearch@bl.uk
@BL_DigiSchol
Notas do Editor
Set up in 2010 the team was formed as a way of dedicating focus on the changing research landscape in the digital realm. Now embedded in collection areas, and as you’ll see later, joining the library explicitly as part of major digitisation projects.
Main activities:
Getting content in digital form and online
Collaborations, Competitions & Awards
Digital research support and guidance
As part of its work to open its data to wider use, the British Library is making copies of some of its datasets available for research and creative purposes.
We aim to describe collections in terms of their data format (images, full text, metadata, etc.), licences, temporal and geographic scope, originating purpose (e.g. specific digitisation projects or exhibitions) and collection, and related subjects or themes.
This site is a 'beta', and is in the early stages of development. If you have questions or feedback about this site or our open data work, please email digitalresearch@bl.uk.
We'd also love to hear what you've done or made with the data.
If you are interested in getting collection level metadata for your digital research purposes contact us!
Research Question:
Brought together for the first time the world's biggest datasets about published sheet music, music manuscripts and classical concerts (in excess of 5 million records) for statistical analysis, manipulation and visualisation. Aim was to unlock musical-bibliographical data held by libraries in order to create new research opportunities. The project cleaned and enhanced aspects of the British Library catalogues of printed and manuscript music, which are now available as open data from www.bl.uk/bibliographic/download.html and piloted big data research techniques on these and five other datasets.
Source Collections:
Data from seven existing databases and catalogues were used as the basis of this project: the British Library's catalogues of printed and manuscript music; the bibliographies created by Répertoire International des Sources Musicales (RISM) that list European music printed 1500-1800 and music manuscripts in European libraries; and the RISM UK Music Manuscripts Database and the Concert Programmes Project database.
Digital/Computational Techniques:
Data wrangling using Open Refine and MARCedit. Data visualisation using: Google Fusion Tables and PalladioProject slides: http://www.slideshare.net/historyspot/ihr-big-data-history-of-music-9-june15
Outcome: Analyses and visualisations of these datasets exposed previously uncharted patterns in the history of music, for instance involving the rise and fall of music printing in 16th- and 17th-century Europe (huge dips in output in Venice were down to plague and war!), or the rise of nationalist colourings in music of the late 18th and early 19th centuries. The detection of these long-term trends permits new ways of linking music history to wider histories of culture, economics, society and politics
https://www.youtube.com/watch?v=tp4y-_VoXdA
The images of the relevant pages of the Northern Star were run through an Optical Character Recognition program (Abbyy Finereader 12) and the resulting text was checked manually.
We developed a set of Python codes to extract and geo-code the place of meeting, using a gazetteer of places, and parse the date of the meeting.
The project has been developing Geographical Information Systems (GIS) and corpus linguistics techniques and applying them to studies concerned with Lake District literature and nineteenth century social history.