2. BEGINNERS DIGITAL HUMANITIES/SUBJECT
LIBRARIAN BOOT CAMP
What is Digital Humanities? Hack (building scholarly digital
editions, projects) vs. Yack (theory)
Two areas within the Hack part of DH: textual encoding with TEI (Textual
Encoding Initiative) XML, and textual mining
We all use digital tools now: what differentiates something as uniquely
digital humanities (vs. ‗traditional‘) scholarship? Digital humanities
scholarship will leverage the digital medium, i.e., create something that
could not be duplicated in analog formats; or if it could be reproduced
in analog with no loss, it‘s not DH
The research team—of scholars, programmers, librarians—is characteristic
(and probably necessary) of DH, but new to the humanities, which had a
tradition (if not completely accurate) of the ―lone wolf‖ scholar
Pointers toward resources in getting started in DH will be on the
THATCamp STL site next week!
Session leaders: Chris Freeland and Andrew
Rouner
3. POTENTIAL LITERATURE
We looked at markov chain random text generation.
Playing around with a "rhymer" script led to a discussion of lexical
resources for text generation.
We looked at a version of the "dada engine", which generates texts by
applying a vocabulary to a grammer.
We briefly surveyed John Cage's "mesostics".
All of this led to a discussion of quantitative measures for literary creativity.
Resources used on the session are available
at http://ada.artsci.wustl.edu/dada/
Session leader: Stephen Pentecost
4. BUILDING A SEMI-AUTOMATIC GEOCODING PROGRAM
FOR TEXT DOCUMENTS
Andrew introduced concept of geocoding place references in text documents.
Aaron said technology is out there to do this.
Jeff demonstrated Viewshare, Library of Congress open source software, which he used
for mapping important place references in oral histories compiled for the Missouri
State Historical Society.
Aaron described how Clavin is based on a Gazetteer, that enables you use access
coordinates for real place names.
The problem is that it won‘t recognize historical places that no longer exist. It also will not
function at the fine grain of street addresses. There was discussion about the need to
create a gazetteer for St. Louis to incorporate lost landscapes and street addresses.
Brian demonstrated Open Calais, a name recognition software, and explained how he used
it to map locations in St. Louis Beacon articles through the Google API.
Anupam asked about different types of geographical data output, other than web-based
displays.
The session wrapped-up with some playing around with the Clavin demo to make it usable.
Session leader: Andrew Hurley
5. SLU CENTER FOR DIGITAL HUMANITIES PT. 1
SLU CDH Origin story
accidental opportunities come from casting a wide net
pursue impossible ideas and you might make connections that make it possible
Collaborative spirit leaves the door open
Linked Open Data ideas can support interoperability and future collaboration, communication, or data reuse even if it is not
exposed to the world
WashU Libraries
sharing experiences and seeking solutions for DLXS lack of support
working with technologies like Fedora, Hydra
the power of finding user groups and library communities
WashU
Unique struggles with 20th, 21st century texts
publishing incomplete biographical text is a DH project that can best exist as an interactive digital object
financing, copyright, access control can interrupt standards and interoperability
even requests are not standardized and change from institution to institution and country to country
Finding tools that support standards or that help mediate the IPR by remotely fetching images or supporting remote annotation
so access can be used with violating rights can help
Session leader: Patrick Cuba
6. SLU CENTER FOR THE DIGITAL HUMANITIES PT.
2
Mizzou
Faces challenges for Digital Humanities support where sciences are prominent and geography is isolating
Digital Humanities may allow for more distant collaboration where interests overlap
Sometimes the DH projects need to precede institutional support until a critical mass of interest exists on campus
Webster
Film project for annotated documentaries or user-guided stories
Tradamus (SLU-CDH) took from others to find standards and directions
Sharing obstacles with peers can aid in the discovery of tangent tools which nearly meet challenges as a starting point for new projects
Visualization tools for moving through a graph may assist in composition or user interface
LittleBigPlanet game allows users to move around well defined visual components to create and experience and the community
reshares compositions (crowd-sourced documentary possibilities)
Eastern Illinois
Past Tracker and Localities projects are great resources which would benefit from update
challenges include rotating grad position in charge of working on project, lack of time at institution, and decentralized resources for
working with DH projects
Contact with other institutions revealed on-campus resources that may be available
When creating this as a DH project, tracking the history of the project itself may be of interest, both popular interest and as an aid to
future scholars
Session leader: Patrick Cuba
7. BLURRING THE BOUNDARIES BETWEEN
SCHOLARSHIP
1. Open-source tools in Digi Hum: calls on the public to do creative work with material
An example: http://t-pen.org/TPEN/
an example: http://rapgenius.com
2. Crowd-sourcing & social media
3. Community, broader impacts in digi-hum projects/products/methods
4. How to convince students? How to incorporate into class construction?
5. How do faculty involve students and still maintain the project quality integrity of the
original product goals (this is true outside of the student context too--at
community level)? Faculty-student collaboration? Faculty-student
guidance/direction? Both?
6. The "subject" as another type of community?
7. Academic/faculty/scholar collaboration
8. Futures? Communities for scholarly peer review in DigiHum, simultaneous, longdistance scholar input (using Wikipedia as an example of the beginnings of this)
Session leader: Kristine Hildebrandt
8. STL LAMS
Going forward, the TECHO (Technology Exchange Humanities Cultural
Organizations) group should:
continue meeting with a focus on projects; making it a “sharing group,”
participants will lose interest; projects require commitments
identify a better platform for collaboration than Google Groups, and at the same
time should have a public-facing resource, so interested parties can contact the
group to join in (possibly WordPress)
build its network and collaborators
begin planning ongoing, informal training on relevant platforms and standards
Session leader: Andrew Rouner
9. XML, OAC, RDF, JSON-LD AND THE KING STOOD: THE
UNIVERSE IS METADATA:
TEI is a great schema for description and interoperability, but XML limits in too many ways
overlapping ranges are not allowed when annotating
XML document does not resemble simulated original
metadata in headers and in-line tags are artificially different
massive XML documents must be parsed and processed for relevant or wanted information
RDF sought to fix some of the problems, but RDF-XML still stumbles
OAC (openannotation.org) removes the description, conversation, and linking from the original
digital object
solves all the listed problems of XML, leaves some common issues of vocab, convention, and data fragility
allows for TEI or DC or any vocabulary to be used in description
creates an independent digital object that can be stored, queried, or resolved from any location
complex chains of annotations and selectors can describe a resource so well that even if an original image
or text becomes unavailable, the annotations can still recreate meaning
OAC abandons the idea that annotations should be easily human readable in favor of machine
navigatable triples that can be passed easily between and within digital applications
Thinking in oa:Annotations instead of XML allows for new possibilities
SharedCanvas (shared-canvas.org) extends OAC and creates a sc:Canvas object for reference which
has no content and is only annotated
Tradamus (SLU-CDH project) creates digital editions whose text is only
Session leader: Patrick Cuba
10. QGIS
Introduced QGIS and the history of the project
Discussed types of GIS possible with the software
Demonstrated how to search for data and add simple data to a QGIS project
Outlined various ways QGIS was similardifferent to ArcGIS
Session leader: Aaron Addison
11. DIGITAL PEDAGOGY
Even in instructional settings where teaching DH is not the primary goal, DH or simply
technology-assisted projects (as basic as creating sites, blogging, tweeting) can
encourage student to interact, take ownership of content, teach peers, & learn important
lessons about source documentation & context
Ongoing projects in particular are great for incorporating new/young/uneducated
students, giving them built-in peer teaching, engagement, bigger sense of purpose, &
responsibility to ―real‖ audience outside classroom (examples from participants:
http://widewideworlddigitaledition.siue.edu/
http://talus.artsci.wustl.edu/spenserArchivePrototype/)
Combining content/theory & making/DH in one course is challenging: many
approaches, incl. one hands-on session & one lecture each week, an additional lab
option, periodic technical bootcamps throughout semester, or a DH-customized lab track
of a larger survey course – none of them perfect, all requiring institutional support!
DH playing field is absolutely not level: digital divide an issue in different institutional
contexts, and not all languages can claim the evel of successful digitization that English
literature enjoys – so how can those of us who teach and/or study foreign languages
expand the definition of DH to include basic digitization & translation projects that will be
useful to them? Should we recenter DH to address socioeconomic & linguistic
difference, especially if these are topics we encounter regularly in our classrooms?
(possible example of richly multilingual project:
http://library.princeton.edu/projects/bluemountain/)
Session leader: Wendy Love Anderson
12. INTEGRATING NEW TECHNOLOGIES INTO FIRST
GENERATION DIGITIZATION PROJECTS
Problem of intellectual stewardship: who is custodian of an archive?
Should you share files, cede ownership?
How do you ensure usability in the future? Front-end vs. back-end?
Uniformity of standards: metadata should talk across platforms, archives.
"We all want our stuff to work with other people‘s stuff to have better
scholarship
is the underlying issue that we should be agitating to change the rules?"
Session leader: Malgorzata RymszaPawlowska
13. SPATIAL HUMANITIES
The discussion revolved around ways in which digital spatial tools have or might in the
future enhance scholarship. The early part of the discussion focused on GIS
mapping. There was also some discussion about 3D digital environments toward
the end of the session.
Campers identified several types of research that lend themselves to electronic spatial
analysis:
Research involving data produced by crowd sourcing.
Research involving massive amounts of data.
Research about the diffusion processes.
Research attempting to flesh out the physical dimensions of a place.
Research about material objects and architectural elements that can be reconstructed in
3D
Limitations of employing spatial digital tools included:
Temporal analysis is difficult to display through maps.
Data collection and input along with the building of 3D environments is resource
intensive and there is the danger that such enterprises will be monopolized by
corporate behemoths like G*****.
The discussion ended on the subject of the portability of geographical data and issues
of access.
Session leader: Andrew Hurley
14. UNSTRUCTURED DATA
•
Types of NoSQL db‘s – other Big Data technologies
•
Application and use cases in Humanities
•
Crowdsourcing data
•
Word spotting
•
Data mining of archives
•
Need to be sure we are asking the right questions
•
Importance of metadata for all processes
Session leader: Aaron Addison
15. WORDPRESS
WordPress can be used as a full content management system. It's not just a
blogging platform.
Some example WordPress sites:
http://taylorfamilyinstitute.wustl.edu
http://mallinckrodt-academy.org/
http://historyofmedicine.wustl.edu/
The Advanced Custom Fields plugin makes it easy to enter and display data for
site-specific types of content.
For developers, WordPress strikes a good balance between flexibility and ease
of use.
WordPress is very popular. As free, open source software, it has a low barrier to
entry. Its huge installed base makes it easy to find hosting, technical
support, themes, and plugins.
The easiest way to get started with WordPress is to sign up for an account at
wordpress.com.
Session leader: Brian Marston
16. TIME SERIES
The session on Databases Before Digital drew a small group for a
discussion that spent some time on questions of how to improve
methods of working with tabular textual material that OCR often doesn't
handle well, but also included shared curiosity on the history of how
people have historically organized data and bureaucracies. There was
some overlap with earlier discussions of 19th-century St. Louis city
directories and what might be done with them in the form of a structured
digital historical resource. The session ended early to enable
participants to attend other sessions of interest at the same time.
The session on Time Series delved into questions of modeling and
visualization, and became a fascinating speculative conversation. We
discussed how to represent spans of time and how to deal with fuzzy
and unknown data. Simile timeline tools
Session leader: Doug Knox
17. ATTRIBUTION AND COLLABORATION
Facing the challenges of attribution and credit in a digital world
Traditional publishing offers monolithic intellectual objects marked with citation conventions
Digital objects record micro-contributions and allows for chaining of annotations
precise citation and criticism becomes possible
crowd-sourced or collaborative work can be assembled by groups, rather than simply mass contributed and then
munged into cohesion by a single editing entity
if an editorial decision is discredited, it becomes easier to find dependent opinions and revise them
It introduces many scenarios we cannot resolve
How do we discriminate between users who contribute different types of work?
datasets
sparse, but critical editorial choices
advanced transcription and collation
helpful visualizations
proof-reading and corrective changes
linking, citation, and supportive annotation
How do we balance quality over quantity?
an RA may have created 95% of the annotations (editorial acts), but the PI may 'own' the critical, controversial, or significant 5%
The act of reviewing and accepting an annotation doesn't necessarily change the credit of the
contributor, but establishes some editorial hegemony
Different institutions attach very different values to work like data
collection, cataloging, transcription, collation, key-finding, inter-linking, etc.
Session leader: Patrick Cuba