IAC 2024 - IA Fast Track to Search Focused AI Solutions
UVA MDST 3703 Marking-Up a Text 2012-09-13
1. Studio 3: Marking Up a Text
Prof. Alvarado
MDST 3703/7703
13 September 2012
2. Business
• Can everyone access their Home
Directory from their desktop?
– i.e. not just from the web interface
• The required McCarty reading was
“„Knowing true things by what their
mockeries be‟: Modelling in the
Humanities”
– My reference to “thick” vs. “deep” in lecture
was from the other reading
– Sorry for the confusion!
3. Review
• URLs to your Home Directory pages
look like this:
– http://people.virginia.edu/~NETID/RESOURC
E
• Where …
NETID = Your UVA Net ID (e.g. rca2t) and
RESOURCE: The filename and the path to the
file
• E.g. index.html OR mydirectory/file.html
• Identical to what is under public_html
5. Review
• Documents, Texts, and Levels
• Data models: Networks, Trees, Tables
• Latent Hypertext and Intertext
6. Documents, Texts, and Levels
• Documents and texts are different
– Related as medium is to message
– The message (text) is independent but must
always exist as part of a medium
• Documents are things like
books, memos, etc.
– They have a material form and a basic content
model
• Texts are more complicated
– They are linguistic and therefore related
grammar, poetics, and meaning
8. The Theory of Levels
We can think of the various aspects of
documents and text as forming levels
DOCUMENT
Layout and Style Physical interface
Structure Physical and logic structure
Content Text, Image, etc.
TEXT
Syntagm Strings of characters and patterns
Structure grammar, pragmatics, etc.
Meaning Intertextual connections in the mind
10. Three Model Types
• We have seen three major models so far
– Networks (hypertext), Trees (OHCO), and Tables
• Networks
– Non-linear relations across lexia
– HTML
• Trees
– Linear and nested relations within lexia
– TEI
• Tables
– Elements of lexia extracted and classified
– Relational databases, spreadsheets
12. Intertextuality is latent hypertext
Goal of markup is to “surface”
latent hypertext and make it
available for analysis and
interpretation
13. Today’s Exercise
• We will markup part of a primary
source
– The beginning of an edition of Jane Austen’s
Persuasion
• We will develop a TEI-like content
model and use HTML to do the markup
• Then we will markup of the text for
intertextual content
• Procedures posted on blog
14. New Concepts
• We will use POSH
– “Plain Old Semantic HTML”
• Use CLASS and ID attributes in your
elements
– <p class=“extract”> … </p>
• Use SPAN and DIV elements to handle
cases where HTML does not provides
an explicit element
– <div class=“page”> … </div>