1. Introduction to Text Encoding and the Text Encoding Initiative (TEI) Richard Wisneski Head, Bibliographic/Metadata Services Kelvin Smith Library Case Western Reserve University 2009-2010
6. Quick Example <lg> <head>After <del>an</del><add>the <del>unsolv’d</del></add> argument</head> <l><del>The</del><add><del>Coming in,</del> A group of</add> little children, and their <lb/>ways and chatter, flow in <del>upon me</del></l> <l>Like <add>welcome</add> rippling water o'er my <lb>heated <add>nerves and</add> flesh.</l> </lg>
22. Level 1 Encoding: Characteristics <div1> or <div> There should be only one child of <body>: a single <div> (or <div1>) <ab> There should be only one child of the <div> (or <div1>): a single <ab> wrapping all text OCR text. If the text is ever “upgraded” to a Level 3 or higher, the <ab> element will be replaced by structural elements like <p> and <table>. <pb> Required in Level 1. Page images can be linked to the text by specifying a jpeg or other image file as the value of the facs= attribute. Page numbers can be supplied with the n= attribute to record the number that is on the page. The Task Force sees the use of METS here as having a tremendous advantage. METS/TEI page turning documentation will be included in the near future.
23.
24. Level 2 Encoding: Characteristics All elements specified in Level 1 plus the following: <front>, <back> Optional <div1> or <div> If no type= attribute is specified, a type= value of "section" should be presumed. <head> Required if present. <ab> At least one container element is required. <fw> Running heads; can be automatically generated
25.
26.
27.
28. Level 3 Encoding: Characteristics All elements specified in Levels 1 and 2 plus the following : <front>, <back> Required if present <div> Required if present; type attribute is recommended <floatingText> Recommended if present. <p> Required for paragraph breaks in prose. <lg> and <l> Required for identifying groups of lines and lines, respectively <list> and <item> May be used in this level to indicate ordered and unordered list structures <table>, <row>, and <cell> May be used to indicate table structures. <figure> Required to indicate figures other than page images <hi> Required to indicate changes in typeface; rend attribute is optional <note> All notes must be encoded. It is also recommended that notes that extend beyond one page be combined into one <note> element. Marginal notes, without reference, should occur at the beginning of the paragraph to which they refer, with the value of the place attribute as "margin"
31. Level 3 Encoding: Verse Example <TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="VAA2383"> <teiHeader> [stuff] </teiHeader> <text> <front> <titlePage>[text]</titlePage> <div type="dedication">[text]</div1> <div type="contents">[text]</div1> </front> <body> <div type="book"> <head>[book title]</head> <div type="part"> <head>[section title]</head> <div type="poem"> <head>THE DAYS GONE BY.</head> <lg> <l n="1">O the days gone by! O the days gone by!</l> <l n="2">The apples in the orchard, and the pathway through the rye;</l> <l n="3">The chirrup of the robin, and the whistle of the quail</l> <l n="4">As he piped across the meadows sweet as any nightingale;</l> </lg> <lg>[lines of poetry]</lg> <lg>[lines of poetry]</lg> </div> </div> </div> </body> </text> </TEI>
32.
33. Level 4 Encoding: Characteristics All elements specified in Levels 1, 2 and 3 plus the following : Et cetera; see TEI BPG Guidelines <titlePage> and child elements Required if present <group> Required to encode a collection of independent texts that are regarded as a single group for processing or other purposes <emph>, <foreign>, <gloss>, <term>, or <title> Recommended to identify typographically distinct text <epigraph>, <quote>, <said>, <mentioned>, or <soCalled> Recommended to represent speech, thought, quotation, etc. <sic>, <corr>, or <choice> Recommended to encode errors or typos. <add>, <del>, <gap>, and <unclear> Recommended to encode material that is omitted, added, marked for deletion, or is illegible, invisible, or inaudible <opener>, <dateline>, <salute> <closer>, <signed>, <postscript> Required to indicate specific parts of letters <sp>, <speaker>, and <stage> Required to encode different dramatic structures. <sp> and <speaker> Required to encode oral histories interviews
50. Session 2: Text Encoding and the Text Encoding Initiative (TEI) Richard Wisneski Head, Bibliographic/Metadata Services Kelvin Smith Library Case Western Reserve University 2009-2010
Show search features of each For Whitman, click “manuscripts” clicking here (under “poetry manuscripts”)
TEI founded in 2000. Members pay annual fee, pays for editorial work, outreach, workshops. KSL-CWRU is a member
Text encoding borne out of new criticism, but more structuralist in nature. Regarding 1 st point, think of text encoding as akin to an edition of a text. Regarding the 2 nd point, there is no one right answer, but there does exist wrong answers Regarding the 3 rd point, it is expected that individual projects will remove elements, constrain attribute values, add new elements, and even import schemas from other namespaces.
Regarding 1 st point: text encoding uses XML because it’s non-proprietary, requires no specialized software or hardware, and is meant to be long-lasting. 2 nd point: have an agreed-upon metadata and markup language that will work across collections and projects 3 rd point: these texts are not static, but rather meant to be built upon by a community of scholars
TEI grew out of a need to create inter’l standards for textual markup in 1987. Members pay annual fee, pays for editorial work, outreach, workshops. KSL-CWRU is a member TEI is intended to serve an inter’l community. # Broad range of methods and approaches # Participation from member institutions around the world # Support for multilingual versions of the TEI Guidelines: Chinese, French, German, Japanese, Spanish, others in the future
Code specifications include: Has a start and end tag No elements overlap Has a single root element (e.g. book; see upcoming slide)
NOTES: Element names ARE case-sensitive Elements are also known as “tags” Attributes are to Elements as Adjectives are to Nouns Elements have an open and close, except for empty elements, such as <pb /> Elements must be properly nested
We’ll use the Roma tool for this later on
Not too important to understand all of this. GO TO PRACTICE
Began in 1994. Major shift occurred in 2002 with P4 encoding LEVEL 1: Texts at Level 1 can be created and encoded by fully automated means, using uncorrected OCR of page images (&quot;dirty OCR&quot;), exporting from existing electronic text files, or actually not including any text at all. texts are not intended to be adequate for textual analysis; they are more likely to be suited to the goals of a preservation unit or mass digitization initiative LEVEL 2: Level 2 encoding requires some human intervention to identify each textual division and heading. Level 2 texts do not require any specialist knowledge or manual intervention below the section level. LEVEL 2 AND 1 both are not meant to have the text stand apart from the page images LEVEL 3: first attempt to have text stand alone from page images
<ab> = anonymous block
<ab> = anonymous block <fw> = forme works
<front>[titlepage information, table of contents, prefaces, etc.][optional]</front> <ab> = anonymous block, NOT <p> tags No <p> tags Facs attribute is used without METS record; xml:id attribute is used WITH METS document
<front>[titlepage information, table of contents, prefaces, etc.][optional]</front> <ab> = anonymous block, NOT <p> tags No <p> tags Not a good idea to use full file paths for facs= attribute
This is the level KSL is using
N.B. You can also use numbered divs. The maximum is 7. The example to the left is invalid; the <div1> and <div2> tags are there just to show that the option exists
N= attribute for <l> is optional
This is the level KSL is using
Click the link to see the full example HAND OUT “SOME COMMON P5 TAGS”
Ask: what do you think would need to be encoded here?
Ask: what do you think would need to be encoded here?
<front>[titlepage information, table of contents, prefaces, etc.][optional]</front> <ab> = anonymous block, NOT <p> tags <fw> = forme works No <p> tags Not good practice to use file paths for facs= attribute
<pb> comes after the <div> <fw> removed Xml:id is used with a METS document; facs= is used without a METS document
<hi rend=“italics”> the rend attribute is optional
<bibStruct> can be in the TEI header or in a separate TEI file, referenced in this TEI document (makes more sense to do the latter). Take note of <q> (can be missed in this example). GO TO PRACTICE
In the local context, a TEI Header gives metadata about the TEI document, its source, and its provenance. The TEI Header may used for metadata exchange, to automatically create indexes (author lists, title lists) for a collection of TEI documents, and to aid in browsing heterogeneous TEI documents. TEI Headers may also be used as a basis for other metadata records (such as MARC or Dublin Core), though generation of other formats may require human intervention because they often are more granular, or have different granularity, than TEI Headers.
In the local context, a TEI Header gives metadata about the TEI document, its source, and its provenance. The TEI Header may used for metadata exchange, to automatically create indexes (author lists, title lists) for a collection of TEI documents, and to aid in browsing heterogeneous TEI documents. TEI Headers may also be used as a basis for other metadata records (such as MARC or Dublin Core), though generation of other formats may require human intervention because they often are more granular, or have different granularity, than TEI Headers.
Distribute spreadsheet
Show how I got to the MARC display Be aware that other components may have to go into the header, depending on your project (e.g. working with verse). Also requires appropriate schema elements and attributes. GO TO PRACTICE TO CREATE A TEI HEADER