A version of my presentation concerning textual criticism and computational editing methods, prepared for an MA seminar on historical research methods.
11. • Modern historical approaches are a recent
thing
• Not unheard of before, but not standard
Wednesday, March 18, 2009
12. • Modern historical approaches are a recent
thing
• Not unheard of before, but not standard
• This profoundly affects the way in which
histories have reached us
Wednesday, March 18, 2009
14. 600 years later
• a bunch of error-ridden copies
Wednesday, March 18, 2009
15. 600 years later
• a bunch of error-ridden copies
• or only a few error-ridden copies
Wednesday, March 18, 2009
16. 600 years later
• a bunch of error-ridden copies
• or only a few error-ridden copies
• or only a name check in a book about
something else
Wednesday, March 18, 2009
18. Apparatus example
St. Stephenʼs Church in Nijmegen
Nobilis itaque comes Otto imperio et dominio Novimagensi sibi, ut praefer-
tur, impignoratis et commissis proinde praeesse cupiens, anno liiii superius 1254
descripto, mense Iunio, una cum iudice, scabinis ceterisque civibus civitatis
Novimagensis, pro ipsius et inhabitantium in ea necessitate, commodo et utili-
5 tate, ut ecclesia eius parochialis extra civitatem sita destrueretur et infra muros
transferretur ac de novo construeretur, a reverendo patre domino Conrado de
Hofsteden, archiepiscopo Coloniensi, licentiam, et a venerabilibus dominis de-
cano et capitulo sanctorum Apostolorum Coloniensi, ipsius ecclesiae ab antiquo
veris et pacificis patronis, consensum, citra tamen praeiudicium, damnum aut
10 gravamen iurium et bonorum eorundem, impetravit.
Et exinde liberum locum eiusdem civitatis qui dicitur Hundisbrug, de prae-
libati Wilhelmi Romanorum regis, ipsius fundi domini, consensu, ad aedifican-
dum et consecrandum ecclesiam et coemeterium, eisdem decano et capitulo de
expresso eiusdem civitatis assensu libera contradiderunt voluntate, obligantes
15 se ipsi comes et civitas dictis decano et capitulo, quod in recompensationem
illius areae infra castrum et portam, quae fuit dos ecclesiae, in qua plebanus
habitare solebat—quae tunc per novum fossatum civitatis est destructa—aliam
aream competentem et ecclesiae novae, ut praefertur, aedificandae satis conti-
guam, ipsi plebano darent et assignarent. Et desuper apud dictam ecclesiam
20 sanctorum Apostolorum est littera sigillis ipsorum Ottonis comitis et civitatis
Novimagensis sigillata.
3 p. 227 R 4 p. 97 N 6 p. 129 D 12 f. 72v M 13 p. 228 R 20 p. 130 D
2 proinde ] primum D 5 ecclesia eius ] ecclesia D: eius eius H extra civitatem om. H
infra ] intra D 6 transferretur ] transferreretur NH 7 Hofsteden ] Hostede D: Hosteden
H Coloniensi ] Colononiensi H dominis ] viris H 8 Coloniensi ] Coloniae H 10 iurium ]
virium D 11 liberum ] librum H qui ] quae D Hundisbrug ] Hundisburch D: Hunsdisbrug
R 12 regis ] imperatoris D 13 et consecrandum om. H eisdem ] eiusdem D 15 comes ]
comites D dictis om. H 17 tunc ] nunc H 18 ut. . . aedificandae om. H 18–19 contiguam ]
contiguum M 19 apud om. H 20 est ] et H littera ] litteram H 21 Novimagensis ]
Novimagii D sigillata ] sigillis communita H
Wednesday, March 18, 2009
19. Apparatus example
St. Stephenʼs Church in Nijmegen
Nobilis itaque comes Otto imperio et dominio Novimagensi sibi, ut praefer-
tur, impignoratis et commissis proinde praeesse cupiens, anno liiii superius 1254
descripto, mense Iunio, una cum iudice, scabinis ceterisque civibus civitatis
Novimagensis, pro ipsius et inhabitantium in ea necessitate, commodo et utili-
5 tate, ut ecclesia eius parochialis extra civitatem sita destrueretur et infra muros
transferretur ac de novo construeretur, a reverendo patre domino Conrado de
Hofsteden, archiepiscopo Coloniensi, licentiam, et a venerabilibus dominis de-
cano et capitulo sanctorum Apostolorum Coloniensi, ipsius ecclesiae ab antiquo
veris et pacificis patronis, consensum, citra tamen praeiudicium, damnum aut
10 gravamen iurium et bonorum eorundem, impetravit.
Et exinde liberum locum eiusdem civitatis qui dicitur Hundisbrug, de prae-
libati Wilhelmi Romanorum regis, ipsius fundi domini, consensu, ad aedifican-
dum et consecrandum ecclesiam et coemeterium, eisdem decano et capitulo de
expresso eiusdem civitatis assensu libera contradiderunt voluntate, obligantes
15 se ipsi comes et civitas dictis decano et capitulo, quod in recompensationem
illius areae infra castrum et portam, quae fuit dos ecclesiae, in qua plebanus
habitare solebat—quae tunc per novum fossatum civitatis est destructa—aliam
aream competentem et ecclesiae novae, ut praefertur, aedificandae satis conti-
guam, ipsi plebano darent et assignarent. Et desuper apud dictam ecclesiam
20 sanctorum Apostolorum est littera sigillis ipsorum Ottonis comitis et civitatis
Novimagensis sigillata.
3 p. 227 R 4 p. 97 N 6 p. 129 D 12 f. 72v M 13 p. 228 R 20 p. 130 D
2 proinde ] primum D 5 ecclesia eius ] ecclesia D: eius eius H extra civitatem om. H
infra ] intra D 6 transferretur ] transferreretur NH 7 Hofsteden ] Hostede D: Hosteden
H Coloniensi ] Colononiensi H dominis ] viris H 8 Coloniensi ] Coloniae H 10 iurium ]
virium D 11 liberum ] librum H qui ] quae D Hundisbrug ] Hundisburch D: Hunsdisbrug
R 12 regis ] imperatoris D 13 et consecrandum om. H eisdem ] eiusdem D 15 comes ]
comites D dictis om. H 17 tunc ] nunc H 18 ut. . . aedificandae om. H 18–19 contiguam ]
contiguum M 19 apud om. H 20 est ] et H littera ] litteram H 21 Novimagensis ]
Novimagii D sigillata ] sigillis communita H
Wednesday, March 18, 2009
22. Who needs this and
why?
• Historians look for one thing
Wednesday, March 18, 2009
23. Who needs this and
why?
• Historians look for one thing
• Linguists look for other things
Wednesday, March 18, 2009
24. Who needs this and
why?
• Historians look for one thing
• Linguists look for other things
• Others will be interested too
Wednesday, March 18, 2009
34. Surviving manuscripts
• Oldest full manuscript is Venice 887
Wednesday, March 18, 2009
35. Surviving manuscripts
• Oldest full manuscript is Venice 887
• Next oldest is Vienna 574
Wednesday, March 18, 2009
36. Surviving manuscripts
• Oldest full manuscript is Venice 887
• Next oldest is Vienna 574
• 24 of 42 (4 of 6 fragments) copied before
1700
Wednesday, March 18, 2009
37. Extant manuscripts of
the Chronicle
Manuscripts Fragments
28
21
14
7
0
pre 16th 16th 17th 18th 19th
Wednesday, March 18, 2009
38. Two manuscript groups
• •
Group 1: like Venice 887 Group 2: like Vienna 574
• •
Text generally Text truncated near
complete (to 1162) the year 1096/7
• •
Transmitted with the Transmitted with
Life of St. Nerses specific long sequence
(Mesrop the Priest) of texts
Wednesday, March 18, 2009
41. Matenadaran 1896
• Copied in 1689
• Uniquely preserves two passages of text
Wednesday, March 18, 2009
42. Matenadaran 1896
• Copied in 1689
• Uniquely preserves two passages of text
• These lacunae known to other copyists
Wednesday, March 18, 2009
43. Matenadaran 1896
• Copied in 1689
• Uniquely preserves two passages of text
• These lacunae known to other copyists
• But lots of manuscripts are older. Hm.
Wednesday, March 18, 2009
51. Transcription
• The most time-consuming part
Wednesday, March 18, 2009
52. Transcription
• The most time-consuming part
• Ideal solution would be optical character
recognition (OCR)
Wednesday, March 18, 2009
53. Transcription
• The most time-consuming part
• Ideal solution would be optical character
recognition (OCR)
• No OCR for manuscripts, yet
Wednesday, March 18, 2009
54. Start with a manuscript
Wednesday, March 18, 2009
55. Into plain text
դ. Թուխտ սիրոյ և միաբանութեան, շարագրեցեալ կղէմէս աստուածաբան վարդապետէ։
Առաջաբանութիւն։
Նա զի արդ գրիչս իմ անյարմարս կարօղ լինիցի երբէկ պատմագրիլ, ըստ պատշաճի
զմեծամեծ յիշելիսն։ Որք ի վաղ ժամանակի անտի պատահեցան յեկեղեցին հայոց. և զի
արդ մեք անհընագէտս յանձնառնցուք ճառել զանցեալ ծածկագոյն խորհուրդս այլասեռ
ազգի, մինչև ցայժմ ո՜չ կարաց Ֆրանկ պատմիչ ոք զբուռն հարել ի սոյնպիսի օտար
պատմագրութիւնս։ Բայց սակայն յուսացեալ ի յօգնութիւն սրբոյ աստուածածնին
յօժարապէտս ախորժեսցուք համարձակիլ և ի յայս անհոռն ծովս մտանել...
Wednesday, March 18, 2009
56. XML solution: TEI
!-- ... --
div n=”4”
headhi rend=”red”Թուխտ սիրոյ և միաբանութեexան/ex, շարագրեցlb/
եալ կղէմէս աexստուա/exծաբան վարդապետէ։lb/
Առաջաբանութիւն։ /hi
/head
phi rend=”ornament”Ն/hihi rend=”red”ա զի արդ գրիչս իմ անյարմարս/hi lb/
կարօղ լինիցի երբէկ պատմագրիլ, expanըստ/expan lb/
պատշաճի զմեծամեծ յիշելիսն։ Որք lb/
ի վաղ ժամանակի անտի պատահեցան lb/
յեկեղեցին հայոց. և զի արդ մեք անհընագէտս lb/
յանձնառնցուք ճառել զանցեալ ծածկագոյն lb/
խորհուրդս այլասեռ ազգի, մինչև ցայժմ ո՜չ lb/
կարաց Ֆրանկ պատմիչ ոք զբուռն հարել lb/
ի սոյնպիսի օտար պատմագրութիexւն/exս։ Բայց սաlb/
կայն յուսացեալ ի յօգնութիexւն/ex սրբոյ աexստուա/exծածնին lb/
յօժարապէտս ախորժեսցուք համարձակիլ և ի lb/
յայս անհոռն ծովս մտանել։ ... lb/
/p
/div
!-- ... --
Wednesday, March 18, 2009
59. Perl to the rescue #1
• XML is a terrible thing to edit
Wednesday, March 18, 2009
60. Perl to the rescue #1
• XML is a terrible thing to edit
• I want a transcription markup that I can
convert to TEI XML later
Wednesday, March 18, 2009
61. Perl to the rescue #1
• XML is a terrible thing to edit
• I want a transcription markup that I can
convert to TEI XML later
• Not a solution you’ll like, but I’ll show it to
you anyway
Wednesday, March 18, 2009
64. TEI markup
[172]
զօ՛րացն և զօրավարացն և ազգն հոռոմոց իւրոց քաջութեան զան
դարձ փաղչելն արարին պարծանք նմանեացն վատ՛ հովուաց,
ո՛ր յորժամ զգայլն տեսանէ փաղչի, սակայն հոռոմք յան
ջանս ջանացին, ո՛ր լուր զպարիսպ ամրութեան տանս հայոց
քակեա՛լ կործանեցին, և զպարսիկք ի վերայ արձակեցին սրով, և
զամենայն յաղթու՛թիւնն իւրոց համարեցան, և ինքեանք անպատկառելի
երեսօք, կուրտ՛ զօրավարք, և ներքինի զօրօք զհայ՛ք պահել
ջանա+յ+ին, մինչև պարսիկք յան±-(blot)տ-+տ+±էր տեսին զ^ամենայն^ արևելք.
և յայնժամ մեծաւ՛ զօրութեամբ զօրացնն այ՛լազգիքն, որ ի
մէկ տարո՛յ հասան մինչև ի դու՛ռն կոստանդնուպօլիս, և
առին զամենայն աշխարհն ±-հայոց-+(overwrite)հոռոմոց+±, զքաղաքս
ծովեզերաց և զկղզիս նոցա,
և արա՛րին զազգն յունաց որպէս զբա՛նդարգեալս ի ներս
ի կոստանդնուպօլիս. և յորժամ առա՛ւ հայք ի յունաց, ար
գելաւ՛ ամենայն չարութիւնն հոռոմո՛ց, յազգէն հայոց, և զկնի այսօր
իկ հնարեցան այ՛լ կերպիւ պատերազմ յարուցանեալ ^ընդ^ ազգն
հա՛յոց, նստան ի քննութիւն հաւ՛ատոյ, և այսու ատեա՛լ
անարգեցին զհանդէս պատերազմի և զօրմարտի, և զկռիւս և
զաղմու՛կս յեկեղեցի աստուծոյ կարգեալ հաստատեցին. ի պարսից
պատերազմէն յօժարութեամբ փախչին, և զամենայն ճշմարիտ
հաւատացեալքս
քրիստոսի ի հաւատոյն ջանան խափանել և խաղխտել, վասն զի յորժամ
այր քաջ զօրաւ՛որ գտանէին, զաչսն խաւարեցուցանէին, և կամ
ի ծով ձգեա՛լ խեղդամահ սատակէին. և այ՛ն էր փո՛յթ յօժարութեան
Wednesday, March 18, 2009
65. TEI markup
հ±-այ-+(overwrite)ոռոմ+ոց,
seg type=quot;wordquot;
substհdelայ/del
add place=quot;overwritequot;
ex resp=quot;#tlaquot;ո/exռ
ex resp=quot;#tlaquot;ո/exմ/add
/substոց,/seg
Wednesday, March 18, 2009
66. TEI markup
հոռոմոց իւրոց քաջութեան
seg type=”word”հ
ex resp=quot;#tlaquot;ո/exռ
ex resp=quot;#tlaquot;ո/exմոց/seg
seg type=”word”իւր
ex resp=quot;#tlaquot;ո/exց/seg
seg type=”word”ք
ex resp=quot;#tlaquot;ա/exջ
ex resp=quot;#tlaquot;ո/exւ
ex resp=quot;#tlaquot;թ/exե
ex resp=quot;#tlaquot;ան/ex/seg
Wednesday, March 18, 2009
68. Collation
quot;The collation of manuscripts requires the
infuriating accuracy of a pedant and the
obsessive stamina of an idiot. It is therefore an
ideal task for a computer.quot;
—Peter Robinson, “Collation and Textual Criticism”, LLC vol. 4 no. 2, 1989
Wednesday, March 18, 2009
70. Collation
• need to align words with each other
Wednesday, March 18, 2009
71. Collation
• need to align words with each other
• ...across many manuscripts
Wednesday, March 18, 2009
72. Collation
• need to align words with each other
• ...across many manuscripts
• ...even when the words aren’t exactly the
same
(e.g. “յաշխարհին” vs. “աշխարհն”)
Wednesday, March 18, 2009
73. յայսմ այս յայսմ այս այս
ամենայն ամենայն ամի ամենայն ամենայն
եղելոցն, եղելոց եղելոց եղելոց եղելոցս
նստուցանեն նստուցանեն նստուցանեն նստուցանեն նստուցանեն
զաթոռ զաթոռ յաթոռ զաթոռ զաթոռ
հայրապետութեան հայրապետութեան հայրապետութեան հայրապետութեանն հայրապետութեան
ի ի ի
թաւբլուր թաւաբլուրն։ թաւբլուր
եւ եւ եւ
կացեալ կացեալ կացեալ
անդ անդ անդ
զամս զամս զամս
գ գ, գ
եւ եւ եւ
ընդ ընդ ընդ
ամենայն ամենայն ամենայն
զ զ վեց
ամ ամ, ամ
կալեալ կալեալ կալեալ
զաթոռ զաթոռ զաթոռ
հայրապետութեանն հայրապետութեան հայրապետութեանն
տէր տէր տէր տէր զտէր
խաչիկ։ խաչիկ։ խաչիկն։ խաչիկ։ խաչիկ։
Wednesday, March 18, 2009
76. Our text apparatus
այս ամենայն եղելոցն նստուցանեն զաթոռ
1
ի թաւբլուր,
1 այս] յայսմ AC 1 ամենայն] ամի C 1 եղելոցն] եղելոց BDE եղելոցս C
1 զաթոռ] յաթոռ C 2 ի թաւբլուր] om. BE
...
Wednesday, March 18, 2009
79. New text apparatus
յայսմ ամի եղելոցն նստուցանեն զաթոռ
1
ի թաւբլուր,
1 յայսմ] այս BDE 1 ամի] ամենայն ABDE 1 եղելոցն] եղելոց BDE եղելոցս C
1 զաթոռ] յաթոռ C 2 ի թաւբլուր] om. BE
...
Wednesday, March 18, 2009
82. Stemma construction
• Better stemma through analysis of collation results
Wednesday, March 18, 2009
83. Stemma construction
• Better stemma through analysis of collation results
• Borrows statistical models from evolutionary biology
Wednesday, March 18, 2009
84. Stemma construction
• Better stemma through analysis of collation results
• Borrows statistical models from evolutionary biology
• “Maximum parsimony” based upon DNA of specimens
Wednesday, March 18, 2009
85. Stemma construction
• Better stemma through analysis of collation results
• Borrows statistical models from evolutionary biology
• “Maximum parsimony” based upon DNA of specimens
• Manuscripts are specimens
Wednesday, March 18, 2009
86. Stemma construction
• Better stemma through analysis of collation results
• Borrows statistical models from evolutionary biology
• “Maximum parsimony” based upon DNA of specimens
• Manuscripts are specimens
• Biologists have DNA sequences; we have words.
Wednesday, March 18, 2009
87. յայսմ այս յայսմ այս այս
ամենայն ամենայն ամի ամենայն ամենայն
եղելոցն, եղելոց եղելոց եղելոց եղելոցս
նստուցանեն նստուցանեն նստուցանեն նստուցանեն նստուցանեն
զաթոռ զաթոռ յաթոռ զաթոռ զաթոռ
հայրապետութեան հայրապետութեան հայրապետութեան հայրապետութեանն հայրապետութեան
ի ի ի
թաւբլուր թաւաբլուրն։ թաւբլուր
եւ եւ եւ
կացեալ կացեալ կացեալ
անդ անդ անդ
զամս զամս զամս
գ գ, գ
եւ եւ եւ
ընդ ընդ ընդ
ամենայն ամենայն ամենայն
զ զ վեց
ամ ամ, ամ
կալեալ կալեալ կալեալ
զաթոռ զաթոռ զաթոռ
հայրապետութեանն հայրապետութեան հայրապետութեանն
տէր տէր տէր տէր զտէր
խաչիկ։ խաչիկ։ խաչիկն։ խաչիկ։ խաչիկ։
Wednesday, March 18, 2009
88. A B A B B
A A B A A
A B B B C
A A A A A
A A B A A
A A A B A
A O A A O
A O B A O
A O A A O
A O A A O
A O A A O
A O A A O
A O A A O
A O A A O
A O A A O
A O A A O
A O A B O
A O A A O
A O A A O
A O A A O
A O B A O
A A A A B
A A B A A
Wednesday, March 18, 2009
90. Non-fragmentary manuscripts omitted:
!
Paris 191, 200
Jerusalem 3651
Matenadaran 2855, 2899, 3380,
gaps appear
6605, 8159, 8232, 8894
Rome 25
Vienna 243, 246
quot;
ch
% ap
te
text truncated
rd
F (1617)
ivi
sio
ns
ap
B (1623) pe
ar
X (1669)
$
A (1689) #
Matenadaran
3520 (17th c.)
O (ca. 1702)
Matenadaran
W (1601)
2644(1844)
V (1590-1600)
J (1617)
D (1647) (Jerusalem
1869 edition*)
H (17th c.) Z (17th c.)
Y (17th c.) K (1699)
L (1660)
I (1664)
Matenadaran
3071 (1651-61)
Bzommar 644
(1775-1805)
Venice 986
(1830-35)
*Based on Jerusalem mss. 1051, 1107
Wednesday, March 18, 2009
91. Non-fragmentary manuscripts omitted:
!
Paris 191, 200
Jerusalem 3651
Matenadaran 2855, 2899, 3380,
gaps appear
6605, 8159, 8232, 8894
Rome 25
Vienna 243, 246
quot;
ch
% ap
te
text truncated
rd
F (1617)
ivi
sio
ns
ap
B (1623) pe
ar
X (1669)
$
A (1689) #
Matenadaran
3520 (17th c.)
O (ca. 1702)
Matenadaran
W (1601)
2644(1844)
V (1590-1600)
J (1617)
D (1647) (Jerusalem
1869 edition*)
H (17th c.) Z (17th c.)
Y (17th c.) K (1699)
L (1660)
I (1664)
Matenadaran
3071 (1651-61)
Bzommar 644
(1775-1805)
Venice 986
(1830-35)
*Based on Jerusalem mss. 1051, 1107
Wednesday, March 18, 2009
92. Non-fragmentary manuscripts omitted:
!
Paris 191, 200
Jerusalem 3651
Matenadaran 2855, 2899, 3380,
gaps appear
6605, 8159, 8232, 8894
Rome 25
Vienna 243, 246
quot;
ch
% ap
te
text truncated
rd
F (1617)
ivi
sio
ns
ap
B (1623) pe
ar
X (1669)
$
A (1689) #
Matenadaran
3520 (17th c.)
O (ca. 1702)
Matenadaran
W (1601)
2644(1844)
V (1590-1600)
J (1617)
D (1647) (Jerusalem
1869 edition*)
H (17th c.) Z (17th c.)
Y (17th c.) K (1699)
L (1660)
I (1664)
Matenadaran
3071 (1651-61)
Bzommar 644
(1775-1805)
Venice 986
(1830-35)
*Based on Jerusalem mss. 1051, 1107
Wednesday, March 18, 2009
94. Online publication
• XML can also be turned into HTML for online
publication
• This gives:
Wednesday, March 18, 2009
95. Online publication
• XML can also be turned into HTML for online
publication
• This gives:
• searchable text
Wednesday, March 18, 2009
96. Online publication
• XML can also be turned into HTML for online
publication
• This gives:
• searchable text
• easy updates
Wednesday, March 18, 2009
97. Online publication
• XML can also be turned into HTML for online
publication
• This gives:
• searchable text
• easy updates
• configurable set of variants
Wednesday, March 18, 2009
98. Online publication
• XML can also be turned into HTML for online
publication
• This gives:
• searchable text
• easy updates
• configurable set of variants
• links to manuscript images where available
Wednesday, March 18, 2009
I’m going to start off by asking a very simple question.
How do we know what happened in the past?
[ASK: We read histories. We read literature. We look at art. We look at archaeology. We listen to people.]
So about these histories we read. Where do they come from?
Hint: \"the library\" is not the answer.
Or, well, it is an answer. The history in Lucius' library may have had a few differences from the one in Brutus'.
Why was that?
No printing press. No mechanical form of copying.
[woodcut slide] This was your printer.
Trouble is, humans don’t make very good machines.
- People are kind of bad at copying because they are
- reading as they go,
- they skip lines,
- they change dialects in the next village,
- they take shortcuts...
... because [TRANS] the monastery is cold and [TRANS] the food is terrible and [TRANS] the tea ladies are unfriendly and [TRANS] they really just don't want to be there anymore.
- (Think I'm kidding? You should read some of the notes that the copyists left in their books.)
... because [TRANS] the monastery is cold and [TRANS] the food is terrible and [TRANS] the tea ladies are unfriendly and [TRANS] they really just don't want to be there anymore.
- (Think I'm kidding? You should read some of the notes that the copyists left in their books.)
... because [TRANS] the monastery is cold and [TRANS] the food is terrible and [TRANS] the tea ladies are unfriendly and [TRANS] they really just don't want to be there anymore.
- (Think I'm kidding? You should read some of the notes that the copyists left in their books.)
... because [TRANS] the monastery is cold and [TRANS] the food is terrible and [TRANS] the tea ladies are unfriendly and [TRANS] they really just don't want to be there anymore.
- (Think I'm kidding? You should read some of the notes that the copyists left in their books.)
Very important to remember - the things we care about in a historical source, including preservation of the original, are *not* the things they cared about several hundred years ago. [TRANS] You can find exceptions, especially in the study of classical texts, but it wasn’t the rule. This would seem obvious, but you would be amazed how often people still lose sight of this in professional scholarship.
[TRANS]
So what does this mean? For one thing, people weren't so concerned about preserving the original.
- Need a copy of another book? Out of parchment?
- Pick a book you don't want anymore and scrape off the ink.
- Was that the original copy of some history that is going to be massively important in 600 years? How would you know? Anyway Arnulf down the road has another copy; they can read his.
Very important to remember - the things we care about in a historical source, including preservation of the original, are *not* the things they cared about several hundred years ago. [TRANS] You can find exceptions, especially in the study of classical texts, but it wasn’t the rule. This would seem obvious, but you would be amazed how often people still lose sight of this in professional scholarship.
[TRANS]
So what does this mean? For one thing, people weren't so concerned about preserving the original.
- Need a copy of another book? Out of parchment?
- Pick a book you don't want anymore and scrape off the ink.
- Was that the original copy of some history that is going to be massively important in 600 years? How would you know? Anyway Arnulf down the road has another copy; they can read his.
Very important to remember - the things we care about in a historical source, including preservation of the original, are *not* the things they cared about several hundred years ago. [TRANS] You can find exceptions, especially in the study of classical texts, but it wasn’t the rule. This would seem obvious, but you would be amazed how often people still lose sight of this in professional scholarship.
[TRANS]
So what does this mean? For one thing, people weren't so concerned about preserving the original.
- Need a copy of another book? Out of parchment?
- Pick a book you don't want anymore and scrape off the ink.
- Was that the original copy of some history that is going to be massively important in 600 years? How would you know? Anyway Arnulf down the road has another copy; they can read his.
So now it's 600 years later and the study of history has become more rigorous. We want to know what our author actually wrote, but we don't have the original anymore. [TRANS] If we're lucky, we just have a bunch of error-ridden copies.
[TRANS]
If we're unlucky, we only have one error-ridden copy.
[TRANS]
If we're really unlucky, we just have a reference to the history in someone else's book.
So now it's 600 years later and the study of history has become more rigorous. We want to know what our author actually wrote, but we don't have the original anymore. [TRANS] If we're lucky, we just have a bunch of error-ridden copies.
[TRANS]
If we're unlucky, we only have one error-ridden copy.
[TRANS]
If we're really unlucky, we just have a reference to the history in someone else's book.
So now it's 600 years later and the study of history has become more rigorous. We want to know what our author actually wrote, but we don't have the original anymore. [TRANS] If we're lucky, we just have a bunch of error-ridden copies.
[TRANS]
If we're unlucky, we only have one error-ridden copy.
[TRANS]
If we're really unlucky, we just have a reference to the history in someone else's book.
Textual criticism is the field of taking all these sources and constructing the basic foundation of history. (It means something similar but slightly different in the field of literature, but hey we’re all historians here.) In some sense it's the preserve of detail-obsessed nerds like me, but every historian needs to know what the field is all about, and what the challenges are, so you can make your own informed decision about the value of a source.
The product of textual criticism on a particular text is a “critical edition”. This will have a version of the text, chosen from the available alternatives, according to whatever criteria the editor thinks best. Sometimes you may wish to politely disagree with the choices that the editor has made...
This is why a critical edition also contains [TRANS] a specially formatted block of footnotes called an \"apparatus criticus\", that gives all the readings from the manuscripts that were rejected from the base text.
This is also fiddly and irritating to format. People have written entire word processors just to do this. But it is the whole point of a critical edition really. [TRANS CLOSEUP]
e.g. “in line 2 where it says “proinde” manuscript D has “primum”
- “in line 6 where it says “transferretur” mss N and H have “transferrentur”
- “in line 13 where it says “et consecrandum” that’s missing in ms H
- who cares about what?
- Historians want to know what happened. They need to know whether the text said “the man bit the dog” or “the dog bit the man”.
- Linguists and philologists want to know how it was said. They need to know whether the author spelled “potato” with an extra E.
- Other people will want other things. Someone making a historical atlas might care very much about every variation in spelling of place names, and not care about extra Es in “potato”.
- Trouble is, to date, what sort of critical edition you get depends entirely on who’s doing it. The editor has had to decide for him/herself what is “interesting”, and anything that isn’t “interesting” isn’t included.
So how does this play out in practice?
- who cares about what?
- Historians want to know what happened. They need to know whether the text said “the man bit the dog” or “the dog bit the man”.
- Linguists and philologists want to know how it was said. They need to know whether the author spelled “potato” with an extra E.
- Other people will want other things. Someone making a historical atlas might care very much about every variation in spelling of place names, and not care about extra Es in “potato”.
- Trouble is, to date, what sort of critical edition you get depends entirely on who’s doing it. The editor has had to decide for him/herself what is “interesting”, and anything that isn’t “interesting” isn’t included.
So how does this play out in practice?
- who cares about what?
- Historians want to know what happened. They need to know whether the text said “the man bit the dog” or “the dog bit the man”.
- Linguists and philologists want to know how it was said. They need to know whether the author spelled “potato” with an extra E.
- Other people will want other things. Someone making a historical atlas might care very much about every variation in spelling of place names, and not care about extra Es in “potato”.
- Trouble is, to date, what sort of critical edition you get depends entirely on who’s doing it. The editor has had to decide for him/herself what is “interesting”, and anything that isn’t “interesting” isn’t included.
So how does this play out in practice?
I shall illustrate with the text I know best, which is a 12th century Armenian chronicle, and is a beautiful example by virtue of being equally irrelevant to all of you.
[SLIDE] First a little historical background. (This is Anatolia, in case it isn’t very clear.) Matthew of Edessa was an Armenian priest who lived and
[TRANS] wrote in Edessa; chronicle begins 952
[TRANS] covers good times up to 1045
[TRANS] covers the migration of the Armenian nobility to Cappadocia as the Seljuks rampaged around
starts to talk about his own time, and in particular the First Crusade, but also the...
[TRANS] rise of new Armenian lords in Kesoun, Raban, and...
[TRANS] Cilicia, the last of which would eventually become the Cilician Kingdom of Armenia, which would be pretty important after Matthew was dead.
The Chronicle was written in the 1130s. Those were exciting times if you lived in Edessa. This is why Matthew wrote the history he did. It turns out that we are really glad he did, because he was there to see the Crusader princes come through and he had a rather different viewpoint than anyone named Baldwin or anyone named Ioannes.
[SLIDE] First a little historical background. (This is Anatolia, in case it isn’t very clear.) Matthew of Edessa was an Armenian priest who lived and
[TRANS] wrote in Edessa; chronicle begins 952
[TRANS] covers good times up to 1045
[TRANS] covers the migration of the Armenian nobility to Cappadocia as the Seljuks rampaged around
starts to talk about his own time, and in particular the First Crusade, but also the...
[TRANS] rise of new Armenian lords in Kesoun, Raban, and...
[TRANS] Cilicia, the last of which would eventually become the Cilician Kingdom of Armenia, which would be pretty important after Matthew was dead.
The Chronicle was written in the 1130s. Those were exciting times if you lived in Edessa. This is why Matthew wrote the history he did. It turns out that we are really glad he did, because he was there to see the Crusader princes come through and he had a rather different viewpoint than anyone named Baldwin or anyone named Ioannes.
[SLIDE] First a little historical background. (This is Anatolia, in case it isn’t very clear.) Matthew of Edessa was an Armenian priest who lived and
[TRANS] wrote in Edessa; chronicle begins 952
[TRANS] covers good times up to 1045
[TRANS] covers the migration of the Armenian nobility to Cappadocia as the Seljuks rampaged around
starts to talk about his own time, and in particular the First Crusade, but also the...
[TRANS] rise of new Armenian lords in Kesoun, Raban, and...
[TRANS] Cilicia, the last of which would eventually become the Cilician Kingdom of Armenia, which would be pretty important after Matthew was dead.
The Chronicle was written in the 1130s. Those were exciting times if you lived in Edessa. This is why Matthew wrote the history he did. It turns out that we are really glad he did, because he was there to see the Crusader princes come through and he had a rather different viewpoint than anyone named Baldwin or anyone named Ioannes.
[SLIDE] First a little historical background. (This is Anatolia, in case it isn’t very clear.) Matthew of Edessa was an Armenian priest who lived and
[TRANS] wrote in Edessa; chronicle begins 952
[TRANS] covers good times up to 1045
[TRANS] covers the migration of the Armenian nobility to Cappadocia as the Seljuks rampaged around
starts to talk about his own time, and in particular the First Crusade, but also the...
[TRANS] rise of new Armenian lords in Kesoun, Raban, and...
[TRANS] Cilicia, the last of which would eventually become the Cilician Kingdom of Armenia, which would be pretty important after Matthew was dead.
The Chronicle was written in the 1130s. Those were exciting times if you lived in Edessa. This is why Matthew wrote the history he did. It turns out that we are really glad he did, because he was there to see the Crusader princes come through and he had a rather different viewpoint than anyone named Baldwin or anyone named Ioannes.
[SLIDE] First a little historical background. (This is Anatolia, in case it isn’t very clear.) Matthew of Edessa was an Armenian priest who lived and
[TRANS] wrote in Edessa; chronicle begins 952
[TRANS] covers good times up to 1045
[TRANS] covers the migration of the Armenian nobility to Cappadocia as the Seljuks rampaged around
starts to talk about his own time, and in particular the First Crusade, but also the...
[TRANS] rise of new Armenian lords in Kesoun, Raban, and...
[TRANS] Cilicia, the last of which would eventually become the Cilician Kingdom of Armenia, which would be pretty important after Matthew was dead.
The Chronicle was written in the 1130s. Those were exciting times if you lived in Edessa. This is why Matthew wrote the history he did. It turns out that we are really glad he did, because he was there to see the Crusader princes come through and he had a rather different viewpoint than anyone named Baldwin or anyone named Ioannes.
So he wrote this history, and we need an edition. How do you do that? First you have to find out what there is to work with. Traditionally, you look at all the manuscripts, and then you make a choice (often arbitrary, or based only on age of manuscript) about which one you'll base your edition on.
Then you return to the roots of scholarship. You copy it out (by longhand, typewriter, word processor, or spreadsheet), maybe one word per line, and you note everything you observe about that word.
Is it abbreviated? Is it misspelled? Is it at a line boundary? A page boundary? Is there a margin note pointing to the word? Is it scratched out?
Maybe these things won't be important, but you never know when they will be. That missing pair of words at the end of the line might be the proof you need that this manuscript was copied from another manuscript, for example.
Then you do the same thing with all the other manuscripts. By the end, you are cold and hungry and the tea ladies are unfriendly and you really just don't want to be there anymore. On the other hand, you know the text *really* well by now.
So then you have to go through your transcription one last time, sick of it as you are, and make decisions about which words will go into your edited text.
In my case, the Chronicle was copied a lot, and it was thrown away a lot. The earliest copy we have is from more than 400 years later. Today there are 42 manuscripts. Of those, 6 are short extracts from the history, which leaves 36 manuscripts that need to be at least looked at closely.
The oldest manuscript I know about is Venice 887. It is held by the Mekhitarist monastery in Venice, and was copied sometime between 1590 and 1600. [TRANS] The next oldest is Vienna 574, held by the Mekhitarist monastery there, and dates from 1601. [TRANS] More than half the non-fragmentary manuscripts date to the seventeenth century, which suggests that multiple older copies which have now been lost, and served as exemplars.
The oldest manuscript I know about is Venice 887. It is held by the Mekhitarist monastery in Venice, and was copied sometime between 1590 and 1600. [TRANS] The next oldest is Vienna 574, held by the Mekhitarist monastery there, and dates from 1601. [TRANS] More than half the non-fragmentary manuscripts date to the seventeenth century, which suggests that multiple older copies which have now been lost, and served as exemplars.
The oldest manuscript I know about is Venice 887. It is held by the Mekhitarist monastery in Venice, and was copied sometime between 1590 and 1600. [TRANS] The next oldest is Vienna 574, held by the Mekhitarist monastery there, and dates from 1601. [TRANS] More than half the non-fragmentary manuscripts date to the seventeenth century, which suggests that multiple older copies which have now been lost, and served as exemplars.
This chart shows the distribution of extant manuscripts of the chronicle. You can see the sudden proliferation of copies of this manuscript in the seventeenth century;
[digression about plant kingdom biology chart]
the characteristics of the surviving copies strongly suggest that there were many copies made before 1600 that have now been lost.
Once I’ve made my list of manuscripts, the next step is to see if I can find patterns. Were some of them obviously copied from others? Do some of them have really obvious features in common?
This is what I was able to find.
The two oldest manuscripts represent two distinct groups, which are pretty easy to spot.
- First (V887) group has complete text, shares parchment with (10th c.) Nerses
- Second (W574) group has truncated text, shares parchment with long specific set of texts that I won’t bore you with
- The colophons in many of these manuscripts show that the copyists were aware of the truncation. Aha, that’s a Clue.
There is one manuscript from the first group that is particularly odd.
Matenadaran manuscript 1896
[TRANS] copied in 1689—many years after our two group leaders.
[TRANS] Preserves 2 longish passages of text that appear nowhere else.
[TRANS] Many of the other manuscripts contain marginal notes that show awareness of gaps. In fact, this one left some room for the gaps, *and then went back and filled them in*. There’s a Clue if I ever saw one.
[TRANS] And yet it’s the only one of the 43 that has these bits, and it’s nowhere near the oldest.
- complicated and twisted manuscript tradition. Normally I’m supposed to look at all this information, think about it for a while, and then draw a stemma.
Stemma is my best guess at a manuscript family tree—shows copy relationships between mss.
- The more mss I can find that were copied from others I have, the fewer I have to transcribe for the edition. Win.
- This is so snarled up that I can’t even begin to draw a stemma though. Lose.
- But sometimes, rarely, scribes are helpful. When that happens, win.
There is one manuscript from the first group that is particularly odd.
Matenadaran manuscript 1896
[TRANS] copied in 1689—many years after our two group leaders.
[TRANS] Preserves 2 longish passages of text that appear nowhere else.
[TRANS] Many of the other manuscripts contain marginal notes that show awareness of gaps. In fact, this one left some room for the gaps, *and then went back and filled them in*. There’s a Clue if I ever saw one.
[TRANS] And yet it’s the only one of the 43 that has these bits, and it’s nowhere near the oldest.
- complicated and twisted manuscript tradition. Normally I’m supposed to look at all this information, think about it for a while, and then draw a stemma.
Stemma is my best guess at a manuscript family tree—shows copy relationships between mss.
- The more mss I can find that were copied from others I have, the fewer I have to transcribe for the edition. Win.
- This is so snarled up that I can’t even begin to draw a stemma though. Lose.
- But sometimes, rarely, scribes are helpful. When that happens, win.
There is one manuscript from the first group that is particularly odd.
Matenadaran manuscript 1896
[TRANS] copied in 1689—many years after our two group leaders.
[TRANS] Preserves 2 longish passages of text that appear nowhere else.
[TRANS] Many of the other manuscripts contain marginal notes that show awareness of gaps. In fact, this one left some room for the gaps, *and then went back and filled them in*. There’s a Clue if I ever saw one.
[TRANS] And yet it’s the only one of the 43 that has these bits, and it’s nowhere near the oldest.
- complicated and twisted manuscript tradition. Normally I’m supposed to look at all this information, think about it for a while, and then draw a stemma.
Stemma is my best guess at a manuscript family tree—shows copy relationships between mss.
- The more mss I can find that were copied from others I have, the fewer I have to transcribe for the edition. Win.
- This is so snarled up that I can’t even begin to draw a stemma though. Lose.
- But sometimes, rarely, scribes are helpful. When that happens, win.
There is one manuscript from the first group that is particularly odd.
Matenadaran manuscript 1896
[TRANS] copied in 1689—many years after our two group leaders.
[TRANS] Preserves 2 longish passages of text that appear nowhere else.
[TRANS] Many of the other manuscripts contain marginal notes that show awareness of gaps. In fact, this one left some room for the gaps, *and then went back and filled them in*. There’s a Clue if I ever saw one.
[TRANS] And yet it’s the only one of the 43 that has these bits, and it’s nowhere near the oldest.
- complicated and twisted manuscript tradition. Normally I’m supposed to look at all this information, think about it for a while, and then draw a stemma.
Stemma is my best guess at a manuscript family tree—shows copy relationships between mss.
- The more mss I can find that were copied from others I have, the fewer I have to transcribe for the edition. Win.
- This is so snarled up that I can’t even begin to draw a stemma though. Lose.
- But sometimes, rarely, scribes are helpful. When that happens, win.
I’ll just have to plunge in and make the edition without having the stemma yet.
There are four steps to making a critical edition: [CLICK] transcription, [CLICK] collation / text analysis, [CLICK] editing, and [CLICK] publication. The first two of these, and to a lesser extent the third, are so horrendously tedious that I spent a while looking desperately for shortcuts. This brings us to the other thing I’m supposed to talk about today...
I’ll just have to plunge in and make the edition without having the stemma yet.
There are four steps to making a critical edition: [CLICK] transcription, [CLICK] collation / text analysis, [CLICK] editing, and [CLICK] publication. The first two of these, and to a lesser extent the third, are so horrendously tedious that I spent a while looking desperately for shortcuts. This brings us to the other thing I’m supposed to talk about today...
I’ll just have to plunge in and make the edition without having the stemma yet.
There are four steps to making a critical edition: [CLICK] transcription, [CLICK] collation / text analysis, [CLICK] editing, and [CLICK] publication. The first two of these, and to a lesser extent the third, are so horrendously tedious that I spent a while looking desperately for shortcuts. This brings us to the other thing I’m supposed to talk about today...
I’ll just have to plunge in and make the edition without having the stemma yet.
There are four steps to making a critical edition: [CLICK] transcription, [CLICK] collation / text analysis, [CLICK] editing, and [CLICK] publication. The first two of these, and to a lesser extent the third, are so horrendously tedious that I spent a while looking desperately for shortcuts. This brings us to the other thing I’m supposed to talk about today...
...how to coax the computer into doing my work for me.
What’s about to follow is a rather detailed case study in how computers can take a particular academic task and make it a lot easier and a lot less tedious. Many of you will never find yourselves making a critical edition, but the point of computer techniques in humanities is to look for anything at all, no matter how small, that is mindless and repetitive and let the computer deal with it so you don’t have to.
So finally I had some manuscripts. I plunged into the transcription. [TRANS] It turns out that this takes a huge amount of time. [TRANS] What I really want is for the computer to read it for me. This actually does exist, and is called “optical character recognition” (OCR for short.) It is what Google uses to scan in all those books and make them searchable. [TRANS] Unfortunately there's not yet any such thing as OCR for manuscripts. Not only were the monks in the scriptoria not machines—their handwriting wasn’t as good as the printing presses either.
This means that the transcription itself is still horrible, mind-numbing, and time-consuming. But the transcription is the only horrible part I have to do, and there are a few things I learned along the way.
So finally I had some manuscripts. I plunged into the transcription. [TRANS] It turns out that this takes a huge amount of time. [TRANS] What I really want is for the computer to read it for me. This actually does exist, and is called “optical character recognition” (OCR for short.) It is what Google uses to scan in all those books and make them searchable. [TRANS] Unfortunately there's not yet any such thing as OCR for manuscripts. Not only were the monks in the scriptoria not machines—their handwriting wasn’t as good as the printing presses either.
This means that the transcription itself is still horrible, mind-numbing, and time-consuming. But the transcription is the only horrible part I have to do, and there are a few things I learned along the way.
So finally I had some manuscripts. I plunged into the transcription. [TRANS] It turns out that this takes a huge amount of time. [TRANS] What I really want is for the computer to read it for me. This actually does exist, and is called “optical character recognition” (OCR for short.) It is what Google uses to scan in all those books and make them searchable. [TRANS] Unfortunately there's not yet any such thing as OCR for manuscripts. Not only were the monks in the scriptoria not machines—their handwriting wasn’t as good as the printing presses either.
This means that the transcription itself is still horrible, mind-numbing, and time-consuming. But the transcription is the only horrible part I have to do, and there are a few things I learned along the way.
So here we have a manuscript page. Just like the history itself has nothing to do with your own fields, I’ve even managed to pick a script that none of you can read. So you’ll just have to take my word for it.
I initially just typed out my texts into plain files, like Notepad or TextEdit or what have you. No red font, no formatting, just the words. I figured this would make it easier for the computer to compare words later across all the different manuscripts.
So I can transcribe the text, but already have dilemma:
- Standardize spelling?
- Unbroken lines?
- How to record deletions & additions?
- How to record page breaks? Section divisions? Etc.
Either I lose information that I need, or I invent some way to represent all these fiddly variations. On the other hand, I don’t want to cause trouble when I get the computer, which doesn't understand Armenian, to collate the results.
Naturally, someone threw XML at the problem. XML is the eXtensible Markup Language, and is a really useful way of representing and transferring anything that you want the computer to process, and especially anything you might want the computer to display differently in different application windows, or to different people.
This is a fragment of XML using the guidelines of the Text Encoding Initiative, or TEI.
- Easy comparison of section / paragraph divisions
- Can keep, and use, all sorts of metadata later in the program
- Can output the collation result back to TEI XML. More on that later.
[cmp with next slide]
[cmp with prev slide]
But something even better. What if I want to look at texts in languages other than Armenian? [Shocking.] Languages whose words aren’t divided by whitespace?
- TEI tells me what’s a word, so that I don’t have to assume whitespace split
- Now the programs I write can be language-independent
So it's a nice solution, but at the same time it's a new problem.
Who wants to write and edit XML by hand?
[TRANS] One of the first parts of this I wrote was to save myself the trouble of typing transcriptions like this, while recording all of the information that I want to keep in my TEI files. [TRANS] The thing I came up with is a very good example of when you don’t want computer scientists like me getting involved in history.
So instead of
Who wants to write and edit XML by hand?
[TRANS] One of the first parts of this I wrote was to save myself the trouble of typing transcriptions like this, while recording all of the information that I want to keep in my TEI files. [TRANS] The thing I came up with is a very good example of when you don’t want computer scientists like me getting involved in history.
So instead of
Who wants to write and edit XML by hand?
[TRANS] One of the first parts of this I wrote was to save myself the trouble of typing transcriptions like this, while recording all of the information that I want to keep in my TEI files. [TRANS] The thing I came up with is a very good example of when you don’t want computer scientists like me getting involved in history.
So instead of
The trouble is that I find myself needing a long irritating snippet of XML like this for a single little scribal correction. (I know you can’t read it, but [explain the pieces])
Now since I think like a computer programmer, and since I have become reasonably familiar with the bits of TEI I need, I just transformed it all into code whose main advantage is that it’s quick to type. If you’re me.
Yes I do know how to type that weird little plus-minus character.
It also helps my transcription in that the file I end up with
is a better visual match to the manuscript I'm transcribing than either the plain-text or the TEI XML versions. [GO TO CLOSEUP]
This is the example I showed before—a relatively complicated transcription records the fact that the copyist originally wrote “Hayoc’” (which means Armenians), and corrected it to “Hrromoc’” (which means Romans) by writing over the original word.
This shows a relatively simple ways to indicate that the copyist abbreviated words, and supply an expansion. You want both to record which letters are actually there, and what you interpret the word to be. When I have finished transcribing, I pass the file to my computer program, and it writes out the XML for me.
It’s a horrible solution actually. But there is nothing better, and this is one of the main problems with XML in the humanities. Was at a conf all about TEI in November, and usability is a big issue. I want us to think about this later. Also, even with all the computational aids we can throw at it, transcription is tedious and time-consuming. That means that, until we live far enough in the future for OCR to work like a dream, if you ever find yourself having to transcribe a text, you should do your bit to do a good enough job that no one else will ever have to repeat the work.
So now I have some pretty well-formatted XML files, and I want to start the collation process. This means I need to extract the words.
When I gave this talk at YAPC, I launched into a rant here about how much I hate XML, and admitted that my “processing” was really nothing more than “yanking the words back out into plaintext and parsing it that way.” But it turns out that, as irritating as XML is to parse, and as badly documented as XML::LibXML is, and as over-engineered a solution as it is, and as much as I just generally dislike it, the XML format is too useful to ignore, and I had to fix up the core of what I wrote over the summer to be able to handle the new XML information.
[READ QUOTE] This observation was made by a man called Peter Robinson, who walked this path long before I did. He wrote a program too. It works pretty well, I'm told. Unfortunately it only works on Mac OS 9, and it doesn’t support non-Western languages very well. Time to reinvent the wheel. Only this time, I have Unicode, and I have Perl.
[people still do this without unicode. seriously guys.]
So this is what my collator needs to do. It should [B] align words with each other, [B] across many manuscripts, [B] even in the case of words such as աշխարհն (the land) and յաշխարհին (in the land) which are similar but not quite the same.
So this is what my collator needs to do. It should [B] align words with each other, [B] across many manuscripts, [B] even in the case of words such as աշխարհն (the land) and յաշխարհին (in the land) which are similar but not quite the same.
So this is what my collator needs to do. It should [B] align words with each other, [B] across many manuscripts, [B] even in the case of words such as աշխարհն (the land) and յաշխարհին (in the land) which are similar but not quite the same.
So it has to do something like this. Remember back at the beginning I said people often do their collation in a spreadsheet? Here I am getting the computer to do my collation for me in what you can then pretend is a spreadsheet.
But wait! Now I have the word alignment, I can spit it all back out into XML...[NEXT]
...back out into TEI format. This uses a TEI module specifically for “text criticism”. They really did think of everything.
[describe the word, the reading, etc.]
And then when I (the editor) do my job, I will mark one of these as a lemma:
which represents my decision about which word should go into the text.
- trivial from this to generate an apparatus like we saw before:
And if I decide I made a dumb mistake, and that the text should really start with յայսմ ամի (in this year) rather than այս ամենայն... (all this)
I can fire up my editing program [NEXT]
and change the lemma: [NEXT]
and the new generated text would do the right thing.
Of course, someday this will not look so much like it was written for a computer scientist. Bear with me.
But the idea is something like this. When the words have been aligned, the program will join together as long a chain as possible of different sets of words, and ask me which one is best.
- Will provide a way to mark orthographic variants & misspellings
- Computer will remember the things I’ve marked, and not ask me again
- Eventually it will only need to ask me about variants that require a judgment call
And that is the essence of digital techniques. Get the computer to handle anything that isn’t a judgment call.
So now we can come back to the problem of a stemma. Part of the process of making a critical edition is to figure out, as best you can, the manuscript stemma—the family tree. It turns out that I can use the collation results I’ve just produced to help me with this.
Manuscripts aren’t living organisms, BUT
- have ancestors - the manuscripts from which they were copied;
- have descendants - the manuscripts that were copied from them.
- Sometimes have more than one ancestor. That’s called contamination. Let’s not think about that for now.
- [TRANS] comparison to living things out to be helpful
- while we medievalists have been slaving away with parchment and inkpots in libraries, biologists have come up with
[TRANS] a statistical method called \"maximum parsimony\"
- MP takes a bunch of genetic data (DNA), encoded as characters (those familiar letters) and produces the evolutionary tree that requires the least number of changes to get back to a common ancestor
- [TRANS] Have “organisms” (mss); need family tree
- Mss don’t have DNA, but [TRANS] they have words.
- I have collation program that matches like words together. Pretend each occurrence of a similar word is a DNA base; feed to biologists’ statistical package. Voilà.
Manuscripts aren’t living organisms, BUT
- have ancestors - the manuscripts from which they were copied;
- have descendants - the manuscripts that were copied from them.
- Sometimes have more than one ancestor. That’s called contamination. Let’s not think about that for now.
- [TRANS] comparison to living things out to be helpful
- while we medievalists have been slaving away with parchment and inkpots in libraries, biologists have come up with
[TRANS] a statistical method called \"maximum parsimony\"
- MP takes a bunch of genetic data (DNA), encoded as characters (those familiar letters) and produces the evolutionary tree that requires the least number of changes to get back to a common ancestor
- [TRANS] Have “organisms” (mss); need family tree
- Mss don’t have DNA, but [TRANS] they have words.
- I have collation program that matches like words together. Pretend each occurrence of a similar word is a DNA base; feed to biologists’ statistical package. Voilà.
Manuscripts aren’t living organisms, BUT
- have ancestors - the manuscripts from which they were copied;
- have descendants - the manuscripts that were copied from them.
- Sometimes have more than one ancestor. That’s called contamination. Let’s not think about that for now.
- [TRANS] comparison to living things out to be helpful
- while we medievalists have been slaving away with parchment and inkpots in libraries, biologists have come up with
[TRANS] a statistical method called \"maximum parsimony\"
- MP takes a bunch of genetic data (DNA), encoded as characters (those familiar letters) and produces the evolutionary tree that requires the least number of changes to get back to a common ancestor
- [TRANS] Have “organisms” (mss); need family tree
- Mss don’t have DNA, but [TRANS] they have words.
- I have collation program that matches like words together. Pretend each occurrence of a similar word is a DNA base; feed to biologists’ statistical package. Voilà.
Manuscripts aren’t living organisms, BUT
- have ancestors - the manuscripts from which they were copied;
- have descendants - the manuscripts that were copied from them.
- Sometimes have more than one ancestor. That’s called contamination. Let’s not think about that for now.
- [TRANS] comparison to living things out to be helpful
- while we medievalists have been slaving away with parchment and inkpots in libraries, biologists have come up with
[TRANS] a statistical method called \"maximum parsimony\"
- MP takes a bunch of genetic data (DNA), encoded as characters (those familiar letters) and produces the evolutionary tree that requires the least number of changes to get back to a common ancestor
- [TRANS] Have “organisms” (mss); need family tree
- Mss don’t have DNA, but [TRANS] they have words.
- I have collation program that matches like words together. Pretend each occurrence of a similar word is a DNA base; feed to biologists’ statistical package. Voilà.
Manuscripts aren’t living organisms, BUT
- have ancestors - the manuscripts from which they were copied;
- have descendants - the manuscripts that were copied from them.
- Sometimes have more than one ancestor. That’s called contamination. Let’s not think about that for now.
- [TRANS] comparison to living things out to be helpful
- while we medievalists have been slaving away with parchment and inkpots in libraries, biologists have come up with
[TRANS] a statistical method called \"maximum parsimony\"
- MP takes a bunch of genetic data (DNA), encoded as characters (those familiar letters) and produces the evolutionary tree that requires the least number of changes to get back to a common ancestor
- [TRANS] Have “organisms” (mss); need family tree
- Mss don’t have DNA, but [TRANS] they have words.
- I have collation program that matches like words together. Pretend each occurrence of a similar word is a DNA base; feed to biologists’ statistical package. Voilà.
So here I have the words lined up with each other and can tell which ones are similar and which are different. All I have to do is pretend it's DNA, and assign a letter to each variant
and suddenly I have a dataset that I can feed to a statistical analysis program.
- Result from the mss I’ve so far transcribed looks like this. The blue bit is the “Vienna” group of truncated mss; the rest can be thought of as the “Venice” group, though it turns out that a few members of that group have a lot more in common with each other than with any of the others.
- Still requires editorial interpretation,
- no accounting for relative dates of mss
- no accounting for possibility of “living ancestor”
- But if I use the knowledge I have about the mss to orient the tree and collapse some nodes...
I end up with [SLIDE: new stemma] this.
Part of it, [TRANS] here, is our Vienna set of truncated mss
- The remainder of the tree not a very coherent group. The Venice group is pretty much “everything else”.
- One manuscript in middle has 2 arrows into it; it was copied from more than one ms. That is contamination. I had to run 2 comparisons on 2 different chunks of text to find that.
- Plausible picture of transmission history
- Confirms the impression of lots of lost copies -- almost none of these were copied from each other
[discourse here on Lachmann if there is time]
- came up with original manual method of stemma analysis
- his methods are superseded by this genetic analysis
- but he’d be really excited by this because he always believed it was possible to rigorously derive the best edition
- his dream won’t ever really work, but we are pushing it as close as we can to machine-generated editions
I end up with [SLIDE: new stemma] this.
Part of it, [TRANS] here, is our Vienna set of truncated mss
- The remainder of the tree not a very coherent group. The Venice group is pretty much “everything else”.
- One manuscript in middle has 2 arrows into it; it was copied from more than one ms. That is contamination. I had to run 2 comparisons on 2 different chunks of text to find that.
- Plausible picture of transmission history
- Confirms the impression of lots of lost copies -- almost none of these were copied from each other
[discourse here on Lachmann if there is time]
- came up with original manual method of stemma analysis
- his methods are superseded by this genetic analysis
- but he’d be really excited by this because he always believed it was possible to rigorously derive the best edition
- his dream won’t ever really work, but we are pushing it as close as we can to machine-generated editions
I end up with [SLIDE: new stemma] this.
Part of it, [TRANS] here, is our Vienna set of truncated mss
- The remainder of the tree not a very coherent group. The Venice group is pretty much “everything else”.
- One manuscript in middle has 2 arrows into it; it was copied from more than one ms. That is contamination. I had to run 2 comparisons on 2 different chunks of text to find that.
- Plausible picture of transmission history
- Confirms the impression of lots of lost copies -- almost none of these were copied from each other
[discourse here on Lachmann if there is time]
- came up with original manual method of stemma analysis
- his methods are superseded by this genetic analysis
- but he’d be really excited by this because he always believed it was possible to rigorously derive the best edition
- his dream won’t ever really work, but we are pushing it as close as we can to machine-generated editions
So this is it! I have a critical edition, and I have a pretty new stemma of all of the manuscripts. I’m ready to rush to publication.
This is another thing that XML makes ridiculously easy. The whole point of XML is that it can easily be transformed into Web pages, or into Word documents, or into book publishing format, or whatever else you need. I don’t ever actually have to fight with a spell checker.
Let’s come back to that whole “online” thing. XML can also be turned into HTML
- Gives features like [TRANS] searchable text.
- All well & good about mss digitisation and online access, but isn’t it better if you can find what you’re looking for?
[TRANS] frequent updates / corrections,
[TRANS] configurable display of variants (what is “significant”? who’s asking?),
[TRANS] links to original MS images for people that *really* care
This is a big thing that’s getting a lot of attention in the digital humanities right now
- really exciting world
- lots of people doing lots of interesting thing so I don’t have to
[COPYRIGHT??]
Let’s come back to that whole “online” thing. XML can also be turned into HTML
- Gives features like [TRANS] searchable text.
- All well & good about mss digitisation and online access, but isn’t it better if you can find what you’re looking for?
[TRANS] frequent updates / corrections,
[TRANS] configurable display of variants (what is “significant”? who’s asking?),
[TRANS] links to original MS images for people that *really* care
This is a big thing that’s getting a lot of attention in the digital humanities right now
- really exciting world
- lots of people doing lots of interesting thing so I don’t have to
[COPYRIGHT??]
Let’s come back to that whole “online” thing. XML can also be turned into HTML
- Gives features like [TRANS] searchable text.
- All well & good about mss digitisation and online access, but isn’t it better if you can find what you’re looking for?
[TRANS] frequent updates / corrections,
[TRANS] configurable display of variants (what is “significant”? who’s asking?),
[TRANS] links to original MS images for people that *really* care
This is a big thing that’s getting a lot of attention in the digital humanities right now
- really exciting world
- lots of people doing lots of interesting thing so I don’t have to
[COPYRIGHT??]
Let’s come back to that whole “online” thing. XML can also be turned into HTML
- Gives features like [TRANS] searchable text.
- All well & good about mss digitisation and online access, but isn’t it better if you can find what you’re looking for?
[TRANS] frequent updates / corrections,
[TRANS] configurable display of variants (what is “significant”? who’s asking?),
[TRANS] links to original MS images for people that *really* care
This is a big thing that’s getting a lot of attention in the digital humanities right now
- really exciting world
- lots of people doing lots of interesting thing so I don’t have to
[COPYRIGHT??]
Let’s come back to that whole “online” thing. XML can also be turned into HTML
- Gives features like [TRANS] searchable text.
- All well & good about mss digitisation and online access, but isn’t it better if you can find what you’re looking for?
[TRANS] frequent updates / corrections,
[TRANS] configurable display of variants (what is “significant”? who’s asking?),
[TRANS] links to original MS images for people that *really* care
This is a big thing that’s getting a lot of attention in the digital humanities right now
- really exciting world
- lots of people doing lots of interesting thing so I don’t have to
[COPYRIGHT??]
Let’s come back to that whole “online” thing. XML can also be turned into HTML
- Gives features like [TRANS] searchable text.
- All well & good about mss digitisation and online access, but isn’t it better if you can find what you’re looking for?
[TRANS] frequent updates / corrections,
[TRANS] configurable display of variants (what is “significant”? who’s asking?),
[TRANS] links to original MS images for people that *really* care
This is a big thing that’s getting a lot of attention in the digital humanities right now
- really exciting world
- lots of people doing lots of interesting thing so I don’t have to
[COPYRIGHT??]