1. Using texts to explore historical texts:
Examples from Lake District literature and the
Registrar General’s Reports
Ian Gregory
Lancaster University
Acknowledgements:
Alistair Baron, Patricia Murrieta-Flores, Andrew Hardie , and Paul Rayson (Lancaster)
Claire Grover (Edinburgh) – providing access to the geo-reference Histpop data
Richard Deswarte – help with the HistPop data
6. Literary Mapping of the Lakes
• British Academy funded pilot project
with David Cooper and Sally Bushell
• Two tours of the Lake District
– Thomas Gray, 1769 (9,000 words)
• Proto-Picturesque
– ST Coleridge, 1802 (10,000 words)
• Romantic
• Aims:
– Can we create a GIS of text?
– What can it offer to literary research?
• Method:
– Texts typed up by hand
– Places tagged manually
– Conversion
– Analysis
7. Place names coded in XML
<p in_text="Y">On Sunday Augt. 1st - half after 12 I had a Shirt, cravat, 2 pair of
Stockings, a little paper & half a dozen Pens, a German Book (Voss's Poems)
& a little Tea & Sugar, with my Night Cap, packed up in my natty green oil-
skin, neatly squared, and put into my <format format_type="I">net</format>
Knapsack / and the Knap-sack on my back & the Besom stick in my hand, which
for want of a better, and in spite of <person>Mrs C.</person> &
<person>Mary</person>, who both raised their voices against it, especially as I left
the Besom scattered on the Kitchen Floor, off I sallied - over the
Bridge<my_comment><pl_name visited="Y">Greta Bridge,
Keswick</pl_name></my_comment>, thro' the Hop-Field, thro' the <pl_name
visited="Y">Prospect Bridge</pl_name> at <pl_name
visited="Y">Portinscale</pl_name>, so on by the tall Birch that grows out of the
center of the huge Oak, along into <pl_name visited="Y">Newlands</pl_name>--
<pl_name visited="Y">Newlands</pl_name>is indeed a lovely Place-the houses…
8. Convert to a GIS
OS 1:50,000 gazetteer – all places on 1:50,000 maps
• Accuracy
• Spelling problems
• Disambiguation
14. Physical Characteristics of Tours
70 700
60 600
50
% of mentions
500
Pop Density
40 400
30 300
Gray
20 200
10 100
0 0
0 to 99 100 to 200 to 300 to 400 to 500 to 600 to 700 to 800+ STC Not visited STC Visited Grey Not visited Grey Visited
199 299 399 499 599 699 799
70 Height
60
Normal
Visited Didn't visit/Unclear
50
% of mentions
1000
40
Coleridge
30
Pop. Density
100
20
10
0 10
0 to 99 100 to 200 to 300 to 400 to 500 to 600 to 700 to 800+
199 299 399 499 599 699 799
Height 1
STC Not visited STC Visited Grey Not visited Grey Visited
Visited Didn't visit/Unclear
Logged
Altitude of mentions Population density
15. Close Reading with Internet Mapping
http://www.lancs.ac.uk/mappingthelakes
http://www.lancs.ac.uk/mappingthelakes/v2
16. The Histpop Collection
• Covers the printed reports published in the Census
and the Registrar General’s Annual Reports, 1801-
1937
• Nearly 13,000,000 words
• Georeferenced by C. Grover (University of Edinburgh)
• Just concerned with the Registrar General’s Reports,
1851-1911
• Total: 3,750,000 words
• England & Wales: 2,000,000 words
• http://www.histpop.org
18. Place-name instances, 1850s
Density Smoothing Cluster identification:
Standard deviations
www.histpop.org of density
19. Extract place-names
Word Cnt Kernel Density Cnt
Frequency Density
North Shields 300 Bermondsey .5849 6
London 294 Newington .5842 4
Durham 207 Spitalfields .5835 1
Nottingham 193 Whitechapel .5835 1
Liverpool 171 Stepney .5823 2
Hawarden 145 Rotherhithe .5809 5
Grantham 131 London .5803 294
Cardington 125 Shoreditch .5794 1
Linslade 121 Bethnal Green .5788 4
Wakefield 121 Camberwell .5787 12
58th: Southwick .3498 1
(nr Sunderland)
20. Collocation
• “In Southwick and Monkwearmouth offensive nuisances
abound.”
• “At Royton, in Oldham, where the drainage was imperfect,
typhoid fever was prevalent”
• “The deaths in the Liverpool workhouse, in the Mount
Pleasant sub-district of Liverpool, were above 100 more than
in the same period of the two previous years, owing chiefly to
an epidemic of measles among children of German emigrants
temporarily located in this institution; there were also 101
deaths from typhus, nearly all of which occurred in the
workhouse.”
22. Most common words in clusters
• Uses Mutual Information scores – top 10 for each cluster, excluding place-names, numbers,
and punctuation
• 1 (North-East): Fog, took [changes in rainfall or temperature took place], largest [changes in
weather], least [as largest], dense [weather related], greatest [weather], observatory, Asiatic
[cholera], Halos [lunar or solar], thunder. WEATHER
• 2 (Wakefield): Falls, rain, seen [meteorological phenomena or “swallows”], reading, fell [snow
or rain], number [met. readings], June, March. WEATHER
• 3 (South Lancs): declining [marriages, births or mortality], incorporated [boundary changes],
noted [health or weather], cubic [cubic feet – earth movement for sanitation], workhouse,
sail [Irish emigrants sailing from Liverpool], observatory, aurora, salutary [salutary effects
that led to death], took [weather]. MIXED
• 4 (Oxon to Beds): cuckoo [was first heard], infirmary, Regius Professor, intermittent
[intermittent fevers], sleet, solar, halos, least [rainfall or temperature], heard [thunder],
thunder - WEATHER
• 5 (London): changed [changed water supply], anemometer, exclusively [supplied by one
water company], hospital, command [front matter], Junction [Grand Junction Water
Company], Company [almost always water company], pipes, Bills [Bills of Mortality], asylum,
sewage – WATER SUPPLY
26. Comparing texts with statistics
40
%
30
20
10 Mentions of measles
Districts
0 Population
1 2 3 4 5 6 7 8
Urban Level
% national Sample areas
pop (1911)
1 9.4 Stow on the Wold (Glou), Whitchurch (Hants.), Hexham (N’humb), Oakham (Rutland), Northallerton (N.Rid.), Holbeach (Lincs)
2 13.0 Cockermouth (Cumb), Chippenham (Wilts), Bridport (Dorset), Bangor (Carn), Alton (Hants), Pembroke (Pembs)
3 17.8 Guildford (Surrey), Redruth (Corn), York (E.Rid), Bucklow (Chesh), Chorley (Lancs), Maidstone (Kent)
4 18.7 Swansea, Canterbury, Hastings, Rochdale, Bolton, Wolverhampton
5 18.0 Sheffield, Leeds, Oxford, Southampton, Coventry, Edmonton (Mdlsex)
6 11.9 Exeter, Hull, Nottingham, Portsmouth, Leicester, Salford (Lancs)
7 9.0 Most of London, also Manchester, Liverpool and Birmingham
8 2.1 Only London, mainly East End
27. Do mentions of “Diarrhoea, dysentery and cholera”
correlate with deaths from these diseases?
IMRchdidy Mchdiady
Kendall's tau_b IMRchdidy Correlation Coefficient 1.000 .225**
Sig. (1-tailed) .000
N 626 626
Mchdiady Correlation Coefficient ** 1.000
.225
Sig. (1-tailed) .000
N 626 626
Spearman's rho IMRchdidy Correlation Coefficient 1.000 .290**
Sig. (1-tailed) .000
N 626 626
Mchdiady Correlation Coefficient ** 1.000
.290
Sig. (1-tailed) .000
N 626 626
**. Correlation is significant at the 0.01 level (1-tailed).
28. Geographical Text Analysis
• Combination of Corpus Linguistics and GIS allows us to:
– 1. Geographical approach:
• Ask where is this corpus talking about?
• Identify place-names in areas that the corpus concentrates on.
• Find out what it is saying about these places
– 2. Theme of interest approach:
• Find out which places are associated with our theme
• Find out what it is saying in relation to this theme
• Find out what other themes are associated with these places
• Compare geography of place-name mentions with statistical evidence to
explore biases in sources