SlideShare uma empresa Scribd logo
1 de 12
Mining and mapping places with multiple names
James Butler & Christopher Donaldson
Lancaster University
1901
Corpus of Lake District
Literature
1688 1789 1837
• 80 texts, comprising more than
1,500,000 words
• Mixture of canonical and non-
canonical literature about the Lake
District, mainly from c18 and c19
(78 out of 80 works)
• Mixture of genres, including
guidebooks, travelogues, novels,
poems, journals, and private letters
34 Texts
650K words
22 Texts
250K words
22 Texts
613K words
Sample sentence collocation: beautiful
‘Again entering the boat, we passed up the channel between Lord’s
Island the shore, from whence beautiful prospects are obtained of the
majestic form of Skiddaw, with the woods of Castlehead and
Cockshot Park in the foreground.’ (Edward Baines, A Companion to the
Lakes [1829] 121.)
±5 tokens: No place-names identified
±10 tokens: 2 place-names identified – Lord’s Island & Skiddaw
Within sentence: 4 place-names identified – Lord’s Island, Skiddaw, Castlehead &
Cockshot Park.
Average sentence length
Lake District corpus = 29.8 words
British National Corpus (BNC) = 16 words
from C. Grover, et al., ‘Use of the Edinburgh Geoparser for Georeferencing Digitized
Historical Collections’, Phil. Trans. R. Soc. A 368 (2010) 3875–89.
Diagram of the Edinburgh Geoparser System
Example of input/output from the Edinburgh Geoparser
System
Geo-referenced Data from the Edinburgh Geoparser
Geo-referenced Data, Corrected
Bowness: ‘the curved headland’, from ON bogi/OE boga ‘bow’ and ON nes/OE naess
‘headland’
*Variant Historical Spellings: Bownus, Bawnas, Bonas, Bonus, Boulness
cf. D. Whaley, A Dictionary of Lake District Place Names
(Nottingham: English Place-Name Society, 2006), 42.
Some of the common generic gazetteer geo-referenced issues…
Spatial misattribution.
Onomastic misassumption
Incorrect weighting
Just for the items that are found!
An extract of our custom manually-collected gazetteer for the corpus
Unique
ID
Topog.
Cat.
Primary Name Secondary Names Regional
Placement
CONISTON (lake):
Thurstan, Coniston Lake, Coniston Water, Thurston, Conistone, Conistone
Lake, Cunnistone Lake, Thurston Lake, Coniston Mere, Lake of Coniston,
Conis- ton, Conyngs Tun, Conyngeston, Thorstane's watter, Turstinus.
Geospatial categories chosen for flexibility and degree of universal referential
specificity
An extract from the latest iteration of the corpus - allowing referential
relationships to be analysed on a whole new level.
Lake, Vale, Specific - Farm, Waterfall

Mais conteúdo relacionado

Destaque

Alma_Implementation_slides_May06_2016
Alma_Implementation_slides_May06_2016Alma_Implementation_slides_May06_2016
Alma_Implementation_slides_May06_2016
mahongzn
 
Alma Day Presentations - Lancaster University 2013-06-03
Alma Day Presentations - Lancaster University 2013-06-03Alma Day Presentations - Lancaster University 2013-06-03
Alma Day Presentations - Lancaster University 2013-06-03
Lancaster University Library
 
Social Networking with SEDA
Social Networking with SEDASocial Networking with SEDA
Social Networking with SEDA
Sue Beckingham
 
Session 5 keeping up to date
Session 5   keeping up to dateSession 5   keeping up to date
Session 5 keeping up to date
RLS-Johnrylands
 

Destaque (15)

Measuring research impact with bibliometrics
Measuring research impact with bibliometricsMeasuring research impact with bibliometrics
Measuring research impact with bibliometrics
 
2013 pod travel fellowship announcement final
2013 pod travel fellowship announcement final2013 pod travel fellowship announcement final
2013 pod travel fellowship announcement final
 
Alma_Implementation_slides_May06_2016
Alma_Implementation_slides_May06_2016Alma_Implementation_slides_May06_2016
Alma_Implementation_slides_May06_2016
 
Alma Day Presentations - Lancaster University 2013-06-03
Alma Day Presentations - Lancaster University 2013-06-03Alma Day Presentations - Lancaster University 2013-06-03
Alma Day Presentations - Lancaster University 2013-06-03
 
Newcastle University Library - Pop-up Library
Newcastle University Library - Pop-up LibraryNewcastle University Library - Pop-up Library
Newcastle University Library - Pop-up Library
 
Sparc-Japan-Slow-revolution-in-scholarly-communication
Sparc-Japan-Slow-revolution-in-scholarly-communicationSparc-Japan-Slow-revolution-in-scholarly-communication
Sparc-Japan-Slow-revolution-in-scholarly-communication
 
Social Networking with SEDA
Social Networking with SEDASocial Networking with SEDA
Social Networking with SEDA
 
Public Engagement in the Digital Age and Handling Web Negativity
Public Engagement in the Digital Age and Handling Web NegativityPublic Engagement in the Digital Age and Handling Web Negativity
Public Engagement in the Digital Age and Handling Web Negativity
 
Science & Community Public Engagement Workshop
Science & Community Public Engagement WorkshopScience & Community Public Engagement Workshop
Science & Community Public Engagement Workshop
 
The value of engagement
The value of engagementThe value of engagement
The value of engagement
 
Session 5 keeping up to date
Session 5   keeping up to dateSession 5   keeping up to date
Session 5 keeping up to date
 
The good, the efficient and the open: changing research workflows and the nee...
The good, the efficient and the open: changing research workflows and the nee...The good, the efficient and the open: changing research workflows and the nee...
The good, the efficient and the open: changing research workflows and the nee...
 
M25 2016 Conference Presentation
M25 2016 Conference PresentationM25 2016 Conference Presentation
M25 2016 Conference Presentation
 
Alma Live at Imperial College London
Alma Live at Imperial College LondonAlma Live at Imperial College London
Alma Live at Imperial College London
 
Different Media for communicating Science to different groups
Different Media for communicating Science to different groupsDifferent Media for communicating Science to different groups
Different Media for communicating Science to different groups
 

Semelhante a Mining and mapping places with multiple names

Semelhante a Mining and mapping places with multiple names (6)

Varvitos
VarvitosVarvitos
Varvitos
 
Health_of_the_Casperkill
Health_of_the_CasperkillHealth_of_the_Casperkill
Health_of_the_Casperkill
 
GLM-Long
GLM-LongGLM-Long
GLM-Long
 
шотландия
шотландияшотландия
шотландия
 
601 l9-dicts+quizrev s10[1]
601 l9-dicts+quizrev s10[1]601 l9-dicts+quizrev s10[1]
601 l9-dicts+quizrev s10[1]
 
Lecture6 radiometricdating
Lecture6 radiometricdatingLecture6 radiometricdating
Lecture6 radiometricdating
 

Mais de Lancaster University Library

Mais de Lancaster University Library (20)

Open Research exercise using Mission Model Canvas
Open Research exercise using Mission Model CanvasOpen Research exercise using Mission Model Canvas
Open Research exercise using Mission Model Canvas
 
Promoting a culture of Open Research at Lancaster University
Promoting a culture of Open Research at Lancaster UniversityPromoting a culture of Open Research at Lancaster University
Promoting a culture of Open Research at Lancaster University
 
PSC2019 - Community Building: How Does It Work?
PSC2019 - Community Building: How Does It Work?PSC2019 - Community Building: How Does It Work?
PSC2019 - Community Building: How Does It Work?
 
"We're in the land of poo" - Fertilising your work with knowledge from the field
"We're in the land of poo" - Fertilising your work with knowledge from the field"We're in the land of poo" - Fertilising your work with knowledge from the field
"We're in the land of poo" - Fertilising your work with knowledge from the field
 
Working with police recorded data
Working with police recorded dataWorking with police recorded data
Working with police recorded data
 
Navigating NHS Administrative Data
Navigating NHS Administrative DataNavigating NHS Administrative Data
Navigating NHS Administrative Data
 
Lancaster 2018-open data
Lancaster 2018-open dataLancaster 2018-open data
Lancaster 2018-open data
 
Data bites
Data bitesData bites
Data bites
 
Documenting Flood Experience
Documenting Flood ExperienceDocumenting Flood Experience
Documenting Flood Experience
 
Stephen Robinson containers for software preservation
Stephen Robinson containers for software preservationStephen Robinson containers for software preservation
Stephen Robinson containers for software preservation
 
Kris Geyer retrieving psychological relevant data from smartphones
Kris Geyer retrieving psychological relevant data from smartphonesKris Geyer retrieving psychological relevant data from smartphones
Kris Geyer retrieving psychological relevant data from smartphones
 
20171003 lancaster data conversations Chue-Hong
20171003 lancaster data conversations Chue-Hong20171003 lancaster data conversations Chue-Hong
20171003 lancaster data conversations Chue-Hong
 
Andrew Moore past-present-potential
Andrew Moore past-present-potentialAndrew Moore past-present-potential
Andrew Moore past-present-potential
 
Barry Rowlingson CHICAS use of git lab
Barry Rowlingson CHICAS use of git labBarry Rowlingson CHICAS use of git lab
Barry Rowlingson CHICAS use of git lab
 
The sensor cloud around us
The sensor cloud around usThe sensor cloud around us
The sensor cloud around us
 
Running Research as a Service. Implications for Privacy Policies and Ethics
Running Research as a Service. Implications for Privacy Policies and EthicsRunning Research as a Service. Implications for Privacy Policies and Ethics
Running Research as a Service. Implications for Privacy Policies and Ethics
 
Security overview at Lancaster University
Security overview at Lancaster UniversitySecurity overview at Lancaster University
Security overview at Lancaster University
 
Better data for better justice - Towards data-driven analyses of Family Court...
Better data for better justice - Towards data-driven analyses of Family Court...Better data for better justice - Towards data-driven analyses of Family Court...
Better data for better justice - Towards data-driven analyses of Family Court...
 
Cloud computing - When is Deletion Deletion?
Cloud computing - When is Deletion Deletion?Cloud computing - When is Deletion Deletion?
Cloud computing - When is Deletion Deletion?
 
Sharing Qualitative Data - Challenges and Opportunities
Sharing Qualitative Data - Challenges and OpportunitiesSharing Qualitative Data - Challenges and Opportunities
Sharing Qualitative Data - Challenges and Opportunities
 

Último

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 

Último (20)

BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 

Mining and mapping places with multiple names

  • 1. Mining and mapping places with multiple names James Butler & Christopher Donaldson Lancaster University
  • 2. 1901 Corpus of Lake District Literature 1688 1789 1837 • 80 texts, comprising more than 1,500,000 words • Mixture of canonical and non- canonical literature about the Lake District, mainly from c18 and c19 (78 out of 80 works) • Mixture of genres, including guidebooks, travelogues, novels, poems, journals, and private letters 34 Texts 650K words 22 Texts 250K words 22 Texts 613K words
  • 3. Sample sentence collocation: beautiful ‘Again entering the boat, we passed up the channel between Lord’s Island the shore, from whence beautiful prospects are obtained of the majestic form of Skiddaw, with the woods of Castlehead and Cockshot Park in the foreground.’ (Edward Baines, A Companion to the Lakes [1829] 121.) ±5 tokens: No place-names identified ±10 tokens: 2 place-names identified – Lord’s Island & Skiddaw Within sentence: 4 place-names identified – Lord’s Island, Skiddaw, Castlehead & Cockshot Park. Average sentence length Lake District corpus = 29.8 words British National Corpus (BNC) = 16 words
  • 4. from C. Grover, et al., ‘Use of the Edinburgh Geoparser for Georeferencing Digitized Historical Collections’, Phil. Trans. R. Soc. A 368 (2010) 3875–89. Diagram of the Edinburgh Geoparser System
  • 5. Example of input/output from the Edinburgh Geoparser System
  • 6. Geo-referenced Data from the Edinburgh Geoparser
  • 8. Bowness: ‘the curved headland’, from ON bogi/OE boga ‘bow’ and ON nes/OE naess ‘headland’ *Variant Historical Spellings: Bownus, Bawnas, Bonas, Bonus, Boulness cf. D. Whaley, A Dictionary of Lake District Place Names (Nottingham: English Place-Name Society, 2006), 42.
  • 9. Some of the common generic gazetteer geo-referenced issues… Spatial misattribution. Onomastic misassumption Incorrect weighting Just for the items that are found!
  • 10. An extract of our custom manually-collected gazetteer for the corpus Unique ID Topog. Cat. Primary Name Secondary Names Regional Placement CONISTON (lake): Thurstan, Coniston Lake, Coniston Water, Thurston, Conistone, Conistone Lake, Cunnistone Lake, Thurston Lake, Coniston Mere, Lake of Coniston, Conis- ton, Conyngs Tun, Conyngeston, Thorstane's watter, Turstinus.
  • 11. Geospatial categories chosen for flexibility and degree of universal referential specificity
  • 12. An extract from the latest iteration of the corpus - allowing referential relationships to be analysed on a whole new level. Lake, Vale, Specific - Farm, Waterfall

Notas do Editor

  1. Overview of corpus…
  2. Our interest in finding what attributes are given to places mentioned…
  3. The Edinburgh Geoparser: NLP tool on which we’ve relied
  4. What the Geoparser do…
  5. The Geoparser output a bit ropey…
  6. Much correction required..
  7. One of the chief reasons for the poor performance of the geoparser is place-name variation…
  8. Geospatial relationships between environmental types as well as connective strengths between any paired locations.