SlideShare uma empresa Scribd logo
1 de 51
Baixar para ler offline
1
A Service Perspective: Unlocking metadata to enhance
discoverability and context
Peter Burnhill
Director, EDINA National Data Centre,
University of Edinburgh, Scotland UK
Text Mining for Scholarly Communications and Repositories
Joint Workshop, Manchester, 28-29 October 2009
2
A Service Perspective: Unlocking metadata to enhance
discoverability and context
Time & Place For Text Mining
Peter Burnhill
Director, EDINA National Data Centre,
University of Edinburgh, Scotland UK
Text Mining for Scholarly Communications and Repositories
Joint Workshop, Manchester, 28-29 October 2009
3
Time & Place For Text Mining
1. The service imperative: online services for education
2. Show time: text mining applications
3. Some concluding remark, and an unveiling!
4
Time & Place For Text Mining
1. The service imperative: online services for education
– enhancing discoverability of online information objects
– providing context for the analysis of those data
Transforming projects into services for research
[& learning/teaching]
2. Show time: text mining applications
A. Geo-enabling via geo-parsing & geo-tagging
B. Geo-activating metadata; evoking the event
C. Enriching audio-visual material by event-tagging
3. Some concluding remark, and an unveiling!
Using text mining to Unlock space/time co-ordinates
For Words, numbers, pictures, sounds!
5
Service context: Edinburgh and beyond
Edinburgh University Data Library, set up in 1983 [pre-EDINA]
– 25 years as ‘an online library of research data’
Based in Centre for Application Software & Technology (PLU)
• databases and graphics; access to programmers
• Linking spatially-referenced datasets to geographic boundaries
• interest in hypertext, ‘actionable’ metadata & SGML (TEI)
Hosted at Edinburgh Regional Computing Centre
• Leveraging multi-access computing & networking infrastructure
Faculty & what we would now call ICT Services worked closely together
• GIS; Joint project activity; movement of staff back and forth
• I have history as a researcher and a senior lecturer
But initial focus of Data Library was on numbers
• encoded data interpretable for secondary analysis via codebook
(NB codebook for datasets as structured text)
6
Present National Service context: UK and beyond
EDINA, a national academic data centre
With stated mission...
to enhance the productivity of research, learning and teaching
in UK higher and further education
• Provides networked access to a range of online resources
• Focus is on service but also undertake R&D (projects services)
– deliver over 20 online services
– have about 10 major projects (including services in development)
– deploy about 80 staff, in two locations (Edinburgh & Warrington)
• These include (although not commented on here):
– Technical support for UK Access Management Federation
– Work to ensure continuity of access to digital content (esp. journal content)
UK funding councils
JISC Sub-CommitteesJISC Collections
acting as platform for network-level services
& helping to build the JISC Integrated Information Environment
research, learning & teaching in UK universities & colleges
Research
Councils UK
National Data Centres
8
Service context: UK and beyond
EDINA was designated in 1995/6, co-incident with
emergence of Web, with need to shift further beyond
encoded numbers into indexed words
New task was to host and provide user interfaces for leading
Abstract and Indexing databases for scholarly literature
• BIOSIS, Compendex, INSPEC, Art Abstracts, Periodical Contents,
Index to The Times, CABI, etc
But always an eye on developing as UK Geographic Data Unit
• UKBORDERS (boundaries and geo-reference database) for ESRC
• Digimap (Ordnance Survey mapping data) for JISC
Switch to ‘What EDINA Does Now’
• Community Report & Website
10
Reading & Reference Room: supporting scholarly communication
the Depot
international Open Access
facility to support self
deposit of peer-reviewed
papers
SUNCAT
UK serials union catalogue:
what’s held where
No longer host specialist Abstract & Index databases …
11
• Services: Digimap, UKBORDERS, AgCensus,
Go-Geo!,GeoCrossWalk
• Projects, such as
– EuroGeonames (funded under EU e-Content.Plus)
– SEE-GEO – A&A of geo (grid) web services
– COMPASS – semantic discovery of geo-services
– DIaD - Data Integration and Dissemination
• History of Service Hosting
• Community engagement and outreach e.g.
– Chair OGC Universities WG
– Sit on Scottish Government GI steering group
– Acting as contact point between OGC & e-Science GRID
communities
– Lead on INSPIRE Directive for UK HE
Map & Data Place: Geo-data Specialists
12
User tools
(clients)
Common application
servers
Common data(base)
servers
Common hardware
infrastructure
Ordnance Survey
digital map data
Historic map
images
Geological
map data
Marine map
data
Data from
another Supplier
Vector map
server
Raster map
server
Database
server
File based
data
server
Historic
map viewer
Digimap
Classic
end-user
desktop/browser
Digimap
Carto
Data Download
Client
Web Services
e.g. to 3rd party
Portal
Digimap - One Architecture, Many Services (2004+)
14
Resource Discovery using Go-Geo!
‘federated’ search portal using geography
• User defines the area of interest as well as subject of interest
• Go-Geo! does ‘federated’ search of 3rd party indexes & data
services, bringing back links to geo-referenced data sources
• promotes awareness of geospatial aspect of problems
– facilitates understanding
– provides access to geographically related resources
• Part of larger strategy
– to enhance geo-parsing (extract place names from documents)
– geo-tagging (ensure names have geo-feet)
– cross-walk across terminology (place/area/address/co-ordinates)
15
A portal (GoGeo!) needs a resolver ( GeoCrossWalk)
GeoCrossWalk
Server
Content
Provider A
Content
Provider B
Coordinate footprints
Content Provider C
Parish names
Place names
Portal service
Post code: L34 0HS
bridging the different vocabularies of geography:
‘Find resources for this postcode’ used to geo-reference survey data
Knowsley
340900,392300 - 347217, 397660
BX003
<
16
Sound & Pictures: access to new data sources
• 20th Century is the first fully audio-visual century
– With new forms of research material to use and to master
• EDINA as platform for downloadable film, video and audio
– Licensed for use in learning, teaching and research
– Wide range of subject coverage, including documentary film
1. Film & Sound Online
– 600 hours of film, digitised for downloading
2. NewsFilm Online
– 3000 hours of material from ITN & Reuters
– Over 4TBs of clips to download
• Plus Education Image Gallery of still photography
• Visual and Sound Materials Portal
• Discovering all sorts of audio-visual material
17
Part 2: Showcase for R&D
Three examples of using text mining to Unlock
space/time co-ordinates: presented here
as Exhibits A, B and C
A. Geo-enabling via geo-parsing & geo-tagging
B. Geo-activating metadata; evoking the event
C. Enriching audio-visual material by event-tagging
18
Geo-enabling via geo-parsing & geo-tagging
Part of overall strategic purpose
– to enhance discoverability of online resources
– to provide context for their analysis
• geo-parsing (to extract place names from documents)
• geo-tagging (to ensure names have geo-feet)
• cross-walking across terminology
Q: What’s special about the spatial?
A: (Geo-)referencing!
Exhibit A
19
Back to Basics: What is geospatial data?
• data with direct
reference to
position on surface
of the earth
• either latitude /
longitude
• or grid
coordinates
• or it is textual, eg
placename,
administrative
area, address … Spatial reference
Attributes
Simplest application is to parse&put place names on a map!
We use Statistical Accounts of Scotland as Reference Text Test-bed
22
Re-thinking Geo-parsing as Text-mining …
From a NPL pov: automatic processing of a text to
1. identify the strings that are place names
2. check each place name against an authority file (gazetteer)
3. select among multiple entries, choose the ‘correct’ one
QED: each place name in the text is linked to an entry in a gazetteer
which provides information about the place including lat/long
Text Analysis Strategy:
• Information extraction/text mining techniques
• Rule-based, not statistical
• Pipeline of linguistic analysis components, starting with simple
components and becoming progressively more complex
• Each component adds information in the form of XML mark-up
23
The Edinburgh Geoparser
• Began as part of the GeoCrossWalk project. Link made between EDINA and
Language Technology Group at UoE School of Informatics
• Testing place name recognition in Scotland-centric documents (Statistical
Accounts).
1. Use of GeoCrossWalk gazetteer as lexicon but no linking to gazetteer
2. Demonstrator for place name recognition and GeoCrossWalk gazetteer
access but no disambiguation
3. Demonstrator for place name recognition and GeoCrossWalk and GeoNames
gazetteer access plus disambiguation
• GeoDigRef project refined and configured the system for JISC-funded
projects of digitisation of key collections:
a) Histpop: 200,000 pages of census and registration material for the British Isles.
b) BOPCRIS:18th Century Official Parliamentary Publications Portal 1688 - 1834
c) British Library Sound Archive
d) The Stormont Papers
.txt
.html
.xml
Format
conversion
Tokenisation
POS
tagging
Lemmatis-
ation
Named
Entity
Recognition
.geotagged.xml
Text Analysis
Gazetteer
lookup
Resolution.geotagged.xml .gaz.xml
Geo-referencing
Edinburgh Geoparser: System Overview
25
Output from Gazetteer Lookup
• <placename> element containing a number of
candidate <place> elements from the gazetteer.
• These elements have as attributes:
– lat (latitude)
– long (longitude)
– gazref (two part id (gazetteer name and id returned by
gazetteer: geonames:12345)
– in-cc (ISO country code, where available)
– type (feature type)
26
Resolution Stage [a longer look at Q&A]
• Try to augment information about each candidate place.
– by consulting lists of large places: from geonames and Wikipedia.
– if place in the list with the same name and similar latitude and longitude
(within one degree), assume it's the right one.
• The information added is population and (containing) country
• Then apply a number of heuristics:
– Feature type: we prefer populated places to "facilities".
– Population: we prefer bigger places (for newspaper text,
we found that > 90% could be correctly identified using this alone).
– Contextual information: colocations indicating containment and proximity
("London, England", "Leith near Edinburgh").
– Parameters from the user: restricting the results to a certain region, or at
least strongly preferring that region.
– Clustering: places in a document are often close together - try minimising
the bounding box.
29
Geo-activated metadata; evoking the event
Purpose
• to extract and re-present information, including geo-spatial and
temporal referencing from archaeological site records
– A way of linking items and collections
– Beyond (but also enabling) simple mapping
* Improving discoverability and providing context
• This uses NLP techniques across different sources
– Find and categorise entities (NER); Find and label relations (RE)
– Combining in a unified format: an RDF graph
– Linking entities common across sources; discovering new connections
• Focus on the ‘has Event’ relation of an excavation
– Many (all?) cultural collection have (hidden) space/time co-ordinates
This is taken from work by Kate Byrne,
also illustrated in the poster on display as:
"Populating the Semantic Web with Relations from Text"
Exhibit B
An archaeological site record: a technical summary
Parse to ‘mark up’ archaeological site record (metadata)
34
Enriching audio-visual material via event-tagging
Purpose: to find ways to leverage contextual metadata for audio-
visual material
• initial focus on (digitised) 20th Century newsfilm footage
• Overcoming inherent problem of sparse metadata that inhibits
discoverability
– using ancillary information in the metadata
– evoking ‘has Event’ relation
– exploiting interoperability
– finding related text for mining and so auto-creating metadata
* to improve discoverability and provide/enhance context
This is a set of ideas for future work
between EDINA, UoE/LTG and NaCTeM
Exhibit C
Sparse Metadata
Very Short Clip
No Soundtrack
The only data we have:
•1st October 1995
•Cyprus
•Disturbance (street disturbance)
•British soldiers
•Broadcast on TV News
Related material in archives,
collections and web services
(open or authenticated) can
be compared
The Times Archive seems
to have several relevant
items as part of full page
images…
Two Cyprus stories in the Times Archive lead to a rich contextual artefact.
“Cyprus 1955” in Life
Magazine reveals a short
US perspective and a
pictorial context
Bringing in text and links to other resources allows a much richer view onto the
original news film and allows exploration of both the topic and the wider context.
41
Some Concluding Remarks
1) Michael Buckland spoke of two traditions in
Information Science …
42
document tradition & computation tradition
“considerable simplification, … helpful to think … of two traditions, or
mentalities, even cultures, co-exist in area of Information Science
1. “Approaches based on a concern with documents, with signifying
records: archives, bibliography, documentation, librarianship,
records management, and the like
2. “approaches based on uses for formal techniques, whether
mechanical (such as punch cards and data-processing
equipment) or mathematical (as in algorithmic procedures).”
Michael Buckland, UC Berkeley, 1998
http://people.ischool.berkeley.edu/~buckland/asis62.html
43
Re-learning basics of information retrieval
Combine document & computation traditions to
contribute to growth of digital library
• Using ancillary information
• to sort/filter etc in the ‘too many’ problem
• as leverage in the ‘too few’ problem
• exploiting the digital properties of data
• to add value, impose/extract meaning, do sums and visualise
• Exploiting interoperability and ‘server vs client’
44
Some Concluding Remarks
1. Buckland spoke of two traditions
2. What meaning is there locked away in numbers,
words, pictures and sounds?
45
“Numbers, Words, Pictures & Sounds …
… soon all will be digital and accessed from afar”
IASSIST Conference, 1990
a) Words as text are full of meaning, including key space/time
co-ordinates ripe for extracting and geo-enabling.
b) The meaning in Numbers is often encoded, but datasets
have codebooks and other associated documentation –
these texts could provide the key to unlock some co-
ordinates
c) The richness of meaning in audio-visual material, sounds &
pictures, is pre-theoretic,
• requiring more of us, including some lateral thinking, as
illustrated in Exhibit C.
• But then these digital and online resources can be discovered
and regarded as evidence
46
Some Concluding Remarks
1. Buckland spoke of two traditions
2. What meaning is there locked away in numbers,
words, pictures and sounds?
3. Now is the Time & Place For Text Mining
47
Some Concluding Remarks
1. Buckland spoke of two traditions
2. What meaning is there locked away in numbers,
words, pictures and sounds?
3. Now is the Time & Place For Text Mining
4. Time for an unveiling:
Unlock
Unlock
51
Time for me to stop
Hoping that I have left some space/place for questions
Thank you
Acknowledgements
to Claire Grover & Kate Byrne (School of Informatics)
to Jo Walsh (EDINA)
• And to all their colleagues who did the work
• And, of course, to JISC for funding, and to you as taxpayers …

Mais conteúdo relacionado

Mais procurados

The EU INSPIRE Directive and what it might mean for UK academia
The EU INSPIRE Directive and what it might mean for UK academiaThe EU INSPIRE Directive and what it might mean for UK academia
The EU INSPIRE Directive and what it might mean for UK academiaEDINA, University of Edinburgh
 
Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617EDINA, University of Edinburgh
 
The Go-Geo! Spatial Data Portal: A Data Discovery and Research Tool for UK Ac...
The Go-Geo! Spatial Data Portal: A Data Discovery and Research Tool for UK Ac...The Go-Geo! Spatial Data Portal: A Data Discovery and Research Tool for UK Ac...
The Go-Geo! Spatial Data Portal: A Data Discovery and Research Tool for UK Ac...EDINA, University of Edinburgh
 
Going for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial MetadataGoing for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial MetadataEDINA, University of Edinburgh
 
Kickoff Meeting for the Interoperable Geographic Information for Biosphere St...
Kickoff Meeting for the Interoperable Geographic Information for Biosphere St...Kickoff Meeting for the Interoperable Geographic Information for Biosphere St...
Kickoff Meeting for the Interoperable Geographic Information for Biosphere St...EDINA, University of Edinburgh
 
Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!EDINA, University of Edinburgh
 
Introduction to data and support services for Political Data Analysis
Introduction to data and support services for Political Data AnalysisIntroduction to data and support services for Political Data Analysis
Introduction to data and support services for Political Data AnalysisEDINA, University of Edinburgh
 
Open Repositories and Interoperability Challenges in UK
Open Repositories and Interoperability Challenges in UKOpen Repositories and Interoperability Challenges in UK
Open Repositories and Interoperability Challenges in UKEDINA, University of Edinburgh
 
SUNCAT: Transforming the service to create an open bridge from resource disco...
SUNCAT: Transforming the service to create an open bridge from resource disco...SUNCAT: Transforming the service to create an open bridge from resource disco...
SUNCAT: Transforming the service to create an open bridge from resource disco...EDINA, University of Edinburgh
 
Guiding users through data deposit
Guiding users through data depositGuiding users through data deposit
Guiding users through data depositRobin Rice
 
Now we are six: Integrating Edinburgh DataShare into local and internet in...
Now we are six: Integrating Edinburgh DataShare into local and internet in...Now we are six: Integrating Edinburgh DataShare into local and internet in...
Now we are six: Integrating Edinburgh DataShare into local and internet in...Robin Rice
 
Where data and journal content collide: what does it mean to ‘publish your da...
Where data and journal content collide: what does it mean to ‘publish your da...Where data and journal content collide: what does it mean to ‘publish your da...
Where data and journal content collide: what does it mean to ‘publish your da...EDINA, University of Edinburgh
 
Northumbria University Geospatial Metadata Workshop 20110505
Northumbria University Geospatial Metadata Workshop 20110505Northumbria University Geospatial Metadata Workshop 20110505
Northumbria University Geospatial Metadata Workshop 20110505EDINA, University of Edinburgh
 

Mais procurados (20)

AddressingHistory: Lessons and Messages
AddressingHistory:  Lessons and MessagesAddressingHistory:  Lessons and Messages
AddressingHistory: Lessons and Messages
 
The EU INSPIRE Directive and what it might mean for UK academia
The EU INSPIRE Directive and what it might mean for UK academiaThe EU INSPIRE Directive and what it might mean for UK academia
The EU INSPIRE Directive and what it might mean for UK academia
 
Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617Leeds University Geospatial Metadata Workshop 20110617
Leeds University Geospatial Metadata Workshop 20110617
 
The Go-Geo! Spatial Data Portal: A Data Discovery and Research Tool for UK Ac...
The Go-Geo! Spatial Data Portal: A Data Discovery and Research Tool for UK Ac...The Go-Geo! Spatial Data Portal: A Data Discovery and Research Tool for UK Ac...
The Go-Geo! Spatial Data Portal: A Data Discovery and Research Tool for UK Ac...
 
Going for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial MetadataGoing for GOLD - Adventures in Open Linked Geospatial Metadata
Going for GOLD - Adventures in Open Linked Geospatial Metadata
 
Kickoff Meeting for the Interoperable Geographic Information for Biosphere St...
Kickoff Meeting for the Interoperable Geographic Information for Biosphere St...Kickoff Meeting for the Interoperable Geographic Information for Biosphere St...
Kickoff Meeting for the Interoperable Geographic Information for Biosphere St...
 
Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!Geospatial Metadata and Spatial Data: It's all Greek to me!
Geospatial Metadata and Spatial Data: It's all Greek to me!
 
Introduction to data and support services for Political Data Analysis
Introduction to data and support services for Political Data AnalysisIntroduction to data and support services for Political Data Analysis
Introduction to data and support services for Political Data Analysis
 
Open Repositories and Interoperability Challenges in UK
Open Repositories and Interoperability Challenges in UKOpen Repositories and Interoperability Challenges in UK
Open Repositories and Interoperability Challenges in UK
 
Research Data Management and Spatial Data
Research Data Management and Spatial DataResearch Data Management and Spatial Data
Research Data Management and Spatial Data
 
SUNCAT: Transforming the service to create an open bridge from resource disco...
SUNCAT: Transforming the service to create an open bridge from resource disco...SUNCAT: Transforming the service to create an open bridge from resource disco...
SUNCAT: Transforming the service to create an open bridge from resource disco...
 
PEER End of Project Report
PEER End of Project ReportPEER End of Project Report
PEER End of Project Report
 
Guiding users through data deposit
Guiding users through data depositGuiding users through data deposit
Guiding users through data deposit
 
Now we are six: Integrating Edinburgh DataShare into local and internet in...
Now we are six: Integrating Edinburgh DataShare into local and internet in...Now we are six: Integrating Edinburgh DataShare into local and internet in...
Now we are six: Integrating Edinburgh DataShare into local and internet in...
 
Where data and journal content collide: what does it mean to ‘publish your da...
Where data and journal content collide: what does it mean to ‘publish your da...Where data and journal content collide: what does it mean to ‘publish your da...
Where data and journal content collide: what does it mean to ‘publish your da...
 
Northumbria University Geospatial Metadata Workshop 20110505
Northumbria University Geospatial Metadata Workshop 20110505Northumbria University Geospatial Metadata Workshop 20110505
Northumbria University Geospatial Metadata Workshop 20110505
 
Ukla uksg 2013_final
Ukla uksg 2013_finalUkla uksg 2013_final
Ukla uksg 2013_final
 
Mapping the Repository Landscape
Mapping the Repository LandscapeMapping the Repository Landscape
Mapping the Repository Landscape
 
MANTRA & Open Educational Resources
MANTRA & Open Educational ResourcesMANTRA & Open Educational Resources
MANTRA & Open Educational Resources
 
Service Integration to Enhance RDM
Service Integration to Enhance RDMService Integration to Enhance RDM
Service Integration to Enhance RDM
 

Destaque

Shibboleth Access Management Federations and Secure SDI: ESDIN Experience
Shibboleth Access Management Federations and Secure SDI: ESDIN Experience Shibboleth Access Management Federations and Secure SDI: ESDIN Experience
Shibboleth Access Management Federations and Secure SDI: ESDIN Experience EDINA, University of Edinburgh
 
Edin casestudy-ou-rr-2011
Edin casestudy-ou-rr-2011Edin casestudy-ou-rr-2011
Edin casestudy-ou-rr-2011Robin Rice
 
Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415EDINA, University of Edinburgh
 
University of Edinburgh's first QGIS Training Course
University of Edinburgh's first QGIS Training CourseUniversity of Edinburgh's first QGIS Training Course
University of Edinburgh's first QGIS Training CourseTom Armitage
 
Jp0042 digimap for colleges connect more jisc scotland june 2015
Jp0042 digimap for colleges connect more jisc scotland june 2015Jp0042 digimap for colleges connect more jisc scotland june 2015
Jp0042 digimap for colleges connect more jisc scotland june 2015EDINA, University of Edinburgh
 
On being a cog rather than inventing the wheel: Edinburgh DataShare as a key ...
On being a cog rather than inventing the wheel: Edinburgh DataShare as a key ...On being a cog rather than inventing the wheel: Edinburgh DataShare as a key ...
On being a cog rather than inventing the wheel: Edinburgh DataShare as a key ...EDINA, University of Edinburgh
 
UK RepositoryNet+ Project: New Services for the Institutional Repository Netw...
UK RepositoryNet+ Project: New Services for the Institutional Repository Netw...UK RepositoryNet+ Project: New Services for the Institutional Repository Netw...
UK RepositoryNet+ Project: New Services for the Institutional Repository Netw...EDINA, University of Edinburgh
 
Geography & GIS in Scottish Schools: Observations from a Service Provider
Geography & GIS in Scottish Schools: Observations from a Service ProviderGeography & GIS in Scottish Schools: Observations from a Service Provider
Geography & GIS in Scottish Schools: Observations from a Service ProviderEDINA, University of Edinburgh
 

Destaque (20)

Deep Impact: Metadata and SUNCAT
Deep Impact: Metadata and SUNCATDeep Impact: Metadata and SUNCAT
Deep Impact: Metadata and SUNCAT
 
Shibboleth Access Management Federations and Secure SDI: ESDIN Experience
Shibboleth Access Management Federations and Secure SDI: ESDIN Experience Shibboleth Access Management Federations and Secure SDI: ESDIN Experience
Shibboleth Access Management Federations and Secure SDI: ESDIN Experience
 
Who is doing what, and how do we know? [PEPRS]
Who is doing what, and how do we know? [PEPRS]Who is doing what, and how do we know? [PEPRS]
Who is doing what, and how do we know? [PEPRS]
 
Accessing Treasure on lands and peoples
Accessing Treasure on lands and peoplesAccessing Treasure on lands and peoples
Accessing Treasure on lands and peoples
 
Edin casestudy-ou-rr-2011
Edin casestudy-ou-rr-2011Edin casestudy-ou-rr-2011
Edin casestudy-ou-rr-2011
 
Metadata Working Group - Status update
Metadata Working Group -Status updateMetadata Working Group -Status update
Metadata Working Group - Status update
 
MANTRA for Change
MANTRA for ChangeMANTRA for Change
MANTRA for Change
 
Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415Oxford University Geospatial Metadata Workshop 20110415
Oxford University Geospatial Metadata Workshop 20110415
 
University of Edinburgh's first QGIS Training Course
University of Edinburgh's first QGIS Training CourseUniversity of Edinburgh's first QGIS Training Course
University of Edinburgh's first QGIS Training Course
 
Jp0042 digimap for colleges connect more jisc scotland june 2015
Jp0042 digimap for colleges connect more jisc scotland june 2015Jp0042 digimap for colleges connect more jisc scotland june 2015
Jp0042 digimap for colleges connect more jisc scotland june 2015
 
On being a cog rather than inventing the wheel: Edinburgh DataShare as a key ...
On being a cog rather than inventing the wheel: Edinburgh DataShare as a key ...On being a cog rather than inventing the wheel: Edinburgh DataShare as a key ...
On being a cog rather than inventing the wheel: Edinburgh DataShare as a key ...
 
UK RepositoryNet+ Project: New Services for the Institutional Repository Netw...
UK RepositoryNet+ Project: New Services for the Institutional Repository Netw...UK RepositoryNet+ Project: New Services for the Institutional Repository Netw...
UK RepositoryNet+ Project: New Services for the Institutional Repository Netw...
 
COBWEB Project Status
COBWEB Project StatusCOBWEB Project Status
COBWEB Project Status
 
Ppls mvm2
Ppls mvm2Ppls mvm2
Ppls mvm2
 
Nature jobsexpo 26sept2012osborne
Nature jobsexpo 26sept2012osborneNature jobsexpo 26sept2012osborne
Nature jobsexpo 26sept2012osborne
 
Osgis sept2012 cartogrammar
Osgis sept2012  cartogrammarOsgis sept2012  cartogrammar
Osgis sept2012 cartogrammar
 
Digimap Essentials
Digimap EssentialsDigimap Essentials
Digimap Essentials
 
Geography & GIS in Scottish Schools: Observations from a Service Provider
Geography & GIS in Scottish Schools: Observations from a Service ProviderGeography & GIS in Scottish Schools: Observations from a Service Provider
Geography & GIS in Scottish Schools: Observations from a Service Provider
 
A Importância da IDE-a no Reino-Unido
A Importância da IDE-a no Reino-UnidoA Importância da IDE-a no Reino-Unido
A Importância da IDE-a no Reino-Unido
 
How does it feel to participate in public?
How does it feel to participate in public?How does it feel to participate in public?
How does it feel to participate in public?
 

Semelhante a Unlocking metadata through text mining

NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three dri_ireland
 
Open Science at the University of Edinburgh
Open Science at the University of EdinburghOpen Science at the University of Edinburgh
Open Science at the University of EdinburghUniversity of Edinburgh
 
Barcelona oldmapsonline
Barcelona oldmapsonlineBarcelona oldmapsonline
Barcelona oldmapsonlinePetr Pridal
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and CeremonyArchiver
 
Reliance project introduction
Reliance project introductionReliance project introduction
Reliance project introductionRaul Palma
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair" OpenAIRE
 
Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01The European Library
 
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the CloudEuropeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the CloudTU Delft, Netherlands
 
The Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresThe Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresguest0dc425
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver
 
Science Demonstrator Session: Social and Earth Sciences
Science Demonstrator Session: Social and Earth SciencesScience Demonstrator Session: Social and Earth Sciences
Science Demonstrator Session: Social and Earth SciencesEOSCpilot .eu
 
Presentations from ICT 2015 in Lisbon
Presentations from ICT 2015 in LisbonPresentations from ICT 2015 in Lisbon
Presentations from ICT 2015 in Lisbonsdi4apps
 
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...The European Library
 
Preparing for CoreTrustSeal Accreditation: FAIR Data, Trust Principles and Cu...
Preparing for CoreTrustSeal Accreditation: FAIR Data, Trust Principles and Cu...Preparing for CoreTrustSeal Accreditation: FAIR Data, Trust Principles and Cu...
Preparing for CoreTrustSeal Accreditation: FAIR Data, Trust Principles and Cu...SHED Strategy
 
INSPIRE - ensuring access or continuity of access?
INSPIRE - ensuring access or continuity of access?INSPIRE - ensuring access or continuity of access?
INSPIRE - ensuring access or continuity of access?Martin Donnelly
 

Semelhante a Unlocking metadata through text mining (20)

Seminario Sobre Datasets Consorcio Madrono
Seminario Sobre Datasets Consorcio Madrono Seminario Sobre Datasets Consorcio Madrono
Seminario Sobre Datasets Consorcio Madrono
 
NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three NORFest 2023 Lightning Talks Session Three
NORFest 2023 Lightning Talks Session Three
 
Open Science at the University of Edinburgh
Open Science at the University of EdinburghOpen Science at the University of Edinburgh
Open Science at the University of Edinburgh
 
Barcelona oldmapsonline
Barcelona oldmapsonlineBarcelona oldmapsonline
Barcelona oldmapsonline
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and Ceremony
 
Using a dumb identifier to do smart things
Using a dumb identifier to do smart thingsUsing a dumb identifier to do smart things
Using a dumb identifier to do smart things
 
Reliance project introduction
Reliance project introductionReliance project introduction
Reliance project introduction
 
WEBINAR: "How to manage your data to make them open and fair"
WEBINAR:  "How to manage your data to make them open and fair"  WEBINAR:  "How to manage your data to make them open and fair"
WEBINAR: "How to manage your data to make them open and fair"
 
Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01Ecloud copenhagen-130625074823-phpapp01
Ecloud copenhagen-130625074823-phpapp01
 
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the CloudEuropeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
Europeana Cloud Work Package 1: Assessing Researchers' Needs in the Cloud
 
ICT project idea: the Danube Data Cube
ICT project idea: the Danube Data Cube ICT project idea: the Danube Data Cube
ICT project idea: the Danube Data Cube
 
The Developing Needs for e-infrastructures
The Developing Needs for e-infrastructuresThe Developing Needs for e-infrastructures
The Developing Needs for e-infrastructures
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
 
Archiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award CeremonyArchiver pilot phase kick off Award Ceremony
Archiver pilot phase kick off Award Ceremony
 
Science Demonstrator Session: Social and Earth Sciences
Science Demonstrator Session: Social and Earth SciencesScience Demonstrator Session: Social and Earth Sciences
Science Demonstrator Session: Social and Earth Sciences
 
Presentations from ICT 2015 in Lisbon
Presentations from ICT 2015 in LisbonPresentations from ICT 2015 in Lisbon
Presentations from ICT 2015 in Lisbon
 
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
Alastair Dunning, Europeana Cloud: The Project and the Challenges of Assessin...
 
Preparing for CoreTrustSeal Accreditation: FAIR Data, Trust Principles and Cu...
Preparing for CoreTrustSeal Accreditation: FAIR Data, Trust Principles and Cu...Preparing for CoreTrustSeal Accreditation: FAIR Data, Trust Principles and Cu...
Preparing for CoreTrustSeal Accreditation: FAIR Data, Trust Principles and Cu...
 
INSPIRE - ensuring access or continuity of access?
INSPIRE - ensuring access or continuity of access?INSPIRE - ensuring access or continuity of access?
INSPIRE - ensuring access or continuity of access?
 
Bonares presentation oct2016v2
Bonares presentation oct2016v2Bonares presentation oct2016v2
Bonares presentation oct2016v2
 

Mais de EDINA, University of Edinburgh

We have the technology... We have the data... What next?
We have the technology... We have the data... What next?We have the technology... We have the data... What next?
We have the technology... We have the data... What next?EDINA, University of Edinburgh
 
Reference Rot in Theses: A HiberActive Pilot - 10x10 session for Repository F...
Reference Rot in Theses: A HiberActive Pilot - 10x10 session for Repository F...Reference Rot in Theses: A HiberActive Pilot - 10x10 session for Repository F...
Reference Rot in Theses: A HiberActive Pilot - 10x10 session for Repository F...EDINA, University of Edinburgh
 
If I Googled You, What Would I Find? Managing your digital footprint - Nicola...
If I Googled You, What Would I Find? Managing your digital footprint - Nicola...If I Googled You, What Would I Find? Managing your digital footprint - Nicola...
If I Googled You, What Would I Find? Managing your digital footprint - Nicola...EDINA, University of Edinburgh
 
Managing your Digital Footprint : Taking control of the metadata and tracks a...
Managing your Digital Footprint : Taking control of the metadata and tracks a...Managing your Digital Footprint : Taking control of the metadata and tracks a...
Managing your Digital Footprint : Taking control of the metadata and tracks a...EDINA, University of Edinburgh
 
Social media and blogging to develop and communicate research in the arts and...
Social media and blogging to develop and communicate research in the arts and...Social media and blogging to develop and communicate research in the arts and...
Social media and blogging to develop and communicate research in the arts and...EDINA, University of Edinburgh
 
Enhancing your research impact through social media - Nicola Osborne
Enhancing your research impact through social media - Nicola OsborneEnhancing your research impact through social media - Nicola Osborne
Enhancing your research impact through social media - Nicola OsborneEDINA, University of Edinburgh
 
Social Media in Marketing in Support of Your Personal Brand - Nicola Osborne
Social Media in Marketing in Support of Your Personal Brand - Nicola OsborneSocial Media in Marketing in Support of Your Personal Brand - Nicola Osborne
Social Media in Marketing in Support of Your Personal Brand - Nicola OsborneEDINA, University of Edinburgh
 
Best Practice for Social Media in Teaching & Learning Contexts - Nicola Osborne
Best Practice for Social Media in Teaching & Learning Contexts - Nicola OsborneBest Practice for Social Media in Teaching & Learning Contexts - Nicola Osborne
Best Practice for Social Media in Teaching & Learning Contexts - Nicola OsborneEDINA, University of Edinburgh
 
Digimap for Schools: Introduction to an ICT based cross curricular resource f...
Digimap for Schools: Introduction to an ICT based cross curricular resource f...Digimap for Schools: Introduction to an ICT based cross curricular resource f...
Digimap for Schools: Introduction to an ICT based cross curricular resource f...EDINA, University of Edinburgh
 

Mais de EDINA, University of Edinburgh (20)

The Making of the English Landscape:
The Making of the English Landscape: The Making of the English Landscape:
The Making of the English Landscape:
 
Spatial Data, Spatial Humanities
Spatial Data, Spatial HumanitiesSpatial Data, Spatial Humanities
Spatial Data, Spatial Humanities
 
Land Cover Map 2015
Land Cover Map 2015Land Cover Map 2015
Land Cover Map 2015
 
We have the technology... We have the data... What next?
We have the technology... We have the data... What next?We have the technology... We have the data... What next?
We have the technology... We have the data... What next?
 
Reference Rot in Theses: A HiberActive Pilot - 10x10 session for Repository F...
Reference Rot in Theses: A HiberActive Pilot - 10x10 session for Repository F...Reference Rot in Theses: A HiberActive Pilot - 10x10 session for Repository F...
Reference Rot in Theses: A HiberActive Pilot - 10x10 session for Repository F...
 
GeoForum EDINA report 2017
GeoForum EDINA report 2017GeoForum EDINA report 2017
GeoForum EDINA report 2017
 
If I Googled You, What Would I Find? Managing your digital footprint - Nicola...
If I Googled You, What Would I Find? Managing your digital footprint - Nicola...If I Googled You, What Would I Find? Managing your digital footprint - Nicola...
If I Googled You, What Would I Find? Managing your digital footprint - Nicola...
 
Moray housemarch2017
Moray housemarch2017Moray housemarch2017
Moray housemarch2017
 
Uniof stirlingmarch2017secondary
Uniof stirlingmarch2017secondaryUniof stirlingmarch2017secondary
Uniof stirlingmarch2017secondary
 
Uniof glasgow jan2017_secondary
Uniof glasgow jan2017_secondaryUniof glasgow jan2017_secondary
Uniof glasgow jan2017_secondary
 
Managing your Digital Footprint : Taking control of the metadata and tracks a...
Managing your Digital Footprint : Taking control of the metadata and tracks a...Managing your Digital Footprint : Taking control of the metadata and tracks a...
Managing your Digital Footprint : Taking control of the metadata and tracks a...
 
Social media and blogging to develop and communicate research in the arts and...
Social media and blogging to develop and communicate research in the arts and...Social media and blogging to develop and communicate research in the arts and...
Social media and blogging to develop and communicate research in the arts and...
 
Enhancing your research impact through social media - Nicola Osborne
Enhancing your research impact through social media - Nicola OsborneEnhancing your research impact through social media - Nicola Osborne
Enhancing your research impact through social media - Nicola Osborne
 
Social Media in Marketing in Support of Your Personal Brand - Nicola Osborne
Social Media in Marketing in Support of Your Personal Brand - Nicola OsborneSocial Media in Marketing in Support of Your Personal Brand - Nicola Osborne
Social Media in Marketing in Support of Your Personal Brand - Nicola Osborne
 
Best Practice for Social Media in Teaching & Learning Contexts - Nicola Osborne
Best Practice for Social Media in Teaching & Learning Contexts - Nicola OsborneBest Practice for Social Media in Teaching & Learning Contexts - Nicola Osborne
Best Practice for Social Media in Teaching & Learning Contexts - Nicola Osborne
 
SCURL and SUNCAT serials holdings comparison service
SCURL and SUNCAT serials holdings comparison serviceSCURL and SUNCAT serials holdings comparison service
SCURL and SUNCAT serials holdings comparison service
 
Big data in Digimap
Big data in DigimapBig data in Digimap
Big data in Digimap
 
Digimap for Schools: Introduction to an ICT based cross curricular resource f...
Digimap for Schools: Introduction to an ICT based cross curricular resource f...Digimap for Schools: Introduction to an ICT based cross curricular resource f...
Digimap for Schools: Introduction to an ICT based cross curricular resource f...
 
Digimap Update - Geoforum 2016 - Guy McGarva
Digimap Update - Geoforum 2016 - Guy McGarvaDigimap Update - Geoforum 2016 - Guy McGarva
Digimap Update - Geoforum 2016 - Guy McGarva
 
Data, Data, Data - Geoforum 2016 - Phil Bartie
Data, Data, Data - Geoforum 2016 - Phil BartieData, Data, Data - Geoforum 2016 - Phil Bartie
Data, Data, Data - Geoforum 2016 - Phil Bartie
 

Último

ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 

Último (20)

ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 

Unlocking metadata through text mining

  • 1. 1 A Service Perspective: Unlocking metadata to enhance discoverability and context Peter Burnhill Director, EDINA National Data Centre, University of Edinburgh, Scotland UK Text Mining for Scholarly Communications and Repositories Joint Workshop, Manchester, 28-29 October 2009
  • 2. 2 A Service Perspective: Unlocking metadata to enhance discoverability and context Time & Place For Text Mining Peter Burnhill Director, EDINA National Data Centre, University of Edinburgh, Scotland UK Text Mining for Scholarly Communications and Repositories Joint Workshop, Manchester, 28-29 October 2009
  • 3. 3 Time & Place For Text Mining 1. The service imperative: online services for education 2. Show time: text mining applications 3. Some concluding remark, and an unveiling!
  • 4. 4 Time & Place For Text Mining 1. The service imperative: online services for education – enhancing discoverability of online information objects – providing context for the analysis of those data Transforming projects into services for research [& learning/teaching] 2. Show time: text mining applications A. Geo-enabling via geo-parsing & geo-tagging B. Geo-activating metadata; evoking the event C. Enriching audio-visual material by event-tagging 3. Some concluding remark, and an unveiling! Using text mining to Unlock space/time co-ordinates For Words, numbers, pictures, sounds!
  • 5. 5 Service context: Edinburgh and beyond Edinburgh University Data Library, set up in 1983 [pre-EDINA] – 25 years as ‘an online library of research data’ Based in Centre for Application Software & Technology (PLU) • databases and graphics; access to programmers • Linking spatially-referenced datasets to geographic boundaries • interest in hypertext, ‘actionable’ metadata & SGML (TEI) Hosted at Edinburgh Regional Computing Centre • Leveraging multi-access computing & networking infrastructure Faculty & what we would now call ICT Services worked closely together • GIS; Joint project activity; movement of staff back and forth • I have history as a researcher and a senior lecturer But initial focus of Data Library was on numbers • encoded data interpretable for secondary analysis via codebook (NB codebook for datasets as structured text)
  • 6. 6 Present National Service context: UK and beyond EDINA, a national academic data centre With stated mission... to enhance the productivity of research, learning and teaching in UK higher and further education • Provides networked access to a range of online resources • Focus is on service but also undertake R&D (projects services) – deliver over 20 online services – have about 10 major projects (including services in development) – deploy about 80 staff, in two locations (Edinburgh & Warrington) • These include (although not commented on here): – Technical support for UK Access Management Federation – Work to ensure continuity of access to digital content (esp. journal content)
  • 7. UK funding councils JISC Sub-CommitteesJISC Collections acting as platform for network-level services & helping to build the JISC Integrated Information Environment research, learning & teaching in UK universities & colleges Research Councils UK National Data Centres
  • 8. 8 Service context: UK and beyond EDINA was designated in 1995/6, co-incident with emergence of Web, with need to shift further beyond encoded numbers into indexed words New task was to host and provide user interfaces for leading Abstract and Indexing databases for scholarly literature • BIOSIS, Compendex, INSPEC, Art Abstracts, Periodical Contents, Index to The Times, CABI, etc But always an eye on developing as UK Geographic Data Unit • UKBORDERS (boundaries and geo-reference database) for ESRC • Digimap (Ordnance Survey mapping data) for JISC Switch to ‘What EDINA Does Now’ • Community Report & Website
  • 9.
  • 10. 10 Reading & Reference Room: supporting scholarly communication the Depot international Open Access facility to support self deposit of peer-reviewed papers SUNCAT UK serials union catalogue: what’s held where No longer host specialist Abstract & Index databases …
  • 11. 11 • Services: Digimap, UKBORDERS, AgCensus, Go-Geo!,GeoCrossWalk • Projects, such as – EuroGeonames (funded under EU e-Content.Plus) – SEE-GEO – A&A of geo (grid) web services – COMPASS – semantic discovery of geo-services – DIaD - Data Integration and Dissemination • History of Service Hosting • Community engagement and outreach e.g. – Chair OGC Universities WG – Sit on Scottish Government GI steering group – Acting as contact point between OGC & e-Science GRID communities – Lead on INSPIRE Directive for UK HE Map & Data Place: Geo-data Specialists
  • 12. 12 User tools (clients) Common application servers Common data(base) servers Common hardware infrastructure Ordnance Survey digital map data Historic map images Geological map data Marine map data Data from another Supplier Vector map server Raster map server Database server File based data server Historic map viewer Digimap Classic end-user desktop/browser Digimap Carto Data Download Client Web Services e.g. to 3rd party Portal Digimap - One Architecture, Many Services (2004+)
  • 13.
  • 14. 14 Resource Discovery using Go-Geo! ‘federated’ search portal using geography • User defines the area of interest as well as subject of interest • Go-Geo! does ‘federated’ search of 3rd party indexes & data services, bringing back links to geo-referenced data sources • promotes awareness of geospatial aspect of problems – facilitates understanding – provides access to geographically related resources • Part of larger strategy – to enhance geo-parsing (extract place names from documents) – geo-tagging (ensure names have geo-feet) – cross-walk across terminology (place/area/address/co-ordinates)
  • 15. 15 A portal (GoGeo!) needs a resolver ( GeoCrossWalk) GeoCrossWalk Server Content Provider A Content Provider B Coordinate footprints Content Provider C Parish names Place names Portal service Post code: L34 0HS bridging the different vocabularies of geography: ‘Find resources for this postcode’ used to geo-reference survey data Knowsley 340900,392300 - 347217, 397660 BX003 <
  • 16. 16 Sound & Pictures: access to new data sources • 20th Century is the first fully audio-visual century – With new forms of research material to use and to master • EDINA as platform for downloadable film, video and audio – Licensed for use in learning, teaching and research – Wide range of subject coverage, including documentary film 1. Film & Sound Online – 600 hours of film, digitised for downloading 2. NewsFilm Online – 3000 hours of material from ITN & Reuters – Over 4TBs of clips to download • Plus Education Image Gallery of still photography • Visual and Sound Materials Portal • Discovering all sorts of audio-visual material
  • 17. 17 Part 2: Showcase for R&D Three examples of using text mining to Unlock space/time co-ordinates: presented here as Exhibits A, B and C A. Geo-enabling via geo-parsing & geo-tagging B. Geo-activating metadata; evoking the event C. Enriching audio-visual material by event-tagging
  • 18. 18 Geo-enabling via geo-parsing & geo-tagging Part of overall strategic purpose – to enhance discoverability of online resources – to provide context for their analysis • geo-parsing (to extract place names from documents) • geo-tagging (to ensure names have geo-feet) • cross-walking across terminology Q: What’s special about the spatial? A: (Geo-)referencing! Exhibit A
  • 19. 19 Back to Basics: What is geospatial data? • data with direct reference to position on surface of the earth • either latitude / longitude • or grid coordinates • or it is textual, eg placename, administrative area, address … Spatial reference Attributes
  • 20. Simplest application is to parse&put place names on a map! We use Statistical Accounts of Scotland as Reference Text Test-bed
  • 21.
  • 22. 22 Re-thinking Geo-parsing as Text-mining … From a NPL pov: automatic processing of a text to 1. identify the strings that are place names 2. check each place name against an authority file (gazetteer) 3. select among multiple entries, choose the ‘correct’ one QED: each place name in the text is linked to an entry in a gazetteer which provides information about the place including lat/long Text Analysis Strategy: • Information extraction/text mining techniques • Rule-based, not statistical • Pipeline of linguistic analysis components, starting with simple components and becoming progressively more complex • Each component adds information in the form of XML mark-up
  • 23. 23 The Edinburgh Geoparser • Began as part of the GeoCrossWalk project. Link made between EDINA and Language Technology Group at UoE School of Informatics • Testing place name recognition in Scotland-centric documents (Statistical Accounts). 1. Use of GeoCrossWalk gazetteer as lexicon but no linking to gazetteer 2. Demonstrator for place name recognition and GeoCrossWalk gazetteer access but no disambiguation 3. Demonstrator for place name recognition and GeoCrossWalk and GeoNames gazetteer access plus disambiguation • GeoDigRef project refined and configured the system for JISC-funded projects of digitisation of key collections: a) Histpop: 200,000 pages of census and registration material for the British Isles. b) BOPCRIS:18th Century Official Parliamentary Publications Portal 1688 - 1834 c) British Library Sound Archive d) The Stormont Papers
  • 25. 25 Output from Gazetteer Lookup • <placename> element containing a number of candidate <place> elements from the gazetteer. • These elements have as attributes: – lat (latitude) – long (longitude) – gazref (two part id (gazetteer name and id returned by gazetteer: geonames:12345) – in-cc (ISO country code, where available) – type (feature type)
  • 26. 26 Resolution Stage [a longer look at Q&A] • Try to augment information about each candidate place. – by consulting lists of large places: from geonames and Wikipedia. – if place in the list with the same name and similar latitude and longitude (within one degree), assume it's the right one. • The information added is population and (containing) country • Then apply a number of heuristics: – Feature type: we prefer populated places to "facilities". – Population: we prefer bigger places (for newspaper text, we found that > 90% could be correctly identified using this alone). – Contextual information: colocations indicating containment and proximity ("London, England", "Leith near Edinburgh"). – Parameters from the user: restricting the results to a certain region, or at least strongly preferring that region. – Clustering: places in a document are often close together - try minimising the bounding box.
  • 27.
  • 28.
  • 29. 29 Geo-activated metadata; evoking the event Purpose • to extract and re-present information, including geo-spatial and temporal referencing from archaeological site records – A way of linking items and collections – Beyond (but also enabling) simple mapping * Improving discoverability and providing context • This uses NLP techniques across different sources – Find and categorise entities (NER); Find and label relations (RE) – Combining in a unified format: an RDF graph – Linking entities common across sources; discovering new connections • Focus on the ‘has Event’ relation of an excavation – Many (all?) cultural collection have (hidden) space/time co-ordinates This is taken from work by Kate Byrne, also illustrated in the poster on display as: "Populating the Semantic Web with Relations from Text" Exhibit B
  • 30. An archaeological site record: a technical summary
  • 31. Parse to ‘mark up’ archaeological site record (metadata)
  • 32.
  • 33.
  • 34. 34 Enriching audio-visual material via event-tagging Purpose: to find ways to leverage contextual metadata for audio- visual material • initial focus on (digitised) 20th Century newsfilm footage • Overcoming inherent problem of sparse metadata that inhibits discoverability – using ancillary information in the metadata – evoking ‘has Event’ relation – exploiting interoperability – finding related text for mining and so auto-creating metadata * to improve discoverability and provide/enhance context This is a set of ideas for future work between EDINA, UoE/LTG and NaCTeM Exhibit C
  • 35. Sparse Metadata Very Short Clip No Soundtrack The only data we have: •1st October 1995 •Cyprus •Disturbance (street disturbance) •British soldiers •Broadcast on TV News
  • 36. Related material in archives, collections and web services (open or authenticated) can be compared
  • 37. The Times Archive seems to have several relevant items as part of full page images…
  • 38.
  • 39. Two Cyprus stories in the Times Archive lead to a rich contextual artefact.
  • 40. “Cyprus 1955” in Life Magazine reveals a short US perspective and a pictorial context Bringing in text and links to other resources allows a much richer view onto the original news film and allows exploration of both the topic and the wider context.
  • 41. 41 Some Concluding Remarks 1) Michael Buckland spoke of two traditions in Information Science …
  • 42. 42 document tradition & computation tradition “considerable simplification, … helpful to think … of two traditions, or mentalities, even cultures, co-exist in area of Information Science 1. “Approaches based on a concern with documents, with signifying records: archives, bibliography, documentation, librarianship, records management, and the like 2. “approaches based on uses for formal techniques, whether mechanical (such as punch cards and data-processing equipment) or mathematical (as in algorithmic procedures).” Michael Buckland, UC Berkeley, 1998 http://people.ischool.berkeley.edu/~buckland/asis62.html
  • 43. 43 Re-learning basics of information retrieval Combine document & computation traditions to contribute to growth of digital library • Using ancillary information • to sort/filter etc in the ‘too many’ problem • as leverage in the ‘too few’ problem • exploiting the digital properties of data • to add value, impose/extract meaning, do sums and visualise • Exploiting interoperability and ‘server vs client’
  • 44. 44 Some Concluding Remarks 1. Buckland spoke of two traditions 2. What meaning is there locked away in numbers, words, pictures and sounds?
  • 45. 45 “Numbers, Words, Pictures & Sounds … … soon all will be digital and accessed from afar” IASSIST Conference, 1990 a) Words as text are full of meaning, including key space/time co-ordinates ripe for extracting and geo-enabling. b) The meaning in Numbers is often encoded, but datasets have codebooks and other associated documentation – these texts could provide the key to unlock some co- ordinates c) The richness of meaning in audio-visual material, sounds & pictures, is pre-theoretic, • requiring more of us, including some lateral thinking, as illustrated in Exhibit C. • But then these digital and online resources can be discovered and regarded as evidence
  • 46. 46 Some Concluding Remarks 1. Buckland spoke of two traditions 2. What meaning is there locked away in numbers, words, pictures and sounds? 3. Now is the Time & Place For Text Mining
  • 47. 47 Some Concluding Remarks 1. Buckland spoke of two traditions 2. What meaning is there locked away in numbers, words, pictures and sounds? 3. Now is the Time & Place For Text Mining 4. Time for an unveiling:
  • 48.
  • 51. 51 Time for me to stop Hoping that I have left some space/place for questions Thank you Acknowledgements to Claire Grover & Kate Byrne (School of Informatics) to Jo Walsh (EDINA) • And to all their colleagues who did the work • And, of course, to JISC for funding, and to you as taxpayers …