This presentation explores emerging trends in linking scholarly literature to data. It discusses entity linking and data linking, presenting examples of how publishers and indexers are connecting local data repositories to internationally hosted related articles. The talk addresses a theoretical framework with four quadrants ranging from easiest to hardest to apply to digital humanities. Examples demonstrate linking publications to supplemental datasets, automating connections between publications and whole datasets, linking to entities, and automated entity recognition without manual markup.
PoolParty Semantic Suite - LT-Innovate Industry Summit-2016 - BrusselsMartin Kaltenböck
Semelhante a Connecting Publications & Data: Raising visibility of local data collections through linking with international publication databases (20)
Connecting Publications & Data: Raising visibility of local data collections through linking with international publication databases
1. Abstract: Connecting locally hosted data repositories to internationally hosted related articles has never
been easier. With APIs and other web services becoming standardized at the same time that new linking
standards, such as Datacite DOIs, are being adopted, new ways to distribute and mashup content are now
possible. This presentation will explore emerging trends in linking scholarly literature to data. Both entity
linking and data linking will be discussed. Examples will be presented demonstrating how these
technologies are being employed by publishers and A&I vendors in cooperation with local data repositories.
__________________________________________
Before I get started, I would like to take a minute to set some expectations for this talk. The examples used
will primarily be about hard sciences, my challenge to you is to figure out how to apply these technologies
and methods to the digital humanities.
1
2. This is a theoretical framework for looking at the different ways that publications can be connected
to data.
This is also the agenda for the talk. I will first speak about the top left quadrant and then work my
way to the bottom right. This means starting from the easiest to apply to the humanities and
working through to the hardest.
2
3. This quadrant is primarily about publications to supplemental data.
3
4. Supplemental data submitted as a file with an article is the traditional way. It has its place, but that
is not what I am talking about today.
4
5. Instead, new tools now enable display and direct manipulation of data in new and interesting ways.
This example is an application that displays KML files on a Google Map:
http://www.applications.sciverse.com/action/appDetail/298231?zone=main&pageOrigin=appGallery
&activity=display
5
6. Next on the agenda is automating the connection between publications and whole supplementary
or related datasets.
6
7. One example of this is the PANGAEA app which searches PANGAEA apis by article DOI and
retrieves the coordinates of where supplementary data was collected and then charts these on a
Google map displayed directly on the ScienceDirect article page.
7
8. This also works on Scopus record pages (so for lot’s of publishers and journals). From deciding to
put it on Scopus as well it took less than 24 hours for the PANGAEA developer to implement. This
was enabled by the SciVerse Applications platform.
8
9. Users can link through to the main record for the dataset on PANGAEA. One thing I would like to
mention here is that there is also a DOI for the dataset. This was done through DataCite.
9
10. So what is DataCite and why is it important? It is also very important for creating links to data in
repositories.
10
11. Takeaway points: International DOI Foundation enables CrossRef to give out DOIs. DataCite
roughly equivalent to CrossRef. Learn more at the DataCite website. A central institution in Serbia
might want to become a Member Institute.
11
12. So those were examples of linking to whole datasets and displaying them in new and interesting
ways. Next to discuss is linking to entities.
12
13. Traditional linking involves an author marking up an entity such as a protein so that it can be easily
linked to additional information about that entity in a different database. While this is useful, it is
not what I wish to share with you today. Why make a user follow a link when…
13
14. You can now embed a 3D interactive model of the protein directly in context in the article. In this
example the PDB Protein Viewer is embedded directly in the article.
14
15. In this example an author adds key structures to the article and they are then embedded using
Reaxys information and software.
15
17. The last examples still required an Author to manually mark up entities. Through text analysis and
mining, this is no longer always necessary.
17
18. In this example, our partner NextBio automatically recognizes entities in the text of the
article.
Easily extendable to new / other entities
Works retrospectively on older content
Does create recall / precision errors
18
19. Not only can it display them in the sidebar, but the application framework enables adding links to
the entities in the text on the fly.
19
20. A reader can then click those links for additional information form multiple databases.
20
21. 1. Colours & tags genes, proteins, molecule names
2. Clicking shows a summary of features for the term (ie: sequence or 2D structure)
3. User can click on links in the pop-up leading out to more information
21
23. * To summarize, we started with very traditional linking of datasets where an author submits the dataset with the
article. One example of how this can be improved was the Interactive map viewer that displays supplementary KML
files rather than simple attaching the files to the article.
* Next we discussed automated linking to datasets. This included the example of searching PANGAEA APIs for
related datasets and then displaying the locations the data was collected. This will be driven by new standards such as
DataCite.
* Third, authors manually mark up entities that can be linked to in other databases. Now it is possible to embed
content from other databases using APIs.
* Last, is totally automated entity recognition using text analysis and mining, Again, information from third party
databases can be embedded directly in the article itself.
* While I haven’t spoken too much about the technologies enabling these new ways of linking articles to data, one
example is the SciVerse Application Framework, which now enables all of the examples discussed today.
http://www.applications.sciverse.com/action/userhome
23
24. I would like to close with the same questions I opened with. Thank you.
24
Notas do Editor
Title: Connecting Publications & Data: Raising visibility of local data collections through linking with international publication databases Abstract: Connecting locally hosted data repositories to internationally hosted related articles has never been easier. With APIs and other web services becoming standardized at the same time that new linking standards, such as Datacite DOIs, are being adopted, new ways to distribute and mashup content are now possible. This presentation will explore emerging trends in linking scholarly literature to data. Both entity linking and data linking will be discussed. Examples will be presented demonstrating how these technologies are being employed by publishers and A&I vendors in cooperation with local data repositories. __________________________________________ Before I get started, I would like to take a minute to set some expectations for this talk. The examples used will primarily be about hard sciences, my challenge to you is to figure out how to apply these technologies and methods to the digital humanities.
This is a theoretical framework for looking at the different ways that publications can be connected to data. This is also the agenda for the talk. I will first speak about the top left quadrant and then work my way to the bottom right. This means starting from the easiest to apply to the humanities and working through to the hardest.
This quadrant is primarily about publications to supplemental data.
Supplemental data submitted as a file with an article is the traditional way. It has its place, but that is not what I am talking about today.
Instead, new tools now enable display and direct manipulation of data in new and interesting ways. This example is an application that displays KML files on a Google Map: http://www.applications.sciverse.com/action/appDetail/298231?zone=main&pageOrigin=appGallery&activity=display
Next on the agenda is automating the connection between publications and whole supplementary or related datasets.
One example of this is the PANGAEA app which searches PANGAEA apis by article DOI and retrieves the coordinates of where supplementary data was collected and then charts these on a Google map displayed directly on the ScienceDirect article page.
This also works on Scopus record pages (so for lot’s of publishers and journals). From deciding to put it on Scopus as well it took less than 24 hours for the PANGAEA developer to implement. This was enabled by the SciVerse Applications platform.
Users can link through to the main record for the dataset on PANGAEA. One thing I would like to mention here is that there is also a DOI for the dataset. This was done through DataCite.
So what is DataCite and why is it important? It is also very important for creating links to data in repositories.
Takeaway points: International DOI Foundation enables CrossRef to give out DOIs. DataCite roughly equivalent to CrossRef. Learn more at the DataCite website. A central institution in Serbia might want to become a Member Institute.
So those were examples of linking to whole datasets and displaying them in new and interesting ways. Next to discuss is linking to entities.
Traditional linking involves an author marking up an entity such as a protein so that it can be easily linked to additional information about that entity in a different database. While this is useful, it is not what I wish to share with you today. Why make a user follow a link when…
You can now embed a 3D interactive model of the protein directly in context in the article. In this example the PDB Protein Viewer is embedded directly in the article.
In this example an author adds key structures to the article and they are then embedded using Reaxys information and software.
The last examples still required an Author to manually mark up entities. Through text analysis and mining, this is no longer always necessary.
In this example, our partner NextBio automatically recognizes entities in the text of the article. Easily extendable to new / other entities Works retrospectively on older content Does create recall / precision errors
Not only can it display them in the sidebar, but the application framework enables adding links to the entities in the text on the fly.
A reader can then click those links for additional information form multiple databases.
Colours & tags genes, proteins, molecule names Clicking shows a summary of features for the term (ie: sequence or 2D structure) User can click on links in the pop-up leading out to more information
Colours & tags genes, proteins, molecule names Clicking shows a summary of features for the term (ie: sequence or 2D structure) User can click on links in the pop-up leading out to more information
To summarize, we started with very traditional linking of datasets where an author submits the dataset with the article. One example of how this can be improved was the Interactive map viewer that displays supplementary KML files rather than simple attaching the files to the article. Next we discussed automated linking to datasets. This included the example of searching PANGAEA APIs for related datasets and then displaying the locations the data was collected. This will be driven by new standards such as DataCite. Third, authors manually mark up entities that can be linked to in other databases. Now it is possible to embed content from other databases using APIs. Last, is totally automated entity recognition using text analysis and mining, Again, information from third party databases can be embedded directly in the article itself. While I haven’t spoken too much about the technologies enabling these new ways of linking articles to data, one example is the SciVerse Application Framework, which now enables all of the examples discussed today. http://www.applications.sciverse.com/action/userhome
I would like to close with the same questions I opened with. Thank you.