Profiling systems have achieved notable adoption by research institutions.1 Multi-site search of research profiling systems has substantially evolved since the first deployment of systems such as DIRECT2Experts.2 CTSAsearch is a federated search engine using VIVO-compliant Linked Open Data (LOD) published by members of the NIH-funded Clinical and Translational Science (CTSA) consortium and other interested parties. Sixty-four institutions are currently included, spanning six distinct platforms and three continents (North America, Europe and Australia). In aggregate, CTSAsearch has data on 150-300 thousand unique researchers and their 10 million publications. The public interface is available at http://research.icts.uiowa.edu/polyglot.
1. Clinical and Translational
Science Institute / CTSI
at the University of California, San Francisco
CrossLinks – Towards a Single VIVO Profile
Ecosystem
David Eichmann and Eric Meeks
VIVO Conference, August, 2015
6. Technical Items
• Built in Java with Fuseki (twice) and uses the
foaf, geo & the experimental r2r ontologies:
https://github.com/CTSIatUCSF/Crosslinks
• If you want this, it is freely available as an
ORNG gadget or through a public API as
JSON-LD
• Data sourced from CTSASearch, DBPedia
and RNS systems (images and sometimes
RDF)
7.
8. Statistics
The data is quite a few months old….
• SELECT count(distinct(?a)) WHERE { ?r
<http://ucsf.edu/ontology/r2r#hasAffiliation> ?a } -> 38 RNS systems
• SELECT count(distinct(?r)) WHERE { ?r
<http://ucsf.edu/ontology/r2r#hasAffiliation> ?a } -> 159539 researcher URI’s
• SELECT count(distinct(?r)) WHERE { ?r
<http://ucsf.edu/ontology/r2r#hasAffiliation> ?a . ?r
<http://xmlns.com/foaf/0.1/publications> ?cw } -> 88955 researchers with
publications
• SELECT count(distinct(?r)) WHERE { ?r
<http://ucsf.edu/ontology/r2r#hasAffiliation> ?a . ?r
<http://xmlns.com/foaf/0.1/publications> ?cw . ?er
<http://xmlns.com/foaf/0.1/publications> ?cw . ?er
<http://ucsf.edu/ontology/r2r#hasAffiliation> ?ea FILTER(?ea != ?a)} -> 45030
researches with external co authors
9. Why This Matters
• It’s cool. It looks nice, it has icons and
thumbnails and a map.
• One of the first “full cycle” VIVO LOD
features.
• IMDB style user experience, better traffic to
our web sites.
• Researcher are already collaborating across
walls, we need to show this.
• The “egocentric view of the ecosystem” is a
powerful use case.
10. Next Steps
• Grants and Clinical Trials
• Publication centric view
• Improve disambiguation services
• Active collaborations & other networks
• Activity streams (pump.io)
• Refresh the data
This is real linked open data. Linked in the literal sense. You click on a link here and it takes you to some data. And it’s not our data, it’s somebody else’s data. That’s the whole point of LOD
We grab geocode data from dbpedia as well as other things.
Data reuse
Crosslinks evolved in phases. Started with the idea of creating a registry of URI’s to researchers via disambiguation.
Next we decided to grab the data ourselves from the various RNS web sites we knew about.
Even though there is a standard way of injesting LOD once you find the URI’s, we have no standard way of finding the URI’s from a system (and it really is ridiculous) so we had to build different crawlers for VIVO, Loki, Profiles (depending on the version) and threw in Stanford CAP while we were at it.
Found out Dave had already done this so just started using his data. Mostly….
These “fav.icons” were hard to get, harder than they should be. In the RNS world we care so much about semantic data standards but we are not so great with common web best practices (a later talk).
City of hope! We love them because they cherish this feature and they are from Monrovia CA. If that rings a bell it’s because you shop at trader joes.
Their URI’s are have IP addresses in them.
First form used CSV files, then RDBMS then finally RDF.
Data is available as a public web service. Plug in your URI and request “serialization format” and we will spit out the results.
Note the:
hasIcon
Lat
Long
That’s all from dbpedia
Also not the “workinfohomepage” for SEO friendly pretty URL’s
The image is a locally stored thumbnail to protect against downtime.
VIVO isn’t about “linkable open data”, it’s about linked open data. We need to figure out more ways to build these links, producing linkable content is not enough. Crosslinks is just a start (more on this tomorrow). A cool, pretty start with nice images but just a start. L for Linked, not Linkable.
There is something to be said about creating a system where your site being good can help my site be better.
Researchers like having profile pages that showcase their work. We’ve seen that and when media links to our site it’s a real win. Many of them are working in a crosslinks fashion and they aren’t doing it to put hyperlinks on their profile page, their doing it because it’s an effective way of getting work done (Like Dave and myself). If our pages are suppose to reflect their work, we should really show those connections.
This is going to take some time and crosslinks is just a small step towards building this out. By ecosystem we mean big, not just one network but the network of networks, which is how our researchers actually do work. Because of network effects, bigger is better. A social networking system that recognizes the user in the context of a meaningful network is a powerful thing. Can be powerful for messaging, powerful for search, powerful for marketing & content delivery. Facebook knows this, linked in knows this. And it has to be real, myspace didn’t know this and that’s why didn’t fair so well. The internet is a big place and it’s just more efficient when they tools you use online know you are and how you are connected to things (if you trust those tools).
A single system like facebook or linkedin cannot deliver this to our users as well as we can, because of provenance. Our systems will always be more real, and more valueable. And they will also be distributed because we don’t all work at the same place. Building out a network of networks (VIVO ecosystem) is the only way to have your cake (the provenance ) and eat it to (the full network) and we are in a very special position to build this out.
Industry can’t do it. We have an opportunity to build out the really cool and useful network.
It’s a real shame pubmed does not support LOD and link out to all the author URI’s we’ve created. But they don’t, so we all need to do it. Should be able to do this in “real time” with Dave’s data and our UI (a good combo)
Add the same for grants and clinical trials
When you find the same publication owned by two different URI’s with the same or similar name, you can see if it’s a “same as” issue or an incorrectly claimed pub, and fix it.