Outreach: A critical intersection of libraries and technology
Facilitating Open Science and Research Discovery via VIVO and the Semantic Web
1. Facilitating Open Science and
Research Discovery via VIVO and the
Semantic Web
Kristi Holmes, PhD
Bioinformaticist
Becker Medical Library
http://vivo.wustl.edu/display/n4754
Twitter: @kristiholmes
December 5, 2011
Facilitating Open Science and Research Discovery via VIVO and the Semantic Web by Kristi L. Holmes is licensed under
a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.
3. Public, structured linked data about
investigators interests, activities and
accomplishments, and tools to use that
data to advance science
4. What is VIVO?
An open-source semantic web application that
An open-source semantic web application that
enables the discovery of research and
enables the discovery of research and
scholarship across disciplines in an institution.
scholarship across disciplines in an institution.
Populated with detailed profiles of faculty and
Populated with detailed profiles of faculty and
researchers; displaying items such as
researchers; displaying items such as
publications, teaching, service, and professional
publications, teaching, service, and professional
affiliations.
affiliations.
A powerful search
A powerful search
functionality for locating
functionality for locating
people and information within
people and information within
or across institutions.
or across institutions.
5. A VIVO profile allows you to:
Find potential colleagues by research area, authorship,
Find potential colleagues by research area, authorship,
and collaborations.
and collaborations.
Showcase credentials, expertise, skills, and professional
Showcase credentials, expertise, skills, and professional
achievements.
achievements.
Connect within focus areas and geographic expertise.
Connect within focus areas and geographic expertise.
Simplify reporting tasks and link data to external
Simplify reporting tasks and link data to external
applications – e.g., to generate biosketches or CVs.
applications – e.g., to generate biosketches or CVs.
Publish the URL or link the profile to other applications.
Publish the URL or link the profile to other applications.
Display visualizations of complex research networks and
Display visualizations of complex research networks and
relationships.
relationships.
6. VIVO harvests data from verified sources
Faculty and unit
Faculty and unit
administrators can then
administrators can then
add additional
add additional
information to their
information to their
profile. (M)
profile. (M) External data sources (I):
External data sources (I):
Internal data sources (I):
Internal data sources (I):
• • Publication warehouses-
Publication warehouses-
• •HR Directory
HR Directory e.g. PubMed, Web of
e.g. PubMed, Web of
• •Office of Sponsored Research
Office of Sponsored Research Science, and more.
Science, and more.
• •Institutional Repositories
Institutional Repositories • • Grant databases:
Grant databases:
• •Registrar System
Registrar System e.g. NSF/ NIH
e.g. NSF/ NIH
• •Faculty Activity Systems
Faculty Activity Systems • •National Organizations:
National Organizations:
• •Events and Seminars
Events and Seminars AAAS, AMA, etc.
AAAS, AMA, etc.
Data stored as RDF triples
Data stored as RDF triples
using standard ontology
using standard ontology
VIVO data is available for reuse by web pages, applications, and
other consumers both within and outside the institution.
7. How does VIVO store data?
Information is stored using the Resource Description Framework (RDF) and
data are structured in the form of “triples” as subject-predicate-object.
Concepts and their relationships use a shared ontology to facilitate the
harvesting of data from multiple sources.
Dept. of
Genetics
College of
Medicine
is member of
Jane
Genetics
Smith has affiliations with Institute
Journal
author of
article
Book Book
chapter
Subject Predicate Object
10. Using VIVO data By storing data in VIVO in RDF and using standard
ontologies, the information in VIVO can either be
displayed in a human readable web page or delivered
directly to other systems as RDF. This allows the open
researcher data in VIVO to be harvested, aggregated, and
integrated into the Linked Open Data cloud.
VIVO enables authoritative data about researchers to become part of the Linked Data cloud.
11. The Semantic Web
& Researcher Networking
• Increasing recognition of the value of semantic web standards
• Increasing momentum in support of semantic web
technologies to facilitate research discovery
• Recommendations for researcher networking recently
endorsed by the CTSA Consortium Steering Committee
represent a new standard in researcher networking.
– Read more at http://vivoweb.org/blog
• Examples of applications that consume these rich data
include: visualizations, enhanced multi-site search, and
VIVO Searchlight. Other utilities are in development across a
wide range of topic areas.
12. Notable SemWeb projects
• Dbpedia is a community effort to extract structured information from Wikipedia and to make this
information available on the Web.
• NextBio is a database consolidating high-throughput life sciences experimental data tagged and
connected via biomedical ontologies.
• GoPubMed a semantic search engine for the life sciences. It uses the GeneOntology (GO) and the
Medical Subject Headings (MeSH) to semantically filter millions of biomedical abstracts from
MEDLINE.
• OpenPHACTS will create an open innovative platform, Open Pharmacological Space, which will be
freely accessible for knowledge discovery and verification. Open PHACTS will provide a growing
body of data on small molecules, their pharmacological profiles, pharmacokinetics, biological
targets and pathways in a semantically interoperable format. Aligning and integrating proprietary
and public data sources into a single system is currently a very difficult and time consuming task,
repeated across companies, institutes and academic laboratories.
• Open Government initiatives
• Publications efforts
• DOD
• Federal Profiling
• Many others
21. University of Florida Indiana University
VIVO Collaboration
Mike Conlon (VIVO and UF PI) Katy Borner (IU PI)
Beth Auten Kavitha Chandrasekar
Michael Barbieri Bin Chen
Chris Barnes Shanshan Chen
Kaitlin Blackburn Ryan Cobine
Cecilia Botero Jeni Coffey Cornell University
Kerry Britt Suresh Deivasigamani Dean Krafft (Cornell PI)
Washington University School of
Erin Brooks Ying Ding Manolo Bevia Medicine in St. Louis
Amy Buhler Russell Duhon Jim Blake Rakesh Nagarajan (WUSTL PI)
Ellie Bushhousen Jon Dunn Nick Cappadona Kristi L. Holmes
Linda Butson Poornima Gopinath Brian Caruso Caerie Houchins
Chris Case Julie Hardesty Jon Corson-Rikert George Joseph
Christine Cogar Brian Keese Elly Cramer Sunita B. Koul
Valrie Davis Namrata Lele Medha Devare Leslie D. McIntosh
Mary Edwards Micah Linnemeier Elizabeth Hines
Nita Ferree Nianli Ma Huda Khan Weill Cornell Medical College
Rolando Garcia-Milan Robert H. McDonald Depak Konidena Curtis Cole (Weill PI)
George Hack Asik Pradhan Gongaju Brian Lowe Paul Albert
Chris Haines Mark Price Joseph McEnerney Victor Brodsky
Sara Henning Michael Stamper Holly Mistlebauer Mark Bronnimann
Rae Jesano Yuyin Sun Stella Mitchell Adam Cheriff
Margeaux Johnson Chintan Tank Anup Sawant Oscar Cruz
Meghan Latorre Alan Walsh Christopher Westling Dan Dickinson
Yang Li Brian Wheeler Tim Worrall Richard Hu
Jennifer Lyon Feng Wu Rebecca Younes Chris Huang
Paula Markes Angela Zoss Itay Klaz
Hannah Norton Kenneth Lee
James Pence The Scripps Research Ponce School of Medicine Peter Michelini
Narayan Raum
Institute Richard J. Noel, Jr. (Ponce PI) Grace Migliorisi
Nicholas Rejack John Ruffing
Gerald Joyce (Scripps PI) Ricardo Espada Colon
Alexander Rockwell Jason Specland
Catherine Dunn Damaris Torres Cruz
Sara Russell Gonzalez Tru Tran
Sam Katkov Michael Vega Negrón
Nancy Schaefer Vinay Varughese
Brant Kelley
Dale Scheppler Virgil Wong
Paula King
Nicholas Skaggs
Angela Murrell
Matthew Tedder
Barbara Noble
Michele R. Tennant
Alicia Turner
Cary Thomas This project is funded by the National Institutes of Health, U24 RR029822
Michaeleen Trimarchi "VIVO: Enabling National Networking of Scientists”
Stephen Williams
22. Acknowledgements
Funding: Collaborations:
• VIVO, NIH award U24 RR029822 • Washington University ICTS,
• Washington University Institute Departments
of Clinical and Translational • VIVO colleagues from across
Sciences, NIH award UL1 the country
RR024992 • Becker Library colleagues
Questions: • Library colleagues everywhere
• holmeskr@wustl.edu
• Twitter: @kristiholmes
• http://vivo.wustl.edu/display/n4754 Thanks!
I work on an open-source, semantic web based project called VIVO. Our work on VIVO is concerned with representing information about people – of course this includes the people we likely have in our minds right now who are typically participating in some way in the scholarly environment But increasingly we’ve also had a lot of interest from people who are more non-traditional – such as the citizen scientists. Important to represent a person’s interests, efforts and and areas of expertise Who are they? What do they do? What do they study and contribute to our understanding of our world around us?
At each implementation, VIVO enables research discovery – providing verifiable information about research and researchers. Each institution provides its own VIVO system and data. Local governance determines data to be provided. Across institutions VIVO provides a uniform semantic structure to enable a new class of tools using the data to advance science.
What is VIVO? It’s a semantic web application with rich profiles that display publications, teaching, service and professional affiliations. Faceted search for fast and meaningful results. What do you mean by “Semantic Web”? A group of methods and technologies to allow machines to understand the meaning – or "semantics" – of information on the World Wide Web. --------- Goal of VIVO : Improve all of science by providing the means for sharing and using current, accurate, and precise information regarding scientists’ interest, activities, and accomplishments. Foster team science by providing tools for identifying potential collaborators . Improve collaboration by creating tools that consume this data and repurpose it in such a way to enhance new and existing teams. Not limited to science – at Cornell, VIVO covers all disciplines across the entire institution
Profiles are largely created via automated data feeds , but can be customized to suit the needs of the individual. Information is open source (free) and is stored in a framework that allows for exporting to other applications. Profiles are richer in content than typical [web pages or] social networking sites and will rank higher in general internet searches.
VIVO harvests much of its data automatically from verified sources Therefore, reducing the need for manual input of data & centralizing information and providing an integrated source. Much of the data in VIVO profiles is ingested from authoritative sources so it is accurate and current, reducing the need for manual input. The rich information in VIVO profiles can be repurposed and shared with other institutional web pages and consumers, reducing cost and increasing efficiencies across the institution. Private or sensitive information is never imported into VIVO. Only public information will be stored and displayed. Data is housed and maintained at the local institutions. There it can be updated on a regular basis. Search results are faceted so information can be located rapidly and with less time spent sorting through information. So where do we get our information for VIVO? So far agencies, repositories, and aggregators have been identified for VIVO.
Each element – subject, predicate, object is governed by ontologies with semantics. VIVO 1.2 includes an ontology module representing research resources such as biological specimens, human studies, instruments, organisms, protocols, reagents, and research opportunities. This module is aligned with the top-level ontology classes and properties ! from the NIH-funded eagle-i Project (https://www.eagle-i.org/home/). We’re also developing the extensions to the ontology that would allow more diverse types of efforts to be included in the profiles (blogs are currently avalable, wiki edits are not) – very important for microattribution/nanopublications efforts
VIVO uses linked open data concepts to provide data as RDF at URIs for each scientist. Critically important for building a web of data. Predicates have addresses, sites point to objects in other triples stores. Resolve queries across triple stores – “show investigators who genetic work is implicated in breast cancer.” VIVO won’t have information linkages between breast cancer and disease. Other resources will. But VIVO can link to external sources. “Mike worksOn GeneY” So where does data about Interests, activities and accomplishments come from? Archives. Data Aggregators. Publishers. Institutional repositories. So now we turn to tools
Strong open source development component to the project – this is reflected in part by the top notch applications that were submitted to a recent call for applications by the project
Miles Worthington Image from Dr. Barend Mons, Scientific Director of the Netherlands Bioinformatics Institute Allows experts to be found, but also ties the object to specific concepts
Nick benik at Harvard
There are many beautiful visualizations, developed by Katy Borner’s group at Indiana University. These include co-author and co-investigator networks and even temporal visualizations which allows discovery of grants and publications by defined groups over time within and beyond an institution. Most recently, the visualization team implemented a Science Map visualization, which allows users to visually explore the scientific strengths of a university, school, department, or person in the VIVO instance. Users will be able to see where an organization or person’s interests lay across 13 major scientific disciplines or 554 sub-disciplines, and will be able to see how these disciplines and sub-disciplines interrelate with one another on the map of science.
DEEP SEMANTIC SEARCH While searches for people are an obvious requirement for researcher networking, we don't want to limit ourselves to searching for people. VIVO's ontology-based data model is not limited to profiles of people, but includes organizations, events, publications, grants, and many other types of data. This enables VIVO to represent the relationships among people and other types of data as an interconnected network that can be accessed in many ways.
Core project development is augmented with contributions and feedback by other developers across multiple institutions on SourceForge. The open source community around VIVO is robust and dedicated. SourceForge also offers an open environment to share materials and ideas related to implementation and adoption. More and more content is added every day
As you can see, The VIVO project itself is a rather large, geographically dispersed team. 7 institutions Project areas: development, implementation, ontology, and outreach Inspiring, hard-working group of people with whom I am grateful to know and collaborate with on the project.