O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (SWAT4LS 2017 Conference)

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio

Confira estes a seguir

1 de 25 Anúncio

Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (SWAT4LS 2017 Conference)

Baixar para ler offline

The Center for Expanded Data Annotation and Retrieval (CEDAR) has developed a suite of tools and services that allow scientists to create and publish metadata describing scientific experiments. Using these tools and services—referred to collectively as the CEDAR Workbench—scientists can collaboratively author metadata and submit them to public repositories. A key focus of our software is semantically enriching metadata with ontology terms. The system combines emerging technologies, such as JSON-LD and graph databases, with modern software development technologies, such as microservices and container platforms. The result is a suite of user-friendly, Web-based tools and REST APIs that provide a versatile end-to-end solution to the problems of metadata authoring and management. This talk presents the architecture of the CEDAR Workbench and focuses on the technology choices made to construct an easily usable, open system that allows users to create and publish semantically enriched metadata in standard Web formats.

The Center for Expanded Data Annotation and Retrieval (CEDAR) has developed a suite of tools and services that allow scientists to create and publish metadata describing scientific experiments. Using these tools and services—referred to collectively as the CEDAR Workbench—scientists can collaboratively author metadata and submit them to public repositories. A key focus of our software is semantically enriching metadata with ontology terms. The system combines emerging technologies, such as JSON-LD and graph databases, with modern software development technologies, such as microservices and container platforms. The result is a suite of user-friendly, Web-based tools and REST APIs that provide a versatile end-to-end solution to the problems of metadata authoring and management. This talk presents the architecture of the CEDAR Workbench and focuses on the technology choices made to construct an easily usable, open system that allows users to create and publish semantically enriched metadata in standard Web formats.

Anúncio
Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (SWAT4LS 2017 Conference) (20)

Anúncio

Mais recentes (20)

Embracing Semantic Technology for Better Metadata Authoring in Biomedicine (SWAT4LS 2017 Conference)

  1. 1. Attila L. Egyedi, Martin O’Connor, Marcos Martínez-Romero, Debra Willrett, Josef Hardi, John Graybeal and Mark Musen Biomedical Informatics Research Stanford University Stanford, California, USA Embracing Semantic Technology for Better Metadata Authoring in Biomedicine
  2. 2. 2 What are metadata? Data that describe data Crucial for: • Finding experimental datasets online • Understanding how the experiments were performed • Reusing the data to perform new analyses
  3. 3. 3 What do metadata look like?
  4. 4. age Age AGE `Age age (after birth) age (in years) age (y) age (year) age (years) Age (years) Age (Years) age (yr) age (yr-old) age (yrs) Age (yrs) age [y] age [year] age [years] age in years age of patient Age of patient age of subjects age(years) Age(years) Age(yrs.) Age, year age, years age, yrs age.year age_years 4 Metadata quality is poor
  5. 5. Gonçalves, R. S. et al. (2017). Metadata in the BioSample Online Repository are Impaired by Numerous Anomalies. SemSci 2017 Workshop, co-located with ISWC 2017. Vienna, Austria. Value type Invalid % Example values Boolean 73% nonsmoker, former-smoker Integer 26% JM52, UVPgt59.4, pig Ontology term 68% presumed normal, wild_type An analysis of metadata from NCBI’s BioSample repository 5 Metadata quality is poor
  6. 6. 6 Metadata authoring is hard
  7. 7. 7 Our solution: CEDAR • A web application for metadata management and submission • Goal: Overcome the impediments to creating high-quality metadata
  8. 8. 8 CEDAR metadata pipeline SUBMIT METADATAFILL IN METADATADESIGN TEMPLATE Template Designer Metadata Editor Template authors (e.g., standards committees) Metadata authors (e.g., scientists) Metadata Repositorytemplate metadata LINCS Public Databases https://cedar.metadatacenter.org/templates/edit/https://repo.metadatacenter.org/templates/ab105771-564e-42a1-9be4-5a63891… https://cedar.metadatacenter.org/instances/edit/https://repo.metadatacenter.org/template-instances/d4f1059e-8e27-4166-902f-… A sample study Acute stress disorder Stanford University John Doe Longitudinal
  9. 9. 9 CEDAR System Architecture Metadata Repository (MongoDB) Folders, Groups & Permissions (Neo4j DB) Users (MongoDB) = Third-party components = CEDAR components Storage user profiles user authorization Templates Elements Fields Metadata Template Model User Service Template Service Value Recommender Service Auth. Service (Keycloak) Open Services user management resource management intelligent authoring controlled terms Resource Service Workspace Service = Only internal access Group Service Worker Service Queues & Caching (Redis) Submission Service Metadata CreatorTemplate Designer Resource Manager metadata export r1 r2 rn … Public Databases NCBO BioPortal Open Services Terminology Service Messaging Service Messages (MySQL) Search Engine (Elasticsearch) messages Template Designer Metadata EditorResource Manager
  10. 10. 10 CEDAR Template Model O’Connor et al.: An open repository model for acquiring knowledge about scientific experiments. Proceedings of the 20th International Conference on Knowledge Engineering and Knowledge Management (EKAW2016), 2016. JSON Schema + JSON-LD JSON-LD TemplatesElementsFields Metadata Storage Open Services Front End Template Model
  11. 11. 11 Infrastructure Layer Metadata Repository Study 2 metadata Study 1 metadata BioSample template isBasedOn isBasedOn Storage Open Services Front End Template Model
  12. 12. 12 Infrastructure Layer Folders, Groups & Permissions Metadata Repository Study 2 metadata Study 1 metadata BioSample template isBasedOn isBasedOn Bob Everybody / Users CONTAINS CEDAR Admin CONTAINS CEDAR Admin OWNS MEMBEROF MEMBEROF ADMINISTERS OWNS OWNS BioSample Study 1 Study 2 CONTAINSCONTAINS OWNS OWNS CANREAD OWNS Bob CONTAINS OWNS CONTAINS OWNS Studies CONTAINS Group User Folder Metadata Template • Folders • Permissions • Sharing Storage Open Services Front End Template Model
  13. 13. Services Layer Authentication 13 Storage Open Services Front End Template Model
  14. 14. 14 Services Layer • Resource Service – core metadata repository service • Terminology Service – ontology repositories • Value Recommender Service – metadata recommendations • Submission Service – submission to public repositories db1 db2 dbn … Storage Open Services Front End Template Model
  15. 15. Services Layer 15 Storage Open Services Front End Template Model
  16. 16. 16 Resource Manager • Organize • Share • Submit Storage Open Services Front End Template Model
  17. 17. Template Designer • Create • Reuse • Annotate • Constrain 17 Storage Open Services Front End Template Model
  18. 18. Template Designer 18 • Search BioPortal • Pick ontology branch Storage Open Services Front End Template Model
  19. 19. Template Designer 19 Constraint is created Storage Open Services Front End Template Model
  20. 20. 20 Metadata Editor • Fill in • Validate Martínez-Romero, M. et al.: Fast and accurate metadata authoring using ontology-based recommendations. Proceedings of AMIA 2017 Annual Symposium, 2017. Storage Open Services Front End Template Model
  21. 21. { "@context": "Title": "schema:name", "Disorder": "ocre:OCRE900086" }, "@id": "http://example.org/1234", "@type": "ncit:C63536", "Title": { "@value": "A sample study" }, "Disorder": { "@id": "doid:DOID_8986", "rdfs:label": "narcolepsy" } } 21 CEDAR publishes semantically rich metadata
  22. 22. <ncit:C63536> <doid:DOID_8986> "A sample study" rdf:type schema:name ocre:OCRE900086 "narcolepsy" rdfs:label 22 CEDAR publishes semantically rich metadata
  23. 23. 23 Who we work with AIRR Community
  24. 24. 24 Summary • Authoring metadata is hard and time-consuming • Authoring semantic metadata is even harder • The CEDAR Workbench provides a pipeline for creating high quality, semantically rich metadata Template Model JSON Schema JSON-LD Neo4j Microservices Docker Key technology choices:
  25. 25. facebook.com/MetadataCenter @metadatacenter https://cedar.metadatacenter.org channel: Metadata Center github.com/metadatacenter plus.google.com/+MetadataCenterOrg

×