8. What does it mean?
❓
Problem
❗
Key obstacle
💡
Solution
Too much data for any single institution.
Not enough data to make new discoveries.
Discovering data.
Federated system capable of executing cross-dataset and
cross-institution queries.
9. • initiative started in 2014 across many groups within GA4GH
• experiment to test the willingness to share in the simplest of all technical contexts
• simple web service
• receives questions of the form Do you have information about this mutation?
• responds with yes or no (optionally additional metadata)
• design principles
• A beacon has to be technically simple.
• A beacon has to minimize risks associated with genomic data sharing.
• It has to be possible to make a beacon publicly available.
Beacon Project
https://beacon-project.io/
10. • no formal specification
• receives questions of the form Do you have information about this mutation?
• responds with yes or no
• 4 public beacons, each API different
Standard: Before Beacon Network
• request method
• supported parameters
• parameter names
• chromosome identifiers
• positional base
• assembly notation
• supported alleles
• dataset support
• response format
• data included in the response
13. • 2015
• true/false/overlap/null response
• datasets
• data use conditions
• self description
• complex (9 records)
• format: Avro
• not well adopted
• not polished enough
Standard: 0.2
14. • 2016
• simplified 0.2
• true/false/null response
• data model improvements, extended
metadata and response, improved
support for datasets and cross-dataset
queries, data versioning
• modular and extensible
• tooling
• format: Avro → Proto3
• based on real needs, successful
Standard: 0.3
15. • 2018
• stable and more flexible
• support for more complex
mutations
• improved error handling
• improved data use
conditions
• developer experience
• format: Proto3 → OpenAPI
Standard: 0.4
16. • promoted 0.4
• extended documentation
and best practices
Standard: 1.0
Finally ready! 🎉
19. Service
• communication with other subsystems
• query normalization
• aggregators
• participant resolution
• query distribution
• audit trail
• L1 parallelization
20. Processor
• executing a query against a beacon
and processing its response
• management of a flexible, dynamic and
easily extensible query execution pipeline
• pipeline stages resolution (CDI and EJB)
• L2 parallelization
• cross-assembly query handling
22. Requester
• second stage in the query execution pipeline
• constructing beacon requests based on their
URIs and parameters produced by the
converters
23. Fetcher
• third stage in the query execution pipeline
• unit actually talking to the API of beacons
• submitting requests over the network and
obtaining the raw response
24. Parser
• last stage in the pipeline
• extracting information of interest from the
raw response obtained by a fetcher
• dealing with various formats
• handling metadata, multiple responses, errors
• response normalization
• parallelized