Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
B07-GenomeContent-Biomart
1. BioMart 0.8 offers new tools, more
interfaces, and increased flexibility
through plugins
Junjun Zhang
BOSC 2011, Vienna, Austria
July 15, 2011
2. BioMart: an open source federated
data management system
• Widely used by public/private biological databases
• Quickly bring in-house data accessible online
• User friendly and flexible querying interfaces: web
GUI and programmatic access API (REST, Perl,
biomaRt etc)
• Automated data conversion tool
• Effortlessly federate in-house datasets with existing
public BioMart datasets
www.biomart.org 2
3. BioMart 0.8 new features
• Integrated Java application makes it possible to build a
BioMart data source, configure querying and presentation
interfaces, and deploy a BioMart server from a single tool
(MartConfigurator)
• Support more RDBMS (MS SQL Server, DB2, in addition to
MySQL, PostgreSQL, and Oracle)
• Create ‘virtual mart’ from 3NF normalized source database
without materialization
• New diverse Web GUIs and APIs provide added flexibility and
ease of use
• Link indexing and parallel querying optimizations
• Support several security features (HTTPS, OpenID and oAuth
protocols) for managing sensitive data
• Extendable plugin framework for analysis and visualization 3
4. Basic BioMart Concepts – the
Power of Simplicity
Building
or
querying
a
BioMart
data
source
only
requires
understanding
of
a
few
basic
concepts:
• DataSource
• DataMart
• DataSet
• A;ribute
• Filter
• AccessPoint
(new)
• Analysis
(new)
• Parameter
(new)
BioMart
hides
complexity
of
underlie
database
schema
and
federaCon
mechanism.
4
13. Special GUI - MartReport
Ensembl
KEGG
Reactome
Mutation frequencies from
cancer projects with data
distributed around the globe
COSMIC
Pancreatic Expression Database
(PED)
Breast Cancer Campaign Tissue Bank
(BCCTB) 13
14. Special GUI - MartAnalysis
Mostly affected pathways
14
15. Special GUI – MartAnalysis
Genomic sequence retrieval tool
Sequence retrieval
tool is implemented
as server-side
analysis plugin
15
17. Several large collaborative projects are
using BioMart for data management
• BioMart Central Portal (http://central.biomart.org)
• International Cancer Genome Consortium (http://dcc.icgc.org)
• POPCURE (collaboration with Pfizer, controlled access)
17
18. BioMart Central Portal (central.biomart.org)
First-‐of-‐its
kind,
community-‐driven
effort
to
provide
unified
access
to
dozens
of
biological
databases
spanning
genomics,
proteomics,
model
organisms,
cancer
data,
and
more
18
20. International Cancer Genome Consortium Data Portal
CANADA EU / UNITED
Pancreatic cancer KINGDOM
(Ductal adenocarcinoma) Breast cancer
Prostate cancer (ER positive, HER2 negative)
(Adenocarcinoma)
GERMANY
UNITED STATES UNITED Malignant lymphoma
Bladder cancer KINGDOM (Germinal center B-cell
derived lymphomas)
Blood cancer Bone cancer Pediatric brain tumors
(Acute myeloid leukemia) (Osteosarcoma/ (Medulloblastoma and
Brain cancer chondrosarcoma/ Pediatric pilocytic
(Glioblastoma multiforme/ rare subtypes) astrocytoma) CHINA
lower grade glioma) Breast cancer
Breast cancer (Triple negative/lobular/
Prostate cancer Gastric cancer
(Intestinal- and di use-type)
JAPAN
(Early onset)
(Ductal & lobular) other) Liver cancer
Cervical cancer Chronic Myeloid Disorders (Hepatocellular carcinoma)
(Squamous) (Myelodysplastic syndromes, (Virus-associated)
Colon cancer myeloproliferative neoplasms
(Adenocarcinoma) and other chronic myeloid
Endometrial cancer malignancies)
(Uterine corpus endometrial Esophageal cancer
carcinoma) Prostate cancer
Gastric cancer
(Adenocarcinoma)
Head and neck cancer EU / FRANCE
(Squamous cell carcinoma/ Renal cancer
Thyroid carcinoma) (Renal cell carcinoma)
Renal cancer (Focus on but not limited
(Renal clear cell carcinoma/ to clear cell subtype)
Renal papillary carcinoma)
Liver cancer ITALY AUSTRALIA
(Hepatocellular carcinoma)
Lung cancer
FRANCE Rare pancreatic tumors
(Enteropancreatic endocrine INDIA Ovarian cancer
Breast cancer (Serous cystadenocarcinoma)
(Adenocarcinoma/ tumors and rare pancreatic Oral cancer
(Subtype de ned by an Pancreatic cancer
squamous cell carcinoma) exocrine tumors) (Gingivobuccal)
ampli cation of the (Ductal adenocarcinoma)
Ovarian cancer Prostate cancer
(Serous cystadenocarcinoma) MEXICO HER2 gene)
Liver cancer
Prostate cancer
(Adenocarcinoma)
Multiple sub-types (Hepatocellular carcinoma) SPAIN
Rectal cancer (Secondary to alcohol Chronic lymphocytic
(Adenocarcinoma) and adiposity) leukemia
Skin cancer Prostate cancer (CLL with mutated and
(Cutaneous melanoma) (Adenocarcinoma) unmutated IgVH)
GOALS: To obtain a comprehensive description of genomic, transcriptomic, and
epigenomic changes in 50 different tumor types and/or subtypes, which are of clinical
and societal importance across the globe. 500 tumor and matched control samples will
be analyzed per tumor type. At present, 12 countries joined ICGC. Data will be
generated by institutions all over the world.
To make the data available rapidly and with minimal restrictions, to accelerate
research of the causes and control of cancer.
20
23. Future Directions
• Creation of BioMart Central Registry to improve
coordination between BioMart servers. It will be a
permanent resource where BioMart data providers can
register their data models, data sources and services.
• Enhancing data transformation module for building
BioMart databases from non-RDBMS data sources (e.g.
flat data files, XML data files etc) with high scalability
and flexibility.
• Enhancing the plugin system to allow various forms of
data analysis and visualization. Third parties are
encouraged to develop plugins to extend the capabilities
of the system.
23
24. The BioMart team
Joachim
Baran
Anthony
Cros
Jonathan
Guberman
For
support:
users@biomart.org
Jack
Hsu
Yong
Liang
Elena
Rivkin
Bre;
Whi;y
Marie
Wong-‐Erasmus
Long
Yao
Syed
Haider
Junjun
Zhang
Arek
Kasprzyk
24