SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
BioMart 0.8 offers new tools, more
interfaces, and increased flexibility
through plugins


             Junjun Zhang
       BOSC 2011, Vienna, Austria
             July 15, 2011
BioMart: an open source federated
data management system
•  Widely used by public/private biological databases

•  Quickly bring in-house data accessible online

•  User friendly and flexible querying interfaces: web
   GUI and programmatic access API (REST, Perl,
   biomaRt etc)

•  Automated data conversion tool

•  Effortlessly federate in-house datasets with existing
   public BioMart datasets

                www.biomart.org                            2

                            	
  
BioMart 0.8 new features
 •  Integrated Java application makes it possible to build a
    BioMart data source, configure querying and presentation
    interfaces, and deploy a BioMart server from a single tool
    (MartConfigurator)

 •  Support more RDBMS (MS SQL Server, DB2, in addition to
    MySQL, PostgreSQL, and Oracle)

 •  Create ‘virtual mart’ from 3NF normalized source database
    without materialization

 •  New diverse Web GUIs and APIs provide added flexibility and
    ease of use

 •  Link indexing and parallel querying optimizations

 •  Support several security features (HTTPS, OpenID and oAuth
    protocols) for managing sensitive data

 •  Extendable plugin framework for analysis and visualization    3
Basic BioMart Concepts – the
Power of Simplicity
Building	
  or	
  querying	
  a	
  BioMart	
  data	
  source	
  only	
  requires	
  
understanding	
  of	
  a	
  few	
  basic	
  concepts:	
  
•  DataSource	
  
•  DataMart	
  
•  DataSet	
  
•  A;ribute	
  	
  
•  Filter	
  
•  AccessPoint	
  (new)	
  
•  Analysis	
  (new)	
  
•  Parameter	
  (new)	
  
	
  

BioMart	
  hides	
  complexity	
  of	
  underlie	
  database	
  schema	
  and	
  
federaCon	
  mechanism.	
  
                                                                                       4
BioMart dataset is organized in a reverse
star schema




                                            5
3NF normalized database can be converted to
reversed star schema




                                                   Source	
  schema	
  




                                   Reverse	
  star	
  schema	
  
                                                                          6
BioMart system components

                                          Client-­‐side	
  
                                            	
  Plugin	
  
                                                	
  	
  	
  




            Query	
  Engine	
  /	
  Plugin	
  




                                                               7
MartConfigurator – an integrated tool
for setting up, configuring and
managing a BioMart server




                                        8
BioMart 0.8 provides several data querying GUIs
                    MartForm




                                                  9
BioMart 0.8 provides several data querying GUIs

                    MartWizard




                                                  10
BioMart 0.8 provides several data querying GUIs

                    MartExplorer




                                                  11
Programmatic access API query syntax at the click
of a button




                                                    12
Special GUI - MartReport
Ensembl
KEGG
Reactome


Mutation frequencies from
cancer projects with data
distributed around the globe



COSMIC




Pancreatic Expression Database
(PED)
Breast Cancer Campaign Tissue Bank
(BCCTB)                              13
Special GUI - MartAnalysis
                 Mostly affected pathways




                                            14
Special GUI – MartAnalysis
      Genomic sequence retrieval tool




                                        Sequence retrieval
                                        tool is implemented
                                        as server-side
                                        analysis plugin




                                                         15
New query type - Analysis
Query against ‘affected_pathways’ analysis:
<Query>
       <Analysis name="affected_pathways" dataset="gene_oicrPanc">
                <Parameter name="biotype" value="protein_coding"/>
                <Parameter name="file_type" value=”png"/>
                <Parameter name="img_height" value="8000"/>
                <Parameter name="img_width" value="12000"/>
       </Analysis>
</Query>

Query against ‘gene_sequence’ sequence retrieval tool:
<Query>
       <Analysis name="gene_sequence">
                <Parameter name="seq_type" value="gene_flank"/>
                <Parameter name="upstream_flank" value="500"/>
       </Analysis>
</Query>


                                                                     16
Several large collaborative projects are
using BioMart for data management


•  BioMart Central Portal (http://central.biomart.org)

•  International Cancer Genome Consortium (http://dcc.icgc.org)

•  POPCURE (collaboration with Pfizer, controlled access)




                                                                  17
BioMart Central Portal    (central.biomart.org)




                         First-­‐of-­‐its	
  kind,	
  community-­‐driven	
  effort	
  
                         to	
  provide	
  unified	
  access	
  to	
  dozens	
  of	
  
                         biological	
  databases	
  spanning	
  genomics,	
  
                         proteomics,	
  model	
  organisms,	
  cancer	
  
                         data,	
  and	
  more	
  

                                                                                        18
BioMart Portal provides access to a collection
of data sources




                                       “Master/Slave” like




                                                             19
International Cancer Genome Consortium Data Portal
        CANADA                                              EU / UNITED
        Pancreatic cancer                                   KINGDOM
        (Ductal adenocarcinoma)                             Breast cancer
        Prostate cancer                                     (ER positive, HER2 negative)
        (Adenocarcinoma)
                                                                                            GERMANY
        UNITED STATES                                        UNITED                        Malignant lymphoma
        Bladder cancer                                       KINGDOM                       (Germinal center B-cell
                                                                                           derived lymphomas)
        Blood cancer                                        Bone cancer                    Pediatric brain tumors
        (Acute myeloid leukemia)                            (Osteosarcoma/                 (Medulloblastoma and
        Brain cancer                                        chondrosarcoma/                Pediatric pilocytic
        (Glioblastoma multiforme/                           rare subtypes)                  astrocytoma)                 CHINA
        lower grade glioma)                                 Breast cancer
        Breast cancer                                       (Triple negative/lobular/
                                                                                           Prostate cancer               Gastric cancer
                                                                                                                         (Intestinal- and di use-type)
                                                                                                                                                         JAPAN
                                                                                           (Early onset)
        (Ductal & lobular)                                  other)                                                                                       Liver cancer
        Cervical cancer                                     Chronic Myeloid Disorders                                                                    (Hepatocellular carcinoma)
        (Squamous)                                          (Myelodysplastic syndromes,                                                                  (Virus-associated)
        Colon cancer                                        myeloproliferative neoplasms
        (Adenocarcinoma)                                     and other chronic myeloid
        Endometrial cancer                                  malignancies)
        (Uterine corpus endometrial                         Esophageal cancer
         carcinoma)                                         Prostate cancer
        Gastric cancer
        (Adenocarcinoma)
        Head and neck cancer                                 EU / FRANCE
        (Squamous cell carcinoma/                           Renal cancer
        Thyroid carcinoma)                                  (Renal cell carcinoma)
        Renal cancer                                        (Focus on but not limited
        (Renal clear cell carcinoma/                         to clear cell subtype)
        Renal papillary carcinoma)
        Liver cancer                                                                       ITALY                                                         AUSTRALIA
        (Hepatocellular carcinoma)
        Lung cancer
                                                             FRANCE                        Rare pancreatic tumors
                                                                                           (Enteropancreatic endocrine   INDIA                           Ovarian cancer
                                                            Breast cancer                                                                                (Serous cystadenocarcinoma)
        (Adenocarcinoma/                                                                   tumors and rare pancreatic    Oral cancer
                                                            (Subtype de ned by an                                                                        Pancreatic cancer
        squamous cell carcinoma)                                                           exocrine tumors)              (Gingivobuccal)
                                                            ampli cation of the                                                                          (Ductal adenocarcinoma)
        Ovarian cancer                                                                                                                                   Prostate cancer
        (Serous cystadenocarcinoma)    MEXICO               HER2 gene)
                                                            Liver cancer
        Prostate cancer
        (Adenocarcinoma)
                                       Multiple sub-types   (Hepatocellular carcinoma)     SPAIN
        Rectal cancer                                       (Secondary to alcohol          Chronic lymphocytic
        (Adenocarcinoma)                                     and adiposity)                leukemia
        Skin cancer                                         Prostate cancer                (CLL with mutated and
        (Cutaneous melanoma)                                (Adenocarcinoma)               unmutated IgVH)




   GOALS: To obtain a comprehensive description of genomic, transcriptomic, and
   epigenomic changes in 50 different tumor types and/or subtypes, which are of clinical
   and societal importance across the globe. 500 tumor and matched control samples will
   be analyzed per tumor type. At present, 12 countries joined ICGC. Data will be
   generated by institutions all over the world.

   To make the data available rapidly and with minimal restrictions, to accelerate
   research of the causes and control of cancer.
                                                                                                                                                                                       20
ICGC Data Portal Architecture




          “Peer-to-Peer” like




                                21
(dcc.icgc.org)




                 22
Future Directions

•  Creation of BioMart Central Registry to improve
   coordination between BioMart servers. It will be a
   permanent resource where BioMart data providers can
   register their data models, data sources and services.

•  Enhancing data transformation module for building
   BioMart databases from non-RDBMS data sources (e.g.
   flat data files, XML data files etc) with high scalability
   and flexibility.

•  Enhancing the plugin system to allow various forms of
   data analysis and visualization. Third parties are
   encouraged to develop plugins to extend the capabilities
   of the system.
                                                                23
The BioMart team
    Joachim	
  Baran	
  
    Anthony	
  Cros	
  
    Jonathan	
  Guberman	
        For	
  support:	
  users@biomart.org	
  
    Jack	
  Hsu	
  
    Yong	
  Liang	
  
    Elena	
  Rivkin	
  
    Bre;	
  Whi;y	
  
    Marie	
  Wong-­‐Erasmus	
  
    Long	
  Yao	
  
    Syed	
  Haider	
  
    Junjun	
  Zhang	
  
    Arek	
  Kasprzyk	
  
                                                                         24

Mais conteúdo relacionado

Destaque

The Role of Sustainability in Career and Workforce Development
The Role of Sustainability in Career and Workforce DevelopmentThe Role of Sustainability in Career and Workforce Development
The Role of Sustainability in Career and Workforce DevelopmentMieko Ozeki
 
Magazine Call Sheet
Magazine Call SheetMagazine Call Sheet
Magazine Call Sheetalyblue98
 
Tafseer Ibn-e-Katheer Part 11 (urdu)
Tafseer Ibn-e-Katheer Part 11 (urdu)Tafseer Ibn-e-Katheer Part 11 (urdu)
Tafseer Ibn-e-Katheer Part 11 (urdu)World
 
Learning analytics definitions processes potential
Learning analytics definitions processes potentialLearning analytics definitions processes potential
Learning analytics definitions processes potentialFernando Bordignon
 
מגזין קראון פלזה ישראל חורף אביב 2013
מגזין קראון פלזה ישראל חורף אביב 2013מגזין קראון פלזה ישראל חורף אביב 2013
מגזין קראון פלזה ישראל חורף אביב 2013Crowne Plaza Israel
 
การป้องกันและแก้ไขปัญหาภัยหนาวและภัยแล้ง
การป้องกันและแก้ไขปัญหาภัยหนาวและภัยแล้งการป้องกันและแก้ไขปัญหาภัยหนาวและภัยแล้ง
การป้องกันและแก้ไขปัญหาภัยหนาวและภัยแล้งBEnz Sing
 
Banderas de los paises de asia
Banderas de los paises de asiaBanderas de los paises de asia
Banderas de los paises de asiaalexanderc18
 

Destaque (12)

The Role of Sustainability in Career and Workforce Development
The Role of Sustainability in Career and Workforce DevelopmentThe Role of Sustainability in Career and Workforce Development
The Role of Sustainability in Career and Workforce Development
 
Symbol unik
Symbol unikSymbol unik
Symbol unik
 
H&M Q1 2016 results
H&M Q1 2016 resultsH&M Q1 2016 results
H&M Q1 2016 results
 
Module 2-Friendship Ambassadors Development/PR
Module 2-Friendship Ambassadors Development/PRModule 2-Friendship Ambassadors Development/PR
Module 2-Friendship Ambassadors Development/PR
 
Demystify Accessibility
Demystify AccessibilityDemystify Accessibility
Demystify Accessibility
 
11 ege
11 ege11 ege
11 ege
 
Magazine Call Sheet
Magazine Call SheetMagazine Call Sheet
Magazine Call Sheet
 
Tafseer Ibn-e-Katheer Part 11 (urdu)
Tafseer Ibn-e-Katheer Part 11 (urdu)Tafseer Ibn-e-Katheer Part 11 (urdu)
Tafseer Ibn-e-Katheer Part 11 (urdu)
 
Learning analytics definitions processes potential
Learning analytics definitions processes potentialLearning analytics definitions processes potential
Learning analytics definitions processes potential
 
מגזין קראון פלזה ישראל חורף אביב 2013
מגזין קראון פלזה ישראל חורף אביב 2013מגזין קראון פלזה ישראל חורף אביב 2013
מגזין קראון פלזה ישראל חורף אביב 2013
 
การป้องกันและแก้ไขปัญหาภัยหนาวและภัยแล้ง
การป้องกันและแก้ไขปัญหาภัยหนาวและภัยแล้งการป้องกันและแก้ไขปัญหาภัยหนาวและภัยแล้ง
การป้องกันและแก้ไขปัญหาภัยหนาวและภัยแล้ง
 
Banderas de los paises de asia
Banderas de los paises de asiaBanderas de los paises de asia
Banderas de los paises de asia
 

Semelhante a B07-GenomeContent-Biomart

Haider Embrace Bosc2008
Haider Embrace Bosc2008Haider Embrace Bosc2008
Haider Embrace Bosc2008bosc_2008
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
 
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29Sage Base
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Sage Base
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglyJoão André Carriço
 
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...Amazon Web Services
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Enrico Glaab
 
Future Immunoprotein Diagnostics Market: US, Europe, Japan
Future Immunoprotein Diagnostics Market: US, Europe, JapanFuture Immunoprotein Diagnostics Market: US, Europe, Japan
Future Immunoprotein Diagnostics Market: US, Europe, JapanReportLinker.com
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Sage Base
 
Role of ensembl in genome browsing
Role of ensembl in genome browsingRole of ensembl in genome browsing
Role of ensembl in genome browsingJoydeep16
 
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...Amazon Web Services
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesGuy Coates
 
Metagenomics sequencing
Metagenomics sequencingMetagenomics sequencing
Metagenomics sequencingcdgenomics525
 
Deep learning for medical imaging
Deep learning for medical imagingDeep learning for medical imaging
Deep learning for medical imaginggeetachauhan
 
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web ApproachesEnabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web ApproachesTom Plasterer
 
Maria de la Iglesia - CEIB: a R&D services in bioimaging oriented to integrat...
Maria de la Iglesia - CEIB: a R&D services in bioimaging oriented to integrat...Maria de la Iglesia - CEIB: a R&D services in bioimaging oriented to integrat...
Maria de la Iglesia - CEIB: a R&D services in bioimaging oriented to integrat...WTHS
 
Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use casesGuy Coates
 

Semelhante a B07-GenomeContent-Biomart (20)

Haider Embrace Bosc2008
Haider Embrace Bosc2008Haider Embrace Bosc2008
Haider Embrace Bosc2008
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
Stephen Friend Institute of Development, Aging and Cancer 2011-11-29
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24
 
Software Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The UglySoftware Pipelines: The Good, The Bad and The Ugly
Software Pipelines: The Good, The Bad and The Ugly
 
Linked data in industry
Linked data in industryLinked data in industry
Linked data in industry
 
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
A Step to the Clouded Solution of Scalable Clinical Genome Sequencing (BDT308...
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
 
Future Immunoprotein Diagnostics Market: US, Europe, Japan
Future Immunoprotein Diagnostics Market: US, Europe, JapanFuture Immunoprotein Diagnostics Market: US, Europe, Japan
Future Immunoprotein Diagnostics Market: US, Europe, Japan
 
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
Stephen Friend CRUK-MD Anderson Cancer Workshop 2012-02-28
 
Role of ensembl in genome browsing
Role of ensembl in genome browsingRole of ensembl in genome browsing
Role of ensembl in genome browsing
 
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
WuXi NextCODE Scales up Genomic Sequencing on AWS (ANT210-S) - AWS re:Invent ...
 
Dr. Ying Xiao: Radiation Therapy Oncology Group Bioinformatics
Dr. Ying Xiao: Radiation Therapy Oncology Group BioinformaticsDr. Ying Xiao: Radiation Therapy Oncology Group Bioinformatics
Dr. Ying Xiao: Radiation Therapy Oncology Group Bioinformatics
 
Next generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciencesNext generation genomics: Petascale data in the life sciences
Next generation genomics: Petascale data in the life sciences
 
Metagenomics sequencing
Metagenomics sequencingMetagenomics sequencing
Metagenomics sequencing
 
Deep learning for medical imaging
Deep learning for medical imagingDeep learning for medical imaging
Deep learning for medical imaging
 
GRM 2011: The Integrated Breeding Platform tools and services
GRM 2011: The Integrated Breeding Platform tools and servicesGRM 2011: The Integrated Breeding Platform tools and services
GRM 2011: The Integrated Breeding Platform tools and services
 
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web ApproachesEnabling Discovery in High-Risk Plaque using Semantic Web Approaches
Enabling Discovery in High-Risk Plaque using Semantic Web Approaches
 
Maria de la Iglesia - CEIB: a R&D services in bioimaging oriented to integrat...
Maria de la Iglesia - CEIB: a R&D services in bioimaging oriented to integrat...Maria de la Iglesia - CEIB: a R&D services in bioimaging oriented to integrat...
Maria de la Iglesia - CEIB: a R&D services in bioimaging oriented to integrat...
 
Life sciences big data use cases
Life sciences big data use casesLife sciences big data use cases
Life sciences big data use cases
 

Mais de Bioinformatics Open Source Conference

Mais de Bioinformatics Open Source Conference (20)

Running workflows through galaxy bosc presentation
Running workflows through galaxy bosc presentationRunning workflows through galaxy bosc presentation
Running workflows through galaxy bosc presentation
 
Talk1 ben sadi for_gmod_bosc_2011
Talk1 ben sadi for_gmod_bosc_2011Talk1 ben sadi for_gmod_bosc_2011
Talk1 ben sadi for_gmod_bosc_2011
 
Bosc mercer
Bosc mercerBosc mercer
Bosc mercer
 
Mobyle 1 0_new_features_new_types_of_service
Mobyle 1 0_new_features_new_types_of_serviceMobyle 1 0_new_features_new_types_of_service
Mobyle 1 0_new_features_new_types_of_service
 
Bosc2011 arakawa
Bosc2011 arakawaBosc2011 arakawa
Bosc2011 arakawa
 
Bosc2011 isobar-fbp
Bosc2011 isobar-fbpBosc2011 isobar-fbp
Bosc2011 isobar-fbp
 
Talk6 biopython bosc2011
Talk6 biopython bosc2011Talk6 biopython bosc2011
Talk6 biopython bosc2011
 
Unipro ugene bosc 2011 update
Unipro ugene bosc 2011 updateUnipro ugene bosc 2011 update
Unipro ugene bosc 2011 update
 
Bosc2011 ntino-krampis-full
Bosc2011 ntino-krampis-fullBosc2011 ntino-krampis-full
Bosc2011 ntino-krampis-full
 
Bosc talk 7-15-2011x
Bosc talk 7-15-2011xBosc talk 7-15-2011x
Bosc talk 7-15-2011x
 
F02-Cloud-Cloud BioLinux
F02-Cloud-Cloud BioLinuxF02-Cloud-Cloud BioLinux
F02-Cloud-Cloud BioLinux
 
B03-GenomeContent-Intermine
B03-GenomeContent-IntermineB03-GenomeContent-Intermine
B03-GenomeContent-Intermine
 
G03-SemanticWeb-OntoCAT
G03-SemanticWeb-OntoCATG03-SemanticWeb-OntoCAT
G03-SemanticWeb-OntoCAT
 
F06-Cloud-Enabling NGS
F06-Cloud-Enabling NGSF06-Cloud-Enabling NGS
F06-Cloud-Enabling NGS
 
D03-NextGen-Bio-NGS
D03-NextGen-Bio-NGSD03-NextGen-Bio-NGS
D03-NextGen-Bio-NGS
 
F07-Cloud-Hadoop-BAM
F07-Cloud-Hadoop-BAMF07-Cloud-Hadoop-BAM
F07-Cloud-Hadoop-BAM
 
C03-Visualization-Webapollo
C03-Visualization-WebapolloC03-Visualization-Webapollo
C03-Visualization-Webapollo
 
F01-Cloud-Mygene.info
F01-Cloud-Mygene.infoF01-Cloud-Mygene.info
F01-Cloud-Mygene.info
 
A01-Openness in knowledge-based systems
A01-Openness in knowledge-based systemsA01-Openness in knowledge-based systems
A01-Openness in knowledge-based systems
 
F03-Cloud-Obiwee
F03-Cloud-ObiweeF03-Cloud-Obiwee
F03-Cloud-Obiwee
 

Último

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

B07-GenomeContent-Biomart

  • 1. BioMart 0.8 offers new tools, more interfaces, and increased flexibility through plugins Junjun Zhang BOSC 2011, Vienna, Austria July 15, 2011
  • 2. BioMart: an open source federated data management system •  Widely used by public/private biological databases •  Quickly bring in-house data accessible online •  User friendly and flexible querying interfaces: web GUI and programmatic access API (REST, Perl, biomaRt etc) •  Automated data conversion tool •  Effortlessly federate in-house datasets with existing public BioMart datasets www.biomart.org 2  
  • 3. BioMart 0.8 new features •  Integrated Java application makes it possible to build a BioMart data source, configure querying and presentation interfaces, and deploy a BioMart server from a single tool (MartConfigurator) •  Support more RDBMS (MS SQL Server, DB2, in addition to MySQL, PostgreSQL, and Oracle) •  Create ‘virtual mart’ from 3NF normalized source database without materialization •  New diverse Web GUIs and APIs provide added flexibility and ease of use •  Link indexing and parallel querying optimizations •  Support several security features (HTTPS, OpenID and oAuth protocols) for managing sensitive data •  Extendable plugin framework for analysis and visualization 3
  • 4. Basic BioMart Concepts – the Power of Simplicity Building  or  querying  a  BioMart  data  source  only  requires   understanding  of  a  few  basic  concepts:   •  DataSource   •  DataMart   •  DataSet   •  A;ribute     •  Filter   •  AccessPoint  (new)   •  Analysis  (new)   •  Parameter  (new)     BioMart  hides  complexity  of  underlie  database  schema  and   federaCon  mechanism.   4
  • 5. BioMart dataset is organized in a reverse star schema 5
  • 6. 3NF normalized database can be converted to reversed star schema Source  schema   Reverse  star  schema   6
  • 7. BioMart system components Client-­‐side    Plugin         Query  Engine  /  Plugin   7
  • 8. MartConfigurator – an integrated tool for setting up, configuring and managing a BioMart server 8
  • 9. BioMart 0.8 provides several data querying GUIs MartForm 9
  • 10. BioMart 0.8 provides several data querying GUIs MartWizard 10
  • 11. BioMart 0.8 provides several data querying GUIs MartExplorer 11
  • 12. Programmatic access API query syntax at the click of a button 12
  • 13. Special GUI - MartReport Ensembl KEGG Reactome Mutation frequencies from cancer projects with data distributed around the globe COSMIC Pancreatic Expression Database (PED) Breast Cancer Campaign Tissue Bank (BCCTB) 13
  • 14. Special GUI - MartAnalysis Mostly affected pathways 14
  • 15. Special GUI – MartAnalysis Genomic sequence retrieval tool Sequence retrieval tool is implemented as server-side analysis plugin 15
  • 16. New query type - Analysis Query against ‘affected_pathways’ analysis: <Query> <Analysis name="affected_pathways" dataset="gene_oicrPanc"> <Parameter name="biotype" value="protein_coding"/> <Parameter name="file_type" value=”png"/> <Parameter name="img_height" value="8000"/> <Parameter name="img_width" value="12000"/> </Analysis> </Query> Query against ‘gene_sequence’ sequence retrieval tool: <Query> <Analysis name="gene_sequence"> <Parameter name="seq_type" value="gene_flank"/> <Parameter name="upstream_flank" value="500"/> </Analysis> </Query> 16
  • 17. Several large collaborative projects are using BioMart for data management •  BioMart Central Portal (http://central.biomart.org) •  International Cancer Genome Consortium (http://dcc.icgc.org) •  POPCURE (collaboration with Pfizer, controlled access) 17
  • 18. BioMart Central Portal (central.biomart.org) First-­‐of-­‐its  kind,  community-­‐driven  effort   to  provide  unified  access  to  dozens  of   biological  databases  spanning  genomics,   proteomics,  model  organisms,  cancer   data,  and  more   18
  • 19. BioMart Portal provides access to a collection of data sources “Master/Slave” like 19
  • 20. International Cancer Genome Consortium Data Portal CANADA EU / UNITED Pancreatic cancer KINGDOM (Ductal adenocarcinoma) Breast cancer Prostate cancer (ER positive, HER2 negative) (Adenocarcinoma) GERMANY UNITED STATES UNITED Malignant lymphoma Bladder cancer KINGDOM (Germinal center B-cell derived lymphomas) Blood cancer Bone cancer Pediatric brain tumors (Acute myeloid leukemia) (Osteosarcoma/ (Medulloblastoma and Brain cancer chondrosarcoma/ Pediatric pilocytic (Glioblastoma multiforme/ rare subtypes) astrocytoma) CHINA lower grade glioma) Breast cancer Breast cancer (Triple negative/lobular/ Prostate cancer Gastric cancer (Intestinal- and di use-type) JAPAN (Early onset) (Ductal & lobular) other) Liver cancer Cervical cancer Chronic Myeloid Disorders (Hepatocellular carcinoma) (Squamous) (Myelodysplastic syndromes, (Virus-associated) Colon cancer myeloproliferative neoplasms (Adenocarcinoma) and other chronic myeloid Endometrial cancer malignancies) (Uterine corpus endometrial Esophageal cancer carcinoma) Prostate cancer Gastric cancer (Adenocarcinoma) Head and neck cancer EU / FRANCE (Squamous cell carcinoma/ Renal cancer Thyroid carcinoma) (Renal cell carcinoma) Renal cancer (Focus on but not limited (Renal clear cell carcinoma/ to clear cell subtype) Renal papillary carcinoma) Liver cancer ITALY AUSTRALIA (Hepatocellular carcinoma) Lung cancer FRANCE Rare pancreatic tumors (Enteropancreatic endocrine INDIA Ovarian cancer Breast cancer (Serous cystadenocarcinoma) (Adenocarcinoma/ tumors and rare pancreatic Oral cancer (Subtype de ned by an Pancreatic cancer squamous cell carcinoma) exocrine tumors) (Gingivobuccal) ampli cation of the (Ductal adenocarcinoma) Ovarian cancer Prostate cancer (Serous cystadenocarcinoma) MEXICO HER2 gene) Liver cancer Prostate cancer (Adenocarcinoma) Multiple sub-types (Hepatocellular carcinoma) SPAIN Rectal cancer (Secondary to alcohol Chronic lymphocytic (Adenocarcinoma) and adiposity) leukemia Skin cancer Prostate cancer (CLL with mutated and (Cutaneous melanoma) (Adenocarcinoma) unmutated IgVH) GOALS: To obtain a comprehensive description of genomic, transcriptomic, and epigenomic changes in 50 different tumor types and/or subtypes, which are of clinical and societal importance across the globe. 500 tumor and matched control samples will be analyzed per tumor type. At present, 12 countries joined ICGC. Data will be generated by institutions all over the world. To make the data available rapidly and with minimal restrictions, to accelerate research of the causes and control of cancer. 20
  • 21. ICGC Data Portal Architecture “Peer-to-Peer” like 21
  • 23. Future Directions •  Creation of BioMart Central Registry to improve coordination between BioMart servers. It will be a permanent resource where BioMart data providers can register their data models, data sources and services. •  Enhancing data transformation module for building BioMart databases from non-RDBMS data sources (e.g. flat data files, XML data files etc) with high scalability and flexibility. •  Enhancing the plugin system to allow various forms of data analysis and visualization. Third parties are encouraged to develop plugins to extend the capabilities of the system. 23
  • 24. The BioMart team Joachim  Baran   Anthony  Cros   Jonathan  Guberman   For  support:  users@biomart.org   Jack  Hsu   Yong  Liang   Elena  Rivkin   Bre;  Whi;y   Marie  Wong-­‐Erasmus   Long  Yao   Syed  Haider   Junjun  Zhang   Arek  Kasprzyk   24