SlideShare uma empresa Scribd logo
1 de 21
Introduction to InterMine
Infrastructure
Vivek Krishnakumar
LF Meeting 04/28/2015
InterMine in a nutshell
• Open-source data warehouse software
• Integration of complex biological data
• Parsers for common biological data formats
• Extensible framework for custom data
• Cookie-cutter interface, highly customizable
• Interact using sophisticated web query tools
• Programmatic access using web-service API
Open-source Project
• Source code available online
• Distributed with the GNU
LGPL license
• GitHub Repo:
https://github.com/intermine/int
ermine
• GitHub Organization:
https://github.com/intermine
intermine / intermine
> bio
> biotestmine
> config
> flymine
> humanmine
> imbuild
> intermine
> testmodel
.gitignore
.travis.yml
LICENSE
LICENSE.LIBS
README.md
RELEASE_NOTES
Richard N. Smith et al. Bioinformatics 2012;28:3163-3165
InterMine system architecture
InterMine system architecture
Web Application
• Java Server Pages (JSP), HTML, JS, CSS
• Interfaces with Java Servlets and IM web-services
Web Server
• Tomcat 7.0.x, serves Web application ARchive file
• ant based build system using Java SDK
Database Server
• PostgreSQL 9.2 or above
• range query, btree, gist enabled (refer docs here)
http://intermine.readthedocs.org/en/latest/system-requirements/
Data Model Overview
• Object-oriented data model
• Divided into classes, their attributes and
their relationships; defined in XML
• Represented as Java classes (pure Java
beans); auto-generated from XML,
automatically map to tables in schema
• Core data model; based on Sequence
Ontology (SO); refer: bio/core/core.xml
and bio/core/genomic_additions.xml
http://intermine.readthedocs.org/en/latest/data-model/overview/
Data Model Overview
<?xml version="1.0"?>
<model name="example" package="org.intermine.model.bio">
<class name="Protein" is-interface="true" extends="SequenceFeature">
<attribute name="name" type="java.lang.String"/>
<attribute name="accession" type="java.lang.String"/>
<collection name="features" referenced-type="NewFeature" reverse-reference="protein"/>
</class>
<class name="NewFeature" is-interface="true">
<attribute name="identifier" type="java.lang.String"/>
<attribute name="confidence" type="java.lang.Double"/>
<reference name="protein" referenced-type="Protein" reverse-reference="features"/>
</class>
</model>
Model expects standard Java names for classes and attributes
• classes: start with an upper case letter and be CamelCase, no underscores or spaces.
• fields (attributes, references, collections): should start with a lower case letter and be
lowerCamelCase, no underscores or spaces.
http://intermine.readthedocs.org/en/latest/data-model/model/
Creating & configuring a mine
• Build out scaffold for mine
$ cd git/intermine
$ bio/scripts/make_mine legumine
• Configure data to load and
post-processing steps to
run by customizing
project.xml
• Data <source /> elements
correspond to directory
under bio/sources/*;
defines parsers to retrieve
data and encodes rules for
integration
intermine / intermine
> bio
> biotestmine
> config
> flymine
> legumine
> dbmodel
> integrate
> postprocess
> webapp
> default.intermine.integrate.properties
> default.intermine.webapp.properties
> project.xml
> humanmine
> imbuild
> intermine
> testmodel
.gitignore
.travis.yml
LICENSE
LICENSE.LIBS
README.md
RELEASE_NOTES
http://intermine.readthedocs.org/en/latest/get-started/tutorial/#creating-a-new-mine
Creating & configuring a mine
<project type="bio">
<property name="target.model" value="genomic"/>
<property name="source.location" location="../bio/sources/"/>
<property name="common.os.prefix" value="common"/>
<property name="intermine.properties.file" value="legumine.properties"/>
<property name="default.intermine.properties.file" location="../default.intermine.integrate.properties"/>
<sources>
<source name=”legumine-gff" type="legumine-gff">
<property name="gff3.taxonId" value="3880"/>
<property name="gff3.seqDataSourceName" value="LF"/>
<property name="gff3.dataSourceName" value="LF"/>
<property name="gff3.seqClsName" value="Chromosome"/>
<property name="gff3.dataSetTitle" value="Genome Annotation"/>
<property name="src.data.dir" location="/path/to/legumine/genome/gff/" />
</source>
:
:
</sources>
<post-processing>
<post-process name="create-references" />
<post-process name="create-chromosome-locations-and-lengths"/>
<post-process name="create-gene-flanking-features" />
:
:
</post-processing>
</project>
project.xml
http://intermine.readthedocs.org/en/latest/get-started/tutorial/#project-xml
Data Sources and Sets
• InterMine provides a vast library of data source parsers and
loaders, covering data types not restricted to:
genome sequence (fasta)
annotation (gff)
ontology (go, so)
proteins (uniprot)
interactions (psi-mi)
pathway (kegg, reactome)
homologs (panther, compara, homologene)
publications (pubmed)
chado (sequence, stock)
• Custom sources can be written by following the tutorial:
http://intermine.readthedocs.org/en/latest/database/data-
sources/custom/ or by referring to code from other mines
http://intermine.readthedocs.org/en/latest/database/data-sources/library/
Building a mine
• Each InterMine instance requires 3
PostgreSQL databases:
 legumine: core db mapping to data model
 items-legumine: db for storing intermediate Items during load
 userprofile-legumine: db for storing user specific data
• Running build requires special config file in
the users’ home area, containing db
connection params and other mine
specific configs to override
${HOME}/.intermine/legumine.properties
http://intermine.readthedocs.org/en/latest/get-started/tutorial/#properties-file
Model Merging & Data Integration
Model Merging
• Each source contributes
towards the data model
• bio/core/core.xml is
always used as the base
for model merging
• The ant build-db
command consumes the
SOURCE_additions.xml
• Model is used to generate
tables, Java classes and
the webapp
http://intermine.readthedocs.org/en/latest/database/database-building/model-
merging/
Data Integration
• Key(s) for class of object
defines equivalence for
objects of that class
• Primary key defines
field(s) used to search for
equivalence
• For objects which share
same primary key, fields
are merged and stored as
single object
http://intermine.readthedocs.org/en/latest/database/database-building/primary-keys/
Post processing
• Operations are
performed on
integrated data
• Calculate/set fields
difficult to work with
while data loading,
because they require 2
or more sources to be
loaded already
• Order of steps is
somewhat important
<post-processing>
<post-process name="create-references" />
<post-process name="create-chromosome-
locations-and-lengths"/>
<post-process name="create-gene-flanking-
features" />
<post-process name="do-sources" />
<post-process name="create-intron-
features">
<property name="organisms" value="3880"/>
</post-process>
<post-process name="transfer-sequences"/>
<post-process name="populate-child-
features"/>
<post-process name="create-location-range-
index" />
<post-process name="create-overlap-view" />
<post-process name="create-attribute-
indexes"/>
<post-process name="summarise-
objectstore"/>
<post-process name="create-search-index"/>
</post-processing>
http://intermine.readthedocs.org/en/latest/database/database-building/post-processing/
Building & deploying a mine
Two types of build mechanisms:
• Manual:
$ cd dbmodel && ant clean build-db ## initialize db
$ ant -Dsource=legumine-gff ## load data sources
$ ant -Dsource=legumine-chr-fasta ## load more sources
$ cd ../postprocess && ant ## run post-process steps
$ cd ../webapp ## build mine webapp
$ ant clean remove-webapp default release-webapp
• Automated:
$ ../bio/scripts/project_build -b -v localhost ~/legumine-dump
http://intermine.readthedocs.org/en/latest/database/database-building/build-script/
Lucene based search index
• Post-process "create-search-index" runs the
database indexing, zips and stores in db
• On webapp (first) load, index is unpacked
• By default, all id and text fields are ignored by the
indexer
• Uses the Apache Lucene whitespace analyzer to
identify word boundaries
• Control temp directory and classes/fields to be
ignored by altering
MINE_NAME/dbmodel/resources/keyword_sear
ch.properties file
http://intermine.readthedocs.org/en/latest/webapp/keyword-search/
Alex Kalderimis et al. Nucl. Acids Res. 2014;42:W468-W472
InterMine web services
http://iodocs.labs.intermine.org
Federated Authentication
• Apart from the standard login scheme
(username/password), InterMine supports industry
standard OAuth2 based login flows, implemented
by Google, GitHub, Agave, etc.
• ThaleMine relies on this infrastructure to
authenticate users against the araport.org tenant
registered within the Agave infrastructure
• Documentation available here:
http://intermine.readthedocs.org/en/latest/webapp/
properties/web-properties/#openauth2-settings-
aka-openid-connect
Friendly reference mines
• FlyMine: https://github.com/intermine/intermine/
• ThaleMine: https://github.com/Arabidopsis-
Information-Portal/intermine/
• MedicMine: https://github.com/jcvi-plant-
genomics/intermine/
• PhytoMine:
https://github.com/JoeCarlson/intermine/
Summary
• Advantages
 InterMine is a powerful biological data warehouse
 Performs complex data integration
 Allows fast and flexible querying
 Well documented programmatic interface
 Cookie-cutter, user-friendly web interface
 Facilitates cross-talk between “mines”
• Caveats
 Adding more data requires a full database rebuild (incremental loading
is not possible) because of the integration step
• About InterMine:
 Developed by the Micklem Lab at the University of Cambridge, UK
 Written in Java, backed by PostgreSQLdb, deployed under Tomcat.
Documentation and downloads available at http://www.intermine.org
Acknowledgments
• InterMine Team
 Gos Micklem
 Julie Sullivan
 Alex Kalderimis
 Richard Smith
 Sergio Contrino
 Josh Heimbach
 et al.
• Araport Team
 Chris Town
 Jason Miller
 Matt Vaughn
 Maria Kim
 Svetlana
Karamycheva
 Erik Ferlanti
 Chia-Yi Cheng
 Benjamin Rosen
 Irina Belyaeva
THANK YOU

Mais conteúdo relacionado

Destaque

Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015Araport
 
Interoperation between InterMines
Interoperation between InterMinesInteroperation between InterMines
Interoperation between InterMinesVivek Krishnakumar
 
2015 Summer - Araport Project Overview Leaflet
2015 Summer - Araport Project Overview Leaflet2015 Summer - Araport Project Overview Leaflet
2015 Summer - Araport Project Overview LeafletAraport
 
ICAR 2015 Workshop - Agnes Chan
ICAR 2015 Workshop - Agnes ChanICAR 2015 Workshop - Agnes Chan
ICAR 2015 Workshop - Agnes ChanAraport
 
JBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIIIJBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIIIVivek Krishnakumar
 
aip-workshop1-dev-tutorial
aip-workshop1-dev-tutorialaip-workshop1-dev-tutorial
aip-workshop1-dev-tutorialMatthew Vaughn
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartAraport
 
Plant ontology web services on Araport
Plant ontology web services on AraportPlant ontology web services on Araport
Plant ontology web services on AraportAraport
 
Introducing ProtAnnot - Araport workshop at PAG 2016
Introducing ProtAnnot - Araport workshop at PAG 2016Introducing ProtAnnot - Araport workshop at PAG 2016
Introducing ProtAnnot - Araport workshop at PAG 2016Ann Loraine
 
ICAR 2015 Plenary - Chris Town
ICAR 2015 Plenary - Chris TownICAR 2015 Plenary - Chris Town
ICAR 2015 Plenary - Chris TownAraport
 
Tripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIIITripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIIIVivek Krishnakumar
 
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...Vivek Krishnakumar
 
ICAR 2015 Workshop - Matt Vaughn
ICAR 2015 Workshop - Matt VaughnICAR 2015 Workshop - Matt Vaughn
ICAR 2015 Workshop - Matt VaughnAraport
 
Tutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer WorkshopTutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer WorkshopVivek Krishnakumar
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportAraport
 
ICAR 2015 Workshop - Blake Meyers
ICAR 2015 Workshop - Blake MeyersICAR 2015 Workshop - Blake Meyers
ICAR 2015 Workshop - Blake MeyersAraport
 
Module development
Module development Module development
Module development Araport
 
User friendly tools for the Arabidopsis thaliana 1001 Genomes
 User friendly tools for the Arabidopsis thaliana 1001 Genomes  User friendly tools for the Arabidopsis thaliana 1001 Genomes
User friendly tools for the Arabidopsis thaliana 1001 Genomes Araport
 
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...Araport
 
Integrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation ArchitectureIntegrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation ArchitectureVivek Krishnakumar
 

Destaque (20)

Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015Vaughn aip walkthru_pag2015
Vaughn aip walkthru_pag2015
 
Interoperation between InterMines
Interoperation between InterMinesInteroperation between InterMines
Interoperation between InterMines
 
2015 Summer - Araport Project Overview Leaflet
2015 Summer - Araport Project Overview Leaflet2015 Summer - Araport Project Overview Leaflet
2015 Summer - Araport Project Overview Leaflet
 
ICAR 2015 Workshop - Agnes Chan
ICAR 2015 Workshop - Agnes ChanICAR 2015 Workshop - Agnes Chan
ICAR 2015 Workshop - Agnes Chan
 
JBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIIIJBrowse within the Arabidopsis Information Portal - PAG XXIII
JBrowse within the Arabidopsis Information Portal - PAG XXIII
 
aip-workshop1-dev-tutorial
aip-workshop1-dev-tutorialaip-workshop1-dev-tutorial
aip-workshop1-dev-tutorial
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick Provart
 
Plant ontology web services on Araport
Plant ontology web services on AraportPlant ontology web services on Araport
Plant ontology web services on Araport
 
Introducing ProtAnnot - Araport workshop at PAG 2016
Introducing ProtAnnot - Araport workshop at PAG 2016Introducing ProtAnnot - Araport workshop at PAG 2016
Introducing ProtAnnot - Araport workshop at PAG 2016
 
ICAR 2015 Plenary - Chris Town
ICAR 2015 Plenary - Chris TownICAR 2015 Plenary - Chris Town
ICAR 2015 Plenary - Chris Town
 
Tripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIIITripal within the Arabidopsis Information Portal - PAG XXIII
Tripal within the Arabidopsis Information Portal - PAG XXIII
 
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
Teaching Bioinformatics data analysis using Medicago truncatula as a model - ...
 
ICAR 2015 Workshop - Matt Vaughn
ICAR 2015 Workshop - Matt VaughnICAR 2015 Workshop - Matt Vaughn
ICAR 2015 Workshop - Matt Vaughn
 
Tutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer WorkshopTutorial 1: Your First Science App - Araport Developer Workshop
Tutorial 1: Your First Science App - Araport Developer Workshop
 
ICAR 2015 Poster - Araport
ICAR 2015 Poster - AraportICAR 2015 Poster - Araport
ICAR 2015 Poster - Araport
 
ICAR 2015 Workshop - Blake Meyers
ICAR 2015 Workshop - Blake MeyersICAR 2015 Workshop - Blake Meyers
ICAR 2015 Workshop - Blake Meyers
 
Module development
Module development Module development
Module development
 
User friendly tools for the Arabidopsis thaliana 1001 Genomes
 User friendly tools for the Arabidopsis thaliana 1001 Genomes  User friendly tools for the Arabidopsis thaliana 1001 Genomes
User friendly tools for the Arabidopsis thaliana 1001 Genomes
 
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
HRGRN: enabling graph search and integrative analysis of Arabidopsis signalin...
 
Integrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation ArchitectureIntegrate JBrowse REST API Framework with Adama Federation Architecture
Integrate JBrowse REST API Framework with Adama Federation Architecture
 

Semelhante a Introduction to InterMine Infrastructure

Building a production ready meteor app
Building a production ready meteor appBuilding a production ready meteor app
Building a production ready meteor appRitik Malhotra
 
Introduction to firebidSQL 3.x
Introduction to firebidSQL 3.xIntroduction to firebidSQL 3.x
Introduction to firebidSQL 3.xFabio Codebue
 
Corporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiCorporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiUnmesh Baile
 
Corporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiCorporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiUnmesh Baile
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructureharendra_pathak
 
Thick Application Penetration Testing - A Crash Course
Thick Application Penetration Testing - A Crash CourseThick Application Penetration Testing - A Crash Course
Thick Application Penetration Testing - A Crash CourseNetSPI
 
OpenProdoc Overview
OpenProdoc OverviewOpenProdoc Overview
OpenProdoc Overviewjhierrot
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learnJohn D Almon
 
Anatomy of Autoconfig in Oracle E-Business Suite
Anatomy of Autoconfig in Oracle E-Business SuiteAnatomy of Autoconfig in Oracle E-Business Suite
Anatomy of Autoconfig in Oracle E-Business Suitevasuballa
 
Documenting metadata application profiles and vocabularies
Documenting metadata application profiles and vocabulariesDocumenting metadata application profiles and vocabularies
Documenting metadata application profiles and vocabulariesPaul Walk
 
SFDC Deployments
SFDC DeploymentsSFDC Deployments
SFDC DeploymentsSujit Kumar
 
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...Peter Keane
 

Semelhante a Introduction to InterMine Infrastructure (20)

Ember - introduction
Ember - introductionEmber - introduction
Ember - introduction
 
Webscripts Server
Webscripts ServerWebscripts Server
Webscripts Server
 
Asp .net folders and web.config
Asp .net folders and web.configAsp .net folders and web.config
Asp .net folders and web.config
 
People aggregator
People aggregatorPeople aggregator
People aggregator
 
Building a production ready meteor app
Building a production ready meteor appBuilding a production ready meteor app
Building a production ready meteor app
 
Introduction to firebidSQL 3.x
Introduction to firebidSQL 3.xIntroduction to firebidSQL 3.x
Introduction to firebidSQL 3.x
 
Corporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiCorporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbai
 
Corporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbaiCorporate-informatica-training-in-mumbai
Corporate-informatica-training-in-mumbai
 
Codeigniter
CodeigniterCodeigniter
Codeigniter
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
 
Introduction to Monsoon PHP framework
Introduction to Monsoon PHP frameworkIntroduction to Monsoon PHP framework
Introduction to Monsoon PHP framework
 
Thick Application Penetration Testing - A Crash Course
Thick Application Penetration Testing - A Crash CourseThick Application Penetration Testing - A Crash Course
Thick Application Penetration Testing - A Crash Course
 
Asp .net folders and web.config
Asp .net folders and web.configAsp .net folders and web.config
Asp .net folders and web.config
 
OpenProdoc Overview
OpenProdoc OverviewOpenProdoc Overview
OpenProdoc Overview
 
Hpc lunch and learn
Hpc lunch and learnHpc lunch and learn
Hpc lunch and learn
 
Anatomy of Autoconfig in Oracle E-Business Suite
Anatomy of Autoconfig in Oracle E-Business SuiteAnatomy of Autoconfig in Oracle E-Business Suite
Anatomy of Autoconfig in Oracle E-Business Suite
 
Documenting metadata application profiles and vocabularies
Documenting metadata application profiles and vocabulariesDocumenting metadata application profiles and vocabularies
Documenting metadata application profiles and vocabularies
 
Hibernate tutorial
Hibernate tutorialHibernate tutorial
Hibernate tutorial
 
SFDC Deployments
SFDC DeploymentsSFDC Deployments
SFDC Deployments
 
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
The Role of Atom/AtomPub in Digital Archive Services at The University of Tex...
 

Último

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsSumit Kumar yadav
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 

Último (20)

Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 

Introduction to InterMine Infrastructure

  • 1. Introduction to InterMine Infrastructure Vivek Krishnakumar LF Meeting 04/28/2015
  • 2. InterMine in a nutshell • Open-source data warehouse software • Integration of complex biological data • Parsers for common biological data formats • Extensible framework for custom data • Cookie-cutter interface, highly customizable • Interact using sophisticated web query tools • Programmatic access using web-service API
  • 3. Open-source Project • Source code available online • Distributed with the GNU LGPL license • GitHub Repo: https://github.com/intermine/int ermine • GitHub Organization: https://github.com/intermine intermine / intermine > bio > biotestmine > config > flymine > humanmine > imbuild > intermine > testmodel .gitignore .travis.yml LICENSE LICENSE.LIBS README.md RELEASE_NOTES
  • 4. Richard N. Smith et al. Bioinformatics 2012;28:3163-3165 InterMine system architecture
  • 5. InterMine system architecture Web Application • Java Server Pages (JSP), HTML, JS, CSS • Interfaces with Java Servlets and IM web-services Web Server • Tomcat 7.0.x, serves Web application ARchive file • ant based build system using Java SDK Database Server • PostgreSQL 9.2 or above • range query, btree, gist enabled (refer docs here) http://intermine.readthedocs.org/en/latest/system-requirements/
  • 6. Data Model Overview • Object-oriented data model • Divided into classes, their attributes and their relationships; defined in XML • Represented as Java classes (pure Java beans); auto-generated from XML, automatically map to tables in schema • Core data model; based on Sequence Ontology (SO); refer: bio/core/core.xml and bio/core/genomic_additions.xml http://intermine.readthedocs.org/en/latest/data-model/overview/
  • 7. Data Model Overview <?xml version="1.0"?> <model name="example" package="org.intermine.model.bio"> <class name="Protein" is-interface="true" extends="SequenceFeature"> <attribute name="name" type="java.lang.String"/> <attribute name="accession" type="java.lang.String"/> <collection name="features" referenced-type="NewFeature" reverse-reference="protein"/> </class> <class name="NewFeature" is-interface="true"> <attribute name="identifier" type="java.lang.String"/> <attribute name="confidence" type="java.lang.Double"/> <reference name="protein" referenced-type="Protein" reverse-reference="features"/> </class> </model> Model expects standard Java names for classes and attributes • classes: start with an upper case letter and be CamelCase, no underscores or spaces. • fields (attributes, references, collections): should start with a lower case letter and be lowerCamelCase, no underscores or spaces. http://intermine.readthedocs.org/en/latest/data-model/model/
  • 8. Creating & configuring a mine • Build out scaffold for mine $ cd git/intermine $ bio/scripts/make_mine legumine • Configure data to load and post-processing steps to run by customizing project.xml • Data <source /> elements correspond to directory under bio/sources/*; defines parsers to retrieve data and encodes rules for integration intermine / intermine > bio > biotestmine > config > flymine > legumine > dbmodel > integrate > postprocess > webapp > default.intermine.integrate.properties > default.intermine.webapp.properties > project.xml > humanmine > imbuild > intermine > testmodel .gitignore .travis.yml LICENSE LICENSE.LIBS README.md RELEASE_NOTES http://intermine.readthedocs.org/en/latest/get-started/tutorial/#creating-a-new-mine
  • 9. Creating & configuring a mine <project type="bio"> <property name="target.model" value="genomic"/> <property name="source.location" location="../bio/sources/"/> <property name="common.os.prefix" value="common"/> <property name="intermine.properties.file" value="legumine.properties"/> <property name="default.intermine.properties.file" location="../default.intermine.integrate.properties"/> <sources> <source name=”legumine-gff" type="legumine-gff"> <property name="gff3.taxonId" value="3880"/> <property name="gff3.seqDataSourceName" value="LF"/> <property name="gff3.dataSourceName" value="LF"/> <property name="gff3.seqClsName" value="Chromosome"/> <property name="gff3.dataSetTitle" value="Genome Annotation"/> <property name="src.data.dir" location="/path/to/legumine/genome/gff/" /> </source> : : </sources> <post-processing> <post-process name="create-references" /> <post-process name="create-chromosome-locations-and-lengths"/> <post-process name="create-gene-flanking-features" /> : : </post-processing> </project> project.xml http://intermine.readthedocs.org/en/latest/get-started/tutorial/#project-xml
  • 10. Data Sources and Sets • InterMine provides a vast library of data source parsers and loaders, covering data types not restricted to: genome sequence (fasta) annotation (gff) ontology (go, so) proteins (uniprot) interactions (psi-mi) pathway (kegg, reactome) homologs (panther, compara, homologene) publications (pubmed) chado (sequence, stock) • Custom sources can be written by following the tutorial: http://intermine.readthedocs.org/en/latest/database/data- sources/custom/ or by referring to code from other mines http://intermine.readthedocs.org/en/latest/database/data-sources/library/
  • 11. Building a mine • Each InterMine instance requires 3 PostgreSQL databases:  legumine: core db mapping to data model  items-legumine: db for storing intermediate Items during load  userprofile-legumine: db for storing user specific data • Running build requires special config file in the users’ home area, containing db connection params and other mine specific configs to override ${HOME}/.intermine/legumine.properties http://intermine.readthedocs.org/en/latest/get-started/tutorial/#properties-file
  • 12. Model Merging & Data Integration Model Merging • Each source contributes towards the data model • bio/core/core.xml is always used as the base for model merging • The ant build-db command consumes the SOURCE_additions.xml • Model is used to generate tables, Java classes and the webapp http://intermine.readthedocs.org/en/latest/database/database-building/model- merging/ Data Integration • Key(s) for class of object defines equivalence for objects of that class • Primary key defines field(s) used to search for equivalence • For objects which share same primary key, fields are merged and stored as single object http://intermine.readthedocs.org/en/latest/database/database-building/primary-keys/
  • 13. Post processing • Operations are performed on integrated data • Calculate/set fields difficult to work with while data loading, because they require 2 or more sources to be loaded already • Order of steps is somewhat important <post-processing> <post-process name="create-references" /> <post-process name="create-chromosome- locations-and-lengths"/> <post-process name="create-gene-flanking- features" /> <post-process name="do-sources" /> <post-process name="create-intron- features"> <property name="organisms" value="3880"/> </post-process> <post-process name="transfer-sequences"/> <post-process name="populate-child- features"/> <post-process name="create-location-range- index" /> <post-process name="create-overlap-view" /> <post-process name="create-attribute- indexes"/> <post-process name="summarise- objectstore"/> <post-process name="create-search-index"/> </post-processing> http://intermine.readthedocs.org/en/latest/database/database-building/post-processing/
  • 14. Building & deploying a mine Two types of build mechanisms: • Manual: $ cd dbmodel && ant clean build-db ## initialize db $ ant -Dsource=legumine-gff ## load data sources $ ant -Dsource=legumine-chr-fasta ## load more sources $ cd ../postprocess && ant ## run post-process steps $ cd ../webapp ## build mine webapp $ ant clean remove-webapp default release-webapp • Automated: $ ../bio/scripts/project_build -b -v localhost ~/legumine-dump http://intermine.readthedocs.org/en/latest/database/database-building/build-script/
  • 15. Lucene based search index • Post-process "create-search-index" runs the database indexing, zips and stores in db • On webapp (first) load, index is unpacked • By default, all id and text fields are ignored by the indexer • Uses the Apache Lucene whitespace analyzer to identify word boundaries • Control temp directory and classes/fields to be ignored by altering MINE_NAME/dbmodel/resources/keyword_sear ch.properties file http://intermine.readthedocs.org/en/latest/webapp/keyword-search/
  • 16. Alex Kalderimis et al. Nucl. Acids Res. 2014;42:W468-W472 InterMine web services http://iodocs.labs.intermine.org
  • 17. Federated Authentication • Apart from the standard login scheme (username/password), InterMine supports industry standard OAuth2 based login flows, implemented by Google, GitHub, Agave, etc. • ThaleMine relies on this infrastructure to authenticate users against the araport.org tenant registered within the Agave infrastructure • Documentation available here: http://intermine.readthedocs.org/en/latest/webapp/ properties/web-properties/#openauth2-settings- aka-openid-connect
  • 18. Friendly reference mines • FlyMine: https://github.com/intermine/intermine/ • ThaleMine: https://github.com/Arabidopsis- Information-Portal/intermine/ • MedicMine: https://github.com/jcvi-plant- genomics/intermine/ • PhytoMine: https://github.com/JoeCarlson/intermine/
  • 19. Summary • Advantages  InterMine is a powerful biological data warehouse  Performs complex data integration  Allows fast and flexible querying  Well documented programmatic interface  Cookie-cutter, user-friendly web interface  Facilitates cross-talk between “mines” • Caveats  Adding more data requires a full database rebuild (incremental loading is not possible) because of the integration step • About InterMine:  Developed by the Micklem Lab at the University of Cambridge, UK  Written in Java, backed by PostgreSQLdb, deployed under Tomcat. Documentation and downloads available at http://www.intermine.org
  • 20. Acknowledgments • InterMine Team  Gos Micklem  Julie Sullivan  Alex Kalderimis  Richard Smith  Sergio Contrino  Josh Heimbach  et al. • Araport Team  Chris Town  Jason Miller  Matt Vaughn  Maria Kim  Svetlana Karamycheva  Erik Ferlanti  Chia-Yi Cheng  Benjamin Rosen  Irina Belyaeva

Notas do Editor

  1. bio: code to deal with biological data, including data sources flymine: config used to create FlyMine testmodel: non-biological test data model used for testing core InterMine imbuild: ant-based build system, do not edit anything intermine: the core (generic) InterMine code to work with any data model
  2. ObjectStore: custom Java object/relational mapping system, optimized for read-only database performance Query optimizer: pre-computed tables joining connected data from different tables, improves PostgreSQL performance
  3. Attributes can be primitives (int, float, string), references to other objects in database and collections of other objects in database
  4. All elements of the model extend core InterMineObject, which has unique field ‘id’ Only the classes defined in the model are searchable
  5. dbmodel: information about the data model and ant targets relating to the model and database creation integrate: ant targets for loading data postprocess: ant targets to run post-processing operations on data webapp: basic configuration and commands for building and deploying the web application
  6. Summary of web services available through InterMine.