Integrated omics analysis pipeline for model organism with Cytoscape, Kozo Nishida
1. Integrated omics analysis
pipeline for model organism
with Cytoscape
Kozo Nishida
RIKEN, Quantitavie Biology Center(QBiC)
Kozo Nishida @ RECOMB2014, Nov 11, 2014 1
2. Goal
Reproducible and modifiable omics analysis pipeline
in a single environment
(www.genome.jp)
(Rohn, 2012)
Omics
experiment
Omics data
analysis
Pathway data
integration
Network
Analysis
Kozo Nishida @ RECOMB2014, Nov 11, 2014 2
3. • Each Process is separated and is NOT easy to reproduce the whole analysis pipeline.
• Especially in need of modifying the process and aggregating the result.
• Cytoscape is good software for pathway data integration and network analysis but…
• NOT the best for whole analysis pipeline, Java app is NOT easy to modify.
• R is common for omics data preprocessing and analysis
• Python is good for data aggregation
• both can be used for data integration and network analysis.
(www.genome.jp) (Rohn, 2012)
Why?
Omics experiment Omics data analysis Pathway data integration Network Analysis
3
5. Seamless, reproducible, and modifiable IPython notebook environment
• Cytoscape is controlled by IPython notebook
• Low-revel access to Cytoscape with cyREST app
• Omics analysis with Bioconductor R packages
• Pathway data integration with Python and graph-database
• KEGG-based pathway data integration with KEGGscape app
5
6. cyREST and KEGGscape app
• cyREST provides us with scripting language interface
• cyREST is useful and suitable for KEGG-based pathway data integration
• KEGGscape supports KEGG pathway xml(KGML) import on Cytoscape
• Difference from CytoKEGG and CyKEGGparser
• CytoKEGG and CyKEGGparser have several additional features, but too
specialized in their purpose and some un-supported pathways.
• KEGGscape simply supports importing and reconstructing KEGG pathway as it
is, as many as KEGG provides. (Currently supports all KEGG pathways.)
Kozo Nishida @ RECOMB2014, Nov 11, 2014 6
7. Demo for E. coli 1
OR
Mapping differentially expressed genes
(Between WT and lrp-) to KEGG
Kozo Nishida @ RECOMB2014, Nov 11, 2014 7
8. Demo for E. coli 2
OR
Mapping E. coli drugtargets to KEGG
Kozo Nishida @ RECOMB2014, Nov 11, 2014 8
9. Other example for Arabidopsis thaliana
OR
Mapping time-series metabolome profile
to KEGG (http://goo.gl/jk01HP)
Kozo Nishida @ RECOMB2014, Nov 11, 2014 9
10. Conclusions, Future work
• Constructed reproducible (and flexible) omics analysis
pipeline with cyREST app.
• You can replace KEGG to WikiPathways, Reactome or
other pathway databases
• Packaging Python and R utility functions
• py2cytoscape (github.com/idekerlab/py2cytoscape)
• More example IPython notebooks!!
•Welcome your contribution, please see
github.com/idekerlab/cy-rest-python
Kozo Nishida @ RECOMB2014, Nov 11, 2014 10
11. Acknowledgments
• The Cytoscape consortium
• Keiichiro Ono (UCSD)
• cyREST, KEGGscape
• Atsushi Fukushima (RIKEN CSRS)
• AtMetExpress Arabidopsis thaliana metabolome database
• Jun Sese (AIST CBRC)
• Mentoring in “Tool Prototype for Integrated Database Analysis” project
This project is supported by National Bioscience Database Center(NBDC), Japan
Kozo Nishida @ RECOMB2014, Nov 11, 2014 11
Notas do Editor
I’m Kozo Nishida. From RIKEN, Japan.
I would like introduce new omics analysis environment project for Cytoscape.
My project goal is to realize reproducible and modifiable omics analysis pipleline in a single environment.
These processes are the component for the pipeline.
The reason why I do this project is
Each process is separated and is NOT easy to reproduce the whole analysis pipeline.
Especially this is hard in need of modifying the process and aggregating the result connecting them.
Of course Cytoscape is good for the latter part of the pipeline, but
Is NOT the best for whole analysis pipeline.
Because this pipeline needs flexibility, but Java app requires compiling and is NOT easy to modify.
And for the former part of pipeline, R language is common for omics data preprocessing and analysis.
And Python language is good for data aggregation and can be used for a general purpose.
These languages are easy to modify the pipeline through a trial and error process.
So I implemented a pipeline like this image.
Usually Cytoscape users mainly control Cytoscape with GUI.
But in my case, Cytoscape is programmatically controlled by IPython notebook with cyREST app.
I leave omics analysis to bioconductor packages, and pathway data integration to Python and graph-database.
And the main pathway integration target is KEGG.
You need to install cyREST and KEGGscape app to reproduce our pipeline.
Python requires cyREST interface to control Cytoscape.
cyREST is useful and suitable for KEGG-based pathway data integration.
And default Cytoscape does not support KEGG pathway.
So you need to install KEGGscape app.
There are CytoKEGG and CyKEGGparser apps for KEGG pathway support in Cytoscape3.
But these are specialized in their workflow, and you may feel difficult to control these app from Python.
So I recommend KEGGscape, currently KEGGscape simply supports importing and reconstructing KEGG pathway as it is.
And currently KEGGscape supports all KEGG pathways.
Next I show you two demoes for E.coli.
1 is a pipeline for mapping differentially expressed genes(between WT and lrp mutant strain) to KEGG
First I import KEGG pathway from Ipython notebook.
At this stage pipeline is not finished, so no data integrated yet.
Next I run whole pipeline, the differentially expressed gene table are merged,
Yellow highlighted nodes are enzyme nodes including the differentially expressed genes.
2 is a pipeline for mapping E.coli drugtargest in drugbank to KEGG.
First I also import a KEGG pathway from Ipython notebook.
At this stage pipeline is not finished yet, no data integrated.
Next I run whole pipeline, the E coli drugtargets in Drugbank are mapped to KEGG pathway.
This column is the drugs, and next column is target protein of KEGG gene product.
These pipelines are independent and you can combine them.
But here for simplicity I separated them two movies.
And RIKEN has rich resource for plant metabolome, so I’m also trying to construct more complicated pipeline for Arabidopsis thaliana.
I cannot show you all metabolome data yet, but you can see a sample metabolome mapping Ipython notebook from here.
I showed you some reproducible omics analysis pipelines with cyREST app.
For example I used KEGG but you can replace target pathway from KEGG to Wikipathways, Reactome or other databases.
Of course you can integrate these all data with Python.
I’ve just started this project, so the Python and R packaging is not finished yet.
I hope to contribute py2cytoscape project.
And Increasing the number of notebook example is important for this project.
If you have interest about notebook contribution, please see this URL.