This document describes a new type of heatmap called a "CoolMap" that allows for flexible multi-scale exploration of molecular network data. CoolMaps allow data to be collapsed and aggregated at different levels of a hierarchical tree, enabling visualization and pattern discovery across scales. This approach addresses limitations of conventional heatmaps and enables linking data to existing biological knowledge. Several case studies demonstrate how CoolMaps can provide new insights into gene expression, nutrition, DNA methylation, glucose monitoring, and network data. The core concepts and near-ready software releases are presented, along with acknowledgments.
31. Heatmap… What is it?
‘CoolMap.. I, am your father’
¤ One of the most popular way of visualizing tabular
data
¤ X column, Y row, value color
¤ Trees for hierarchical clustering, or groups are often drawn
along the sides
¤ Great format for visual exploration and pattern discovery
¤ Used along with node-edge network views such as
Cytoscape-clusterExplorer
¤ The paradigm remains largely unchanged
The American Statistician, 2009;!
PNAS Dec. 8, 1998 Vol. 95 No. 25 14863-14868!
Czekanowski (1909)! Brinton (1914)!Loua (1873)! Eisen (1998)!
12k citations!
32. The Good, the Bad, and the Ugly…
of the conventional heatmaps
¤ The Good
¤ Mapping number to color makes it intuitive
¤ Clustering patterns become conspicuous and interpretable
¤ The Bad
¤ Increasingly difficult to visualize and explore big datasets
¤ Difficult for data other than numeric
¤ The Ugly
¤ Difficult to incorporate existing annotations such as pathways and ontologies
¤ Difficult to visualize high-level relationships such as overall pathway to
pathway correlations
The “Figure 1” Phenomena
33. There are known knowns, and there are known unknowns.
PLoS Genet. 2008 Mar 14;4(3):e1000034! BMC Bioinformatics. 2011; 12(Suppl 1); 2011!
How do we relate the unknown to the known:
From observed patterns to existing knowledge interactively and intuitively?
35. The CoolMap Solution:
Nuts and Bolts
¤ Core concept: ‘Collapsible Heatmap’
¤ The tree nodes can be expanded/collapsed at any level:
¤ Think about a two-way multi tree
¤ Collapsed data are represented using aggregation functions (mean,
median, etc.)
¤ The aggregation enables the user to explore data at multiple levels:
¤ Identify potential signals from high level aggregated views
¤ Expand nodes or interest, while keeping the context around
!
Using mean to collapse four
numeric cells
The two way tree can be expanded and
collapsed at multiple levels
36. CoolMap: Core Design Concepts
¤ Extensible Interfaces:
¤ A Loader that imports custom data objects into a ‘base’ matrix
¤ An aggregator that transforms a group of ‘base’ data objects into a ‘view’ data object
¤ A render that renders the ‘view’ data object to the designated region in the interactive view
Example:
¤ Gene expression values of all genes in pathway A, sample group B, aggregated using median,
and rendered in color
[0.5, 1, 2.1, 3.2, 4.3] [2.1]
¤ Nucleotide sequences belong to the same transcription factor binding sites, aggregated using
IUPAC consensus code to a single letter, and rendered in text:
[A,A,A,A,T] [A] A
¤ The ‘base’ matrix can use a variety of data structures, such as arrays, lists, sparse matrices or even remote
services
¤ Flexible Row/Column Ontological Trees:
¤ Multiple-inheritance tree
¤ Genes or metabolites may be shared by multiple pathways or ontological terms, and may
occur more than once.
¤ Trees from different sources
¤ Side by side comparison of different ontologies (GO, KEGG, Hierarchical Clustering)
¤ Trees may be used at any level
¤ Tree nodes at any level can be inserted into any place in the tree.
37. Near-ready Releases
¤ CoolMap Core
¤ Core interfaces, data structures and utility functions for base matrix, view
matrix, ontology trees, renderers, interactive view panels, etc.
¤ CoolMap Application
¤ An application with auxiliary modules such as dynamic multiple dataset
synchronization, searcher, filters, sorters, data persistence etc.
¤ Followed many best practices from Cytoscape
¤ CoolMap Cytoscape Prototype Plugin
¤ A Cytoscape plugin that enables two way communication between
Cytoscape and CoolMap
Our user classroom user study of a group of undergraduate students
with preliminary computer and bioinformatics background shows:
65% found it easy or not difficult to learn
74% highly enjoyed or enjoyed the software
39. Case Study 1: Eisen Yeast Data
Eisen (1998)!
Gene expression fold change of selected gene groups and experiment conditions
CoolMap makes it easier to interpret data from the higher concept levels
CoolMap!
40. Case Study 1: Eisen Yeast Data (con’t)
CoolMap reveals more than meets the eye from conventional heatmaps
The peculiar outlier sample of spo5 2
Fold change reversed across many pathways
Easier to identify in the aggregated view
í
41. Case Study 1: Eisen Yeast Data (con’t)
Using CoolMap’s multi-view link functions to compare different ontology definitions
Left: Go 6096: Glycolysis Right: Eisen’s annotated Glycolysis cluster
Integrate existing knowledge with observed data for hypothesis generation
42. Case Study 2: Diet Induced Differential Gene Expression
¤ Individuals fed on SFA (Saturated Fatty Acid) and Monounsaturated
Fatty Acid (MUFA) diets demonstrate differential gene expression over 8
week span
¤ Authors picked a list of immune related genes showed up-regulation of
these genes
The American journal of clinical nutrition 90,
1656-64 (2009)!
CoolMap!
43. Probe level expression profiles can be maintained
Case Study 2: Diet Induced Differential Gene Expression
(cont’d)
44. Using ontology groups (genders) leads to new discoveries: up-regulated gene groups
and gender-specific responses: weaker patterns. Total of 25k probes
Case Study 2: Diet Induced Differential Gene Expression
(cont’d)
Up-regulated clusters Female-specific Male-specific
45. Case Study 3: Mother-Child Nutrition Data (Unpublished)
v The aggregated group view makes it much easier to interpret at concept level
v We can immediately identify that:
§ BCAA AcylCarnitines(0.45), Long Chain AcylCarnitines(0.34), PPARa methylation
(0.52), ESR Methylation (0.32) are highly correlated between mother and child
Burant C. Unpublished data!
46. Case Study 3: Mother-Child Nutrition Data (Unpublished)
PPARa: One Level Down ê
¤ Validation
¤ Boxplot overlay (left) and expanded view (right) shows the high correlation is unlikely to be a result
from error, outliers or noise (mean 0.52)
¤ Strong association of PPARa methylation levels in mother and child.
¤ Hypothesis
¤ As PPARa regulates genes involved in cell proliferation, cell differentiation and inflammation
responses, the expression profile of these genes may also be correlated in mother and child.
http://www.ncbi.nlm.nih.gov/gene/5465!
Burant C. Unpublished data!
47. Case Study 3: Mother-Child Nutrition Data (Unpublished)
BCAA AcylCarnitines
¤ The Mother-child correlation is lower (mean 0.45)
¤ The BCAA AcylCarnitines intra-child group have a larger variance comparing with Mother
¤ While C3 is highly correlated, C4 has low correlation
48. Case Study 4: DNA Methylation
Missing values and ragged data (unpublished)
¤ Sparse or Ragged matrix
¤ Normalized methylation data: every gene has a different number of methylation sites.
¤ Collapsing by cell line (Caski.1 and Caski.2 cell lines) reveals the aggregated (mean, etc.) normalized
methylation value. Expansion by cell line reveals details for each methylation site.
Sartor M. Unpublished data!
49. Case Study 5: Continuous Glucose Monitoring (CGM)
Display glucose level at:
• a variety of time resolutions:
From 5 min to 1 month
• and sample groups:
age groups, gender
Link hypoglycemia events to blood
sugar changes.
50. Case Study 6: Sequence Analysis Example
¤ Interactive Consensus sequence exploration:
CRP (Catabolite Activator Protein) binding site, 49 sequences in dozens of promoters | Chip-seq
¤ Extend CoolMap: Loader, Aggregator, Renderer [Annotator]
Full Sequence View!
Sequence Logo!
Consensus View!
Consensus View with base percentage overlay!
Consensus View with GC content overlay!
Genome Res. 2004 June; 14(6): 1188-1190!
51. Case Study 7: Network Analysis
¤ Link Cytoscape with CoolMap:
¤ Network node link with CoolMap views, by ID, attribute names, etc.
¤ Explore identified patterns in an experiment to curated networks – an
alternative for JTreeView; create correlation matrices from Cytoscape
numeric attributes;
¤ Use pathways and ontologies to view sub-network to sub-network connectivity
¤ Cluster network based on attributes, and compare unsupervised clustering v.s.
annotated pathways and ontologies.
Need two monitors!
52. Case Study 7: Network Analysis (con’t)
Top Left: MAPK pathway in ‘galFiltered.cys’ network from Cytoscape
Bottom Left: Part of the same network arranged with pathways and the adjacency matrix, and sum as
aggregator. Each cell shows the number of edges within each pathway, as well as the number of
inter-pathway edges. A good ‘community’ clustering will have most of the green dots along the
diagonal
Right: The same view with MAPK pathway expanded, showing dense intra-cluster connectivity
53. Case Study 7: Network Analysis (con’t)
Left: a correlation matrix can be created from gal expression profiles, and then use
pathways to arrange them into a condensed concept correlation view. Hierarchical
clustering can be run from the concept level.
Right: The selected region contains nodes are annotated with KEGG pathway: Cell
cycle and are close to each other in the network
54. Acknowledgement
Thank you!
Primary Advisor
Dr Fan Meng
Committee Mentors
Dr Brian D. Athey (Co-chair)
Dr Charles F. Burant and his lab
Dr Barbara Mirel
Dr Maureen Sartor
Testers
Usability testers and software testers, fellow Bioinformatics brethren.
Development
Please contact me if you are interested in development or testing:
sugang@umich.edu