2. Exploring the human disease network
The diseasome website is a disease/disorder relationships explorer and a sample of an
innovative map‐oriented scientific work. Built by a team of researchers and engineers, it
uses the Human Disease Network dataset and allows intuitive knowledge discovery by
mapping its complexity.
Diseasome – Mathieu Bastian, Sébastien Heymann Online Information 2009, London
3. Original data
Official paper
The Human Disease Network
Goh K‐I, Cusick ME, Valle D, Childs B, Vidal M, Barabási A‐L (2007)
Proc Natl Acad Sci USA 104:8685‐8690
Link: http://www.pnas.org/content/104/21/8685.full
Data retrieved as linked data
Link: http://www4.wiwiss.fu‐berlin.de/diseasome/
Network‐like organization
• 526 diseases and 903 genes in the main sub‐graph
• nodes = disease or gene
• edges = gene‐disorder association, reveal
a common genetic origin
• 22 different categories of diseases: Bone,
Cancer, Cardiovascular etc.
Medical application example
Understanding the the spread of obesity: NYT visualization, 2007
Network Medicine — From Obesity to the "Diseasome",
Albert‐László Barabási
Link: http://content.nejm.org/cgi/content/full/357/4/404
Diseasome – Mathieu Bastian, Sébastien Heymann Online Information 2009, London
4. http://diseasome.eu
The website is a portal for online resources based on data exploration, and contains:
An interactive map
Intuitive access to specific gene/disease documents. The technology is provided by Linkfluence, a
research institute adept in social web studies.
A poster
Printable network of diseases for collaborative analysis and communication.
An expert tool
Embedded graph visualization and manipulation software, Gephi, for advanced exploration.
A book
How information technologies change the way biologists work? Which benefits could we expect? The
Diseasome website offers a practical advocacy for "Biologie ‐ L'ère numérique" (Biology ‐ The digital
era), directed by Magali Roux at INIST‐CNRS.
), y g
Diseasome – Mathieu Bastian, Sébastien Heymann Online Information 2009, London
5. Map overview
The map contains the diseases and genes
relations, presented with nodes and edges.
The nodes represent diseases. White nodes
represent genes. The edges represent correlations
between diseases and genes, or relations between
between diseases and genes, or relations between
diseases if they have a gene in common.
Node color indicates the category it belongs to,
and a disease node’s size indicates its hub degree
d di d ’ i i di t it h b d
(overall number of outbound links).
The pale grey zones in the map indicates a high
density of links. The more links a node send to
gene nodes, the bigger it appears on the map.
The diseasome network
Diseasome – Mathieu Bastian, Sébastien Heymann Online Information 2009, London
6. How did we create the map?
Nodes are positioned on the map according to a topological placement algorithm, i.e. each node is
positioned solely according to its linking pattern. Many softwares are available for doing this. Gephi has
been chosen for its high quality algorithm ForceAtlas.
been chosen for its high quality algorithm ForceAtlas
From original data, several compatible GEXF graph file have been created. Graphs layouts and rendering
have been performed by Gephi network visualization software. Isolated disorders are not shown and
only the giant component has been ketp.
only the giant component has been ketp
Many algorithms make possible for a 2D rendering of an adjacent matrix ‐ i.e. the matrix describing any
graph. We used a ForceAtlas algorithm, which shares with all the others the same basic principle:
minimizing the system s energy while maximizing the use of the space available for the representation
minimizing the system’s energy while maximizing the use of the space available for the representation
of the data. To minimize the system’s energy, one can for instance assume that nodes that are not linked
to each other are pushing away from each other whereas nodes that are linked to each other are
attracting each other. Through iterative steps the algorithm find a balanced spatial placement of the
inherent structure of the network.
These positioning principles call for the following reading conventions:
• A node’s position on the map depends solely upon its links. A node has no predefined position,
the latter being the result of the relations it has with other nodes. This means that a node with no links
the latter being the result of the relations it has with other nodes. This means that a node with no links
at all cannot be positioned on the map;
• North, East, South and West don’t matter. The displayed space is not based on the cardinal system
(North, East, South, West), which means that the choice of a relative left‐right or top‐down position is
p
purely arbitrary;
y y
Diseasome – Mathieu Bastian, Sébastien Heymann Online Information 2009, London
7. Map interactions
• Search by gene‐disease name
• Zoom in/out
• Node selection displaying graphical distinction between inbound and outbound links
• Filtering by category of disease
• Seeing the distribution of the different categories on the map as a pie chart or a bar chart.
Map interface
Map interface
Diseasome – Mathieu Bastian, Sébastien Heymann Online Information 2009, London
8. Access to online related documents and databases
A click on selected node label gives an access to a page aggregating related resources:
• original linked data on D2R server
• TermSciences, the INIST‐CNRS terminological database for science
• MeSH, the Medical Subject Headings vocabulary
• Wikipedia Resources
for a disease
Map
Concept tree on TermSciences
Diseasome – Mathieu Bastian, Sébastien Heymann Online Information 2009, London
9. Printable poster
The poster share results and enhance collaborative work,
by facilitating discussions about the data or the view.
A hi‐resolution printable PDF is available for
communication and collaborative exploration.
Poster
Diseasome – Mathieu Bastian, Sébastien Heymann Online Information 2009, London
10. Expert tool
Users are able to create their own view on data by
launching the Gephi applet in the browser.
It helps to understand how we did, and proposing
graphical alternatives.
Gephi is an open source software available at
an open source software available
http://gephi.org.
Gephi software
Diseasome – Mathieu Bastian, Sébastien Heymann Online Information 2009, London
11. Usages and goals
Usages
• Finding diseases « proximity » linked by shared activated genes
• Browse the related documents from scientific databases: TermScience, MeSH, OMIM
Goals
• Propose an alternative user experience
• Allowing graphical exploration readings and document discovery
• Promoting the book Biologie ‐ L'ère numérique (Biology ‐ The digital era) directed by Magali Roux
(ISBN: 978 2 271 06779 1)
(ISBN: 978‐2‐271‐06779‐1)
Diseasome – Mathieu Bastian, Sébastien Heymann Online Information 2009, London
12. Benefits for scientific document databases
A map is a tool of power, a complex world reduced on a plain surface and an object with shapes a user
can dominate and understand. A main issue remains how to read and interprete them correctly.
Benefits
• Access to weakly or non‐ordered documents with complex relationships.
• Intuitive knowledge discovery.
I t iti k l d di
• Speed up document searching with graphical signs.
“Expedition Zukunft” the German train
presenting the map of science (Kevin W.
Boyack, Katy Börner, & Richard Klavans), 2009
Diseasome – Mathieu Bastian, Sébastien Heymann Online Information 2009, London
13. Perspectives and conclusion
Diseasome is an attempt to outline futur designs of scientific information systems.
Innovative items
• Document relationships design as a way to
hold non‐trivial contexts of research
non‐trivial contexts of research.
• Graph visualization is currently used to
represent different kind of networks (social,
biological, physical, transports)…and mapping
scientific publications
publications.
• Data with network‐like organization may
reveal properties only observable and measurable
by a network‐based approach in analysis and
visualization systems.
l
• Maps allow integrating different kind of data
and dimensions for their exploration and
manipulation.
Multi‐level networks, A.L. Barabasi
Diseasome – Mathieu Bastian, Sébastien Heymann Online Information 2009, London
14. Diseasome
Thank you
y
Mathieu Bastian, Sebastien Heymann
INIST‐CNRS, France
December 2009