SlideShare uma empresa Scribd logo
1 de 53
Baixar para ler offline
Bioinformatics to reveal the logic 
of life 
M. Gonzalo Claros Díaz 
Dpto Biología Molecular y Bioquímica 
Plataforma Andaluza de Bioinformática 
@MGClaros 
1 
Centro de Bioinnovación 
http://about.me/mgclaros/
http://www.scbi.uma.es 
The meaning/logic of life 
2
There are many reflections about life 
http://www.scbi.uma.es 
3 
Genetics 
Philosophy 
Religion 
Physics 
And many 
more
A living being for some scientists 
http://www.scbi.uma.es 
4 
The cell is a kind 
of black box
Molecular biology provides some logic… 
http://www.scbi.uma.es 
5 
How to select the few 
combinations having some sense?
http://www.scbi.uma.es 
A hierarchical logic… 
6 
the way back cannot be predicted
In fact, a complex logic plenty of interactions 
http://www.scbi.uma.es 
7
Metabolism offers another source of logic 
http://www.scbi.uma.es 
8
Other sciences were also interested in life logic 
http://www.scbi.uma.es 
9
Bioinformatics = integration 
http://www.scbi.uma.es 
10 
http://bioinformatics.biol.ntnu.edu.tw/sher/Teaching.html
Bioinformatics receives and gives new data and insights 
http://www.scbi.uma.es 
11 
Biology 
The living being is the 
result of all observations 
and cannot be inferred 
Computer 
science Statistics 
from biassed 
observations
A living being for some scientists 
http://www.scbi.uma.es 
12 
The cell is a kind 
of black box
A living being for a bioinformatician 
http://www.scbi.uma.es 
13 
Life ontology
So, we begin to understand 
http://www.scbi.uma.es 
14 
Other 
scientists 
Bioinformatician 
Biotechnologist
Bioinformatics emerged with data accumulation 
http://www.scbi.uma.es 
15
Regarding data, informatics is in the rear of biology 
http://www.scbi.uma.es 
16
Therefore, biology and informatics are interdependent 
http://www.scbi.uma.es 
http://www.genomicglossaries.com/presentation/SLAgenomics.asp 17
http://www.scbi.uma.es 
Without mobility issues 
18
Some logic in living beings based on 
bioinformatics 
19
Bioinformatics integration in alcohol induced disorders 
Through integration and modeling, these studies would allow us to better exploit the complexity 
of genomic and functional genomic data and to extract their biological and clinical significance 
http://www.scbi.uma.es 
http://pubs.niaaa.nih.gov/publications/arh311/5-11.htm 20
Drug discovery was expensive 
Classic approach 
http://www.scbi.uma.es 
21 
Experimental drugs were 
chemically synthesized and 
then tested in animals
Drug discovery was expensive 
Classic approach Bioinformatics approach 
http://www.scbi.uma.es 
21 
Experimental drugs were 
chemically synthesized and 
then tested in animals 
Ligand 
database 
Only candidate drugs are 
synthesized. A cost-effective 
approach
Nobel of chemistry in 2013 
Bioquímico 
Químico teórico Biofísico Bioquímico 
http://www.scbi.uma.es 
22 
Por el desarrollo de modelos 
computacionales para conocer 
y predecir procesos químicos 
http://blogs.plos.org/biologue/2013/10/18/the-significance-of-the-2013-nobel-prize-in-chemistry-and-the-challenges-ahead/
Nobel of chemistry in 2013 
This Nobel Prize is the first given to work in 
computational biology, indicating that the field has 
matured and is on a par with experimental biology 
Bioquímico 
Químico teórico Biofísico Bioquímico 
http://www.scbi.uma.es 
22 
Por el desarrollo de modelos 
computacionales para conocer 
y predecir procesos químicos 
The blog of PLOS Computational Biology 
http://blogs.plos.org/biologue/2013/10/18/the-significance-of-the-2013-nobel-prize-in-chemistry-and-the-challenges-ahead/
A cell was full of molecular cascades 
http://www.scbi.uma.es 
23 
Divergent cascades Convergent cascades
Then, a cell was a subway map 
http://www.scbi.uma.es 
24 
Subway map designed by Claudia Bentley. 
Web design by Nick Allin. 
Edited by Cath Brooksbank and Sandra Clark. 
© 2002 Nature Publishing Group. 
http://www.nature.com/nrc/poster/subpathways/index.html
Finally, a cell is a network 
http://www.scbi.uma.es 
25 
Cell network complexity increases with whole organism 
complexity. Key nodes revealed key functions
Human transcription factor network topology C. Rodriguez-Caso et al. 
Transcription factor network explains some cancers 
Abbreviations 
ER, Erdo¨ s-Re´ nyi; HTFN, human transcription factor network; SF, scale free; SW, small world; TF, transcription factor. 
allow the formation of supramolecular activator or 
inhibitory complexes, depending on their components 
and possible combinations. 
Transcription factors (TFs) are an essential subset of 
interacting proteins responsible for the control of gene 
expression. They interact with DNA regions and tend 
to form transcriptional regulatory complexes. Thus, 
the final effect of one of these complexes is determined 
by its TF composition. 
The number of TFs varies among organisms, 
although it appears to be linked to the organism’s 
complexity. Around 200–300 TFs are predicted for 
Escherichia coli [18] and Saccharomyces [19,20]. By 
contrast, comparative analysis in multicellular organ-isms 
FEBS Journal 272 (2005) 6423–6434 ª 2005 The Authors Journal compilation ª 2005 FEBS 6423 
Topology, tinkering and evolution of the human 
transcription factor network 
Rodriguez-Caso1,2, Miguel A. Medina2 and Ricard V. Sole´ 1,3 
C. Rodriguez-Caso et al. Human transcription factor network topology 
Complex Systems Laboratory, Universitat Pompeu Fabra, Barcelona, Spain 
Department of Molecular Biology and Biochemistry, Faculty of Sciences, Universidad de Ma´laga, Spain 
Table 2. Description and functionality of transcriptions factor hubs. Transcription factor (TF), degree (k), betweenness centrality (b). 
TF Description Associate disease k b· 103 
TBP Basal transcription machinery initiator Spinocerebellar ataxia [40] 27 17.3 
p53 Tumor suppressor protein Proliferative disease [68] 23 18.5 
P300 Coactivator. Histone acetyltransferase May play a role in epithelial cancer [69] 18 20.2 
RXR-a Retinoid X-a receptor Hepatocellular carcinoma [70] 18 8 
pRB retinoblastoma suppressor protein. 
shows that the predicted number of TFs reaches 
600–820 in C. elegans and D. melanogaster [20,21], and 
1500–1800 in Arabidopsis (1200 cloned sequences) 
[20–22]. For humans, around 1500 TFs have been 
documented [21] and it is estimated that there are 
2000–3000 [21,23]. Such an increase in the number of 
TFs is associated with higher control of gene regula-tion 
or via control of TF expression, less connected factors 
may also be relevant to cell survival. 
[24]. Interestingly, such an increase is based on 
Functional and structural patterns from topology 
In order to reveal the mechanisms that shape the struc-ture 
the use of the same structural types of proteins. 
Human transcription factors are predominantly Zn fin-gers, 
followed by homeobox and basic helix–loop–helix 
[21]. Phylogenetic studies have shown that the amplifi-cation 
http://www.scbi.uma.es 
and shuffling of protein domains determine the 
growth of certain transcription factor families [25–28]. 
Fig. 1. Human transcription factor network built from data extracted 
from the TRANSFAC 8.2 database. Numbered black filled nodes 
are the highest connected transcription factors. 1, TATA-binding 
protein (TBP); 2, p53; 3, p300; 4, retinoid X receptor a (RXRa); 5, 
retinoblastoma protein (pRB); 6, nuclear factor NFjB p65 subunit 
(RelA); 7, c-jun; 8, c-myc; 9, c-fos. 
26 
filtering according to criteria given in Experimental 
Topology, tinkering and evolution of the human 
transcription factor network 
Carlos Rodriguez-Caso1,2, Miguel A. Medina2 and Ricard V. Sole´ 1,3 
1 ICREA-Complex Systems Laboratory, Universitat Pompeu Fabra, Barcelona, Spain 
2 Department of Molecular Biology and Biochemistry, Faculty of Sciences, Universidad de Ma´laga, Spain 
3 Santa Fe Institute, Santa Fe, New Mexico, USA 
Living cells are composed of a large number of differ-ent 
molecules interacting with each other to yield com-plex 
spatial and temporal patterns. Unfortunately, this 
reality is seldom captured by traditional and molecular 
biology approaches. A shift from molecular to modular 
biology seems unavoidable [1] as biological systems are 
defined by complex networks of interacting compo-nents. 
Early topological studies of cellular networks 
revealed that genomic, proteomic and metabolic maps 
share characteristic features with other real-world 
networks [8–12]. Protein networks, also called inter-actomes, 
were studied thanks to a massive two-hybrid 
system screening in unicellular Saccharomyces cerevisiae 
[9] and, more recently, in Drosophila melanogaster [13] 
Keywords 
human; molecular evolution; protein 
interaction; tinkering; transcription factor 
network 
Correspondence 
Ricard V. Sole´ , ICREA - Complex System 
Laboratory, Universitat Pompeu Fabra, 
Dr Aiguader 80, 08003 Barcelona, Spain 
Fax: +34 93 221 3237 
Tel: +34 93 542 2821 
E-mail: ricard.sole@upf.edu 
(Received 5 August 2005, revised 25 
October 2005, accepted 31 October 2005) 
doi:10.1111/j.1742-4658.2005.05041.x 
Patterns of protein interactions are organized around complex heterogene-ous 
networks. Their architecture has been suggested to be of relevance in 
understanding the interactome and its functional organization, which per-vades 
cellular robustness. Transcription factors are particularly relevant in 
this context, given their central role in gene regulation. Here we present the 
first topological study of the human protein–protein interacting transcrip-tion 
factor network built using the TRANSFAC database. We show that 
the network exhibits scale-free and small-world properties with a hierarchi-cal 
and modular structure, which is built around a small number of key 
proteins. Most of these proteins are associated with proliferative diseases 
and are typically not linked to each other, thus reducing the propagation 
of failures through compartmentalization. Network modularity is consistent 
with common structural and functional features and the features are gener-ated 
by two distinct evolutionary strategies: amplification and shuffling of 
interacting domains through tinkering and acquisition of specific interact-ing 
regions. The function of the regulatory complexes may have played an 
active role in choosing one of them. 
Fe Institute, Santa Fe, New Mexico, USA 
cells are composed of a large number of differ-ent 
molecules interacting with each other to yield com-plex 
spatial and temporal patterns. Unfortunately, this 
is seldom captured by traditional and molecular 
approaches. A shift from molecular to modular 
seems unavoidable [1] as biological systems are 
by complex networks of interacting compo-nents. 
Early topological studies of cellular networks 
revealed that genomic, proteomic and metabolic maps 
share characteristic features with other real-world 
networks [8–12]. Protein networks, also called inter-actomes, 
were studied thanks to a massive two-hybrid 
system screening in unicellular Saccharomyces cerevisiae 
[9] and, more recently, in Drosophila melanogaster [13] 
Keywords 
molecular evolution; protein 
interaction; tinkering; transcription factor 
Correspondence 
Sole´ , ICREA - Complex System 
Laboratory, Universitat Pompeu Fabra, 
Aiguader 80, 08003 Barcelona, Spain 
93 221 3237 
93 542 2821 
ricard.sole@upf.edu 
Received 5 August 2005, revised 25 
2005, accepted 31 October 2005) 
10.1111/j.1742-4658.2005.05041.x 
Patterns of protein interactions are organized around complex heterogene-ous 
networks. Their architecture has been suggested to be of relevance in 
understanding the interactome and its functional organization, which per-vades 
cellular robustness. Transcription factors are particularly relevant in 
this context, given their central role in gene regulation. Here we present the 
first topological study of the human protein–protein interacting transcrip-tion 
factor network built using the TRANSFAC database. We show that 
the network exhibits scale-free and small-world properties with a hierarchi-cal 
and modular structure, which is built around a small number of key 
proteins. Most of these proteins are associated with proliferative diseases 
and are typically not linked to each other, thus reducing the propagation 
of failures through compartmentalization. Network modularity is consistent 
with common structural and functional features and the features are gener-ated 
by two distinct evolutionary strategies: amplification and shuffling of 
interacting domains through tinkering and acquisition of specific interact-ing 
regions. The function of the regulatory complexes may have played an 
active role in choosing one of them. 
of HTFN, we studied its topological modularity 
in relation to the function and structure of TFs from 
available information. From a structural point of view, 
the overabundance of self-interactions is associated 
with a majority group of 55% of basic helix–loop– 
helix (bHLH) and leucine zippers (bZip), 17.5% of Zn 
fingers and 22.5% corresponding to a more hetero-geneous 
a complex, by varying their function and affinity to 
DNA. This is the case of the bHLH–bZip proto-onco-gen 
c-myc [44], or the Zn finger retinoid X receptor 
RXR [45]. 
From a topological viewpoint, connections by self-interacting 
domains would imply high clustering and 
modularity, because all these proteins share the same 
rules and they have the potential to give a highly inter-connected 
subgraph (i.e. a module). According to this, 
the high clustering of HTFN (see Fig. 1) could be 
explained as a by-product of the overabundance of 
self-interacting domains. 
We wondered whether the HTFN modular architec-ture 
Tumour suppressor protein 
Proliferative disease Bladder cancer. 
Osteosarcoma [71] 
15 27.1 
RelA NF-jB pathway Hepatocyte apoptosis and foetal death [72] 14 6.6 
c-jun AP-1 complex (activator). Proto-oncogen Proliferative disease [73] 14 4.1 
c-myc Activator. Proto-oncogen Proliferative disease [74] 13 10.5 
c-fos AP-1 complex (activator). Proto-oncogen Proliferative disease [75] 12 2 
2 
1 
4 
5 
7 
6 
9
Human transcription factor network topology C. Rodriguez-Caso et al. 
Transcription factor network explains some cancers 
Abbreviations 
ER, Erdo¨ s-Re´ nyi; HTFN, human transcription factor network; SF, scale free; SW, small world; TF, transcription factor. 
allow the formation of supramolecular activator or 
inhibitory complexes, depending on their components 
and possible combinations. 
Transcription factors (TFs) are an essential subset of 
interacting proteins responsible for the control of gene 
expression. They interact with DNA regions and tend 
to form transcriptional regulatory complexes. Thus, 
the final effect of one of these complexes is determined 
by its TF composition. 
The number of TFs varies among organisms, 
although it appears to be linked to the organism’s 
complexity. Around 200–300 TFs are predicted for 
Escherichia coli [18] and Saccharomyces [19,20]. By 
contrast, comparative analysis in multicellular organ-isms 
FEBS Journal 272 (2005) 6423–6434 ª 2005 The Authors Journal compilation ª 2005 FEBS 6423 
Topology, tinkering and evolution of the human 
transcription factor network 
Rodriguez-Caso1,2, Miguel A. Medina2 and Ricard V. Sole´ 1,3 
C. Rodriguez-Caso et al. Human transcription factor network topology 
Complex Systems Laboratory, Universitat Pompeu Fabra, Barcelona, Spain 
Department of Molecular Biology and Biochemistry, Faculty of Sciences, Universidad de Ma´laga, Spain 
Table 2. Description and functionality of transcriptions factor hubs. Transcription factor (TF), degree (k), betweenness centrality (b). 
TF Description Associate disease k b· 103 
TBP Basal transcription machinery initiator Spinocerebellar ataxia [40] 27 17.3 
p53 Tumor suppressor protein Proliferative disease [68] 23 18.5 
P300 Coactivator. Histone acetyltransferase May play a role in epithelial cancer [69] 18 20.2 
RXR-a Retinoid X-a receptor Hepatocellular carcinoma [70] 18 8 
pRB retinoblastoma suppressor protein. 
shows that the predicted number of TFs reaches 
600–820 in C. elegans and D. melanogaster [20,21], and 
1500–1800 in Arabidopsis (1200 cloned sequences) 
[20–22]. For humans, around 1500 TFs have been 
documented [21] and it is estimated that there are 
2000–3000 [21,23]. Such an increase in the number of 
TFs is associated with higher control of gene regula-tion 
or via control of TF expression, less connected factors 
may also be relevant to cell survival. 
[24]. Interestingly, such an increase is based on 
Functional and structural patterns from topology 
In order to reveal the mechanisms that shape the struc-ture 
the use of the same structural types of proteins. 
Human transcription factors are predominantly Zn fin-gers, 
followed by homeobox and basic helix–loop–helix 
[21]. Phylogenetic studies have shown that the amplifi-cation 
http://www.scbi.uma.es 
and shuffling of protein domains determine the 
growth of certain transcription factor families [25–28]. 
Fig. 1. Human transcription factor network built from data extracted 
from the TRANSFAC 8.2 database. Numbered black filled nodes 
are the highest connected transcription factors. 1, TATA-binding 
protein (TBP); 2, p53; 3, p300; 4, retinoid X receptor a (RXRa); 5, 
retinoblastoma protein (pRB); 6, nuclear factor NFjB p65 subunit 
(RelA); 7, c-jun; 8, c-myc; 9, c-fos. 
26 
filtering according to criteria given in Experimental 
Topology, tinkering and evolution of the human 
transcription factor network 
Carlos Rodriguez-Caso1,2, Miguel A. Medina2 and Ricard V. Sole´ 1,3 
1 ICREA-Complex Systems Laboratory, Universitat Pompeu Fabra, Barcelona, Spain 
2 Department of Molecular Biology and Biochemistry, Faculty of Sciences, Universidad de Ma´laga, Spain 
3 Santa Fe Institute, Santa Fe, New Mexico, USA 
Living cells are composed of a large number of differ-ent 
molecules interacting with each other to yield com-plex 
spatial and temporal patterns. Unfortunately, this 
reality is seldom captured by traditional and molecular 
biology approaches. A shift from molecular to modular 
biology seems unavoidable [1] as biological systems are 
defined by complex networks of interacting compo-nents. 
Early topological studies of cellular networks 
revealed that genomic, proteomic and metabolic maps 
share characteristic features with other real-world 
networks [8–12]. Protein networks, also called inter-actomes, 
were studied thanks to a massive two-hybrid 
system screening in unicellular Saccharomyces cerevisiae 
[9] and, more recently, in Drosophila melanogaster [13] 
Keywords 
human; molecular evolution; protein 
interaction; tinkering; transcription factor 
network 
Correspondence 
Ricard V. Sole´ , ICREA - Complex System 
Laboratory, Universitat Pompeu Fabra, 
Dr Aiguader 80, 08003 Barcelona, Spain 
Fax: +34 93 221 3237 
Tel: +34 93 542 2821 
E-mail: ricard.sole@upf.edu 
(Received 5 August 2005, revised 25 
October 2005, accepted 31 October 2005) 
doi:10.1111/j.1742-4658.2005.05041.x 
Patterns of protein interactions are organized around complex heterogene-ous 
networks. Their architecture has been suggested to be of relevance in 
understanding the interactome and its functional organization, which per-vades 
cellular robustness. Transcription factors are particularly relevant in 
this context, given their central role in gene regulation. Here we present the 
first topological study of the human protein–protein interacting transcrip-tion 
factor network built using the TRANSFAC database. We show that 
the network exhibits scale-free and small-world properties with a hierarchi-cal 
and modular structure, which is built around a small number of key 
proteins. Most of these proteins are associated with proliferative diseases 
and are typically not linked to each other, thus reducing the propagation 
of failures through compartmentalization. Network modularity is consistent 
with common structural and functional features and the features are gener-ated 
by two distinct evolutionary strategies: amplification and shuffling of 
interacting domains through tinkering and acquisition of specific interact-ing 
regions. The function of the regulatory complexes may have played an 
active role in choosing one of them. 
Fe Institute, Santa Fe, New Mexico, USA 
cells are composed of a large number of differ-ent 
molecules interacting with each other to yield com-plex 
spatial and temporal patterns. Unfortunately, this 
is seldom captured by traditional and molecular 
approaches. A shift from molecular to modular 
seems unavoidable [1] as biological systems are 
by complex networks of interacting compo-nents. 
Early topological studies of cellular networks 
revealed that genomic, proteomic and metabolic maps 
share characteristic features with other real-world 
networks [8–12]. Protein networks, also called inter-actomes, 
were studied thanks to a massive two-hybrid 
system screening in unicellular Saccharomyces cerevisiae 
[9] and, more recently, in Drosophila melanogaster [13] 
Keywords 
molecular evolution; protein 
interaction; tinkering; transcription factor 
Correspondence 
Sole´ , ICREA - Complex System 
Laboratory, Universitat Pompeu Fabra, 
Aiguader 80, 08003 Barcelona, Spain 
93 221 3237 
93 542 2821 
ricard.sole@upf.edu 
Received 5 August 2005, revised 25 
2005, accepted 31 October 2005) 
10.1111/j.1742-4658.2005.05041.x 
Patterns of protein interactions are organized around complex heterogene-ous 
networks. Their architecture has been suggested to be of relevance in 
understanding the interactome and its functional organization, which per-vades 
cellular robustness. Transcription factors are particularly relevant in 
this context, given their central role in gene regulation. Here we present the 
first topological study of the human protein–protein interacting transcrip-tion 
factor network built using the TRANSFAC database. We show that 
the network exhibits scale-free and small-world properties with a hierarchi-cal 
and modular structure, which is built around a small number of key 
proteins. Most of these proteins are associated with proliferative diseases 
and are typically not linked to each other, thus reducing the propagation 
of failures through compartmentalization. Network modularity is consistent 
with common structural and functional features and the features are gener-ated 
by two distinct evolutionary strategies: amplification and shuffling of 
interacting domains through tinkering and acquisition of specific interact-ing 
regions. The function of the regulatory complexes may have played an 
active role in choosing one of them. 
of HTFN, we studied its topological modularity 
in relation to the function and structure of TFs from 
available information. From a structural point of view, 
the overabundance of self-interactions is associated 
with a majority group of 55% of basic helix–loop– 
helix (bHLH) and leucine zippers (bZip), 17.5% of Zn 
fingers and 22.5% corresponding to a more hetero-geneous 
a complex, by varying their function and affinity to 
DNA. This is the case of the bHLH–bZip proto-onco-gen 
c-myc [44], or the Zn finger retinoid X receptor 
RXR [45]. 
From a topological viewpoint, connections by self-interacting 
domains would imply high clustering and 
modularity, because all these proteins share the same 
rules and they have the potential to give a highly inter-connected 
subgraph (i.e. a module). According to this, 
the high clustering of HTFN (see Fig. 1) could be 
explained as a by-product of the overabundance of 
self-interacting domains. 
We wondered whether the HTFN modular architec-ture 
Tumour suppressor protein 
Proliferative disease Bladder cancer. 
Osteosarcoma [71] 
15 27.1 
RelA NF-jB pathway Hepatocyte apoptosis and foetal death [72] 14 6.6 
c-jun AP-1 complex (activator). Proto-oncogen Proliferative disease [73] 14 4.1 
c-myc Activator. Proto-oncogen Proliferative disease [74] 13 10.5 
c-fos AP-1 complex (activator). Proto-oncogen Proliferative disease [75] 12 2 
2 
1 
4 
5 
7 
6 
9 
At least 9 transcription factors drive to cancer if their 
function is affected
Human transcription factor network topology C. Rodriguez-Caso et al. 
Transcription factor network explains some cancers 
Abbreviations 
ER, Erdo¨ s-Re´ nyi; HTFN, human transcription factor network; SF, scale free; SW, small world; TF, transcription factor. 
allow the formation of supramolecular activator or 
inhibitory complexes, depending on their components 
and possible combinations. 
Transcription factors (TFs) are an essential subset of 
interacting proteins responsible for the control of gene 
expression. They interact with DNA regions and tend 
to form transcriptional regulatory complexes. Thus, 
the final effect of one of these complexes is determined 
by its TF composition. 
The number of TFs varies among organisms, 
although it appears to be linked to the organism’s 
complexity. Around 200–300 TFs are predicted for 
Escherichia coli [18] and Saccharomyces [19,20]. By 
contrast, comparative analysis in multicellular organ-isms 
FEBS Journal 272 (2005) 6423–6434 ª 2005 The Authors Journal compilation ª 2005 FEBS 6423 
Topology, tinkering and evolution of the human 
transcription factor network 
Rodriguez-Caso1,2, Miguel A. Medina2 and Ricard V. Sole´ 1,3 
C. Rodriguez-Caso et al. Human transcription factor network topology 
Complex Systems Laboratory, Universitat Pompeu Fabra, Barcelona, Spain 
If I know the gene network of a process 
Department of Molecular Biology and Biochemistry, Faculty of Sciences, Universidad de Ma´laga, Spain 
Table 2. Description and functionality of transcriptions factor hubs. Transcription factor (TF), degree (k), betweenness centrality (b). 
THEN 
TF Description Associate disease k b· 103 
TBP Basal transcription machinery initiator Spinocerebellar ataxia [40] 27 17.3 
p53 Tumor suppressor protein Proliferative disease [68] 23 18.5 
P300 Coactivator. Histone acetyltransferase May play a role in epithelial cancer [69] 18 20.2 
RXR-a Retinoid X-a receptor Hepatocellular carcinoma [70] 18 8 
pRB retinoblastoma suppressor protein. 
I can predict which genes are really essential 
shows that the predicted number of TFs reaches 
600–820 in C. elegans and D. melanogaster [20,21], and 
1500–1800 in Arabidopsis (1200 cloned sequences) 
[20–22]. For humans, around 1500 TFs have been 
documented [21] and it is estimated that there are 
2000–3000 [21,23]. Such an increase in the number of 
TFs is associated with higher control of gene regula-tion 
or via control of TF expression, less connected factors 
may also be relevant to cell survival. 
[24]. Interestingly, such an increase is based on 
Functional and structural patterns from topology 
In order to reveal the mechanisms that shape the struc-ture 
the use of the same structural types of proteins. 
Human transcription factors are predominantly Zn fin-gers, 
followed by homeobox and basic helix–loop–helix 
[21]. Phylogenetic studies have shown that the amplifi-cation 
http://www.scbi.uma.es 
and shuffling of protein domains determine the 
growth of certain transcription factor families [25–28]. 
Fig. 1. Human transcription factor network built from data extracted 
from the TRANSFAC 8.2 database. Numbered black filled nodes 
are the highest connected transcription factors. 1, TATA-binding 
protein (TBP); 2, p53; 3, p300; 4, retinoid X receptor a (RXRa); 5, 
retinoblastoma protein (pRB); 6, nuclear factor NFjB p65 subunit 
(RelA); 7, c-jun; 8, c-myc; 9, c-fos. 
26 
filtering according to criteria given in Experimental 
Topology, tinkering and evolution of the human 
transcription factor network 
Carlos Rodriguez-Caso1,2, Miguel A. Medina2 and Ricard V. Sole´ 1,3 
1 ICREA-Complex Systems Laboratory, Universitat Pompeu Fabra, Barcelona, Spain 
2 Department of Molecular Biology and Biochemistry, Faculty of Sciences, Universidad de Ma´laga, Spain 
3 Santa Fe Institute, Santa Fe, New Mexico, USA 
Living cells are composed of a large number of differ-ent 
molecules interacting with each other to yield com-plex 
spatial and temporal patterns. Unfortunately, this 
reality is seldom captured by traditional and molecular 
biology approaches. A shift from molecular to modular 
biology seems unavoidable [1] as biological systems are 
defined by complex networks of interacting compo-nents. 
Early topological studies of cellular networks 
revealed that genomic, proteomic and metabolic maps 
share characteristic features with other real-world 
networks [8–12]. Protein networks, also called inter-actomes, 
were studied thanks to a massive two-hybrid 
system screening in unicellular Saccharomyces cerevisiae 
[9] and, more recently, in Drosophila melanogaster [13] 
Keywords 
human; molecular evolution; protein 
interaction; tinkering; transcription factor 
network 
Correspondence 
Ricard V. Sole´ , ICREA - Complex System 
Laboratory, Universitat Pompeu Fabra, 
Dr Aiguader 80, 08003 Barcelona, Spain 
Fax: +34 93 221 3237 
Tel: +34 93 542 2821 
E-mail: ricard.sole@upf.edu 
(Received 5 August 2005, revised 25 
October 2005, accepted 31 October 2005) 
doi:10.1111/j.1742-4658.2005.05041.x 
Patterns of protein interactions are organized around complex heterogene-ous 
networks. Their architecture has been suggested to be of relevance in 
understanding the interactome and its functional organization, which per-vades 
cellular robustness. Transcription factors are particularly relevant in 
this context, given their central role in gene regulation. Here we present the 
first topological study of the human protein–protein interacting transcrip-tion 
factor network built using the TRANSFAC database. We show that 
the network exhibits scale-free and small-world properties with a hierarchi-cal 
and modular structure, which is built around a small number of key 
proteins. Most of these proteins are associated with proliferative diseases 
and are typically not linked to each other, thus reducing the propagation 
of failures through compartmentalization. Network modularity is consistent 
with common structural and functional features and the features are gener-ated 
by two distinct evolutionary strategies: amplification and shuffling of 
interacting domains through tinkering and acquisition of specific interact-ing 
regions. The function of the regulatory complexes may have played an 
active role in choosing one of them. 
Fe Institute, Santa Fe, New Mexico, USA 
cells are composed of a large number of differ-ent 
molecules interacting with each other to yield com-plex 
spatial and temporal patterns. Unfortunately, this 
is seldom captured by traditional and molecular 
approaches. A shift from molecular to modular 
seems unavoidable [1] as biological systems are 
by complex networks of interacting compo-nents. 
Early topological studies of cellular networks 
revealed that genomic, proteomic and metabolic maps 
share characteristic features with other real-world 
networks [8–12]. Protein networks, also called inter-actomes, 
were studied thanks to a massive two-hybrid 
system screening in unicellular Saccharomyces cerevisiae 
[9] and, more recently, in Drosophila melanogaster [13] 
Keywords 
molecular evolution; protein 
interaction; tinkering; transcription factor 
Correspondence 
Sole´ , ICREA - Complex System 
Laboratory, Universitat Pompeu Fabra, 
Aiguader 80, 08003 Barcelona, Spain 
93 221 3237 
93 542 2821 
ricard.sole@upf.edu 
Received 5 August 2005, revised 25 
2005, accepted 31 October 2005) 
10.1111/j.1742-4658.2005.05041.x 
Patterns of protein interactions are organized around complex heterogene-ous 
networks. Their architecture has been suggested to be of relevance in 
understanding the interactome and its functional organization, which per-vades 
cellular robustness. Transcription factors are particularly relevant in 
this context, given their central role in gene regulation. Here we present the 
first topological study of the human protein–protein interacting transcrip-tion 
factor network built using the TRANSFAC database. We show that 
the network exhibits scale-free and small-world properties with a hierarchi-cal 
and modular structure, which is built around a small number of key 
proteins. Most of these proteins are associated with proliferative diseases 
and are typically not linked to each other, thus reducing the propagation 
of failures through compartmentalization. Network modularity is consistent 
with common structural and functional features and the features are gener-ated 
by two distinct evolutionary strategies: amplification and shuffling of 
interacting domains through tinkering and acquisition of specific interact-ing 
regions. The function of the regulatory complexes may have played an 
active role in choosing one of them. 
of HTFN, we studied its topological modularity 
in relation to the function and structure of TFs from 
available information. From a structural point of view, 
the overabundance of self-interactions is associated 
with a majority group of 55% of basic helix–loop– 
helix (bHLH) and leucine zippers (bZip), 17.5% of Zn 
fingers and 22.5% corresponding to a more hetero-geneous 
a complex, by varying their function and affinity to 
DNA. This is the case of the bHLH–bZip proto-onco-gen 
c-myc [44], or the Zn finger retinoid X receptor 
RXR [45]. 
From a topological viewpoint, connections by self-interacting 
domains would imply high clustering and 
modularity, because all these proteins share the same 
rules and they have the potential to give a highly inter-connected 
subgraph (i.e. a module). According to this, 
the high clustering of HTFN (see Fig. 1) could be 
explained as a by-product of the overabundance of 
self-interacting domains. 
We wondered whether the HTFN modular architec-ture 
Tumour suppressor protein 
Proliferative disease Bladder cancer. 
Osteosarcoma [71] 
15 27.1 
RelA NF-jB pathway Hepatocyte apoptosis and foetal death [72] 14 6.6 
c-jun AP-1 complex (activator). Proto-oncogen Proliferative disease [73] 14 4.1 
c-myc Activator. Proto-oncogen Proliferative disease [74] 13 10.5 
c-fos AP-1 complex (activator). Proto-oncogen Proliferative disease [75] 12 2 
2 
1 
4 
5 
7 
6 
9 
At least 9 transcription factors drive to cancer if their 
function is affected
Biomarkers can be obtained from the observation of 
http://www.scbi.uma.es 
bioinformatics networks 
27 
Breast cancer
Gene R.M. Luque-Baena signatures et al. / Journal of Biomedical Informatics to 49 (2014) cancer 32–44 
diagnosis 
Table 5 
Performance comparison among the ‘‘Filter + GA + Pathway’’ combined strategy and three well-known filtering methods (Cons, IG and ReliefF). ACC and number of genes 
mean ± std) are reported for LDA and SVM classifiers on the three analyzed datasets. 
Strategy Leukemia 
http://www.scbi.uma.es 
28 
Journal of Biomedical Informatics 49 (2014) 32–44 
Contents lists available at ScienceDirect 
Journal of Biomedical Informatics 
journal homepage: www.elsevier.com/locate/yjbin 
Robust gene signatures from microarray data using genetic algorithms 
enriched with biological pathway keywords 
R.M. Luque-Baena a,⇑, D. Urda a,b, M. Gonzalo Claros c, L. Franco a,b, J.M. Jerez a,b 
a Departmento de Lenguajes y Ciencias de la Computación, University of Málaga, Bulevar Louis Pasteur, 35, 29071 Málaga, Spain 
b Instituto de Investigación Biomédica de Málaga (IBIMA), Málaga, Spain 
c Supercomputing and Bioinformatics Centre, University of Málaga, C/ Severo Ochoa, 34, 29590 Málaga, Spain 
a r t i c l e i n f o 
Article history: 
Received 24 July 2013 
Accepted 16 January 2014 
Available online 27 January 2014 
Keywords: 
DNA analysis 
Evolutionary algorithms 
Biological enrichment 
Feature selection 
a b s t r a c t 
Genetic algorithms are widely used in the estimation of expression profiles from microarrays data. How-ever, 
these techniques are unable to produce stable and robust solutions suitable to use in clinical and bio-medical 
studies. This paper presents a novel two-stage evolutionary strategy for gene feature selection 
combining the genetic algorithm with biological information extracted from the KEGG database. A com-parative 
study is carried out over public data from three different types of cancer (leukemia, lung cancer 
and prostate cancer). Even though the analyses only use features having KEGG information, the results 
demonstrate that this two-stage evolutionary strategy increased the consistency, robustness and accuracy 
of a blind discrimination among relapsed and healthy individuals. Therefore, this approach could facilitate 
the definition of gene signatures for the clinical prognosis and diagnostic of cancer diseases in a near 
future. Additionally, it could also be used for biological knowledge discovery about the studied disease. 
! 2014 Elsevier Inc. All rights reserved. 
1. Introduction 
domain of DNA microarrays. Genetic algorithms (GAs) [13–18], 
as a particular case of evolutionary models, use classification tech-niques 
Journal of Biomedical Informatics 49 (2014) 32–44 
LDA SVM 
ACC #Genes ACC #Genes 
Filter + GA + Pathway 05340 97.13 ± 1.16 31.83 ± 1.86 93.87 ± 2.02 30.82 ± 1.62 
Filter + GA + Pathway 04640 96.38 ± 1.26 4.47 ± 0.71 94.86 ± 1.13 4.05 ± 0.80 
Cons 85.85 ± 8.55 1.84 ± 0.51 88.24 ± 5.95 1.84 ± 0.51 
IG 93.13 ± 4.40 9 ± 0 93.36 ± 4.33 9 ± 0 
ReliefF 93.31 ± 4.37 9 ± 0 90.48 ± 5.15 9 ± 0 
Lung 
Filter + GA + Pathway 04144 98.09 ± 0.68 4.29 ± 0.53 96.25 ± 0.97 4.15 ± 0.57 
Filter + GA + Pathway 04530 98.26 ± 0.46 3.84 ± 0.46 97.05 ± 0.90 3.55 ± 0.64 
Cons 94.08 ± 3.36 1.84 ± 0.42 94.57 ± 2.55 1.84 ± 0.42 
IG 98.68 ± 1.51 22 ± 0 98.88 ± 1.39 22 ± 0 
ReliefF 97.89 ± 1.81 22 ± 0 98.47 ± 1.43 22 ± 0 
Prostate 
Filter + GA + Pathway 00980 91.37 ± 1.15 8.27 ± 0.83 87.96 ± 2.39 11.15 ± 2.10 
Filter + GA + Pathway 00480 90.80 ± 1.36 14.30 ± 2.63 88.90 ± 2.29 26.24 ± 4.02 
Cons 81.51 ± 7.57 3.20 ± 0.67 82.49 ± 6.72 3.20 ± 0.67 
IG 91.66 ± 4.07 12 ± 0 85.86 ± 4.86 12 ± 0 
ReliefF 90.22 ± 4.53 12 ± 0 88.50 ± 5.17 12 ± 0
Gene R.M. Luque-Baena signatures et al. / Journal of Biomedical Informatics to 49 (2014) cancer 32–44 
diagnosis 
Table 5 
Performance comparison among the ‘‘Filter + GA + Pathway’’ combined strategy and three well-known filtering methods (Cons, IG and ReliefF). ACC and number of genes 
mean ± std) are reported for LDA and SVM classifiers on the three analyzed datasets. 
Strategy Leukemia 
If I have determined a gene signature 
http://www.scbi.uma.es 
28 
Journal of Biomedical Informatics 49 (2014) 32–44 
Contents lists available at ScienceDirect 
Journal of Biomedical Informatics 
journal homepage: www.elsevier.com/locate/yjbin 
Robust gene signatures from microarray data using genetic algorithms 
enriched with biological pathway keywords 
R.M. Luque-Baena a,⇑, D. Urda a,b, M. Gonzalo Claros c, L. Franco a,b, J.M. Jerez a,b 
a Departmento de Lenguajes y Ciencias de la Computación, University of Málaga, Bulevar Louis Pasteur, 35, 29071 Málaga, Spain 
b Instituto de Investigación Biomédica de Málaga (IBIMA), Málaga, Spain 
c Supercomputing and Bioinformatics Centre, University of Málaga, C/ Severo Ochoa, 34, 29590 Málaga, Spain 
a r t i c l e i n f o 
Article history: 
Received 24 July 2013 
Accepted 16 January 2014 
Available online 27 January 2014 
Keywords: 
DNA analysis 
Evolutionary algorithms 
Biological enrichment 
Feature selection 
a b s t r a c t 
Genetic algorithms are widely used in the estimation of expression profiles from microarrays data. How-ever, 
these techniques are unable to produce stable and robust solutions suitable to use in clinical and bio-medical 
studies. This paper presents a novel two-stage evolutionary strategy for gene feature selection 
combining the genetic algorithm with biological information extracted from the KEGG database. A com-parative 
study is carried out over public data from three different types of cancer (leukemia, lung cancer 
and prostate cancer). Even though the analyses only use features having KEGG information, the results 
demonstrate that this two-stage evolutionary strategy increased the consistency, robustness and accuracy 
of a blind discrimination among relapsed and healthy individuals. Therefore, this approach could facilitate 
the definition of gene signatures for the clinical prognosis and diagnostic of cancer diseases in a near 
future. Additionally, it could also be used for biological knowledge discovery about the studied disease. 
! 2014 Elsevier Inc. All rights reserved. 
1. Introduction 
domain of DNA microarrays. Genetic algorithms (GAs) [13–18], 
as a particular case of evolutionary models, use classification tech-niques 
Journal of Biomedical Informatics 49 (2014) 32–44 
LDA SVM 
ACC #Genes ACC #Genes 
Filter + GA + Pathway 05340 97.13 ± 1.16 31.83 ± 1.86 93.87 ± 2.02 30.82 ± 1.62 
Filter + GA + Pathway 04640 96.38 ± 1.26 4.47 ± 0.71 Cons 85.85 ± 8.55 1.84 ± 0.51 THEN 
94.86 ± 1.13 4.05 ± 0.80 
88.24 ± 5.95 1.84 ± 0.51 
IG 93.13 ± 4.40 9 ± 0 93.36 ± 4.33 9 ± 0 
ReliefF 93.31 Lung 
I ± 4.37 can know 9 ± which 0 is the 90.48 ± 5.15 desease 
9 ± 0 
Filter + GA + Pathway 04144 98.09 ± 0.68 4.29 ± 0.53 96.25 ± 0.97 4.15 ± 0.57 
Filter + GA + Pathway 04530 98.26 ± 0.46 3.84 ± 0.46 97.05 ± 0.90 3.55 ± 0.64 
Cons 94.08 ± 3.36 1.84 ± 0.42 94.57 ± 2.55 1.84 ± 0.42 
IG 98.68 ± 1.51 22 ± 0 98.88 ± 1.39 22 ± 0 
ReliefF 97.89 ± 1.81 22 ± 0 98.47 ± 1.43 22 ± 0 
Prostate 
Filter + GA + Pathway 00980 91.37 ± 1.15 8.27 ± 0.83 87.96 ± 2.39 11.15 ± 2.10 
Filter + GA + Pathway 00480 90.80 ± 1.36 14.30 ± 2.63 88.90 ± 2.29 26.24 ± 4.02 
Cons 81.51 ± 7.57 3.20 ± 0.67 82.49 ± 6.72 3.20 ± 0.67 
IG 91.66 ± 4.07 12 ± 0 85.86 ± 4.86 12 ± 0 
ReliefF 90.22 ± 4.53 12 ± 0 88.50 ± 5.17 12 ± 0
Cancer signatures to reveal prognosis 
Competing Interests: The authors have declared that no competing interests exist. 
* E-mail: jlozano@uma.es 
. These authors contributed equally to this work. 
Introduction 
Breast cancer comprises a group of heterogeneous diseases that 
can be classified based on both clinical and molecular features [1– 
5]. Improvements in the early detection of primary tumors and the 
development of novel targeted therapies, together with the 
systematic use of adjuvant chemotherapy, has drastically reduced 
mortality rates and increased disease-free survival (DFS) in breast 
cancer. Still, about one third of patients undergoing breast tumor 
excision will develop metastases, the major life-threatening event 
which is strongly associated with poor outcome [6,7]. 
early recurrence in breast cancer. Hierarchical clustering of the 71 tumor samples based 
lower expression levels of the 5-miRNA signature defines a distinct cluster 2b wich mainly includes 
contrary, most patients with good prognosis (group A) had tumors with normal or higher-than 
different cluster 1b (‘‘low risk’’). 
The risk of relapse after tumor resection is not constant over 
time. A detailed examination of large series of long-term follow-up 
studies over the last two decades reveals a bimodal hazard function 
with two peaks of early and late recurrence occurring at 1.5 and 5 
Table 2). MiR- 
RT-qPCR data 
2). Next, we re-clustered 
signature. As 
B were clearly 
discriminates tumors with an overall higher risk of early 
recurrence. 
The 5-miRNA signature 
PLOS ONE | www.plosone.org 1 March 2014 | Volume 9 | Issue 3 | e91884 
included most of the 
A in cluster 1b 
risk). Of note, the 
group C (72.8%), 
MiR-149 was the most significant miRNA downregulated in 
group B, as determined by microarray hybridization and by RT-qPCR. 
This miRNA has been described as a TS-miR that 
regulates the expression of genes associated with cell cycle, 
invasion or migration and its downregulation has been observed in 
several tumor diseases, including gastric cancer and breast cancer 
[70,77–81]. Down-regulation of miR-149 can occur epigenetical- 
http://www.scbi.uma.es 
29 
years, respectively, followed by a nearly flat plateau in which the 
risk of relapse tends to zero [8–10]. A causal link between tumor 
surgery and the bimodal pattern of recurrence has been proposed 
by some investigators (i.e. an iatrogenic effect) [11]. According to 
that model, surgical removal of the primary breast tumor would 
accelerate the growth of dormant metastatic foci by altering the 
balance between circulating pro- and anti-angiogenic factors 
[9,11–14]. Such hypothesis is supported by the fact that the two 
peaks of relapse are observed regardless other factors than surgery, 
such as the axillary nodal status, the type of surgery or the 
administration of adjuvant therapy. Although estrogen receptor 
(ER)-negative tumors are commonly associated with a higher risk 
of early relapse [15], the bimodal distribution pattern is observed 
with independence of the hormone receptor status [16]. Other 
studies also suggest that the dynamics of tumor relapse may be a 
A microRNA Signature Associated with Early Recurrence 
in Breast Cancer 
Luis G. Pe´ rez-Rivas1., Jose´ M. Jerez2., Rosario Carmona3, Vanessa de Luque1, Luis Vicioso4, 
M. Gonzalo Claros3,5, Enrique Viguera6, Bella Pajares1, Alfonso Sa´nchez1, Nuria Ribelles1, 
Emilio Alba1, Jose´ Lozano1,5* 
1 Laboratorio de Oncologı´a Molecular, Servicio de Oncologı´a Me´dica, Instituto de Biomedicina de Ma´laga (IBIMA), Hospital Universitario Virgen de la Victoria, Ma´laga, 
Spain, 2 Departamento de Lenguajes y Ciencias de la Computacio´ n, Universidad de Ma´laga, Ma´laga, Spain, 3 Plataforma Andaluza de Bioinforma´ tica, Universidad de 
Ma´laga, Ma´laga, Spain, 4 Servicio de Anatomı´a Patolo´ gica, Instituto de Biomedicina de Ma´laga (IBIMA), Hospital Universitario Virgen de la Victoria, Ma´laga, Spain, 
5 Departmento de Biologı´a Molecular y Bioquı´mica, Universidad de Ma´laga, Ma´laga, Spain, 6 Departmento of Biologı´a Celular, Gene´ tica y Fisiologı´a Animal, Universidad de 
Ma´laga, Ma´laga, Spain 
signature specifically 
Abstract 
Recurrent breast cancer occurring after the initial treatment is associated with poor outcome. A bimodal relapse pattern 
after surgery for primary tumor has been described with peaks of early and late recurrence occurring at about 2 and 5 years, 
respectively. Although several clinical and pathological features have been used to discriminate between low- and high-risk 
patients, the identification of molecular biomarkers with prognostic value remains an unmet need in the current 
management of breast cancer. Using microarray-based technology, we have performed a microRNA expression analysis in 
71 primary breast tumors from patients that either remained disease-free at 5 years post-surgery (group A) or developed 
early (group B) or late (group C) recurrence. Unsupervised hierarchical clustering of microRNA expression data segregated 
tumors in two groups, mainly corresponding to patients with early recurrence and those with no recurrence. Microarray 
data analysis and RT-qPCR validation led to the identification of a set of 5 microRNAs (the 5-miRNA signature) differentially 
expressed between these two groups: miR-149, miR-10a, miR-20b, miR-30a-3p and miR-342-5p. All five microRNAs were 
down-regulated in tumors from patients with early recurrence. We show here that the 5-miRNA signature defines a high-risk 
group of patients with shorter relapse-free survival and has predictive value to discriminate non-relapsing versus early-relapsing 
patients (AUC = 0.993, p-value,0.05). Network analysis based on miRNA-target interactions curated by public 
databases suggests that down-regulation of the 5-miRNA signature in the subset of early-relapsing tumors would result in 
an overall increased proliferative and angiogenic capacity. In summary, we have identified a set of recurrence-related 
microRNAs with potential prognostic value to identify patients who will likely develop metastasis early after primary breast 
surgery. 
Citation: Pe´rez-Rivas LG, Jerez JM, Carmona R, de Luque V, Vicioso L, et al. (2014) A microRNA Signature Associated with Early Recurrence in Breast Cancer. PLoS 
ONE 9(3): e91884. doi:10.1371/journal.pone.0091884 
Editor: Sonia Rocha, University of Dundee, United Kingdom 
Received November 11, 2013; Accepted February 14, 2014; Published March 14, 2014 
Copyright: ! 2014 Pe´rez-Rivas et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits 
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 
Funding: This work was supported by a grant from the Spanish Society of Medical Oncology (SEOM, to NR) and by grants from the Spanish Ministerio de 
Economı´a, (SAF2010-20203 to J.L and TIN2010-16556 to J.J) and from the Junta de Andalucı´a (TIN-4026, to JJ). The funders had no role in study design, data 
collection and analysis, decision to publish, or preparation of the manuscript. 
Competing Interests: The authors have declared that no competing interests exist. 
* E-mail: jlozano@uma.es 
. These authors contributed equally to this work. 
patients with diferent RFS. A) Kaplan-Meier graph for the whole patient cohort included in 
overall down-regulation of the 5-miRNA signature (i.e. those from cluster 2b in Fig. 2) were 
RFS was calculated (red line). RFS was also calculated for the remaining patients in the cohort 
that the 5-miRNA signature specifically discriminates tumors with an overall higher risk of early 
post-recurrence survival [100], likely because it targets AKT1 
mRNA [101]. 
In sum, the available bibliographic data suggests that down-regulation 
of miR-149, miR-30a-3p, miR-20b, miR-10a and 
miR342-5p in primary breast tumors could confer them enhanced 
proliferative, angiogenic and invasive potentials. 
Prognostic value of the 5-miRNA signature. The relation-ship 
between expression of the 5-miRNA signature and RFS was 
examined by a survival analysis. Figure 3A shows a Kaplan-Meier 
graph for the whole series of patients included in the study. Due to 
the intrinsic characteristics of the cohort, decreases in the RFS are 
only observed in the intervals 0–24 and 50–60 months 
(corresponding to groups B and C, respectively). We next grouped 
the tumors according to their 5-miRNA signature status in two 
different groups. One group included those tumors with all five 
miRNAs simultaneously downregulated, (FC.2 and p,0.05) and 
a second group included those tumors not having all five miRNAs 
downregulated. A survival analysis was performed using clinical 
data from the corresponding patients. As shown in Figure 3B, the 
Kaplan-Meier graphs for the two groups demonstrate that the 5- 
miRNA signature defines a ‘‘high risk’’ group of patients with a 
shorter RFS (Peto-Peto test with p-value = 0.02, when comparing 
Figure 4. Receiver operating characteristic curve (ROC) for 
early breast cancer recurrence by the 5-miRNA signature 
status. ROC curves generated using the prognosis information and 
expression levels of the 5-miRNA signature can discriminate between 
A miRNA Signature Predictive of Early Recurrence 
A miRNA Signature Predictive of Early Recurrence
Cancer signatures to reveal prognosis 
Competing Interests: The authors have declared that no competing interests exist. 
* E-mail: jlozano@uma.es 
. These authors contributed equally to this work. 
Introduction 
Breast cancer comprises a group of heterogeneous diseases that 
can be classified based on both clinical and molecular features [1– 
5]. Improvements in the early detection of primary tumors and the 
development of novel targeted therapies, together with the 
systematic use of adjuvant chemotherapy, has drastically reduced 
mortality rates and increased disease-free survival (DFS) in breast 
cancer. Still, about one third of patients undergoing breast tumor 
excision will develop metastases, the major life-threatening event 
which is strongly associated with poor outcome [6,7]. 
early recurrence in breast cancer. Hierarchical clustering of the 71 tumor samples based 
lower expression levels of the 5-miRNA signature defines a distinct cluster 2b wich mainly includes 
contrary, most patients with good prognosis (group A) had tumors with normal or higher-than 
different cluster 1b (‘‘low risk’’). 
The risk of relapse after tumor resection is not constant over 
time. A detailed examination of large series of long-term follow-up 
studies over the last two decades reveals a bimodal hazard function 
with two peaks of early and late recurrence occurring at 1.5 and 5 
Table 2). MiR- 
RT-qPCR data 
2). Next, we re-clustered 
signature. As 
B were clearly 
discriminates tumors with an overall higher risk of early 
recurrence. 
The 5-miRNA signature 
PLOS ONE | www.plosone.org 1 March 2014 | Volume 9 | Issue 3 | e91884 
included most of the 
A in cluster 1b 
risk). Of note, the 
group C (72.8%), 
MiR-149 was the most significant miRNA downregulated in 
group B, as determined by microarray hybridization and by RT-qPCR. 
This miRNA has been described as a TS-miR that 
regulates the expression of genes associated with cell cycle, 
invasion or migration and its downregulation has been observed in 
several tumor diseases, including gastric cancer and breast cancer 
[70,77–81]. Down-regulation of miR-149 can occur epigenetical- 
http://www.scbi.uma.es 
29 
years, respectively, followed by a nearly flat plateau in which the 
risk of relapse tends to zero [8–10]. A causal link between tumor 
surgery and the bimodal pattern of recurrence has been proposed 
by some investigators (i.e. an iatrogenic effect) [11]. According to 
that model, surgical removal of the primary breast tumor would 
accelerate the growth of dormant metastatic foci by altering the 
balance between circulating pro- and anti-angiogenic factors 
[9,11–14]. Such hypothesis is supported by the fact that the two 
peaks of relapse are observed regardless other factors than surgery, 
such as the axillary nodal status, the type of surgery or the 
administration of adjuvant therapy. Although estrogen receptor 
(ER)-negative tumors are commonly associated with a higher risk 
of early relapse [15], the bimodal distribution pattern is observed 
with independence of the hormone receptor status [16]. Other 
studies also suggest that the dynamics of tumor relapse may be a 
A microRNA Signature Associated with Early Recurrence 
in Breast Cancer 
Luis G. Pe´ rez-Rivas1., Jose´ M. Jerez2., Rosario Carmona3, Vanessa de Luque1, Luis Vicioso4, 
M. Gonzalo Claros3,5, Enrique Viguera6, Bella Pajares1, Alfonso Sa´nchez1, Nuria Ribelles1, 
Emilio Alba1, Jose´ Lozano1,5* 
1 Laboratorio de Oncologı´a Molecular, Servicio de Oncologı´a Me´dica, Instituto de Biomedicina de Ma´laga (IBIMA), Hospital Universitario Virgen de la Victoria, Ma´laga, 
Spain, 2 Departamento de Lenguajes y Ciencias de la Computacio´ n, Universidad de Ma´laga, Ma´laga, Spain, 3 Plataforma Andaluza de Bioinforma´ tica, Universidad de 
Ma´laga, Ma´laga, Spain, 4 Servicio de Anatomı´a Patolo´ gica, Instituto de Biomedicina de Ma´laga (IBIMA), Hospital Universitario Virgen de la Victoria, Ma´laga, Spain, 
5 Departmento de Biologı´a Molecular y Bioquı´mica, Universidad de Ma´laga, Ma´laga, Spain, 6 Departmento of Biologı´a Celular, Gene´ tica y Fisiologı´a Animal, Universidad de 
Ma´laga, Ma´laga, Spain 
signature specifically 
Abstract 
Recurrent breast cancer occurring after the initial treatment is associated with poor outcome. A bimodal relapse pattern 
after surgery for primary tumor has been described with peaks of early and late recurrence occurring at about 2 and 5 years, 
respectively. Although several clinical and pathological features have been used to discriminate between low- and high-risk 
patients, the identification of molecular biomarkers with prognostic value remains an unmet need in the current 
management of breast cancer. Using microarray-based technology, we have performed a microRNA expression analysis in 
71 primary breast tumors from patients that either remained disease-free at 5 years post-surgery (group A) or developed 
early (group B) or late (group C) recurrence. Unsupervised hierarchical clustering of microRNA expression data segregated 
tumors in two groups, mainly corresponding to patients with early recurrence and those with no recurrence. Microarray 
data analysis and RT-qPCR validation led to the identification of a set of 5 microRNAs (the 5-miRNA signature) differentially 
expressed between these two groups: miR-149, miR-10a, miR-20b, miR-30a-3p and miR-342-5p. All five microRNAs were 
down-regulated in tumors from patients with early recurrence. We show here that the 5-miRNA signature defines a high-risk 
group of patients with shorter relapse-free survival and has predictive value to discriminate non-relapsing versus early-relapsing 
patients (AUC = 0.993, p-value,0.05). Network analysis based on miRNA-target interactions curated by public 
databases suggests that down-regulation of the 5-miRNA signature in the subset of early-relapsing tumors would result in 
an overall increased proliferative and angiogenic capacity. In summary, we have identified a set of recurrence-related 
microRNAs with potential prognostic value to identify patients who will likely develop metastasis early after primary breast 
surgery. 
Citation: Pe´rez-Rivas LG, Jerez JM, Carmona R, de Luque V, Vicioso L, et al. (2014) A microRNA Signature Associated with Early Recurrence in Breast Cancer. PLoS 
ONE 9(3): e91884. doi:10.1371/journal.pone.0091884 
Editor: Sonia Rocha, University of Dundee, United Kingdom 
Received November 11, 2013; Accepted February 14, 2014; Published March 14, 2014 
Copyright: ! 2014 Pe´rez-Rivas et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits 
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 
Funding: This work was supported by a grant from the Spanish Society of Medical Oncology (SEOM, to NR) and by grants from the Spanish Ministerio de 
Economı´a, (SAF2010-20203 to J.L and TIN2010-16556 to J.J) and from the Junta de Andalucı´a (TIN-4026, to JJ). The funders had no role in study design, data 
collection and analysis, decision to publish, or preparation of the manuscript. 
Competing Interests: The authors have declared that no competing interests exist. 
* E-mail: jlozano@uma.es 
. These authors contributed equally to this work. 
patients with diferent RFS. A) Kaplan-Meier graph for the whole patient cohort included in 
overall down-regulation of the 5-miRNA signature (i.e. those from cluster 2b in Fig. 2) were 
RFS was calculated (red line). RFS was also calculated for the remaining patients in the cohort 
that the 5-miRNA signature specifically discriminates tumors with an overall higher risk of early 
post-recurrence survival [100], likely because it targets AKT1 
mRNA [101]. 
In sum, the available bibliographic data suggests that down-regulation 
of miR-149, miR-30a-3p, miR-20b, miR-10a and 
miR342-5p in primary breast tumors could confer them enhanced 
proliferative, angiogenic and invasive potentials. 
Prognostic value of the 5-miRNA signature. The relation-ship 
between expression of the 5-miRNA signature and RFS was 
examined by a survival analysis. Figure 3A shows a Kaplan-Meier 
graph for the whole series of patients included in the study. Due to 
the intrinsic characteristics of the cohort, decreases in the RFS are 
only observed in the intervals 0–24 and 50–60 months 
(corresponding to groups B and C, respectively). We next grouped 
the tumors according to their 5-miRNA signature status in two 
different groups. One group included those tumors with all five 
miRNAs simultaneously downregulated, (FC.2 and p,0.05) and 
a second group included those tumors not having all five miRNAs 
downregulated. A survival analysis was performed using clinical 
data from the corresponding patients. As shown in Figure 3B, the 
Kaplan-Meier graphs for the two groups demonstrate that the 5- 
miRNA signature defines a ‘‘high risk’’ group of patients with a 
shorter RFS (Peto-Peto test with p-value = 0.02, when comparing 
Figure 4. Receiver operating characteristic curve (ROC) for 
early breast cancer recurrence by the 5-miRNA signature 
status. ROC curves generated using the prognosis information and 
expression levels of the 5-miRNA signature can discriminate between 
A miRNA Signature Predictive of Early Recurrence 
A miRNA Signature Predictive of Early Recurrence 
If I know the which genes ARE expressed 
THEN 
I can know which output WILL be obtained
Characterization of complex variations in cancer 
http://www.scbi.uma.es 
30 
© 2014 Nature America, Inc. All rights reserved. 
ANALYSIS 
for structural variants of different sizes (Supplementary Table 2). For 
the present comparison, we ran them as described in their companies’ 
corresponding publication or website. 
We first observed that the calling of somatic SNVs was nearly opti-mal 
and within the same range in Mutect and SMUFIN, with sensitivi-ties 
of 97% and 92%, and specificities of 93% and 99%, respectively 
(Table 1 and Supplementary Table 3). On the other hand, the calling 
NATURE BIOTECHNOLOGY ADVANCE ONLINE PUBLICATION efficiency of somatic structural variants varied greatly between differ-ent 
methods, revealing clear differences when compared to SMUFIN. 
Some methods reached reasonable levels of sensitivity when the eval-uation 
was restricted to the range of structural variants they were 
designed to detect (Pindel and Delly), but these dropped drastically 
when compared against the complete catalog of structural variations 
in the tumor (Supplementary Table 4). By contrast, SMUFIN was 
Tumor and normal genome sequencing 
Construction of 
breakpoint blocks 
Definition and classification 
of variants 
Assigning reference 
coordinates 
Quaternary sequence tree 
1 3 6 
1 2 3 4 5 6 
7 8 9 
10 11 12 
Read 
nt 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
n 
Single orientation 
breakpoint 
Double orientation 
breakpoint 
Quaternary 
sequence tree 
Overlapping 
and complementary 
reads from normal 
genome 
1 
3 
6 
Construction of breakpoint blocks 
Undefined breakpoint blocks 
Reads in tumor-specific 
branches 
Comparison of normal and tumor reads 
and identification of potential breakpoints 
FASTQ file 
Reads 
Quality 
filters 
Tumor 
Normal 
Read 
1 
2 
3 
4 
5 
6 
7 
8 
9 
SNV 
nt 1 2 3 4 5 6 7 8 9 1011................................. n = Read 
length 
Short 
insertion 
Large SV 
10 
11 
12 
Tumor and 
normal reads 
Unambiguous extension of 
normal and mutated 
tumor allele 
Normal 
alleles 
Nonmutated 
tumor allele 
Mutated 
tumor allele 
Definition of small variants (n  read size) 
Definition of breakpoint and variant sequence for large SVs (  read size) 
Breakpoint 
100 nt Extension of the variant and normal sequences around the breakpoint 100 nt 
SNVs 
Tumor 
Normal 
Inversions 
Deletions 
Insertions 
Small SVs Breakpoint of large SV 
Reference genome 
Mapping of normal sequences (BWA) 
Independent mapping of normal sequences 
flanking the breakpoint (BWA) 
a 
b 
c 
d 
Tumor-specific reads with potential breakpoints
Characterization of complex variations in cancer 
If I know the polymorphisms of a person 
http://www.scbi.uma.es 
30 
© 2014 Nature America, Inc. All rights reserved. 
ANALYSIS 
for structural variants of different sizes (Supplementary Table 2). For 
the present comparison, we ran them as described in their companies’ 
corresponding publication or website. 
We first observed that the calling of somatic SNVs was nearly opti-mal 
and within the same range in Mutect and SMUFIN, with sensitivi-ties 
of 97% and 92%, and specificities of 93% and 99%, respectively 
(Table 1 and Supplementary Table 3). On the other hand, the calling 
NATURE BIOTECHNOLOGY ADVANCE ONLINE PUBLICATION efficiency of somatic structural variants varied greatly between differ-ent 
methods, revealing clear differences when compared to SMUFIN. 
Some methods reached reasonable levels of sensitivity when the eval-uation 
was restricted to the range of structural variants they were 
designed to detect (Pindel and Delly), but these dropped drastically 
when compared against the complete catalog of structural variations 
in the tumor (Supplementary Table 4). By contrast, SMUFIN was 
Tumor and normal genome sequencing 
Construction of 
breakpoint blocks 
Definition and classification 
of variants 
Assigning reference 
coordinates 
Quaternary sequence tree 
1 3 6 
1 2 3 4 5 6 
7 8 9 
10 11 12 
Read 
nt 
1 
2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
n 
Single orientation 
breakpoint 
Double orientation 
breakpoint 
Quaternary 
sequence tree 
Overlapping 
and complementary 
reads from normal 
genome 
1 
3 
6 
Construction of breakpoint blocks 
Undefined breakpoint blocks 
Reads in tumor-specific 
branches 
Comparison of normal and tumor reads 
and identification of potential breakpoints 
FASTQ file 
Reads 
Quality 
filters 
Tumor 
Normal 
Read 
1 
2 
3 
4 
5 
6 
7 
8 
9 
SNV 
nt 1 2 3 4 5 6 7 8 9 1011................................. n = Read 
length 
Short 
insertion 
Large SV 
10 
11 
12 
Tumor and 
normal reads 
Unambiguous extension of 
normal and mutated 
tumor allele 
Normal 
alleles 
Nonmutated 
tumor allele 
Mutated 
tumor allele 
Definition of small variants (n  read size) 
Definition of breakpoint and variant sequence for large SVs (  read size) 
Breakpoint 
100 nt Extension of the variant and normal sequences around the breakpoint 100 nt 
SNVs 
Tumor 
Normal 
Inversions 
Deletions 
Insertions 
Small SVs Breakpoint of large SV 
Reference genome 
Mapping of normal sequences (BWA) 
Independent mapping of normal sequences 
flanking the breakpoint (BWA) 
a 
b 
c 
d 
Tumor-specific reads with potential breakpoints 
THEN 
I can predict which disease WILL he suffer
http://www.scbi.uma.es 
Personalised medicine 
31 
A needle in a haystack WAS FOUND
Linking unrelated diseases 
http://www.scbi.uma.es 
32 
Alzheimer patients use to be free of cancer, and cancer 
patients use to be free of mental diseases
Linking unrelated diseases 
http://www.scbi.uma.es 
32 
Alzheimer patients use to be free of cancer, and cancer 
patients use to be free of mental diseases 
Molecular Evidence for the Inverse Comorbidity between 
Central Nervous System Disorders and Cancers Detected 
by Transcriptomic Meta-analyses 
Kristina Iba´n˜ ez1., Ce´ sar Boullosa1., Rafael Tabare´ s-Seisdedos2, Anaı¨s Baudot3*, Alfonso Valencia1* 
1 Structural Biology and Biocomputing Programme, Spanish National Cancer, Research Centre (CNIO), Madrid, Spain, 2 Department of Medicine, University of Valencia, 
CIBERSAM, INCLIVA, Valencia, Spain, 3 Aix-Marseille Universite´ , CNRS, I2M, UMR 7373, Marseille, France 
Abstract 
There is epidemiological evidence that patients with certain Central Nervous System (CNS) disorders have a lower than 
expected probability of developing some types of Cancer. We tested here the hypothesis that this inverse comorbidity is 
driven by molecular processes common to CNS disorders and Cancers, and that are deregulated in opposite directions. We 
conducted transcriptomic meta-analyses of three CNS disorders (Alzheimer’s disease, Parkinson’s disease and Schizophrenia) 
and three Cancer types (Lung, Prostate, Colorectal) previously described with inverse comorbidities. A significant overlap was 
observed between the genes upregulated in CNS disorders and downregulated in Cancers, as well as between the genes 
downregulated in CNS disorders and upregulated in Cancers. We also observed expression deregulations in opposite 
directions at the level of pathways. Our analysis points to specific genes and pathways, the upregulation of which could 
increase the incidence of CNS disorders and simultaneously lower the risk of developing Cancer, while the downregulation 
of another set of genes and pathways could contribute to a decrease in the incidence of CNS disorders while increasing the 
Cancer risk. These results reinforce the previously proposed involvement of the PIN1 gene, Wnt and P53 pathways, and 
reveal potential new candidates, in particular related with protein degradation processes. 
SCZ: schizophrenia 
AD: Alzheimer disease 
PD: Parkinson disease 
CRC: colorectal cancer 
PC: prostate cancer 
LC: lung cancer 
Citation: Iba´n˜ ez K, Boullosa C, Tabare´s-Seisdedos R, Baudot A, Valencia A (2014) Molecular Evidence for the Inverse Comorbidity between Central Nervous 
System Disorders and Cancers Detected by Transcriptomic Meta-analyses. PLoS Genet 10(2): e1004173. doi:10.1371/journal.pgen.1004173 
Editor: Marshall S. Horwitz, University of Washington, United States of America 
Received September 16, 2013; Accepted December 30, 2013; Published February 20, 2014 
Copyright: ! 2014 Iba´n˜ ez et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits 
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 
Funding: This work was supported by a Fellowship from Obra Social la Caixa grant to KI (http://obrasocial.lacaixa.es/laCaixaFoundation/home_en.html), FPI grant 
BES-2008-006332 to CB and grant BIO2012 to AV Group. The funders had no role in study design, data collection and analysis, decision to publish, or preparation 
of the manuscript. 
Competing Interests: The authors have declared that no competing interests exist. 
* E-mail: anais.baudot@univ-amu.fr (AB); avalencia@cnio.es (AV) 
. These authors contributed equally to this work. 
Introduction 
Epidemiological evidences point to a lower-than-expected 
probability of developing some types of Cancer in certain CNS 
Results and Discussion 
For each CNS disorder and Cancer type independently, we 
undertook meta-analyses from a large collection of microarray 
together with these external factors (for review, see [3–7]). In 
particular, we propose the deregulation in opposite directions of a 
common set of genes and pathways as an underlying cause of 
inverse comorbidities. 
To investigate the biological plausibility of this hypothesis, a 
basic initial step is to establish the existence of inverse gene 
expression deregulations (i.e., down- versus up-regulations) in CNS 
disorders and Cancers. Towards this objective, we have performed 
integrative meta-analyses of collections of gene expression data, 
publically available for AD, PD and SCZ, and Lung (LC), 
Colorectal (CRC) and Prostate (PC) Cancers. Clinical and 
epidemiological data previously reported inverse comorbidities for 
these complex disorders, according to population studies assessing 
the Cancer risks among patients with CNS disorders [8–17]. 
significant overlaps (Fisher’s exact test, corrected p-value (q-value), 
0.05, see Methods) between the DEGs upregulated in 
CNS disorders and those downregulated in Cancers. Similarly, 
DEGs downregulated in CNS disorders overlapped significantly 
with DEGs upregulated in Cancers (Figure 1A). Significant 
overlaps between DEGs deregulated in opposite directions in CNS 
disorders and Cancers are still observed while setting more 
stringent cutoffs for the detection of DEGs (qvalues lower than 
0.005, 0.0005, 0.00005 and 0.000005, Figure S1). A significant 
overlap between DEGs deregulated in the same direction was only 
identified in the case of CRC and PD upregulated genes 
(Figure 1A). 
A molecular interpretation of the inverse comorbidity between CNS 
disorders and Cancers could be that the downregulation of certain 
PLOS Genetics | www.plosgenetics.org 1 February 2014 | Volume 10 | Issue 2 | e1004173 
Inverse Comorbidity among Cancer and CNS Disorders 
Comparing 
differentially 
expressed genes
Mental and cancer diseases are really connected 
http://www.scbi.uma.es 
33 
(Figure 2, Figure S2, Table S3). The inverse relationship 
between the levels of expression deregulations of these pathways 
possibly suggests opposite roles in CNS disorders and Cancers. 
Figure 3). Hence, global regulations of cellular activity may 
account for a protective effect between inversely comorbid 
diseases. 
Figure 2. KEGG pathways significantly deregulated in Central Nervous System (CNS) disorders and Cancer types. KEGG pathways [24] 
significantly up- and downregulated in each disease were identified using the GSEA method [34] (q-value,0.05). The significant pathways were 
compared between the 6 diseases and combined in a network representation. Node pie charts are coloured according to the pathway status as 
Cancer upregulated (yellow), Cancer downregulated (blue), CNS disorder upregulated (green) and CNS disorder downregulated (red). The green/blue 
and yellow/red associations thus correspond to pathways deregulated in opposite directions in CNS disorders and Cancers. Pathway labels are 
coloured according to their classifications provided by KEGG [24], as: Metabolism (green), Genetic Information Processing (yellow), Cellular Process 
(pink), Environmental Information Processing (red) and Organismal Systems (dark red). All networks are available at bioinfo.cnio.es/people/cboullosa/ 
validation/cytoscape/Ibanezetal.zip, in cytoscape format (http://www.cytoscape.org/). 
doi:10.1371/journal.pgen.1004173.g002 
PLOS Genetics | www.plosgenetics.org 4 February 2014 | Volume 10 | Issue 2 | e1004173 
Typical cancer 
functions 
Typical mental 
disease functions
Mental and cancer diseases are really connected 
19 genes 74 genes 
cancer ↓↓ 
http://www.scbi.uma.es 
33 
(Figure 2, Figure S2, Table S3). The inverse relationship 
between the levels of expression deregulations of these pathways 
possibly suggests opposite roles in CNS disorders and Cancers. 
Figure 3). Hence, global regulations of cellular activity may 
account for a protective effect between inversely comorbid 
diseases. 
Figure 2. KEGG pathways significantly deregulated in Central Nervous System (CNS) disorders and Cancer types. KEGG pathways [24] 
significantly up- and downregulated in each disease were identified using the GSEA method [34] (q-value,0.05). The significant pathways were 
compared between the 6 diseases and combined in a network representation. Node pie charts are coloured according to the pathway status as 
Cancer upregulated (yellow), Cancer downregulated (blue), CNS disorder upregulated (green) and CNS disorder downregulated (red). The green/blue 
and yellow/red associations thus correspond to pathways deregulated in opposite directions in CNS disorders and Cancers. Pathway labels are 
coloured according to their classifications provided by KEGG [24], as: Metabolism (green), Genetic Information Processing (yellow), Cellular Process 
(pink), Environmental Information Processing (red) and Organismal Systems (dark red). All networks are available at bioinfo.cnio.es/people/cboullosa/ 
validation/cytoscape/Ibanezetal.zip, in cytoscape format (http://www.cytoscape.org/). 
doi:10.1371/journal.pgen.1004173.g002 
PLOS Genetics | www.plosgenetics.org 4 February 2014 | Volume 10 | Issue 2 | e1004173 
Typical cancer 
functions 
Typical mental 
disease functions 
↑↑ cancer 
↓↓ mental disease 
mental disease↑↑ 
Since 93 genes are inversely expressed in 
cancer and CNS disorders 
THEN 
I can explain the inverse correlation between both diseases
After basic research, translational research is easy 
http://www.scbi.uma.es 
34
Higher vertebrates have conserved genomes 
Chimpanzee 
http://www.scbi.uma.es 
35 
The bonobo genome compared with the chimpanzee and human 
genomes 
Kay Prüfer et al. 
Nature 486, 527–531 (28 June 2012) 
The zebrafish reference genome sequence and its relationship to the 
human genome 
Kerstin Howe et al. 
Nature 496, 498–503 (25 April 2013) 
70% of protein-coding human genes are related to genes found in the zebrafish 
84% of genes known to be associated with human disease have a zebrafish 
counterpart
Genome plasticity in bacteria 
http://www.scbi.uma.es 
36 
Estimating the size of the bacterial pan-genome 
Lapierre  Gogarten 
Trends in Genetics 23(3), 2009, Pages 107–110 
Pangenomics – an avenue to improved industrial starter cultures and probiotics 
Garrigues et al. 
Current Opinion in Biotechnology 2013, 24:187–191
Minimum number of genes for a living organism 
http://www.scbi.uma.es 
37 
1354 genes 
Giovannoni et al., (2005) 
Science 309: 1242-1245 
500 genes
Minimum number of genes for a living organism 
500 genes If I know the minimal gene number of an organism 
http://www.scbi.uma.es 
37 
1354 genes 
Giovannoni et al., (2005) 
Science 309: 1242-1245 
THEN 
I can design artificial organisms for biotechnological purposes
There aren’t new genes but duplicated genes 
http://www.scbi.uma.es 
38 
The number of gene families plateaus with genome size 
Figure 3.15 Because many genes are 
duplicated, the number of different gene 
families is much less than the total 
number of genes. The histogram compares 
the total number of genes with the number 
of distinct gene families. 
GENOMESIZEHASNOTHINGTODO 
WITHGENENUMBER 
VARIABILITYAMONGGENOMESARISES 
FROMANUMBEROFDIFFERENTSOURCES 
HIGHTHROUGHPUTTECHNOLOGIESOVERVIEW
We are not able to predict which kind of organism is 
produced when having the genome sequence 
http://www.scbi.uma.es 
39 
?
We are not able to predict which kind of organism is 
produced when having the genome sequence 
A living being si more than the sum of its components 
http://www.scbi.uma.es 
39 
?
We can now relate facial shapes with genes 
* E-mail: mds17@psu.edu 
Introduction 
The craniofacial complex is initially modulated by precisely-timed 
embryonic gene expression and molecular interactions 
mediated through complex pathways [1]. As humans grow, 
hormones and biomechanical factors also affect many parts of 
the face [2,3]. The inability to systematically summarize facial 
variation has impeded the discovery of the determinants and 
correlates of face shape. In contrast to genomic technologies, 
systematic and comprehensive phenotyping has lagged. This is 
especially so in the context of multipartite traits such as the human 
face. In typical genome-wide association studies (GWAS) today 
phenotypes are summarized as univariate variables, which is 
inherently limiting for multivariate traits, which, by definition 
cannot be expressed with single variables. Current state-of-the-art 
PLOS Genetics | www.plosgenetics.org 1 March 2014 | Volume 10 | Issue 3 | e1004224 
http://www.scbi.uma.es 
40 
genetic association studies for facial traits are limited in their 
description of facial morphology [4–7]. These analyses start from a 
sparse set of anatomical landmarks (these being defined as ‘‘a point 
of correspondence on an object that matches between and within 
populations’’), which overlooks salient features of facial shape. 
Subsequently, either a set of conventional morphometric mea-surements 
such as distances and angles are extracted, which 
Modeling 3D Facial Shape from DNA 
drastically oversimplify facial shape, or a set of principal 
components (PCs) are extracted using principal components 
analysis (PCA) on the shape-space obtained with superimposition 
techniques, where each PC is assumed to represent a distinct 
morphological trait. Here we describe a novel method that 
facilitates the compounding of all PCs into a single scalar variable 
customized to relevant independent variables including, sex, 
genomic ancestry, and genes. Our approach combines placing 
Peter Claes1, Denise K. Liberton2, Katleen Daniels1, Kerri Matthes Rosana2, Ellen E. Quillen2, 
Laurel N. Pearson2, Brian McEvoy3, Marc Bauchet2, Arslan A. Zaidi2, Wei Yao2, Hua Tang4, 
Gregory S. Barsh4,5, Devin M. Absher5, David A. Puts2, Jorge Rocha6,7, Sandra Beleza4,8, 
Rinaldo W. Pereira9, Gareth Baynam10,11,12, Paul Suetens1, Dirk Vandermeulen1, Jennifer K. Wagner13, 
James S. Boster14, Mark D. Shriver2* 
1 Medical Image Computing, ESAT/PSI, Department of Electrical Engineering, KU Leuven, Medical Imaging Research Center, KU Leuven  UZ Leuven, iMinds-KU Leuven 
Future Health Department, Leuven, Belgium, 2 Department of Anthropology, Penn State University, University Park, Pennsylvania, United States of America, 3 Smurfit 
Institute of Genetics, Dublin, Ireland, 4 Department of Genetics, Stanford University, Palo Alto, California, United States of America, 5 HudsonAlpha Institute for 
Biotechnology, Huntsville, Alabama, United States of America, 6 CIBIO: Centro de Investigac¸a˜o em Biodiversidade e Recursos Gene´ticos, Universidade do Porto, Porto, 
Portugal, 7 Departamento de Biologia, Faculdade de Cieˆncias, Universidade do Porto, Porto, Portugal, 8 IPATIMUP: Instituto de Patologia e Imunologia Molecular da 
Universidade do Porto, Porto, Portugal, 9 Programa de Po´ s-Graduac¸a˜o em Cieˆncias Genoˆ micas e Biotecnologia, Universidade Cato´ lica de Brası´lia, Brasilia, Brasil, 10 School 
of Paediatrics and Child Health, University of Western Australia, Perth, Australia, 11 Institute for Immunology and Infectious Diseases, Murdoch University, Perth, Australia, 
12 Genetic Services of Western Australia, King Edward Memorial Hospital, Perth, Australia, 13 Center for the Integration of Genetic Healthcare Technologies, University of 
Pennsylvania, Philadelphia, Pennsylvania, United States of America, 14 Department of Anthropology, University of Connecticut, Storrs, Connecticut, United States of 
America 
Abstract 
Human facial diversity is substantial, complex, and largely scientifically unexplained. We used spatially dense quasi-landmarks 
to measure face shape in population samples with mixed West African and European ancestry from three 
locations (United States, Brazil, and Cape Verde). Using bootstrapped response-based imputation modeling (BRIM), we 
uncover the relationships between facial variation and the effects of sex, genomic ancestry, and a subset of craniofacial 
candidate genes. The facial effects of these variables are summarized as response-based imputed predictor (RIP) variables, 
which are validated using self-reported sex, genomic ancestry, and observer-based facial ratings (femininity and 
proportional ancestry) and judgments (sex and population group). By jointly modeling sex, genomic ancestry, and 
genotype, the independent effects of particular alleles on facial features can be uncovered. Results on a set of 20 genes 
showing significant effects on facial features provide support for this approach as a novel means to identify genes affecting 
normal-range facial features and for approximating the appearance of a face from genetic markers. 
Citation: Claes P, Liberton DK, Daniels K, Rosana KM, Quillen EE, et al. (2014) Modeling 3D Facial Shape from DNA. PLoS Genet 10(3): e1004224. doi:10.1371/ 
journal.pgen.1004224 
Editor: Daniela Luquetti, Seattle Children’s Research Institute, United States of America 
Received September 12, 2013; Accepted January 22, 2014; Published March 20, 2014 
Copyright: ! 2014 Claes et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits 
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. 
Funding: This investigation was supported by grants to MDS from Science Foundation of Ireland Walton Fellowship (04.W4/B643); to MDS and DAP from the 
National Institute Justice (2008-DN-BX-K125); to JKW from the NIH/National Human Genome Research Institute (K99HG006446); to DKL from the National Science 
Foundation (BCS-0851815) and from the Wenner Gren Foundation (Fieldwork Grant 7967). PC is partly supported by the Flemish Institute for the Promotion of 
Innovation by Science and Technology in Flanders (IWT Vlaanderen), the Research Program of the Fund for Scientific Research - Flanders (Belgium) (FWO), the 
Research Fund KU Leuven and SB was supported by the Portuguese Institution ‘‘Fundac¸a˜o para a Cieˆncia e a Tecnologia’’ [FCT; PTDC/BIABDE/64044/2006 
(project) and SFRH/BPD/21887/2005 (post-doc grant)] and by a Dean’s Postdoctoral Fellowship at Stanford University. The funders had no role in study design, 
data collection and analysis, decision to publish, or preparation of the manuscript. 
Competing Interests: The authors have declared that no competing interests exist. 
* E-mail: mds17@psu.edu 
Figure 4. Relationships between the ancestry and sex RIP variables and their initial predictor variables. (A) RIP-A with genomic 
ancestry; genomic ancestry is calculated using the core panel of 68 AIMs and RIP-A is calculated using this ancestry estimate on the set of three 
populations combined (N = 592). Populations are indicated as shown in the legend with United States participants shown with black circles, Brazilians 
with red circles, and Cape Verdeans with blue circles. (B) Histograms of RIP-S by self-reported sex. 
doi:10.1371/journal.pgen.1004224.g004
We have found the treasure coffer, but… 
http://www.scbi.uma.es 
http://www.slideshare.net/MGonzaloClaros 41
We have found the treasure coffer, but… 
http://www.scbi.uma.es 
http://www.slideshare.net/MGonzaloClaros 41

Mais conteúdo relacionado

Mais procurados

Cimetta et al., 2013
Cimetta et al., 2013Cimetta et al., 2013
Cimetta et al., 2013Fran Flores
 
exRNA Atlas and deconvolution tools at the transition from ERCC1 to ERCC2
exRNA Atlas and deconvolution tools at the transition from ERCC1 to ERCC2exRNA Atlas and deconvolution tools at the transition from ERCC1 to ERCC2
exRNA Atlas and deconvolution tools at the transition from ERCC1 to ERCC2Roger Alexander
 
Ecocyc database
Ecocyc databaseEcocyc database
Ecocyc databaseShiv Kumar
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interactionAashish Patel
 
A Systems Biology Approach to Natural Products Research
A Systems Biology Approach to Natural Products ResearchA Systems Biology Approach to Natural Products Research
A Systems Biology Approach to Natural Products ResearchHuda Nazeer
 
Systems biology & Approaches of genomics and proteomics
 Systems biology & Approaches of genomics and proteomics Systems biology & Approaches of genomics and proteomics
Systems biology & Approaches of genomics and proteomicssonam786
 
Pcmd bioinformatics-lecture i
Pcmd bioinformatics-lecture iPcmd bioinformatics-lecture i
Pcmd bioinformatics-lecture iMuhammad Younis
 
Cytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksCytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksBITS
 
Metagenomics as a tool for biodiversity and health
Metagenomics as a tool for biodiversity and healthMetagenomics as a tool for biodiversity and health
Metagenomics as a tool for biodiversity and healthAlberto Dávila
 
Plant system biology
Plant system biologyPlant system biology
Plant system biologySubaParanie
 
Metabolomics
MetabolomicsMetabolomics
Metabolomicspriya1111
 
Protein-protein interaction (PPI)
Protein-protein interaction (PPI)Protein-protein interaction (PPI)
Protein-protein interaction (PPI)N Poorin
 
Gene regulatory networks
Gene regulatory networksGene regulatory networks
Gene regulatory networksMadiheh
 
Systems Biology Approaches to Cancer
Systems Biology Approaches to CancerSystems Biology Approaches to Cancer
Systems Biology Approaches to CancerRaunak Shrestha
 
Comparative genomics @ sid 2003 format
Comparative genomics @ sid 2003 formatComparative genomics @ sid 2003 format
Comparative genomics @ sid 2003 formatsidjena70
 
Epigeneticsand methylation
Epigeneticsand methylationEpigeneticsand methylation
Epigeneticsand methylationShubhda Roy
 
Report on System Biology Funding from BMBF
Report on System Biology Funding from BMBFReport on System Biology Funding from BMBF
Report on System Biology Funding from BMBFEuroBioForum
 
An Evolutionary and Structural Analysis of the Connective Tissue Growth Facto...
An Evolutionary and Structural Analysis of the Connective Tissue Growth Facto...An Evolutionary and Structural Analysis of the Connective Tissue Growth Facto...
An Evolutionary and Structural Analysis of the Connective Tissue Growth Facto...Ashley Kennedy
 

Mais procurados (20)

Metabolomics
MetabolomicsMetabolomics
Metabolomics
 
Cimetta et al., 2013
Cimetta et al., 2013Cimetta et al., 2013
Cimetta et al., 2013
 
exRNA Atlas and deconvolution tools at the transition from ERCC1 to ERCC2
exRNA Atlas and deconvolution tools at the transition from ERCC1 to ERCC2exRNA Atlas and deconvolution tools at the transition from ERCC1 to ERCC2
exRNA Atlas and deconvolution tools at the transition from ERCC1 to ERCC2
 
Ecocyc database
Ecocyc databaseEcocyc database
Ecocyc database
 
Protein protein interaction
Protein protein interactionProtein protein interaction
Protein protein interaction
 
A Systems Biology Approach to Natural Products Research
A Systems Biology Approach to Natural Products ResearchA Systems Biology Approach to Natural Products Research
A Systems Biology Approach to Natural Products Research
 
Systems biology & Approaches of genomics and proteomics
 Systems biology & Approaches of genomics and proteomics Systems biology & Approaches of genomics and proteomics
Systems biology & Approaches of genomics and proteomics
 
Pcmd bioinformatics-lecture i
Pcmd bioinformatics-lecture iPcmd bioinformatics-lecture i
Pcmd bioinformatics-lecture i
 
Cytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networksCytoscape: Gene coexppression and PPI networks
Cytoscape: Gene coexppression and PPI networks
 
Metagenomics as a tool for biodiversity and health
Metagenomics as a tool for biodiversity and healthMetagenomics as a tool for biodiversity and health
Metagenomics as a tool for biodiversity and health
 
Plant system biology
Plant system biologyPlant system biology
Plant system biology
 
Metabolomics
MetabolomicsMetabolomics
Metabolomics
 
Topology
TopologyTopology
Topology
 
Protein-protein interaction (PPI)
Protein-protein interaction (PPI)Protein-protein interaction (PPI)
Protein-protein interaction (PPI)
 
Gene regulatory networks
Gene regulatory networksGene regulatory networks
Gene regulatory networks
 
Systems Biology Approaches to Cancer
Systems Biology Approaches to CancerSystems Biology Approaches to Cancer
Systems Biology Approaches to Cancer
 
Comparative genomics @ sid 2003 format
Comparative genomics @ sid 2003 formatComparative genomics @ sid 2003 format
Comparative genomics @ sid 2003 format
 
Epigeneticsand methylation
Epigeneticsand methylationEpigeneticsand methylation
Epigeneticsand methylation
 
Report on System Biology Funding from BMBF
Report on System Biology Funding from BMBFReport on System Biology Funding from BMBF
Report on System Biology Funding from BMBF
 
An Evolutionary and Structural Analysis of the Connective Tissue Growth Facto...
An Evolutionary and Structural Analysis of the Connective Tissue Growth Facto...An Evolutionary and Structural Analysis of the Connective Tissue Growth Facto...
An Evolutionary and Structural Analysis of the Connective Tissue Growth Facto...
 

Semelhante a Bioinformatics and the logic of life

Undergraduate Research Grant
Undergraduate Research GrantUndergraduate Research Grant
Undergraduate Research GrantKaitlin Zoccola
 
Ellison MolBioSys b905602e published (2)
Ellison MolBioSys b905602e published (2)Ellison MolBioSys b905602e published (2)
Ellison MolBioSys b905602e published (2)Dr David Ellison
 
Applications of bioinformatics, main by kk sahu
Applications of bioinformatics, main by kk sahuApplications of bioinformatics, main by kk sahu
Applications of bioinformatics, main by kk sahuKAUSHAL SAHU
 
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGijbbjournal
 
Deep learning methods in metagenomics: a review
Deep learning methods in metagenomics: a reviewDeep learning methods in metagenomics: a review
Deep learning methods in metagenomics: a reviewssuser6fc73c
 
Unveiling the role of network and systems biology in drug discovery
Unveiling the role of network and systems biology in drug discoveryUnveiling the role of network and systems biology in drug discovery
Unveiling the role of network and systems biology in drug discoverychengcheng zhou
 
PNAS-2013-Barr-10771-6
PNAS-2013-Barr-10771-6PNAS-2013-Barr-10771-6
PNAS-2013-Barr-10771-6Rita Auro
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsAmna Jalil
 
Yeast two hybrid
Yeast two hybridYeast two hybrid
Yeast two hybridhina ojha
 
A Cell-Cycle Knowledge Integration Framework
A Cell-Cycle Knowledge Integration FrameworkA Cell-Cycle Knowledge Integration Framework
A Cell-Cycle Knowledge Integration FrameworkLisa Muthukumar
 
Technology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network RepresentationsTechnology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network RepresentationsAlexander Pico
 
A statistical framework for multiparameter analysis at the single cell level
A statistical framework for multiparameter analysis at the single cell levelA statistical framework for multiparameter analysis at the single cell level
A statistical framework for multiparameter analysis at the single cell levelShashaanka Ashili
 

Semelhante a Bioinformatics and the logic of life (20)

Undergraduate Research Grant
Undergraduate Research GrantUndergraduate Research Grant
Undergraduate Research Grant
 
Ellison MolBioSys b905602e published (2)
Ellison MolBioSys b905602e published (2)Ellison MolBioSys b905602e published (2)
Ellison MolBioSys b905602e published (2)
 
Applications of bioinformatics, main by kk sahu
Applications of bioinformatics, main by kk sahuApplications of bioinformatics, main by kk sahu
Applications of bioinformatics, main by kk sahu
 
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
 
Deep learning methods in metagenomics: a review
Deep learning methods in metagenomics: a reviewDeep learning methods in metagenomics: a review
Deep learning methods in metagenomics: a review
 
Unveiling the role of network and systems biology in drug discovery
Unveiling the role of network and systems biology in drug discoveryUnveiling the role of network and systems biology in drug discovery
Unveiling the role of network and systems biology in drug discovery
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
PNAS-2013-Barr-10771-6
PNAS-2013-Barr-10771-6PNAS-2013-Barr-10771-6
PNAS-2013-Barr-10771-6
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Yeast two hybrid
Yeast two hybridYeast two hybrid
Yeast two hybrid
 
A Cell-Cycle Knowledge Integration Framework
A Cell-Cycle Knowledge Integration FrameworkA Cell-Cycle Knowledge Integration Framework
A Cell-Cycle Knowledge Integration Framework
 
Yeast two hybrid
Yeast two hybrid Yeast two hybrid
Yeast two hybrid
 
Technology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network RepresentationsTechnology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network Representations
 
Gdt 2-126
Gdt 2-126Gdt 2-126
Gdt 2-126
 
Gdt 2-126 (1)
Gdt 2-126 (1)Gdt 2-126 (1)
Gdt 2-126 (1)
 
Bio informatics
Bio informaticsBio informatics
Bio informatics
 
Bio informatics
Bio informaticsBio informatics
Bio informatics
 
Untitled document (2).pdf
Untitled document (2).pdfUntitled document (2).pdf
Untitled document (2).pdf
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
A statistical framework for multiparameter analysis at the single cell level
A statistical framework for multiparameter analysis at the single cell levelA statistical framework for multiparameter analysis at the single cell level
A statistical framework for multiparameter analysis at the single cell level
 

Mais de M. Gonzalo Claros

Manuscritos-a-bioinfo Olimipadas.pdf
Manuscritos-a-bioinfo Olimipadas.pdfManuscritos-a-bioinfo Olimipadas.pdf
Manuscritos-a-bioinfo Olimipadas.pdfM. Gonzalo Claros
 
Genoma humano con fósiles.pdf
Genoma humano con fósiles.pdfGenoma humano con fósiles.pdf
Genoma humano con fósiles.pdfM. Gonzalo Claros
 
Genes, genomas y ordenadores.pdf
Genes, genomas y ordenadores.pdfGenes, genomas y ordenadores.pdf
Genes, genomas y ordenadores.pdfM. Gonzalo Claros
 
210531 Covid-19 and bioinformatics
210531 Covid-19 and bioinformatics210531 Covid-19 and bioinformatics
210531 Covid-19 and bioinformaticsM. Gonzalo Claros
 
Redacta, corrige y traduce textos científicos sin morir en el intento
Redacta, corrige y traduce textos científicos sin morir en el intentoRedacta, corrige y traduce textos científicos sin morir en el intento
Redacta, corrige y traduce textos científicos sin morir en el intentoM. Gonzalo Claros
 
191129 aeter19 mgc slideshare
191129 aeter19 mgc slideshare191129 aeter19 mgc slideshare
191129 aeter19 mgc slideshareM. Gonzalo Claros
 
191128 corrigere2 slideshare
191128 corrigere2 slideshare191128 corrigere2 slideshare
191128 corrigere2 slideshareM. Gonzalo Claros
 
181214 Bioinformática vegetal
181214 Bioinformática vegetal181214 Bioinformática vegetal
181214 Bioinformática vegetalM. Gonzalo Claros
 
180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancer
180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancer180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancer
180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancerM. Gonzalo Claros
 
180427 Traducir, redactar y corregir: no solo de ciencia vive la ciencia
180427 Traducir, redactar y corregir: no solo de ciencia vive la ciencia180427 Traducir, redactar y corregir: no solo de ciencia vive la ciencia
180427 Traducir, redactar y corregir: no solo de ciencia vive la cienciaM. Gonzalo Claros
 
Cómo traducir y redactar textos científicos en español
Cómo traducir y redactar textos científicos en españolCómo traducir y redactar textos científicos en español
Cómo traducir y redactar textos científicos en españolM. Gonzalo Claros
 
170602 Traducir química sin saber química
170602 Traducir química sin saber química170602 Traducir química sin saber química
170602 Traducir química sin saber químicaM. Gonzalo Claros
 
¿Ciencia ficción o medicina personalizada? La tecnología al servicio de la sa...
¿Ciencia ficción o medicina personalizada? La tecnología al servicio de la sa...¿Ciencia ficción o medicina personalizada? La tecnología al servicio de la sa...
¿Ciencia ficción o medicina personalizada? La tecnología al servicio de la sa...M. Gonzalo Claros
 
De los rasgos poligénicos a los poligenómicos 250517
De los rasgos poligénicos a los poligenómicos 250517De los rasgos poligénicos a los poligenómicos 250517
De los rasgos poligénicos a los poligenómicos 250517M. Gonzalo Claros
 
Mi bioinformática para el IBIMA
Mi bioinformática para el IBIMAMi bioinformática para el IBIMA
Mi bioinformática para el IBIMAM. Gonzalo Claros
 
Calidad de las traducciones. Reunión Red Vértice en Málaga 140606
Calidad de las traducciones. Reunión Red Vértice en Málaga 140606Calidad de las traducciones. Reunión Red Vértice en Málaga 140606
Calidad de las traducciones. Reunión Red Vértice en Málaga 140606M. Gonzalo Claros
 
Bioinformática: desde las proteínas mitocondriales a la genómica
Bioinformática: desde las proteínas mitocondriales a la genómicaBioinformática: desde las proteínas mitocondriales a la genómica
Bioinformática: desde las proteínas mitocondriales a la genómicaM. Gonzalo Claros
 

Mais de M. Gonzalo Claros (20)

Manuscritos-a-bioinfo Olimipadas.pdf
Manuscritos-a-bioinfo Olimipadas.pdfManuscritos-a-bioinfo Olimipadas.pdf
Manuscritos-a-bioinfo Olimipadas.pdf
 
Genoma humano con fósiles.pdf
Genoma humano con fósiles.pdfGenoma humano con fósiles.pdf
Genoma humano con fósiles.pdf
 
Genes, genomas y ordenadores.pdf
Genes, genomas y ordenadores.pdfGenes, genomas y ordenadores.pdf
Genes, genomas y ordenadores.pdf
 
210531 Covid-19 and bioinformatics
210531 Covid-19 and bioinformatics210531 Covid-19 and bioinformatics
210531 Covid-19 and bioinformatics
 
Redacta, corrige y traduce textos científicos sin morir en el intento
Redacta, corrige y traduce textos científicos sin morir en el intentoRedacta, corrige y traduce textos científicos sin morir en el intento
Redacta, corrige y traduce textos científicos sin morir en el intento
 
191129 aeter19 mgc slideshare
191129 aeter19 mgc slideshare191129 aeter19 mgc slideshare
191129 aeter19 mgc slideshare
 
191128 corrigere2 slideshare
191128 corrigere2 slideshare191128 corrigere2 slideshare
191128 corrigere2 slideshare
 
181214 Bioinformática vegetal
181214 Bioinformática vegetal181214 Bioinformática vegetal
181214 Bioinformática vegetal
 
180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancer
180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancer180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancer
180425 Bioinformatic workflows to discover transposon/gene biomarkers in cancer
 
180427 Traducir, redactar y corregir: no solo de ciencia vive la ciencia
180427 Traducir, redactar y corregir: no solo de ciencia vive la ciencia180427 Traducir, redactar y corregir: no solo de ciencia vive la ciencia
180427 Traducir, redactar y corregir: no solo de ciencia vive la ciencia
 
Cómo traducir y redactar textos científicos en español
Cómo traducir y redactar textos científicos en españolCómo traducir y redactar textos científicos en español
Cómo traducir y redactar textos científicos en español
 
Vengo a hablar de mi libro
Vengo a hablar de mi libroVengo a hablar de mi libro
Vengo a hablar de mi libro
 
170602 Traducir química sin saber química
170602 Traducir química sin saber química170602 Traducir química sin saber química
170602 Traducir química sin saber química
 
¿Ciencia ficción o medicina personalizada? La tecnología al servicio de la sa...
¿Ciencia ficción o medicina personalizada? La tecnología al servicio de la sa...¿Ciencia ficción o medicina personalizada? La tecnología al servicio de la sa...
¿Ciencia ficción o medicina personalizada? La tecnología al servicio de la sa...
 
De los rasgos poligénicos a los poligenómicos 250517
De los rasgos poligénicos a los poligenómicos 250517De los rasgos poligénicos a los poligenómicos 250517
De los rasgos poligénicos a los poligenómicos 250517
 
160620 sole nomics v2
160620 sole nomics v2160620 sole nomics v2
160620 sole nomics v2
 
150522 bioinfo gis lr
150522 bioinfo gis lr150522 bioinfo gis lr
150522 bioinfo gis lr
 
Mi bioinformática para el IBIMA
Mi bioinformática para el IBIMAMi bioinformática para el IBIMA
Mi bioinformática para el IBIMA
 
Calidad de las traducciones. Reunión Red Vértice en Málaga 140606
Calidad de las traducciones. Reunión Red Vértice en Málaga 140606Calidad de las traducciones. Reunión Red Vértice en Málaga 140606
Calidad de las traducciones. Reunión Red Vértice en Málaga 140606
 
Bioinformática: desde las proteínas mitocondriales a la genómica
Bioinformática: desde las proteínas mitocondriales a la genómicaBioinformática: desde las proteínas mitocondriales a la genómica
Bioinformática: desde las proteínas mitocondriales a la genómica
 

Último

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Silpa
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxSilpa
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Silpa
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.Silpa
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Silpa
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLkantirani197
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 

Último (20)

Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.Atp synthase , Atp synthase complex 1 to 4.
Atp synthase , Atp synthase complex 1 to 4.
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Genome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptxGenome sequencing,shotgun sequencing.pptx
Genome sequencing,shotgun sequencing.pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.Cyathodium bryophyte: morphology, anatomy, reproduction etc.
Cyathodium bryophyte: morphology, anatomy, reproduction etc.
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 

Bioinformatics and the logic of life

  • 1. Bioinformatics to reveal the logic of life M. Gonzalo Claros Díaz Dpto Biología Molecular y Bioquímica Plataforma Andaluza de Bioinformática @MGClaros 1 Centro de Bioinnovación http://about.me/mgclaros/
  • 3. There are many reflections about life http://www.scbi.uma.es 3 Genetics Philosophy Religion Physics And many more
  • 4. A living being for some scientists http://www.scbi.uma.es 4 The cell is a kind of black box
  • 5. Molecular biology provides some logic… http://www.scbi.uma.es 5 How to select the few combinations having some sense?
  • 6. http://www.scbi.uma.es A hierarchical logic… 6 the way back cannot be predicted
  • 7. In fact, a complex logic plenty of interactions http://www.scbi.uma.es 7
  • 8. Metabolism offers another source of logic http://www.scbi.uma.es 8
  • 9. Other sciences were also interested in life logic http://www.scbi.uma.es 9
  • 10. Bioinformatics = integration http://www.scbi.uma.es 10 http://bioinformatics.biol.ntnu.edu.tw/sher/Teaching.html
  • 11. Bioinformatics receives and gives new data and insights http://www.scbi.uma.es 11 Biology The living being is the result of all observations and cannot be inferred Computer science Statistics from biassed observations
  • 12. A living being for some scientists http://www.scbi.uma.es 12 The cell is a kind of black box
  • 13. A living being for a bioinformatician http://www.scbi.uma.es 13 Life ontology
  • 14. So, we begin to understand http://www.scbi.uma.es 14 Other scientists Bioinformatician Biotechnologist
  • 15. Bioinformatics emerged with data accumulation http://www.scbi.uma.es 15
  • 16. Regarding data, informatics is in the rear of biology http://www.scbi.uma.es 16
  • 17. Therefore, biology and informatics are interdependent http://www.scbi.uma.es http://www.genomicglossaries.com/presentation/SLAgenomics.asp 17
  • 19. Some logic in living beings based on bioinformatics 19
  • 20. Bioinformatics integration in alcohol induced disorders Through integration and modeling, these studies would allow us to better exploit the complexity of genomic and functional genomic data and to extract their biological and clinical significance http://www.scbi.uma.es http://pubs.niaaa.nih.gov/publications/arh311/5-11.htm 20
  • 21. Drug discovery was expensive Classic approach http://www.scbi.uma.es 21 Experimental drugs were chemically synthesized and then tested in animals
  • 22. Drug discovery was expensive Classic approach Bioinformatics approach http://www.scbi.uma.es 21 Experimental drugs were chemically synthesized and then tested in animals Ligand database Only candidate drugs are synthesized. A cost-effective approach
  • 23. Nobel of chemistry in 2013 Bioquímico Químico teórico Biofísico Bioquímico http://www.scbi.uma.es 22 Por el desarrollo de modelos computacionales para conocer y predecir procesos químicos http://blogs.plos.org/biologue/2013/10/18/the-significance-of-the-2013-nobel-prize-in-chemistry-and-the-challenges-ahead/
  • 24. Nobel of chemistry in 2013 This Nobel Prize is the first given to work in computational biology, indicating that the field has matured and is on a par with experimental biology Bioquímico Químico teórico Biofísico Bioquímico http://www.scbi.uma.es 22 Por el desarrollo de modelos computacionales para conocer y predecir procesos químicos The blog of PLOS Computational Biology http://blogs.plos.org/biologue/2013/10/18/the-significance-of-the-2013-nobel-prize-in-chemistry-and-the-challenges-ahead/
  • 25. A cell was full of molecular cascades http://www.scbi.uma.es 23 Divergent cascades Convergent cascades
  • 26. Then, a cell was a subway map http://www.scbi.uma.es 24 Subway map designed by Claudia Bentley. Web design by Nick Allin. Edited by Cath Brooksbank and Sandra Clark. © 2002 Nature Publishing Group. http://www.nature.com/nrc/poster/subpathways/index.html
  • 27. Finally, a cell is a network http://www.scbi.uma.es 25 Cell network complexity increases with whole organism complexity. Key nodes revealed key functions
  • 28. Human transcription factor network topology C. Rodriguez-Caso et al. Transcription factor network explains some cancers Abbreviations ER, Erdo¨ s-Re´ nyi; HTFN, human transcription factor network; SF, scale free; SW, small world; TF, transcription factor. allow the formation of supramolecular activator or inhibitory complexes, depending on their components and possible combinations. Transcription factors (TFs) are an essential subset of interacting proteins responsible for the control of gene expression. They interact with DNA regions and tend to form transcriptional regulatory complexes. Thus, the final effect of one of these complexes is determined by its TF composition. The number of TFs varies among organisms, although it appears to be linked to the organism’s complexity. Around 200–300 TFs are predicted for Escherichia coli [18] and Saccharomyces [19,20]. By contrast, comparative analysis in multicellular organ-isms FEBS Journal 272 (2005) 6423–6434 ª 2005 The Authors Journal compilation ª 2005 FEBS 6423 Topology, tinkering and evolution of the human transcription factor network Rodriguez-Caso1,2, Miguel A. Medina2 and Ricard V. Sole´ 1,3 C. Rodriguez-Caso et al. Human transcription factor network topology Complex Systems Laboratory, Universitat Pompeu Fabra, Barcelona, Spain Department of Molecular Biology and Biochemistry, Faculty of Sciences, Universidad de Ma´laga, Spain Table 2. Description and functionality of transcriptions factor hubs. Transcription factor (TF), degree (k), betweenness centrality (b). TF Description Associate disease k b· 103 TBP Basal transcription machinery initiator Spinocerebellar ataxia [40] 27 17.3 p53 Tumor suppressor protein Proliferative disease [68] 23 18.5 P300 Coactivator. Histone acetyltransferase May play a role in epithelial cancer [69] 18 20.2 RXR-a Retinoid X-a receptor Hepatocellular carcinoma [70] 18 8 pRB retinoblastoma suppressor protein. shows that the predicted number of TFs reaches 600–820 in C. elegans and D. melanogaster [20,21], and 1500–1800 in Arabidopsis (1200 cloned sequences) [20–22]. For humans, around 1500 TFs have been documented [21] and it is estimated that there are 2000–3000 [21,23]. Such an increase in the number of TFs is associated with higher control of gene regula-tion or via control of TF expression, less connected factors may also be relevant to cell survival. [24]. Interestingly, such an increase is based on Functional and structural patterns from topology In order to reveal the mechanisms that shape the struc-ture the use of the same structural types of proteins. Human transcription factors are predominantly Zn fin-gers, followed by homeobox and basic helix–loop–helix [21]. Phylogenetic studies have shown that the amplifi-cation http://www.scbi.uma.es and shuffling of protein domains determine the growth of certain transcription factor families [25–28]. Fig. 1. Human transcription factor network built from data extracted from the TRANSFAC 8.2 database. Numbered black filled nodes are the highest connected transcription factors. 1, TATA-binding protein (TBP); 2, p53; 3, p300; 4, retinoid X receptor a (RXRa); 5, retinoblastoma protein (pRB); 6, nuclear factor NFjB p65 subunit (RelA); 7, c-jun; 8, c-myc; 9, c-fos. 26 filtering according to criteria given in Experimental Topology, tinkering and evolution of the human transcription factor network Carlos Rodriguez-Caso1,2, Miguel A. Medina2 and Ricard V. Sole´ 1,3 1 ICREA-Complex Systems Laboratory, Universitat Pompeu Fabra, Barcelona, Spain 2 Department of Molecular Biology and Biochemistry, Faculty of Sciences, Universidad de Ma´laga, Spain 3 Santa Fe Institute, Santa Fe, New Mexico, USA Living cells are composed of a large number of differ-ent molecules interacting with each other to yield com-plex spatial and temporal patterns. Unfortunately, this reality is seldom captured by traditional and molecular biology approaches. A shift from molecular to modular biology seems unavoidable [1] as biological systems are defined by complex networks of interacting compo-nents. Early topological studies of cellular networks revealed that genomic, proteomic and metabolic maps share characteristic features with other real-world networks [8–12]. Protein networks, also called inter-actomes, were studied thanks to a massive two-hybrid system screening in unicellular Saccharomyces cerevisiae [9] and, more recently, in Drosophila melanogaster [13] Keywords human; molecular evolution; protein interaction; tinkering; transcription factor network Correspondence Ricard V. Sole´ , ICREA - Complex System Laboratory, Universitat Pompeu Fabra, Dr Aiguader 80, 08003 Barcelona, Spain Fax: +34 93 221 3237 Tel: +34 93 542 2821 E-mail: ricard.sole@upf.edu (Received 5 August 2005, revised 25 October 2005, accepted 31 October 2005) doi:10.1111/j.1742-4658.2005.05041.x Patterns of protein interactions are organized around complex heterogene-ous networks. Their architecture has been suggested to be of relevance in understanding the interactome and its functional organization, which per-vades cellular robustness. Transcription factors are particularly relevant in this context, given their central role in gene regulation. Here we present the first topological study of the human protein–protein interacting transcrip-tion factor network built using the TRANSFAC database. We show that the network exhibits scale-free and small-world properties with a hierarchi-cal and modular structure, which is built around a small number of key proteins. Most of these proteins are associated with proliferative diseases and are typically not linked to each other, thus reducing the propagation of failures through compartmentalization. Network modularity is consistent with common structural and functional features and the features are gener-ated by two distinct evolutionary strategies: amplification and shuffling of interacting domains through tinkering and acquisition of specific interact-ing regions. The function of the regulatory complexes may have played an active role in choosing one of them. Fe Institute, Santa Fe, New Mexico, USA cells are composed of a large number of differ-ent molecules interacting with each other to yield com-plex spatial and temporal patterns. Unfortunately, this is seldom captured by traditional and molecular approaches. A shift from molecular to modular seems unavoidable [1] as biological systems are by complex networks of interacting compo-nents. Early topological studies of cellular networks revealed that genomic, proteomic and metabolic maps share characteristic features with other real-world networks [8–12]. Protein networks, also called inter-actomes, were studied thanks to a massive two-hybrid system screening in unicellular Saccharomyces cerevisiae [9] and, more recently, in Drosophila melanogaster [13] Keywords molecular evolution; protein interaction; tinkering; transcription factor Correspondence Sole´ , ICREA - Complex System Laboratory, Universitat Pompeu Fabra, Aiguader 80, 08003 Barcelona, Spain 93 221 3237 93 542 2821 ricard.sole@upf.edu Received 5 August 2005, revised 25 2005, accepted 31 October 2005) 10.1111/j.1742-4658.2005.05041.x Patterns of protein interactions are organized around complex heterogene-ous networks. Their architecture has been suggested to be of relevance in understanding the interactome and its functional organization, which per-vades cellular robustness. Transcription factors are particularly relevant in this context, given their central role in gene regulation. Here we present the first topological study of the human protein–protein interacting transcrip-tion factor network built using the TRANSFAC database. We show that the network exhibits scale-free and small-world properties with a hierarchi-cal and modular structure, which is built around a small number of key proteins. Most of these proteins are associated with proliferative diseases and are typically not linked to each other, thus reducing the propagation of failures through compartmentalization. Network modularity is consistent with common structural and functional features and the features are gener-ated by two distinct evolutionary strategies: amplification and shuffling of interacting domains through tinkering and acquisition of specific interact-ing regions. The function of the regulatory complexes may have played an active role in choosing one of them. of HTFN, we studied its topological modularity in relation to the function and structure of TFs from available information. From a structural point of view, the overabundance of self-interactions is associated with a majority group of 55% of basic helix–loop– helix (bHLH) and leucine zippers (bZip), 17.5% of Zn fingers and 22.5% corresponding to a more hetero-geneous a complex, by varying their function and affinity to DNA. This is the case of the bHLH–bZip proto-onco-gen c-myc [44], or the Zn finger retinoid X receptor RXR [45]. From a topological viewpoint, connections by self-interacting domains would imply high clustering and modularity, because all these proteins share the same rules and they have the potential to give a highly inter-connected subgraph (i.e. a module). According to this, the high clustering of HTFN (see Fig. 1) could be explained as a by-product of the overabundance of self-interacting domains. We wondered whether the HTFN modular architec-ture Tumour suppressor protein Proliferative disease Bladder cancer. Osteosarcoma [71] 15 27.1 RelA NF-jB pathway Hepatocyte apoptosis and foetal death [72] 14 6.6 c-jun AP-1 complex (activator). Proto-oncogen Proliferative disease [73] 14 4.1 c-myc Activator. Proto-oncogen Proliferative disease [74] 13 10.5 c-fos AP-1 complex (activator). Proto-oncogen Proliferative disease [75] 12 2 2 1 4 5 7 6 9
  • 29. Human transcription factor network topology C. Rodriguez-Caso et al. Transcription factor network explains some cancers Abbreviations ER, Erdo¨ s-Re´ nyi; HTFN, human transcription factor network; SF, scale free; SW, small world; TF, transcription factor. allow the formation of supramolecular activator or inhibitory complexes, depending on their components and possible combinations. Transcription factors (TFs) are an essential subset of interacting proteins responsible for the control of gene expression. They interact with DNA regions and tend to form transcriptional regulatory complexes. Thus, the final effect of one of these complexes is determined by its TF composition. The number of TFs varies among organisms, although it appears to be linked to the organism’s complexity. Around 200–300 TFs are predicted for Escherichia coli [18] and Saccharomyces [19,20]. By contrast, comparative analysis in multicellular organ-isms FEBS Journal 272 (2005) 6423–6434 ª 2005 The Authors Journal compilation ª 2005 FEBS 6423 Topology, tinkering and evolution of the human transcription factor network Rodriguez-Caso1,2, Miguel A. Medina2 and Ricard V. Sole´ 1,3 C. Rodriguez-Caso et al. Human transcription factor network topology Complex Systems Laboratory, Universitat Pompeu Fabra, Barcelona, Spain Department of Molecular Biology and Biochemistry, Faculty of Sciences, Universidad de Ma´laga, Spain Table 2. Description and functionality of transcriptions factor hubs. Transcription factor (TF), degree (k), betweenness centrality (b). TF Description Associate disease k b· 103 TBP Basal transcription machinery initiator Spinocerebellar ataxia [40] 27 17.3 p53 Tumor suppressor protein Proliferative disease [68] 23 18.5 P300 Coactivator. Histone acetyltransferase May play a role in epithelial cancer [69] 18 20.2 RXR-a Retinoid X-a receptor Hepatocellular carcinoma [70] 18 8 pRB retinoblastoma suppressor protein. shows that the predicted number of TFs reaches 600–820 in C. elegans and D. melanogaster [20,21], and 1500–1800 in Arabidopsis (1200 cloned sequences) [20–22]. For humans, around 1500 TFs have been documented [21] and it is estimated that there are 2000–3000 [21,23]. Such an increase in the number of TFs is associated with higher control of gene regula-tion or via control of TF expression, less connected factors may also be relevant to cell survival. [24]. Interestingly, such an increase is based on Functional and structural patterns from topology In order to reveal the mechanisms that shape the struc-ture the use of the same structural types of proteins. Human transcription factors are predominantly Zn fin-gers, followed by homeobox and basic helix–loop–helix [21]. Phylogenetic studies have shown that the amplifi-cation http://www.scbi.uma.es and shuffling of protein domains determine the growth of certain transcription factor families [25–28]. Fig. 1. Human transcription factor network built from data extracted from the TRANSFAC 8.2 database. Numbered black filled nodes are the highest connected transcription factors. 1, TATA-binding protein (TBP); 2, p53; 3, p300; 4, retinoid X receptor a (RXRa); 5, retinoblastoma protein (pRB); 6, nuclear factor NFjB p65 subunit (RelA); 7, c-jun; 8, c-myc; 9, c-fos. 26 filtering according to criteria given in Experimental Topology, tinkering and evolution of the human transcription factor network Carlos Rodriguez-Caso1,2, Miguel A. Medina2 and Ricard V. Sole´ 1,3 1 ICREA-Complex Systems Laboratory, Universitat Pompeu Fabra, Barcelona, Spain 2 Department of Molecular Biology and Biochemistry, Faculty of Sciences, Universidad de Ma´laga, Spain 3 Santa Fe Institute, Santa Fe, New Mexico, USA Living cells are composed of a large number of differ-ent molecules interacting with each other to yield com-plex spatial and temporal patterns. Unfortunately, this reality is seldom captured by traditional and molecular biology approaches. A shift from molecular to modular biology seems unavoidable [1] as biological systems are defined by complex networks of interacting compo-nents. Early topological studies of cellular networks revealed that genomic, proteomic and metabolic maps share characteristic features with other real-world networks [8–12]. Protein networks, also called inter-actomes, were studied thanks to a massive two-hybrid system screening in unicellular Saccharomyces cerevisiae [9] and, more recently, in Drosophila melanogaster [13] Keywords human; molecular evolution; protein interaction; tinkering; transcription factor network Correspondence Ricard V. Sole´ , ICREA - Complex System Laboratory, Universitat Pompeu Fabra, Dr Aiguader 80, 08003 Barcelona, Spain Fax: +34 93 221 3237 Tel: +34 93 542 2821 E-mail: ricard.sole@upf.edu (Received 5 August 2005, revised 25 October 2005, accepted 31 October 2005) doi:10.1111/j.1742-4658.2005.05041.x Patterns of protein interactions are organized around complex heterogene-ous networks. Their architecture has been suggested to be of relevance in understanding the interactome and its functional organization, which per-vades cellular robustness. Transcription factors are particularly relevant in this context, given their central role in gene regulation. Here we present the first topological study of the human protein–protein interacting transcrip-tion factor network built using the TRANSFAC database. We show that the network exhibits scale-free and small-world properties with a hierarchi-cal and modular structure, which is built around a small number of key proteins. Most of these proteins are associated with proliferative diseases and are typically not linked to each other, thus reducing the propagation of failures through compartmentalization. Network modularity is consistent with common structural and functional features and the features are gener-ated by two distinct evolutionary strategies: amplification and shuffling of interacting domains through tinkering and acquisition of specific interact-ing regions. The function of the regulatory complexes may have played an active role in choosing one of them. Fe Institute, Santa Fe, New Mexico, USA cells are composed of a large number of differ-ent molecules interacting with each other to yield com-plex spatial and temporal patterns. Unfortunately, this is seldom captured by traditional and molecular approaches. A shift from molecular to modular seems unavoidable [1] as biological systems are by complex networks of interacting compo-nents. Early topological studies of cellular networks revealed that genomic, proteomic and metabolic maps share characteristic features with other real-world networks [8–12]. Protein networks, also called inter-actomes, were studied thanks to a massive two-hybrid system screening in unicellular Saccharomyces cerevisiae [9] and, more recently, in Drosophila melanogaster [13] Keywords molecular evolution; protein interaction; tinkering; transcription factor Correspondence Sole´ , ICREA - Complex System Laboratory, Universitat Pompeu Fabra, Aiguader 80, 08003 Barcelona, Spain 93 221 3237 93 542 2821 ricard.sole@upf.edu Received 5 August 2005, revised 25 2005, accepted 31 October 2005) 10.1111/j.1742-4658.2005.05041.x Patterns of protein interactions are organized around complex heterogene-ous networks. Their architecture has been suggested to be of relevance in understanding the interactome and its functional organization, which per-vades cellular robustness. Transcription factors are particularly relevant in this context, given their central role in gene regulation. Here we present the first topological study of the human protein–protein interacting transcrip-tion factor network built using the TRANSFAC database. We show that the network exhibits scale-free and small-world properties with a hierarchi-cal and modular structure, which is built around a small number of key proteins. Most of these proteins are associated with proliferative diseases and are typically not linked to each other, thus reducing the propagation of failures through compartmentalization. Network modularity is consistent with common structural and functional features and the features are gener-ated by two distinct evolutionary strategies: amplification and shuffling of interacting domains through tinkering and acquisition of specific interact-ing regions. The function of the regulatory complexes may have played an active role in choosing one of them. of HTFN, we studied its topological modularity in relation to the function and structure of TFs from available information. From a structural point of view, the overabundance of self-interactions is associated with a majority group of 55% of basic helix–loop– helix (bHLH) and leucine zippers (bZip), 17.5% of Zn fingers and 22.5% corresponding to a more hetero-geneous a complex, by varying their function and affinity to DNA. This is the case of the bHLH–bZip proto-onco-gen c-myc [44], or the Zn finger retinoid X receptor RXR [45]. From a topological viewpoint, connections by self-interacting domains would imply high clustering and modularity, because all these proteins share the same rules and they have the potential to give a highly inter-connected subgraph (i.e. a module). According to this, the high clustering of HTFN (see Fig. 1) could be explained as a by-product of the overabundance of self-interacting domains. We wondered whether the HTFN modular architec-ture Tumour suppressor protein Proliferative disease Bladder cancer. Osteosarcoma [71] 15 27.1 RelA NF-jB pathway Hepatocyte apoptosis and foetal death [72] 14 6.6 c-jun AP-1 complex (activator). Proto-oncogen Proliferative disease [73] 14 4.1 c-myc Activator. Proto-oncogen Proliferative disease [74] 13 10.5 c-fos AP-1 complex (activator). Proto-oncogen Proliferative disease [75] 12 2 2 1 4 5 7 6 9 At least 9 transcription factors drive to cancer if their function is affected
  • 30. Human transcription factor network topology C. Rodriguez-Caso et al. Transcription factor network explains some cancers Abbreviations ER, Erdo¨ s-Re´ nyi; HTFN, human transcription factor network; SF, scale free; SW, small world; TF, transcription factor. allow the formation of supramolecular activator or inhibitory complexes, depending on their components and possible combinations. Transcription factors (TFs) are an essential subset of interacting proteins responsible for the control of gene expression. They interact with DNA regions and tend to form transcriptional regulatory complexes. Thus, the final effect of one of these complexes is determined by its TF composition. The number of TFs varies among organisms, although it appears to be linked to the organism’s complexity. Around 200–300 TFs are predicted for Escherichia coli [18] and Saccharomyces [19,20]. By contrast, comparative analysis in multicellular organ-isms FEBS Journal 272 (2005) 6423–6434 ª 2005 The Authors Journal compilation ª 2005 FEBS 6423 Topology, tinkering and evolution of the human transcription factor network Rodriguez-Caso1,2, Miguel A. Medina2 and Ricard V. Sole´ 1,3 C. Rodriguez-Caso et al. Human transcription factor network topology Complex Systems Laboratory, Universitat Pompeu Fabra, Barcelona, Spain If I know the gene network of a process Department of Molecular Biology and Biochemistry, Faculty of Sciences, Universidad de Ma´laga, Spain Table 2. Description and functionality of transcriptions factor hubs. Transcription factor (TF), degree (k), betweenness centrality (b). THEN TF Description Associate disease k b· 103 TBP Basal transcription machinery initiator Spinocerebellar ataxia [40] 27 17.3 p53 Tumor suppressor protein Proliferative disease [68] 23 18.5 P300 Coactivator. Histone acetyltransferase May play a role in epithelial cancer [69] 18 20.2 RXR-a Retinoid X-a receptor Hepatocellular carcinoma [70] 18 8 pRB retinoblastoma suppressor protein. I can predict which genes are really essential shows that the predicted number of TFs reaches 600–820 in C. elegans and D. melanogaster [20,21], and 1500–1800 in Arabidopsis (1200 cloned sequences) [20–22]. For humans, around 1500 TFs have been documented [21] and it is estimated that there are 2000–3000 [21,23]. Such an increase in the number of TFs is associated with higher control of gene regula-tion or via control of TF expression, less connected factors may also be relevant to cell survival. [24]. Interestingly, such an increase is based on Functional and structural patterns from topology In order to reveal the mechanisms that shape the struc-ture the use of the same structural types of proteins. Human transcription factors are predominantly Zn fin-gers, followed by homeobox and basic helix–loop–helix [21]. Phylogenetic studies have shown that the amplifi-cation http://www.scbi.uma.es and shuffling of protein domains determine the growth of certain transcription factor families [25–28]. Fig. 1. Human transcription factor network built from data extracted from the TRANSFAC 8.2 database. Numbered black filled nodes are the highest connected transcription factors. 1, TATA-binding protein (TBP); 2, p53; 3, p300; 4, retinoid X receptor a (RXRa); 5, retinoblastoma protein (pRB); 6, nuclear factor NFjB p65 subunit (RelA); 7, c-jun; 8, c-myc; 9, c-fos. 26 filtering according to criteria given in Experimental Topology, tinkering and evolution of the human transcription factor network Carlos Rodriguez-Caso1,2, Miguel A. Medina2 and Ricard V. Sole´ 1,3 1 ICREA-Complex Systems Laboratory, Universitat Pompeu Fabra, Barcelona, Spain 2 Department of Molecular Biology and Biochemistry, Faculty of Sciences, Universidad de Ma´laga, Spain 3 Santa Fe Institute, Santa Fe, New Mexico, USA Living cells are composed of a large number of differ-ent molecules interacting with each other to yield com-plex spatial and temporal patterns. Unfortunately, this reality is seldom captured by traditional and molecular biology approaches. A shift from molecular to modular biology seems unavoidable [1] as biological systems are defined by complex networks of interacting compo-nents. Early topological studies of cellular networks revealed that genomic, proteomic and metabolic maps share characteristic features with other real-world networks [8–12]. Protein networks, also called inter-actomes, were studied thanks to a massive two-hybrid system screening in unicellular Saccharomyces cerevisiae [9] and, more recently, in Drosophila melanogaster [13] Keywords human; molecular evolution; protein interaction; tinkering; transcription factor network Correspondence Ricard V. Sole´ , ICREA - Complex System Laboratory, Universitat Pompeu Fabra, Dr Aiguader 80, 08003 Barcelona, Spain Fax: +34 93 221 3237 Tel: +34 93 542 2821 E-mail: ricard.sole@upf.edu (Received 5 August 2005, revised 25 October 2005, accepted 31 October 2005) doi:10.1111/j.1742-4658.2005.05041.x Patterns of protein interactions are organized around complex heterogene-ous networks. Their architecture has been suggested to be of relevance in understanding the interactome and its functional organization, which per-vades cellular robustness. Transcription factors are particularly relevant in this context, given their central role in gene regulation. Here we present the first topological study of the human protein–protein interacting transcrip-tion factor network built using the TRANSFAC database. We show that the network exhibits scale-free and small-world properties with a hierarchi-cal and modular structure, which is built around a small number of key proteins. Most of these proteins are associated with proliferative diseases and are typically not linked to each other, thus reducing the propagation of failures through compartmentalization. Network modularity is consistent with common structural and functional features and the features are gener-ated by two distinct evolutionary strategies: amplification and shuffling of interacting domains through tinkering and acquisition of specific interact-ing regions. The function of the regulatory complexes may have played an active role in choosing one of them. Fe Institute, Santa Fe, New Mexico, USA cells are composed of a large number of differ-ent molecules interacting with each other to yield com-plex spatial and temporal patterns. Unfortunately, this is seldom captured by traditional and molecular approaches. A shift from molecular to modular seems unavoidable [1] as biological systems are by complex networks of interacting compo-nents. Early topological studies of cellular networks revealed that genomic, proteomic and metabolic maps share characteristic features with other real-world networks [8–12]. Protein networks, also called inter-actomes, were studied thanks to a massive two-hybrid system screening in unicellular Saccharomyces cerevisiae [9] and, more recently, in Drosophila melanogaster [13] Keywords molecular evolution; protein interaction; tinkering; transcription factor Correspondence Sole´ , ICREA - Complex System Laboratory, Universitat Pompeu Fabra, Aiguader 80, 08003 Barcelona, Spain 93 221 3237 93 542 2821 ricard.sole@upf.edu Received 5 August 2005, revised 25 2005, accepted 31 October 2005) 10.1111/j.1742-4658.2005.05041.x Patterns of protein interactions are organized around complex heterogene-ous networks. Their architecture has been suggested to be of relevance in understanding the interactome and its functional organization, which per-vades cellular robustness. Transcription factors are particularly relevant in this context, given their central role in gene regulation. Here we present the first topological study of the human protein–protein interacting transcrip-tion factor network built using the TRANSFAC database. We show that the network exhibits scale-free and small-world properties with a hierarchi-cal and modular structure, which is built around a small number of key proteins. Most of these proteins are associated with proliferative diseases and are typically not linked to each other, thus reducing the propagation of failures through compartmentalization. Network modularity is consistent with common structural and functional features and the features are gener-ated by two distinct evolutionary strategies: amplification and shuffling of interacting domains through tinkering and acquisition of specific interact-ing regions. The function of the regulatory complexes may have played an active role in choosing one of them. of HTFN, we studied its topological modularity in relation to the function and structure of TFs from available information. From a structural point of view, the overabundance of self-interactions is associated with a majority group of 55% of basic helix–loop– helix (bHLH) and leucine zippers (bZip), 17.5% of Zn fingers and 22.5% corresponding to a more hetero-geneous a complex, by varying their function and affinity to DNA. This is the case of the bHLH–bZip proto-onco-gen c-myc [44], or the Zn finger retinoid X receptor RXR [45]. From a topological viewpoint, connections by self-interacting domains would imply high clustering and modularity, because all these proteins share the same rules and they have the potential to give a highly inter-connected subgraph (i.e. a module). According to this, the high clustering of HTFN (see Fig. 1) could be explained as a by-product of the overabundance of self-interacting domains. We wondered whether the HTFN modular architec-ture Tumour suppressor protein Proliferative disease Bladder cancer. Osteosarcoma [71] 15 27.1 RelA NF-jB pathway Hepatocyte apoptosis and foetal death [72] 14 6.6 c-jun AP-1 complex (activator). Proto-oncogen Proliferative disease [73] 14 4.1 c-myc Activator. Proto-oncogen Proliferative disease [74] 13 10.5 c-fos AP-1 complex (activator). Proto-oncogen Proliferative disease [75] 12 2 2 1 4 5 7 6 9 At least 9 transcription factors drive to cancer if their function is affected
  • 31. Biomarkers can be obtained from the observation of http://www.scbi.uma.es bioinformatics networks 27 Breast cancer
  • 32. Gene R.M. Luque-Baena signatures et al. / Journal of Biomedical Informatics to 49 (2014) cancer 32–44 diagnosis Table 5 Performance comparison among the ‘‘Filter + GA + Pathway’’ combined strategy and three well-known filtering methods (Cons, IG and ReliefF). ACC and number of genes mean ± std) are reported for LDA and SVM classifiers on the three analyzed datasets. Strategy Leukemia http://www.scbi.uma.es 28 Journal of Biomedical Informatics 49 (2014) 32–44 Contents lists available at ScienceDirect Journal of Biomedical Informatics journal homepage: www.elsevier.com/locate/yjbin Robust gene signatures from microarray data using genetic algorithms enriched with biological pathway keywords R.M. Luque-Baena a,⇑, D. Urda a,b, M. Gonzalo Claros c, L. Franco a,b, J.M. Jerez a,b a Departmento de Lenguajes y Ciencias de la Computación, University of Málaga, Bulevar Louis Pasteur, 35, 29071 Málaga, Spain b Instituto de Investigación Biomédica de Málaga (IBIMA), Málaga, Spain c Supercomputing and Bioinformatics Centre, University of Málaga, C/ Severo Ochoa, 34, 29590 Málaga, Spain a r t i c l e i n f o Article history: Received 24 July 2013 Accepted 16 January 2014 Available online 27 January 2014 Keywords: DNA analysis Evolutionary algorithms Biological enrichment Feature selection a b s t r a c t Genetic algorithms are widely used in the estimation of expression profiles from microarrays data. How-ever, these techniques are unable to produce stable and robust solutions suitable to use in clinical and bio-medical studies. This paper presents a novel two-stage evolutionary strategy for gene feature selection combining the genetic algorithm with biological information extracted from the KEGG database. A com-parative study is carried out over public data from three different types of cancer (leukemia, lung cancer and prostate cancer). Even though the analyses only use features having KEGG information, the results demonstrate that this two-stage evolutionary strategy increased the consistency, robustness and accuracy of a blind discrimination among relapsed and healthy individuals. Therefore, this approach could facilitate the definition of gene signatures for the clinical prognosis and diagnostic of cancer diseases in a near future. Additionally, it could also be used for biological knowledge discovery about the studied disease. ! 2014 Elsevier Inc. All rights reserved. 1. Introduction domain of DNA microarrays. Genetic algorithms (GAs) [13–18], as a particular case of evolutionary models, use classification tech-niques Journal of Biomedical Informatics 49 (2014) 32–44 LDA SVM ACC #Genes ACC #Genes Filter + GA + Pathway 05340 97.13 ± 1.16 31.83 ± 1.86 93.87 ± 2.02 30.82 ± 1.62 Filter + GA + Pathway 04640 96.38 ± 1.26 4.47 ± 0.71 94.86 ± 1.13 4.05 ± 0.80 Cons 85.85 ± 8.55 1.84 ± 0.51 88.24 ± 5.95 1.84 ± 0.51 IG 93.13 ± 4.40 9 ± 0 93.36 ± 4.33 9 ± 0 ReliefF 93.31 ± 4.37 9 ± 0 90.48 ± 5.15 9 ± 0 Lung Filter + GA + Pathway 04144 98.09 ± 0.68 4.29 ± 0.53 96.25 ± 0.97 4.15 ± 0.57 Filter + GA + Pathway 04530 98.26 ± 0.46 3.84 ± 0.46 97.05 ± 0.90 3.55 ± 0.64 Cons 94.08 ± 3.36 1.84 ± 0.42 94.57 ± 2.55 1.84 ± 0.42 IG 98.68 ± 1.51 22 ± 0 98.88 ± 1.39 22 ± 0 ReliefF 97.89 ± 1.81 22 ± 0 98.47 ± 1.43 22 ± 0 Prostate Filter + GA + Pathway 00980 91.37 ± 1.15 8.27 ± 0.83 87.96 ± 2.39 11.15 ± 2.10 Filter + GA + Pathway 00480 90.80 ± 1.36 14.30 ± 2.63 88.90 ± 2.29 26.24 ± 4.02 Cons 81.51 ± 7.57 3.20 ± 0.67 82.49 ± 6.72 3.20 ± 0.67 IG 91.66 ± 4.07 12 ± 0 85.86 ± 4.86 12 ± 0 ReliefF 90.22 ± 4.53 12 ± 0 88.50 ± 5.17 12 ± 0
  • 33. Gene R.M. Luque-Baena signatures et al. / Journal of Biomedical Informatics to 49 (2014) cancer 32–44 diagnosis Table 5 Performance comparison among the ‘‘Filter + GA + Pathway’’ combined strategy and three well-known filtering methods (Cons, IG and ReliefF). ACC and number of genes mean ± std) are reported for LDA and SVM classifiers on the three analyzed datasets. Strategy Leukemia If I have determined a gene signature http://www.scbi.uma.es 28 Journal of Biomedical Informatics 49 (2014) 32–44 Contents lists available at ScienceDirect Journal of Biomedical Informatics journal homepage: www.elsevier.com/locate/yjbin Robust gene signatures from microarray data using genetic algorithms enriched with biological pathway keywords R.M. Luque-Baena a,⇑, D. Urda a,b, M. Gonzalo Claros c, L. Franco a,b, J.M. Jerez a,b a Departmento de Lenguajes y Ciencias de la Computación, University of Málaga, Bulevar Louis Pasteur, 35, 29071 Málaga, Spain b Instituto de Investigación Biomédica de Málaga (IBIMA), Málaga, Spain c Supercomputing and Bioinformatics Centre, University of Málaga, C/ Severo Ochoa, 34, 29590 Málaga, Spain a r t i c l e i n f o Article history: Received 24 July 2013 Accepted 16 January 2014 Available online 27 January 2014 Keywords: DNA analysis Evolutionary algorithms Biological enrichment Feature selection a b s t r a c t Genetic algorithms are widely used in the estimation of expression profiles from microarrays data. How-ever, these techniques are unable to produce stable and robust solutions suitable to use in clinical and bio-medical studies. This paper presents a novel two-stage evolutionary strategy for gene feature selection combining the genetic algorithm with biological information extracted from the KEGG database. A com-parative study is carried out over public data from three different types of cancer (leukemia, lung cancer and prostate cancer). Even though the analyses only use features having KEGG information, the results demonstrate that this two-stage evolutionary strategy increased the consistency, robustness and accuracy of a blind discrimination among relapsed and healthy individuals. Therefore, this approach could facilitate the definition of gene signatures for the clinical prognosis and diagnostic of cancer diseases in a near future. Additionally, it could also be used for biological knowledge discovery about the studied disease. ! 2014 Elsevier Inc. All rights reserved. 1. Introduction domain of DNA microarrays. Genetic algorithms (GAs) [13–18], as a particular case of evolutionary models, use classification tech-niques Journal of Biomedical Informatics 49 (2014) 32–44 LDA SVM ACC #Genes ACC #Genes Filter + GA + Pathway 05340 97.13 ± 1.16 31.83 ± 1.86 93.87 ± 2.02 30.82 ± 1.62 Filter + GA + Pathway 04640 96.38 ± 1.26 4.47 ± 0.71 Cons 85.85 ± 8.55 1.84 ± 0.51 THEN 94.86 ± 1.13 4.05 ± 0.80 88.24 ± 5.95 1.84 ± 0.51 IG 93.13 ± 4.40 9 ± 0 93.36 ± 4.33 9 ± 0 ReliefF 93.31 Lung I ± 4.37 can know 9 ± which 0 is the 90.48 ± 5.15 desease 9 ± 0 Filter + GA + Pathway 04144 98.09 ± 0.68 4.29 ± 0.53 96.25 ± 0.97 4.15 ± 0.57 Filter + GA + Pathway 04530 98.26 ± 0.46 3.84 ± 0.46 97.05 ± 0.90 3.55 ± 0.64 Cons 94.08 ± 3.36 1.84 ± 0.42 94.57 ± 2.55 1.84 ± 0.42 IG 98.68 ± 1.51 22 ± 0 98.88 ± 1.39 22 ± 0 ReliefF 97.89 ± 1.81 22 ± 0 98.47 ± 1.43 22 ± 0 Prostate Filter + GA + Pathway 00980 91.37 ± 1.15 8.27 ± 0.83 87.96 ± 2.39 11.15 ± 2.10 Filter + GA + Pathway 00480 90.80 ± 1.36 14.30 ± 2.63 88.90 ± 2.29 26.24 ± 4.02 Cons 81.51 ± 7.57 3.20 ± 0.67 82.49 ± 6.72 3.20 ± 0.67 IG 91.66 ± 4.07 12 ± 0 85.86 ± 4.86 12 ± 0 ReliefF 90.22 ± 4.53 12 ± 0 88.50 ± 5.17 12 ± 0
  • 34. Cancer signatures to reveal prognosis Competing Interests: The authors have declared that no competing interests exist. * E-mail: jlozano@uma.es . These authors contributed equally to this work. Introduction Breast cancer comprises a group of heterogeneous diseases that can be classified based on both clinical and molecular features [1– 5]. Improvements in the early detection of primary tumors and the development of novel targeted therapies, together with the systematic use of adjuvant chemotherapy, has drastically reduced mortality rates and increased disease-free survival (DFS) in breast cancer. Still, about one third of patients undergoing breast tumor excision will develop metastases, the major life-threatening event which is strongly associated with poor outcome [6,7]. early recurrence in breast cancer. Hierarchical clustering of the 71 tumor samples based lower expression levels of the 5-miRNA signature defines a distinct cluster 2b wich mainly includes contrary, most patients with good prognosis (group A) had tumors with normal or higher-than different cluster 1b (‘‘low risk’’). The risk of relapse after tumor resection is not constant over time. A detailed examination of large series of long-term follow-up studies over the last two decades reveals a bimodal hazard function with two peaks of early and late recurrence occurring at 1.5 and 5 Table 2). MiR- RT-qPCR data 2). Next, we re-clustered signature. As B were clearly discriminates tumors with an overall higher risk of early recurrence. The 5-miRNA signature PLOS ONE | www.plosone.org 1 March 2014 | Volume 9 | Issue 3 | e91884 included most of the A in cluster 1b risk). Of note, the group C (72.8%), MiR-149 was the most significant miRNA downregulated in group B, as determined by microarray hybridization and by RT-qPCR. This miRNA has been described as a TS-miR that regulates the expression of genes associated with cell cycle, invasion or migration and its downregulation has been observed in several tumor diseases, including gastric cancer and breast cancer [70,77–81]. Down-regulation of miR-149 can occur epigenetical- http://www.scbi.uma.es 29 years, respectively, followed by a nearly flat plateau in which the risk of relapse tends to zero [8–10]. A causal link between tumor surgery and the bimodal pattern of recurrence has been proposed by some investigators (i.e. an iatrogenic effect) [11]. According to that model, surgical removal of the primary breast tumor would accelerate the growth of dormant metastatic foci by altering the balance between circulating pro- and anti-angiogenic factors [9,11–14]. Such hypothesis is supported by the fact that the two peaks of relapse are observed regardless other factors than surgery, such as the axillary nodal status, the type of surgery or the administration of adjuvant therapy. Although estrogen receptor (ER)-negative tumors are commonly associated with a higher risk of early relapse [15], the bimodal distribution pattern is observed with independence of the hormone receptor status [16]. Other studies also suggest that the dynamics of tumor relapse may be a A microRNA Signature Associated with Early Recurrence in Breast Cancer Luis G. Pe´ rez-Rivas1., Jose´ M. Jerez2., Rosario Carmona3, Vanessa de Luque1, Luis Vicioso4, M. Gonzalo Claros3,5, Enrique Viguera6, Bella Pajares1, Alfonso Sa´nchez1, Nuria Ribelles1, Emilio Alba1, Jose´ Lozano1,5* 1 Laboratorio de Oncologı´a Molecular, Servicio de Oncologı´a Me´dica, Instituto de Biomedicina de Ma´laga (IBIMA), Hospital Universitario Virgen de la Victoria, Ma´laga, Spain, 2 Departamento de Lenguajes y Ciencias de la Computacio´ n, Universidad de Ma´laga, Ma´laga, Spain, 3 Plataforma Andaluza de Bioinforma´ tica, Universidad de Ma´laga, Ma´laga, Spain, 4 Servicio de Anatomı´a Patolo´ gica, Instituto de Biomedicina de Ma´laga (IBIMA), Hospital Universitario Virgen de la Victoria, Ma´laga, Spain, 5 Departmento de Biologı´a Molecular y Bioquı´mica, Universidad de Ma´laga, Ma´laga, Spain, 6 Departmento of Biologı´a Celular, Gene´ tica y Fisiologı´a Animal, Universidad de Ma´laga, Ma´laga, Spain signature specifically Abstract Recurrent breast cancer occurring after the initial treatment is associated with poor outcome. A bimodal relapse pattern after surgery for primary tumor has been described with peaks of early and late recurrence occurring at about 2 and 5 years, respectively. Although several clinical and pathological features have been used to discriminate between low- and high-risk patients, the identification of molecular biomarkers with prognostic value remains an unmet need in the current management of breast cancer. Using microarray-based technology, we have performed a microRNA expression analysis in 71 primary breast tumors from patients that either remained disease-free at 5 years post-surgery (group A) or developed early (group B) or late (group C) recurrence. Unsupervised hierarchical clustering of microRNA expression data segregated tumors in two groups, mainly corresponding to patients with early recurrence and those with no recurrence. Microarray data analysis and RT-qPCR validation led to the identification of a set of 5 microRNAs (the 5-miRNA signature) differentially expressed between these two groups: miR-149, miR-10a, miR-20b, miR-30a-3p and miR-342-5p. All five microRNAs were down-regulated in tumors from patients with early recurrence. We show here that the 5-miRNA signature defines a high-risk group of patients with shorter relapse-free survival and has predictive value to discriminate non-relapsing versus early-relapsing patients (AUC = 0.993, p-value,0.05). Network analysis based on miRNA-target interactions curated by public databases suggests that down-regulation of the 5-miRNA signature in the subset of early-relapsing tumors would result in an overall increased proliferative and angiogenic capacity. In summary, we have identified a set of recurrence-related microRNAs with potential prognostic value to identify patients who will likely develop metastasis early after primary breast surgery. Citation: Pe´rez-Rivas LG, Jerez JM, Carmona R, de Luque V, Vicioso L, et al. (2014) A microRNA Signature Associated with Early Recurrence in Breast Cancer. PLoS ONE 9(3): e91884. doi:10.1371/journal.pone.0091884 Editor: Sonia Rocha, University of Dundee, United Kingdom Received November 11, 2013; Accepted February 14, 2014; Published March 14, 2014 Copyright: ! 2014 Pe´rez-Rivas et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by a grant from the Spanish Society of Medical Oncology (SEOM, to NR) and by grants from the Spanish Ministerio de Economı´a, (SAF2010-20203 to J.L and TIN2010-16556 to J.J) and from the Junta de Andalucı´a (TIN-4026, to JJ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: jlozano@uma.es . These authors contributed equally to this work. patients with diferent RFS. A) Kaplan-Meier graph for the whole patient cohort included in overall down-regulation of the 5-miRNA signature (i.e. those from cluster 2b in Fig. 2) were RFS was calculated (red line). RFS was also calculated for the remaining patients in the cohort that the 5-miRNA signature specifically discriminates tumors with an overall higher risk of early post-recurrence survival [100], likely because it targets AKT1 mRNA [101]. In sum, the available bibliographic data suggests that down-regulation of miR-149, miR-30a-3p, miR-20b, miR-10a and miR342-5p in primary breast tumors could confer them enhanced proliferative, angiogenic and invasive potentials. Prognostic value of the 5-miRNA signature. The relation-ship between expression of the 5-miRNA signature and RFS was examined by a survival analysis. Figure 3A shows a Kaplan-Meier graph for the whole series of patients included in the study. Due to the intrinsic characteristics of the cohort, decreases in the RFS are only observed in the intervals 0–24 and 50–60 months (corresponding to groups B and C, respectively). We next grouped the tumors according to their 5-miRNA signature status in two different groups. One group included those tumors with all five miRNAs simultaneously downregulated, (FC.2 and p,0.05) and a second group included those tumors not having all five miRNAs downregulated. A survival analysis was performed using clinical data from the corresponding patients. As shown in Figure 3B, the Kaplan-Meier graphs for the two groups demonstrate that the 5- miRNA signature defines a ‘‘high risk’’ group of patients with a shorter RFS (Peto-Peto test with p-value = 0.02, when comparing Figure 4. Receiver operating characteristic curve (ROC) for early breast cancer recurrence by the 5-miRNA signature status. ROC curves generated using the prognosis information and expression levels of the 5-miRNA signature can discriminate between A miRNA Signature Predictive of Early Recurrence A miRNA Signature Predictive of Early Recurrence
  • 35. Cancer signatures to reveal prognosis Competing Interests: The authors have declared that no competing interests exist. * E-mail: jlozano@uma.es . These authors contributed equally to this work. Introduction Breast cancer comprises a group of heterogeneous diseases that can be classified based on both clinical and molecular features [1– 5]. Improvements in the early detection of primary tumors and the development of novel targeted therapies, together with the systematic use of adjuvant chemotherapy, has drastically reduced mortality rates and increased disease-free survival (DFS) in breast cancer. Still, about one third of patients undergoing breast tumor excision will develop metastases, the major life-threatening event which is strongly associated with poor outcome [6,7]. early recurrence in breast cancer. Hierarchical clustering of the 71 tumor samples based lower expression levels of the 5-miRNA signature defines a distinct cluster 2b wich mainly includes contrary, most patients with good prognosis (group A) had tumors with normal or higher-than different cluster 1b (‘‘low risk’’). The risk of relapse after tumor resection is not constant over time. A detailed examination of large series of long-term follow-up studies over the last two decades reveals a bimodal hazard function with two peaks of early and late recurrence occurring at 1.5 and 5 Table 2). MiR- RT-qPCR data 2). Next, we re-clustered signature. As B were clearly discriminates tumors with an overall higher risk of early recurrence. The 5-miRNA signature PLOS ONE | www.plosone.org 1 March 2014 | Volume 9 | Issue 3 | e91884 included most of the A in cluster 1b risk). Of note, the group C (72.8%), MiR-149 was the most significant miRNA downregulated in group B, as determined by microarray hybridization and by RT-qPCR. This miRNA has been described as a TS-miR that regulates the expression of genes associated with cell cycle, invasion or migration and its downregulation has been observed in several tumor diseases, including gastric cancer and breast cancer [70,77–81]. Down-regulation of miR-149 can occur epigenetical- http://www.scbi.uma.es 29 years, respectively, followed by a nearly flat plateau in which the risk of relapse tends to zero [8–10]. A causal link between tumor surgery and the bimodal pattern of recurrence has been proposed by some investigators (i.e. an iatrogenic effect) [11]. According to that model, surgical removal of the primary breast tumor would accelerate the growth of dormant metastatic foci by altering the balance between circulating pro- and anti-angiogenic factors [9,11–14]. Such hypothesis is supported by the fact that the two peaks of relapse are observed regardless other factors than surgery, such as the axillary nodal status, the type of surgery or the administration of adjuvant therapy. Although estrogen receptor (ER)-negative tumors are commonly associated with a higher risk of early relapse [15], the bimodal distribution pattern is observed with independence of the hormone receptor status [16]. Other studies also suggest that the dynamics of tumor relapse may be a A microRNA Signature Associated with Early Recurrence in Breast Cancer Luis G. Pe´ rez-Rivas1., Jose´ M. Jerez2., Rosario Carmona3, Vanessa de Luque1, Luis Vicioso4, M. Gonzalo Claros3,5, Enrique Viguera6, Bella Pajares1, Alfonso Sa´nchez1, Nuria Ribelles1, Emilio Alba1, Jose´ Lozano1,5* 1 Laboratorio de Oncologı´a Molecular, Servicio de Oncologı´a Me´dica, Instituto de Biomedicina de Ma´laga (IBIMA), Hospital Universitario Virgen de la Victoria, Ma´laga, Spain, 2 Departamento de Lenguajes y Ciencias de la Computacio´ n, Universidad de Ma´laga, Ma´laga, Spain, 3 Plataforma Andaluza de Bioinforma´ tica, Universidad de Ma´laga, Ma´laga, Spain, 4 Servicio de Anatomı´a Patolo´ gica, Instituto de Biomedicina de Ma´laga (IBIMA), Hospital Universitario Virgen de la Victoria, Ma´laga, Spain, 5 Departmento de Biologı´a Molecular y Bioquı´mica, Universidad de Ma´laga, Ma´laga, Spain, 6 Departmento of Biologı´a Celular, Gene´ tica y Fisiologı´a Animal, Universidad de Ma´laga, Ma´laga, Spain signature specifically Abstract Recurrent breast cancer occurring after the initial treatment is associated with poor outcome. A bimodal relapse pattern after surgery for primary tumor has been described with peaks of early and late recurrence occurring at about 2 and 5 years, respectively. Although several clinical and pathological features have been used to discriminate between low- and high-risk patients, the identification of molecular biomarkers with prognostic value remains an unmet need in the current management of breast cancer. Using microarray-based technology, we have performed a microRNA expression analysis in 71 primary breast tumors from patients that either remained disease-free at 5 years post-surgery (group A) or developed early (group B) or late (group C) recurrence. Unsupervised hierarchical clustering of microRNA expression data segregated tumors in two groups, mainly corresponding to patients with early recurrence and those with no recurrence. Microarray data analysis and RT-qPCR validation led to the identification of a set of 5 microRNAs (the 5-miRNA signature) differentially expressed between these two groups: miR-149, miR-10a, miR-20b, miR-30a-3p and miR-342-5p. All five microRNAs were down-regulated in tumors from patients with early recurrence. We show here that the 5-miRNA signature defines a high-risk group of patients with shorter relapse-free survival and has predictive value to discriminate non-relapsing versus early-relapsing patients (AUC = 0.993, p-value,0.05). Network analysis based on miRNA-target interactions curated by public databases suggests that down-regulation of the 5-miRNA signature in the subset of early-relapsing tumors would result in an overall increased proliferative and angiogenic capacity. In summary, we have identified a set of recurrence-related microRNAs with potential prognostic value to identify patients who will likely develop metastasis early after primary breast surgery. Citation: Pe´rez-Rivas LG, Jerez JM, Carmona R, de Luque V, Vicioso L, et al. (2014) A microRNA Signature Associated with Early Recurrence in Breast Cancer. PLoS ONE 9(3): e91884. doi:10.1371/journal.pone.0091884 Editor: Sonia Rocha, University of Dundee, United Kingdom Received November 11, 2013; Accepted February 14, 2014; Published March 14, 2014 Copyright: ! 2014 Pe´rez-Rivas et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by a grant from the Spanish Society of Medical Oncology (SEOM, to NR) and by grants from the Spanish Ministerio de Economı´a, (SAF2010-20203 to J.L and TIN2010-16556 to J.J) and from the Junta de Andalucı´a (TIN-4026, to JJ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: jlozano@uma.es . These authors contributed equally to this work. patients with diferent RFS. A) Kaplan-Meier graph for the whole patient cohort included in overall down-regulation of the 5-miRNA signature (i.e. those from cluster 2b in Fig. 2) were RFS was calculated (red line). RFS was also calculated for the remaining patients in the cohort that the 5-miRNA signature specifically discriminates tumors with an overall higher risk of early post-recurrence survival [100], likely because it targets AKT1 mRNA [101]. In sum, the available bibliographic data suggests that down-regulation of miR-149, miR-30a-3p, miR-20b, miR-10a and miR342-5p in primary breast tumors could confer them enhanced proliferative, angiogenic and invasive potentials. Prognostic value of the 5-miRNA signature. The relation-ship between expression of the 5-miRNA signature and RFS was examined by a survival analysis. Figure 3A shows a Kaplan-Meier graph for the whole series of patients included in the study. Due to the intrinsic characteristics of the cohort, decreases in the RFS are only observed in the intervals 0–24 and 50–60 months (corresponding to groups B and C, respectively). We next grouped the tumors according to their 5-miRNA signature status in two different groups. One group included those tumors with all five miRNAs simultaneously downregulated, (FC.2 and p,0.05) and a second group included those tumors not having all five miRNAs downregulated. A survival analysis was performed using clinical data from the corresponding patients. As shown in Figure 3B, the Kaplan-Meier graphs for the two groups demonstrate that the 5- miRNA signature defines a ‘‘high risk’’ group of patients with a shorter RFS (Peto-Peto test with p-value = 0.02, when comparing Figure 4. Receiver operating characteristic curve (ROC) for early breast cancer recurrence by the 5-miRNA signature status. ROC curves generated using the prognosis information and expression levels of the 5-miRNA signature can discriminate between A miRNA Signature Predictive of Early Recurrence A miRNA Signature Predictive of Early Recurrence If I know the which genes ARE expressed THEN I can know which output WILL be obtained
  • 36. Characterization of complex variations in cancer http://www.scbi.uma.es 30 © 2014 Nature America, Inc. All rights reserved. ANALYSIS for structural variants of different sizes (Supplementary Table 2). For the present comparison, we ran them as described in their companies’ corresponding publication or website. We first observed that the calling of somatic SNVs was nearly opti-mal and within the same range in Mutect and SMUFIN, with sensitivi-ties of 97% and 92%, and specificities of 93% and 99%, respectively (Table 1 and Supplementary Table 3). On the other hand, the calling NATURE BIOTECHNOLOGY ADVANCE ONLINE PUBLICATION efficiency of somatic structural variants varied greatly between differ-ent methods, revealing clear differences when compared to SMUFIN. Some methods reached reasonable levels of sensitivity when the eval-uation was restricted to the range of structural variants they were designed to detect (Pindel and Delly), but these dropped drastically when compared against the complete catalog of structural variations in the tumor (Supplementary Table 4). By contrast, SMUFIN was Tumor and normal genome sequencing Construction of breakpoint blocks Definition and classification of variants Assigning reference coordinates Quaternary sequence tree 1 3 6 1 2 3 4 5 6 7 8 9 10 11 12 Read nt 1 2 3 4 5 6 7 8 9 10 11 n Single orientation breakpoint Double orientation breakpoint Quaternary sequence tree Overlapping and complementary reads from normal genome 1 3 6 Construction of breakpoint blocks Undefined breakpoint blocks Reads in tumor-specific branches Comparison of normal and tumor reads and identification of potential breakpoints FASTQ file Reads Quality filters Tumor Normal Read 1 2 3 4 5 6 7 8 9 SNV nt 1 2 3 4 5 6 7 8 9 1011................................. n = Read length Short insertion Large SV 10 11 12 Tumor and normal reads Unambiguous extension of normal and mutated tumor allele Normal alleles Nonmutated tumor allele Mutated tumor allele Definition of small variants (n read size) Definition of breakpoint and variant sequence for large SVs ( read size) Breakpoint 100 nt Extension of the variant and normal sequences around the breakpoint 100 nt SNVs Tumor Normal Inversions Deletions Insertions Small SVs Breakpoint of large SV Reference genome Mapping of normal sequences (BWA) Independent mapping of normal sequences flanking the breakpoint (BWA) a b c d Tumor-specific reads with potential breakpoints
  • 37. Characterization of complex variations in cancer If I know the polymorphisms of a person http://www.scbi.uma.es 30 © 2014 Nature America, Inc. All rights reserved. ANALYSIS for structural variants of different sizes (Supplementary Table 2). For the present comparison, we ran them as described in their companies’ corresponding publication or website. We first observed that the calling of somatic SNVs was nearly opti-mal and within the same range in Mutect and SMUFIN, with sensitivi-ties of 97% and 92%, and specificities of 93% and 99%, respectively (Table 1 and Supplementary Table 3). On the other hand, the calling NATURE BIOTECHNOLOGY ADVANCE ONLINE PUBLICATION efficiency of somatic structural variants varied greatly between differ-ent methods, revealing clear differences when compared to SMUFIN. Some methods reached reasonable levels of sensitivity when the eval-uation was restricted to the range of structural variants they were designed to detect (Pindel and Delly), but these dropped drastically when compared against the complete catalog of structural variations in the tumor (Supplementary Table 4). By contrast, SMUFIN was Tumor and normal genome sequencing Construction of breakpoint blocks Definition and classification of variants Assigning reference coordinates Quaternary sequence tree 1 3 6 1 2 3 4 5 6 7 8 9 10 11 12 Read nt 1 2 3 4 5 6 7 8 9 10 11 n Single orientation breakpoint Double orientation breakpoint Quaternary sequence tree Overlapping and complementary reads from normal genome 1 3 6 Construction of breakpoint blocks Undefined breakpoint blocks Reads in tumor-specific branches Comparison of normal and tumor reads and identification of potential breakpoints FASTQ file Reads Quality filters Tumor Normal Read 1 2 3 4 5 6 7 8 9 SNV nt 1 2 3 4 5 6 7 8 9 1011................................. n = Read length Short insertion Large SV 10 11 12 Tumor and normal reads Unambiguous extension of normal and mutated tumor allele Normal alleles Nonmutated tumor allele Mutated tumor allele Definition of small variants (n read size) Definition of breakpoint and variant sequence for large SVs ( read size) Breakpoint 100 nt Extension of the variant and normal sequences around the breakpoint 100 nt SNVs Tumor Normal Inversions Deletions Insertions Small SVs Breakpoint of large SV Reference genome Mapping of normal sequences (BWA) Independent mapping of normal sequences flanking the breakpoint (BWA) a b c d Tumor-specific reads with potential breakpoints THEN I can predict which disease WILL he suffer
  • 38. http://www.scbi.uma.es Personalised medicine 31 A needle in a haystack WAS FOUND
  • 39. Linking unrelated diseases http://www.scbi.uma.es 32 Alzheimer patients use to be free of cancer, and cancer patients use to be free of mental diseases
  • 40. Linking unrelated diseases http://www.scbi.uma.es 32 Alzheimer patients use to be free of cancer, and cancer patients use to be free of mental diseases Molecular Evidence for the Inverse Comorbidity between Central Nervous System Disorders and Cancers Detected by Transcriptomic Meta-analyses Kristina Iba´n˜ ez1., Ce´ sar Boullosa1., Rafael Tabare´ s-Seisdedos2, Anaı¨s Baudot3*, Alfonso Valencia1* 1 Structural Biology and Biocomputing Programme, Spanish National Cancer, Research Centre (CNIO), Madrid, Spain, 2 Department of Medicine, University of Valencia, CIBERSAM, INCLIVA, Valencia, Spain, 3 Aix-Marseille Universite´ , CNRS, I2M, UMR 7373, Marseille, France Abstract There is epidemiological evidence that patients with certain Central Nervous System (CNS) disorders have a lower than expected probability of developing some types of Cancer. We tested here the hypothesis that this inverse comorbidity is driven by molecular processes common to CNS disorders and Cancers, and that are deregulated in opposite directions. We conducted transcriptomic meta-analyses of three CNS disorders (Alzheimer’s disease, Parkinson’s disease and Schizophrenia) and three Cancer types (Lung, Prostate, Colorectal) previously described with inverse comorbidities. A significant overlap was observed between the genes upregulated in CNS disorders and downregulated in Cancers, as well as between the genes downregulated in CNS disorders and upregulated in Cancers. We also observed expression deregulations in opposite directions at the level of pathways. Our analysis points to specific genes and pathways, the upregulation of which could increase the incidence of CNS disorders and simultaneously lower the risk of developing Cancer, while the downregulation of another set of genes and pathways could contribute to a decrease in the incidence of CNS disorders while increasing the Cancer risk. These results reinforce the previously proposed involvement of the PIN1 gene, Wnt and P53 pathways, and reveal potential new candidates, in particular related with protein degradation processes. SCZ: schizophrenia AD: Alzheimer disease PD: Parkinson disease CRC: colorectal cancer PC: prostate cancer LC: lung cancer Citation: Iba´n˜ ez K, Boullosa C, Tabare´s-Seisdedos R, Baudot A, Valencia A (2014) Molecular Evidence for the Inverse Comorbidity between Central Nervous System Disorders and Cancers Detected by Transcriptomic Meta-analyses. PLoS Genet 10(2): e1004173. doi:10.1371/journal.pgen.1004173 Editor: Marshall S. Horwitz, University of Washington, United States of America Received September 16, 2013; Accepted December 30, 2013; Published February 20, 2014 Copyright: ! 2014 Iba´n˜ ez et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by a Fellowship from Obra Social la Caixa grant to KI (http://obrasocial.lacaixa.es/laCaixaFoundation/home_en.html), FPI grant BES-2008-006332 to CB and grant BIO2012 to AV Group. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: anais.baudot@univ-amu.fr (AB); avalencia@cnio.es (AV) . These authors contributed equally to this work. Introduction Epidemiological evidences point to a lower-than-expected probability of developing some types of Cancer in certain CNS Results and Discussion For each CNS disorder and Cancer type independently, we undertook meta-analyses from a large collection of microarray together with these external factors (for review, see [3–7]). In particular, we propose the deregulation in opposite directions of a common set of genes and pathways as an underlying cause of inverse comorbidities. To investigate the biological plausibility of this hypothesis, a basic initial step is to establish the existence of inverse gene expression deregulations (i.e., down- versus up-regulations) in CNS disorders and Cancers. Towards this objective, we have performed integrative meta-analyses of collections of gene expression data, publically available for AD, PD and SCZ, and Lung (LC), Colorectal (CRC) and Prostate (PC) Cancers. Clinical and epidemiological data previously reported inverse comorbidities for these complex disorders, according to population studies assessing the Cancer risks among patients with CNS disorders [8–17]. significant overlaps (Fisher’s exact test, corrected p-value (q-value), 0.05, see Methods) between the DEGs upregulated in CNS disorders and those downregulated in Cancers. Similarly, DEGs downregulated in CNS disorders overlapped significantly with DEGs upregulated in Cancers (Figure 1A). Significant overlaps between DEGs deregulated in opposite directions in CNS disorders and Cancers are still observed while setting more stringent cutoffs for the detection of DEGs (qvalues lower than 0.005, 0.0005, 0.00005 and 0.000005, Figure S1). A significant overlap between DEGs deregulated in the same direction was only identified in the case of CRC and PD upregulated genes (Figure 1A). A molecular interpretation of the inverse comorbidity between CNS disorders and Cancers could be that the downregulation of certain PLOS Genetics | www.plosgenetics.org 1 February 2014 | Volume 10 | Issue 2 | e1004173 Inverse Comorbidity among Cancer and CNS Disorders Comparing differentially expressed genes
  • 41. Mental and cancer diseases are really connected http://www.scbi.uma.es 33 (Figure 2, Figure S2, Table S3). The inverse relationship between the levels of expression deregulations of these pathways possibly suggests opposite roles in CNS disorders and Cancers. Figure 3). Hence, global regulations of cellular activity may account for a protective effect between inversely comorbid diseases. Figure 2. KEGG pathways significantly deregulated in Central Nervous System (CNS) disorders and Cancer types. KEGG pathways [24] significantly up- and downregulated in each disease were identified using the GSEA method [34] (q-value,0.05). The significant pathways were compared between the 6 diseases and combined in a network representation. Node pie charts are coloured according to the pathway status as Cancer upregulated (yellow), Cancer downregulated (blue), CNS disorder upregulated (green) and CNS disorder downregulated (red). The green/blue and yellow/red associations thus correspond to pathways deregulated in opposite directions in CNS disorders and Cancers. Pathway labels are coloured according to their classifications provided by KEGG [24], as: Metabolism (green), Genetic Information Processing (yellow), Cellular Process (pink), Environmental Information Processing (red) and Organismal Systems (dark red). All networks are available at bioinfo.cnio.es/people/cboullosa/ validation/cytoscape/Ibanezetal.zip, in cytoscape format (http://www.cytoscape.org/). doi:10.1371/journal.pgen.1004173.g002 PLOS Genetics | www.plosgenetics.org 4 February 2014 | Volume 10 | Issue 2 | e1004173 Typical cancer functions Typical mental disease functions
  • 42. Mental and cancer diseases are really connected 19 genes 74 genes cancer ↓↓ http://www.scbi.uma.es 33 (Figure 2, Figure S2, Table S3). The inverse relationship between the levels of expression deregulations of these pathways possibly suggests opposite roles in CNS disorders and Cancers. Figure 3). Hence, global regulations of cellular activity may account for a protective effect between inversely comorbid diseases. Figure 2. KEGG pathways significantly deregulated in Central Nervous System (CNS) disorders and Cancer types. KEGG pathways [24] significantly up- and downregulated in each disease were identified using the GSEA method [34] (q-value,0.05). The significant pathways were compared between the 6 diseases and combined in a network representation. Node pie charts are coloured according to the pathway status as Cancer upregulated (yellow), Cancer downregulated (blue), CNS disorder upregulated (green) and CNS disorder downregulated (red). The green/blue and yellow/red associations thus correspond to pathways deregulated in opposite directions in CNS disorders and Cancers. Pathway labels are coloured according to their classifications provided by KEGG [24], as: Metabolism (green), Genetic Information Processing (yellow), Cellular Process (pink), Environmental Information Processing (red) and Organismal Systems (dark red). All networks are available at bioinfo.cnio.es/people/cboullosa/ validation/cytoscape/Ibanezetal.zip, in cytoscape format (http://www.cytoscape.org/). doi:10.1371/journal.pgen.1004173.g002 PLOS Genetics | www.plosgenetics.org 4 February 2014 | Volume 10 | Issue 2 | e1004173 Typical cancer functions Typical mental disease functions ↑↑ cancer ↓↓ mental disease mental disease↑↑ Since 93 genes are inversely expressed in cancer and CNS disorders THEN I can explain the inverse correlation between both diseases
  • 43. After basic research, translational research is easy http://www.scbi.uma.es 34
  • 44. Higher vertebrates have conserved genomes Chimpanzee http://www.scbi.uma.es 35 The bonobo genome compared with the chimpanzee and human genomes Kay Prüfer et al. Nature 486, 527–531 (28 June 2012) The zebrafish reference genome sequence and its relationship to the human genome Kerstin Howe et al. Nature 496, 498–503 (25 April 2013) 70% of protein-coding human genes are related to genes found in the zebrafish 84% of genes known to be associated with human disease have a zebrafish counterpart
  • 45. Genome plasticity in bacteria http://www.scbi.uma.es 36 Estimating the size of the bacterial pan-genome Lapierre Gogarten Trends in Genetics 23(3), 2009, Pages 107–110 Pangenomics – an avenue to improved industrial starter cultures and probiotics Garrigues et al. Current Opinion in Biotechnology 2013, 24:187–191
  • 46. Minimum number of genes for a living organism http://www.scbi.uma.es 37 1354 genes Giovannoni et al., (2005) Science 309: 1242-1245 500 genes
  • 47. Minimum number of genes for a living organism 500 genes If I know the minimal gene number of an organism http://www.scbi.uma.es 37 1354 genes Giovannoni et al., (2005) Science 309: 1242-1245 THEN I can design artificial organisms for biotechnological purposes
  • 48. There aren’t new genes but duplicated genes http://www.scbi.uma.es 38 The number of gene families plateaus with genome size Figure 3.15 Because many genes are duplicated, the number of different gene families is much less than the total number of genes. The histogram compares the total number of genes with the number of distinct gene families. GENOMESIZEHASNOTHINGTODO WITHGENENUMBER VARIABILITYAMONGGENOMESARISES FROMANUMBEROFDIFFERENTSOURCES HIGHTHROUGHPUTTECHNOLOGIESOVERVIEW
  • 49. We are not able to predict which kind of organism is produced when having the genome sequence http://www.scbi.uma.es 39 ?
  • 50. We are not able to predict which kind of organism is produced when having the genome sequence A living being si more than the sum of its components http://www.scbi.uma.es 39 ?
  • 51. We can now relate facial shapes with genes * E-mail: mds17@psu.edu Introduction The craniofacial complex is initially modulated by precisely-timed embryonic gene expression and molecular interactions mediated through complex pathways [1]. As humans grow, hormones and biomechanical factors also affect many parts of the face [2,3]. The inability to systematically summarize facial variation has impeded the discovery of the determinants and correlates of face shape. In contrast to genomic technologies, systematic and comprehensive phenotyping has lagged. This is especially so in the context of multipartite traits such as the human face. In typical genome-wide association studies (GWAS) today phenotypes are summarized as univariate variables, which is inherently limiting for multivariate traits, which, by definition cannot be expressed with single variables. Current state-of-the-art PLOS Genetics | www.plosgenetics.org 1 March 2014 | Volume 10 | Issue 3 | e1004224 http://www.scbi.uma.es 40 genetic association studies for facial traits are limited in their description of facial morphology [4–7]. These analyses start from a sparse set of anatomical landmarks (these being defined as ‘‘a point of correspondence on an object that matches between and within populations’’), which overlooks salient features of facial shape. Subsequently, either a set of conventional morphometric mea-surements such as distances and angles are extracted, which Modeling 3D Facial Shape from DNA drastically oversimplify facial shape, or a set of principal components (PCs) are extracted using principal components analysis (PCA) on the shape-space obtained with superimposition techniques, where each PC is assumed to represent a distinct morphological trait. Here we describe a novel method that facilitates the compounding of all PCs into a single scalar variable customized to relevant independent variables including, sex, genomic ancestry, and genes. Our approach combines placing Peter Claes1, Denise K. Liberton2, Katleen Daniels1, Kerri Matthes Rosana2, Ellen E. Quillen2, Laurel N. Pearson2, Brian McEvoy3, Marc Bauchet2, Arslan A. Zaidi2, Wei Yao2, Hua Tang4, Gregory S. Barsh4,5, Devin M. Absher5, David A. Puts2, Jorge Rocha6,7, Sandra Beleza4,8, Rinaldo W. Pereira9, Gareth Baynam10,11,12, Paul Suetens1, Dirk Vandermeulen1, Jennifer K. Wagner13, James S. Boster14, Mark D. Shriver2* 1 Medical Image Computing, ESAT/PSI, Department of Electrical Engineering, KU Leuven, Medical Imaging Research Center, KU Leuven UZ Leuven, iMinds-KU Leuven Future Health Department, Leuven, Belgium, 2 Department of Anthropology, Penn State University, University Park, Pennsylvania, United States of America, 3 Smurfit Institute of Genetics, Dublin, Ireland, 4 Department of Genetics, Stanford University, Palo Alto, California, United States of America, 5 HudsonAlpha Institute for Biotechnology, Huntsville, Alabama, United States of America, 6 CIBIO: Centro de Investigac¸a˜o em Biodiversidade e Recursos Gene´ticos, Universidade do Porto, Porto, Portugal, 7 Departamento de Biologia, Faculdade de Cieˆncias, Universidade do Porto, Porto, Portugal, 8 IPATIMUP: Instituto de Patologia e Imunologia Molecular da Universidade do Porto, Porto, Portugal, 9 Programa de Po´ s-Graduac¸a˜o em Cieˆncias Genoˆ micas e Biotecnologia, Universidade Cato´ lica de Brası´lia, Brasilia, Brasil, 10 School of Paediatrics and Child Health, University of Western Australia, Perth, Australia, 11 Institute for Immunology and Infectious Diseases, Murdoch University, Perth, Australia, 12 Genetic Services of Western Australia, King Edward Memorial Hospital, Perth, Australia, 13 Center for the Integration of Genetic Healthcare Technologies, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America, 14 Department of Anthropology, University of Connecticut, Storrs, Connecticut, United States of America Abstract Human facial diversity is substantial, complex, and largely scientifically unexplained. We used spatially dense quasi-landmarks to measure face shape in population samples with mixed West African and European ancestry from three locations (United States, Brazil, and Cape Verde). Using bootstrapped response-based imputation modeling (BRIM), we uncover the relationships between facial variation and the effects of sex, genomic ancestry, and a subset of craniofacial candidate genes. The facial effects of these variables are summarized as response-based imputed predictor (RIP) variables, which are validated using self-reported sex, genomic ancestry, and observer-based facial ratings (femininity and proportional ancestry) and judgments (sex and population group). By jointly modeling sex, genomic ancestry, and genotype, the independent effects of particular alleles on facial features can be uncovered. Results on a set of 20 genes showing significant effects on facial features provide support for this approach as a novel means to identify genes affecting normal-range facial features and for approximating the appearance of a face from genetic markers. Citation: Claes P, Liberton DK, Daniels K, Rosana KM, Quillen EE, et al. (2014) Modeling 3D Facial Shape from DNA. PLoS Genet 10(3): e1004224. doi:10.1371/ journal.pgen.1004224 Editor: Daniela Luquetti, Seattle Children’s Research Institute, United States of America Received September 12, 2013; Accepted January 22, 2014; Published March 20, 2014 Copyright: ! 2014 Claes et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This investigation was supported by grants to MDS from Science Foundation of Ireland Walton Fellowship (04.W4/B643); to MDS and DAP from the National Institute Justice (2008-DN-BX-K125); to JKW from the NIH/National Human Genome Research Institute (K99HG006446); to DKL from the National Science Foundation (BCS-0851815) and from the Wenner Gren Foundation (Fieldwork Grant 7967). PC is partly supported by the Flemish Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT Vlaanderen), the Research Program of the Fund for Scientific Research - Flanders (Belgium) (FWO), the Research Fund KU Leuven and SB was supported by the Portuguese Institution ‘‘Fundac¸a˜o para a Cieˆncia e a Tecnologia’’ [FCT; PTDC/BIABDE/64044/2006 (project) and SFRH/BPD/21887/2005 (post-doc grant)] and by a Dean’s Postdoctoral Fellowship at Stanford University. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. * E-mail: mds17@psu.edu Figure 4. Relationships between the ancestry and sex RIP variables and their initial predictor variables. (A) RIP-A with genomic ancestry; genomic ancestry is calculated using the core panel of 68 AIMs and RIP-A is calculated using this ancestry estimate on the set of three populations combined (N = 592). Populations are indicated as shown in the legend with United States participants shown with black circles, Brazilians with red circles, and Cape Verdeans with blue circles. (B) Histograms of RIP-S by self-reported sex. doi:10.1371/journal.pgen.1004224.g004
  • 52. We have found the treasure coffer, but… http://www.scbi.uma.es http://www.slideshare.net/MGonzaloClaros 41
  • 53. We have found the treasure coffer, but… http://www.scbi.uma.es http://www.slideshare.net/MGonzaloClaros 41