Bioinformatics and the logic of life

Bioinformatics to reveal the logic
of life
M. Gonzalo Claros Díaz
Dpto Biología Molecular y Bioquímica
Plataforma Andaluza de Bioinformática
@MGClaros
1
Centro de Bioinnovación
http://about.me/mgclaros/

http://www.scbi.uma.es
The meaning/logic of life
2

There are many reflections about life
3
Genetics
Philosophy
Religion
Physics
And many
more

A living being for some scientists
4
The cell is a kind
of black box

Molecular biology provides some logic…
5
How to select the few
combinations having some sense?

A hierarchical logic…
6
the way back cannot be predicted

In fact, a complex logic plenty of interactions
7

Metabolism offers another source of logic
8

Other sciences were also interested in life logic
9

Bioinformatics = integration
10
http://bioinformatics.biol.ntnu.edu.tw/sher/Teaching.html

Bioinformatics receives and gives new data and insights
11
Biology
The living being is the
result of all observations
and cannot be inferred
Computer
science Statistics
from biassed
observations

A living being for some scientists
12
The cell is a kind
of black box

A living being for a bioinformatician
13
Life ontology

So, we begin to understand
14
Other
scientists
Bioinformatician
Biotechnologist

Bioinformatics emerged with data accumulation
15

Regarding data, informatics is in the rear of biology
16

Therefore, biology and informatics are interdependent
http://www.genomicglossaries.com/presentation/SLAgenomics.asp 17

Without mobility issues
18

Some logic in living beings based on
bioinformatics
19

Bioinformatics integration in alcohol induced disorders
Through integration and modeling, these studies would allow us to better exploit the complexity
of genomic and functional genomic data and to extract their biological and clinical significance
http://pubs.niaaa.nih.gov/publications/arh311/5-11.htm 20

Drug discovery was expensive
Classic approach
21
Experimental drugs were
chemically synthesized and
then tested in animals

Drug discovery was expensive
Classic approach Bioinformatics approach
21
Experimental drugs were
chemically synthesized and
then tested in animals
Ligand
database
Only candidate drugs are
synthesized. A cost-effective
approach

Nobel of chemistry in 2013
Bioquímico
Químico teórico Biofísico Bioquímico
22
Por el desarrollo de modelos
computacionales para conocer
y predecir procesos químicos
http://blogs.plos.org/biologue/2013/10/18/the-significance-of-the-2013-nobel-prize-in-chemistry-and-the-challenges-ahead/

Nobel of chemistry in 2013
This Nobel Prize is the first given to work in
computational biology, indicating that the field has
matured and is on a par with experimental biology
Bioquímico
Químico teórico Biofísico Bioquímico
22
Por el desarrollo de modelos
computacionales para conocer
y predecir procesos químicos
The blog of PLOS Computational Biology
http://blogs.plos.org/biologue/2013/10/18/the-significance-of-the-2013-nobel-prize-in-chemistry-and-the-challenges-ahead/

A cell was full of molecular cascades
23
Divergent cascades Convergent cascades

Then, a cell was a subway map
24
Subway map designed by Claudia Bentley.
Web design by Nick Allin.
Edited by Cath Brooksbank and Sandra Clark.
© 2002 Nature Publishing Group.
http://www.nature.com/nrc/poster/subpathways/index.html

Finally, a cell is a network
25
Cell network complexity increases with whole organism
complexity. Key nodes revealed key functions

Human transcription factor network topology C. Rodriguez-Caso et al.
Transcription factor network explains some cancers
Abbreviations
ER, Erdo¨ s-Re´ nyi; HTFN, human transcription factor network; SF, scale free; SW, small world; TF, transcription factor.
allow the formation of supramolecular activator or
inhibitory complexes, depending on their components
and possible combinations.
Transcription factors (TFs) are an essential subset of
interacting proteins responsible for the control of gene
expression. They interact with DNA regions and tend
to form transcriptional regulatory complexes. Thus,
the final effect of one of these complexes is determined
by its TF composition.
The number of TFs varies among organisms,
although it appears to be linked to the organism’s
complexity. Around 200–300 TFs are predicted for
Escherichia coli [18] and Saccharomyces [19,20]. By
contrast, comparative analysis in multicellular organ-isms
FEBS Journal 272 (2005) 6423–6434 ª 2005 The Authors Journal compilation ª 2005 FEBS 6423
Topology, tinkering and evolution of the human
transcription factor network
Rodriguez-Caso1,2, Miguel A. Medina2 and Ricard V. Sole´ 1,3
C. Rodriguez-Caso et al. Human transcription factor network topology
Complex Systems Laboratory, Universitat Pompeu Fabra, Barcelona, Spain
Department of Molecular Biology and Biochemistry, Faculty of Sciences, Universidad de Ma´laga, Spain
Table 2. Description and functionality of transcriptions factor hubs. Transcription factor (TF), degree (k), betweenness centrality (b).
TF Description Associate disease k b· 103
TBP Basal transcription machinery initiator Spinocerebellar ataxia [40] 27 17.3
p53 Tumor suppressor protein Proliferative disease [68] 23 18.5
P300 Coactivator. Histone acetyltransferase May play a role in epithelial cancer [69] 18 20.2
RXR-a Retinoid X-a receptor Hepatocellular carcinoma [70] 18 8
pRB retinoblastoma suppressor protein.
shows that the predicted number of TFs reaches
600–820 in C. elegans and D. melanogaster [20,21], and
1500–1800 in Arabidopsis (1200 cloned sequences)
[20–22]. For humans, around 1500 TFs have been
documented [21] and it is estimated that there are
2000–3000 [21,23]. Such an increase in the number of
TFs is associated with higher control of gene regula-tion
or via control of TF expression, less connected factors
may also be relevant to cell survival.
[24]. Interestingly, such an increase is based on
Functional and structural patterns from topology
In order to reveal the mechanisms that shape the struc-ture
the use of the same structural types of proteins.
Human transcription factors are predominantly Zn fin-gers,
followed by homeobox and basic helix–loop–helix
[21]. Phylogenetic studies have shown that the amplifi-cation
and shuffling of protein domains determine the
growth of certain transcription factor families [25–28].
Fig. 1. Human transcription factor network built from data extracted
from the TRANSFAC 8.2 database. Numbered black filled nodes
are the highest connected transcription factors. 1, TATA-binding
protein (TBP); 2, p53; 3, p300; 4, retinoid X receptor a (RXRa); 5,
retinoblastoma protein (pRB); 6, nuclear factor NFjB p65 subunit
(RelA); 7, c-jun; 8, c-myc; 9, c-fos.
26
filtering according to criteria given in Experimental
Carlos Rodriguez-Caso1,2, Miguel A. Medina2 and Ricard V. Sole´ 1,3
1 ICREA-Complex Systems Laboratory, Universitat Pompeu Fabra, Barcelona, Spain
2 Department of Molecular Biology and Biochemistry, Faculty of Sciences, Universidad de Ma´laga, Spain
3 Santa Fe Institute, Santa Fe, New Mexico, USA
Living cells are composed of a large number of differ-ent
molecules interacting with each other to yield com-plex
spatial and temporal patterns. Unfortunately, this
reality is seldom captured by traditional and molecular
biology approaches. A shift from molecular to modular
biology seems unavoidable [1] as biological systems are
defined by complex networks of interacting compo-nents.
Early topological studies of cellular networks
revealed that genomic, proteomic and metabolic maps
share characteristic features with other real-world
networks [8–12]. Protein networks, also called inter-actomes,
were studied thanks to a massive two-hybrid
system screening in unicellular Saccharomyces cerevisiae
[9] and, more recently, in Drosophila melanogaster [13]
Keywords
human; molecular evolution; protein
interaction; tinkering; transcription factor
network
Correspondence
Ricard V. Sole´ , ICREA - Complex System
Laboratory, Universitat Pompeu Fabra,
Dr Aiguader 80, 08003 Barcelona, Spain
Fax: +34 93 221 3237
Tel: +34 93 542 2821
E-mail: ricard.sole@upf.edu
(Received 5 August 2005, revised 25
October 2005, accepted 31 October 2005)
doi:10.1111/j.1742-4658.2005.05041.x
Patterns of protein interactions are organized around complex heterogene-ous
networks. Their architecture has been suggested to be of relevance in
understanding the interactome and its functional organization, which per-vades
cellular robustness. Transcription factors are particularly relevant in
this context, given their central role in gene regulation. Here we present the
first topological study of the human protein–protein interacting transcrip-tion
factor network built using the TRANSFAC database. We show that
the network exhibits scale-free and small-world properties with a hierarchi-cal
and modular structure, which is built around a small number of key
proteins. Most of these proteins are associated with proliferative diseases
and are typically not linked to each other, thus reducing the propagation
of failures through compartmentalization. Network modularity is consistent
with common structural and functional features and the features are gener-ated
by two distinct evolutionary strategies: amplification and shuffling of
interacting domains through tinkering and acquisition of specific interact-ing
regions. The function of the regulatory complexes may have played an
active role in choosing one of them.
Fe Institute, Santa Fe, New Mexico, USA
cells are composed of a large number of differ-ent
is seldom captured by traditional and molecular
approaches. A shift from molecular to modular
seems unavoidable [1] as biological systems are
by complex networks of interacting compo-nents.
Keywords
molecular evolution; protein
Correspondence
Sole´ , ICREA - Complex System
Aiguader 80, 08003 Barcelona, Spain
93 221 3237
93 542 2821
ricard.sole@upf.edu
Received 5 August 2005, revised 25
2005, accepted 31 October 2005)
10.1111/j.1742-4658.2005.05041.x
of HTFN, we studied its topological modularity
in relation to the function and structure of TFs from
available information. From a structural point of view,
the overabundance of self-interactions is associated
with a majority group of 55% of basic helix–loop–
helix (bHLH) and leucine zippers (bZip), 17.5% of Zn
fingers and 22.5% corresponding to a more hetero-geneous
a complex, by varying their function and affinity to
DNA. This is the case of the bHLH–bZip proto-onco-gen
c-myc [44], or the Zn finger retinoid X receptor
RXR [45].
From a topological viewpoint, connections by self-interacting
domains would imply high clustering and
modularity, because all these proteins share the same
rules and they have the potential to give a highly inter-connected
subgraph (i.e. a module). According to this,
the high clustering of HTFN (see Fig. 1) could be
explained as a by-product of the overabundance of
self-interacting domains.
We wondered whether the HTFN modular architec-ture
Tumour suppressor protein
Proliferative disease Bladder cancer.
Osteosarcoma [71]
15 27.1
RelA NF-jB pathway Hepatocyte apoptosis and foetal death [72] 14 6.6
c-jun AP-1 complex (activator). Proto-oncogen Proliferative disease [73] 14 4.1
c-myc Activator. Proto-oncogen Proliferative disease [74] 13 10.5
c-fos AP-1 complex (activator). Proto-oncogen Proliferative disease [75] 12 2
2
1
4
5
7
6
9

Abbreviations
26
Keywords
network
Correspondence
Fax: +34 93 221 3237
Tel: +34 93 542 2821
doi:10.1111/j.1742-4658.2005.05041.x
Keywords
Correspondence
93 221 3237
93 542 2821
ricard.sole@upf.edu
10.1111/j.1742-4658.2005.05041.x
RXR [45].
Osteosarcoma [71]
15 27.1
2
1
4
5
7
6
9
At least 9 transcription factors drive to cancer if their
function is affected

Abbreviations
If I know the gene network of a process
THEN
I can predict which genes are really essential
26
Keywords
network
Correspondence
Fax: +34 93 221 3237
Tel: +34 93 542 2821
doi:10.1111/j.1742-4658.2005.05041.x
Keywords
Correspondence
93 221 3237
93 542 2821
ricard.sole@upf.edu
10.1111/j.1742-4658.2005.05041.x
RXR [45].
Osteosarcoma [71]
15 27.1
2
1
4
5
7
6
9
At least 9 transcription factors drive to cancer if their
function is affected

Biomarkers can be obtained from the observation of
bioinformatics networks
27
Breast cancer

Gene R.M. Luque-Baena signatures et al. / Journal of Biomedical Informatics to 49 (2014) cancer 32–44
diagnosis
Table 5
Performance comparison among the ‘‘Filter + GA + Pathway’’ combined strategy and three well-known filtering methods (Cons, IG and ReliefF). ACC and number of genes
mean ± std) are reported for LDA and SVM classifiers on the three analyzed datasets.
Strategy Leukemia
28
Journal of Biomedical Informatics 49 (2014) 32–44
Contents lists available at ScienceDirect
Journal of Biomedical Informatics
journal homepage: www.elsevier.com/locate/yjbin
Robust gene signatures from microarray data using genetic algorithms
enriched with biological pathway keywords
R.M. Luque-Baena a,⇑, D. Urda a,b, M. Gonzalo Claros c, L. Franco a,b, J.M. Jerez a,b
a Departmento de Lenguajes y Ciencias de la Computación, University of Málaga, Bulevar Louis Pasteur, 35, 29071 Málaga, Spain
b Instituto de Investigación Biomédica de Málaga (IBIMA), Málaga, Spain
c Supercomputing and Bioinformatics Centre, University of Málaga, C/ Severo Ochoa, 34, 29590 Málaga, Spain
a r t i c l e i n f o
Article history:
Received 24 July 2013
Accepted 16 January 2014
Available online 27 January 2014
Keywords:
DNA analysis
Evolutionary algorithms
Biological enrichment
Feature selection
a b s t r a c t
Genetic algorithms are widely used in the estimation of expression profiles from microarrays data. How-ever,
these techniques are unable to produce stable and robust solutions suitable to use in clinical and bio-medical
studies. This paper presents a novel two-stage evolutionary strategy for gene feature selection
combining the genetic algorithm with biological information extracted from the KEGG database. A com-parative
study is carried out over public data from three different types of cancer (leukemia, lung cancer
and prostate cancer). Even though the analyses only use features having KEGG information, the results
demonstrate that this two-stage evolutionary strategy increased the consistency, robustness and accuracy
of a blind discrimination among relapsed and healthy individuals. Therefore, this approach could facilitate
the definition of gene signatures for the clinical prognosis and diagnostic of cancer diseases in a near
future. Additionally, it could also be used for biological knowledge discovery about the studied disease.
! 2014 Elsevier Inc. All rights reserved.
1. Introduction
domain of DNA microarrays. Genetic algorithms (GAs) [13–18],
as a particular case of evolutionary models, use classification tech-niques
LDA SVM
ACC #Genes ACC #Genes
Filter + GA + Pathway 05340 97.13 ± 1.16 31.83 ± 1.86 93.87 ± 2.02 30.82 ± 1.62
Cons 85.85 ± 8.55 1.84 ± 0.51 88.24 ± 5.95 1.84 ± 0.51
IG 93.13 ± 4.40 9 ± 0 93.36 ± 4.33 9 ± 0
ReliefF 93.31 ± 4.37 9 ± 0 90.48 ± 5.15 9 ± 0
Lung
Cons 94.08 ± 3.36 1.84 ± 0.42 94.57 ± 2.55 1.84 ± 0.42
IG 98.68 ± 1.51 22 ± 0 98.88 ± 1.39 22 ± 0
ReliefF 97.89 ± 1.81 22 ± 0 98.47 ± 1.43 22 ± 0
Prostate
Cons 81.51 ± 7.57 3.20 ± 0.67 82.49 ± 6.72 3.20 ± 0.67
IG 91.66 ± 4.07 12 ± 0 85.86 ± 4.86 12 ± 0
ReliefF 90.22 ± 4.53 12 ± 0 88.50 ± 5.17 12 ± 0

Gene R.M. Luque-Baena signatures et al. / Journal of Biomedical Informatics to 49 (2014) cancer 32–44
diagnosis
Table 5
Performance comparison among the ‘‘Filter + GA + Pathway’’ combined strategy and three well-known filtering methods (Cons, IG and ReliefF). ACC and number of genes
mean ± std) are reported for LDA and SVM classifiers on the three analyzed datasets.
Strategy Leukemia
If I have determined a gene signature
28
Contents lists available at ScienceDirect
Journal of Biomedical Informatics
journal homepage: www.elsevier.com/locate/yjbin
Robust gene signatures from microarray data using genetic algorithms
enriched with biological pathway keywords
R.M. Luque-Baena a,⇑, D. Urda a,b, M. Gonzalo Claros c, L. Franco a,b, J.M. Jerez a,b
a Departmento de Lenguajes y Ciencias de la Computación, University of Málaga, Bulevar Louis Pasteur, 35, 29071 Málaga, Spain
b Instituto de Investigación Biomédica de Málaga (IBIMA), Málaga, Spain
c Supercomputing and Bioinformatics Centre, University of Málaga, C/ Severo Ochoa, 34, 29590 Málaga, Spain
a r t i c l e i n f o
Article history:
Received 24 July 2013
Accepted 16 January 2014
Available online 27 January 2014
Keywords:
DNA analysis
Evolutionary algorithms
Biological enrichment
Feature selection
a b s t r a c t
Genetic algorithms are widely used in the estimation of expression profiles from microarrays data. How-ever,
these techniques are unable to produce stable and robust solutions suitable to use in clinical and bio-medical
studies. This paper presents a novel two-stage evolutionary strategy for gene feature selection
combining the genetic algorithm with biological information extracted from the KEGG database. A com-parative
study is carried out over public data from three different types of cancer (leukemia, lung cancer
and prostate cancer). Even though the analyses only use features having KEGG information, the results
demonstrate that this two-stage evolutionary strategy increased the consistency, robustness and accuracy
of a blind discrimination among relapsed and healthy individuals. Therefore, this approach could facilitate
the definition of gene signatures for the clinical prognosis and diagnostic of cancer diseases in a near
future. Additionally, it could also be used for biological knowledge discovery about the studied disease.
! 2014 Elsevier Inc. All rights reserved.
1. Introduction
domain of DNA microarrays. Genetic algorithms (GAs) [13–18],
as a particular case of evolutionary models, use classification tech-niques
LDA SVM
ACC #Genes ACC #Genes
Filter + GA + Pathway 04640 96.38 ± 1.26 4.47 ± 0.71 Cons 85.85 ± 8.55 1.84 ± 0.51 THEN
94.86 ± 1.13 4.05 ± 0.80
88.24 ± 5.95 1.84 ± 0.51
IG 93.13 ± 4.40 9 ± 0 93.36 ± 4.33 9 ± 0
ReliefF 93.31 Lung
I ± 4.37 can know 9 ± which 0 is the 90.48 ± 5.15 desease
9 ± 0
Cons 94.08 ± 3.36 1.84 ± 0.42 94.57 ± 2.55 1.84 ± 0.42
IG 98.68 ± 1.51 22 ± 0 98.88 ± 1.39 22 ± 0
ReliefF 97.89 ± 1.81 22 ± 0 98.47 ± 1.43 22 ± 0
Prostate
Cons 81.51 ± 7.57 3.20 ± 0.67 82.49 ± 6.72 3.20 ± 0.67
IG 91.66 ± 4.07 12 ± 0 85.86 ± 4.86 12 ± 0
ReliefF 90.22 ± 4.53 12 ± 0 88.50 ± 5.17 12 ± 0

Cancer signatures to reveal prognosis
Competing Interests: The authors have declared that no competing interests exist.
* E-mail: jlozano@uma.es
. These authors contributed equally to this work.
Introduction
Breast cancer comprises a group of heterogeneous diseases that
can be classified based on both clinical and molecular features [1–
5]. Improvements in the early detection of primary tumors and the
development of novel targeted therapies, together with the
systematic use of adjuvant chemotherapy, has drastically reduced
mortality rates and increased disease-free survival (DFS) in breast
cancer. Still, about one third of patients undergoing breast tumor
excision will develop metastases, the major life-threatening event
which is strongly associated with poor outcome [6,7].
early recurrence in breast cancer. Hierarchical clustering of the 71 tumor samples based
lower expression levels of the 5-miRNA signature defines a distinct cluster 2b wich mainly includes
contrary, most patients with good prognosis (group A) had tumors with normal or higher-than
different cluster 1b (‘‘low risk’’).
The risk of relapse after tumor resection is not constant over
time. A detailed examination of large series of long-term follow-up
studies over the last two decades reveals a bimodal hazard function
with two peaks of early and late recurrence occurring at 1.5 and 5
Table 2). MiR-
RT-qPCR data
2). Next, we re-clustered
signature. As
B were clearly
discriminates tumors with an overall higher risk of early
recurrence.
The 5-miRNA signature
PLOS ONE | www.plosone.org 1 March 2014 | Volume 9 | Issue 3 | e91884
included most of the
A in cluster 1b
risk). Of note, the
group C (72.8%),
MiR-149 was the most significant miRNA downregulated in
group B, as determined by microarray hybridization and by RT-qPCR.
This miRNA has been described as a TS-miR that
regulates the expression of genes associated with cell cycle,
invasion or migration and its downregulation has been observed in
several tumor diseases, including gastric cancer and breast cancer
[70,77–81]. Down-regulation of miR-149 can occur epigenetical-
29
years, respectively, followed by a nearly flat plateau in which the
risk of relapse tends to zero [8–10]. A causal link between tumor
surgery and the bimodal pattern of recurrence has been proposed
by some investigators (i.e. an iatrogenic effect) [11]. According to
that model, surgical removal of the primary breast tumor would
accelerate the growth of dormant metastatic foci by altering the
balance between circulating pro- and anti-angiogenic factors
[9,11–14]. Such hypothesis is supported by the fact that the two
peaks of relapse are observed regardless other factors than surgery,
such as the axillary nodal status, the type of surgery or the
administration of adjuvant therapy. Although estrogen receptor
(ER)-negative tumors are commonly associated with a higher risk
of early relapse [15], the bimodal distribution pattern is observed
with independence of the hormone receptor status [16]. Other
studies also suggest that the dynamics of tumor relapse may be a
A microRNA Signature Associated with Early Recurrence
in Breast Cancer
Luis G. Pe´ rez-Rivas1., Jose´ M. Jerez2., Rosario Carmona3, Vanessa de Luque1, Luis Vicioso4,
M. Gonzalo Claros3,5, Enrique Viguera6, Bella Pajares1, Alfonso Sańchez1, Nuria Ribelles1,
Emilio Alba1, Jose´ Lozano1,5*
1 Laboratorio de Oncologıá Molecular, Servicio de Oncologıá Me´dica, Instituto de Biomedicina de Ma´laga (IBIMA), Hospital Universitario Virgen de la Victoria, Ma´laga,
Spain, 2 Departamento de Lenguajes y Ciencias de la Computacio´ n, Universidad de Ma´laga, Ma´laga, Spain, 3 Plataforma Andaluza de Bioinforma´ tica, Universidad de
Ma´laga, Ma´laga, Spain, 4 Servicio de Anatomıá Patolo´ gica, Instituto de Biomedicina de Ma´laga (IBIMA), Hospital Universitario Virgen de la Victoria, Ma´laga, Spain,
5 Departmento de Biologıá Molecular y Bioquı´mica, Universidad de Ma´laga, Ma´laga, Spain, 6 Departmento of Biologıá Celular, Gene´ tica y Fisiologıá Animal, Universidad de
Ma´laga, Ma´laga, Spain
signature specifically
Abstract
Recurrent breast cancer occurring after the initial treatment is associated with poor outcome. A bimodal relapse pattern
after surgery for primary tumor has been described with peaks of early and late recurrence occurring at about 2 and 5 years,
respectively. Although several clinical and pathological features have been used to discriminate between low- and high-risk
patients, the identification of molecular biomarkers with prognostic value remains an unmet need in the current
management of breast cancer. Using microarray-based technology, we have performed a microRNA expression analysis in
71 primary breast tumors from patients that either remained disease-free at 5 years post-surgery (group A) or developed
early (group B) or late (group C) recurrence. Unsupervised hierarchical clustering of microRNA expression data segregated
tumors in two groups, mainly corresponding to patients with early recurrence and those with no recurrence. Microarray
data analysis and RT-qPCR validation led to the identification of a set of 5 microRNAs (the 5-miRNA signature) differentially
expressed between these two groups: miR-149, miR-10a, miR-20b, miR-30a-3p and miR-342-5p. All five microRNAs were
down-regulated in tumors from patients with early recurrence. We show here that the 5-miRNA signature defines a high-risk
group of patients with shorter relapse-free survival and has predictive value to discriminate non-relapsing versus early-relapsing
patients (AUC = 0.993, p-value,0.05). Network analysis based on miRNA-target interactions curated by public
databases suggests that down-regulation of the 5-miRNA signature in the subset of early-relapsing tumors would result in
an overall increased proliferative and angiogenic capacity. In summary, we have identified a set of recurrence-related
microRNAs with potential prognostic value to identify patients who will likely develop metastasis early after primary breast
surgery.
Citation: Pe´rez-Rivas LG, Jerez JM, Carmona R, de Luque V, Vicioso L, et al. (2014) A microRNA Signature Associated with Early Recurrence in Breast Cancer. PLoS
ONE 9(3): e91884. doi:10.1371/journal.pone.0091884
Editor: Sonia Rocha, University of Dundee, United Kingdom
Received November 11, 2013; Accepted February 14, 2014; Published March 14, 2014
Copyright: ! 2014 Pe´rez-Rivas et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Funding: This work was supported by a grant from the Spanish Society of Medical Oncology (SEOM, to NR) and by grants from the Spanish Ministerio de
Economıá, (SAF2010-20203 to J.L and TIN2010-16556 to J.J) and from the Junta de Andalucıá (TIN-4026, to JJ). The funders had no role in study design, data
collection and analysis, decision to publish, or preparation of the manuscript.
patients with diferent RFS. A) Kaplan-Meier graph for the whole patient cohort included in
overall down-regulation of the 5-miRNA signature (i.e. those from cluster 2b in Fig. 2) were
RFS was calculated (red line). RFS was also calculated for the remaining patients in the cohort
that the 5-miRNA signature specifically discriminates tumors with an overall higher risk of early
post-recurrence survival [100], likely because it targets AKT1
mRNA [101].
In sum, the available bibliographic data suggests that down-regulation
of miR-149, miR-30a-3p, miR-20b, miR-10a and
miR342-5p in primary breast tumors could confer them enhanced
proliferative, angiogenic and invasive potentials.
Prognostic value of the 5-miRNA signature. The relation-ship
between expression of the 5-miRNA signature and RFS was
examined by a survival analysis. Figure 3A shows a Kaplan-Meier
graph for the whole series of patients included in the study. Due to
the intrinsic characteristics of the cohort, decreases in the RFS are
only observed in the intervals 0–24 and 50–60 months
(corresponding to groups B and C, respectively). We next grouped
the tumors according to their 5-miRNA signature status in two
different groups. One group included those tumors with all five
miRNAs simultaneously downregulated, (FC.2 and p,0.05) and
a second group included those tumors not having all five miRNAs
downregulated. A survival analysis was performed using clinical
data from the corresponding patients. As shown in Figure 3B, the
Kaplan-Meier graphs for the two groups demonstrate that the 5-
miRNA signature defines a ‘‘high risk’’ group of patients with a
shorter RFS (Peto-Peto test with p-value = 0.02, when comparing
Figure 4. Receiver operating characteristic curve (ROC) for
early breast cancer recurrence by the 5-miRNA signature
status. ROC curves generated using the prognosis information and
expression levels of the 5-miRNA signature can discriminate between
A miRNA Signature Predictive of Early Recurrence

Cancer signatures to reveal prognosis
Introduction
Breast cancer comprises a group of heterogeneous diseases that
can be classified based on both clinical and molecular features [1–
5]. Improvements in the early detection of primary tumors and the
development of novel targeted therapies, together with the
systematic use of adjuvant chemotherapy, has drastically reduced
mortality rates and increased disease-free survival (DFS) in breast
cancer. Still, about one third of patients undergoing breast tumor
excision will develop metastases, the major life-threatening event
which is strongly associated with poor outcome [6,7].
early recurrence in breast cancer. Hierarchical clustering of the 71 tumor samples based
lower expression levels of the 5-miRNA signature defines a distinct cluster 2b wich mainly includes
contrary, most patients with good prognosis (group A) had tumors with normal or higher-than
different cluster 1b (‘‘low risk’’).
The risk of relapse after tumor resection is not constant over
time. A detailed examination of large series of long-term follow-up
studies over the last two decades reveals a bimodal hazard function
with two peaks of early and late recurrence occurring at 1.5 and 5
Table 2). MiR-
RT-qPCR data
2). Next, we re-clustered
signature. As
B were clearly
discriminates tumors with an overall higher risk of early
recurrence.
The 5-miRNA signature
PLOS ONE | www.plosone.org 1 March 2014 | Volume 9 | Issue 3 | e91884
included most of the
A in cluster 1b
risk). Of note, the
group C (72.8%),
MiR-149 was the most significant miRNA downregulated in
group B, as determined by microarray hybridization and by RT-qPCR.
This miRNA has been described as a TS-miR that
regulates the expression of genes associated with cell cycle,
invasion or migration and its downregulation has been observed in
several tumor diseases, including gastric cancer and breast cancer
[70,77–81]. Down-regulation of miR-149 can occur epigenetical-
29
years, respectively, followed by a nearly flat plateau in which the
risk of relapse tends to zero [8–10]. A causal link between tumor
surgery and the bimodal pattern of recurrence has been proposed
by some investigators (i.e. an iatrogenic effect) [11]. According to
that model, surgical removal of the primary breast tumor would
accelerate the growth of dormant metastatic foci by altering the
balance between circulating pro- and anti-angiogenic factors
[9,11–14]. Such hypothesis is supported by the fact that the two
peaks of relapse are observed regardless other factors than surgery,
such as the axillary nodal status, the type of surgery or the
administration of adjuvant therapy. Although estrogen receptor
(ER)-negative tumors are commonly associated with a higher risk
of early relapse [15], the bimodal distribution pattern is observed
with independence of the hormone receptor status [16]. Other
studies also suggest that the dynamics of tumor relapse may be a
A microRNA Signature Associated with Early Recurrence
in Breast Cancer
Luis G. Pe´ rez-Rivas1., Jose´ M. Jerez2., Rosario Carmona3, Vanessa de Luque1, Luis Vicioso4,
M. Gonzalo Claros3,5, Enrique Viguera6, Bella Pajares1, Alfonso Sańchez1, Nuria Ribelles1,
Emilio Alba1, Jose´ Lozano1,5*
1 Laboratorio de Oncologıá Molecular, Servicio de Oncologıá Me´dica, Instituto de Biomedicina de Ma´laga (IBIMA), Hospital Universitario Virgen de la Victoria, Ma´laga,
Spain, 2 Departamento de Lenguajes y Ciencias de la Computacio´ n, Universidad de Ma´laga, Ma´laga, Spain, 3 Plataforma Andaluza de Bioinforma´ tica, Universidad de
Ma´laga, Ma´laga, Spain, 4 Servicio de Anatomıá Patolo´ gica, Instituto de Biomedicina de Ma´laga (IBIMA), Hospital Universitario Virgen de la Victoria, Ma´laga, Spain,
5 Departmento de Biologıá Molecular y Bioquı´mica, Universidad de Ma´laga, Ma´laga, Spain, 6 Departmento of Biologıá Celular, Gene´ tica y Fisiologıá Animal, Universidad de
Ma´laga, Ma´laga, Spain
signature specifically
Abstract
Recurrent breast cancer occurring after the initial treatment is associated with poor outcome. A bimodal relapse pattern
after surgery for primary tumor has been described with peaks of early and late recurrence occurring at about 2 and 5 years,
respectively. Although several clinical and pathological features have been used to discriminate between low- and high-risk
patients, the identification of molecular biomarkers with prognostic value remains an unmet need in the current
management of breast cancer. Using microarray-based technology, we have performed a microRNA expression analysis in
71 primary breast tumors from patients that either remained disease-free at 5 years post-surgery (group A) or developed
early (group B) or late (group C) recurrence. Unsupervised hierarchical clustering of microRNA expression data segregated
tumors in two groups, mainly corresponding to patients with early recurrence and those with no recurrence. Microarray
data analysis and RT-qPCR validation led to the identification of a set of 5 microRNAs (the 5-miRNA signature) differentially
expressed between these two groups: miR-149, miR-10a, miR-20b, miR-30a-3p and miR-342-5p. All five microRNAs were
down-regulated in tumors from patients with early recurrence. We show here that the 5-miRNA signature defines a high-risk
group of patients with shorter relapse-free survival and has predictive value to discriminate non-relapsing versus early-relapsing
patients (AUC = 0.993, p-value,0.05). Network analysis based on miRNA-target interactions curated by public
databases suggests that down-regulation of the 5-miRNA signature in the subset of early-relapsing tumors would result in
an overall increased proliferative and angiogenic capacity. In summary, we have identified a set of recurrence-related
microRNAs with potential prognostic value to identify patients who will likely develop metastasis early after primary breast
surgery.
Citation: Pe´rez-Rivas LG, Jerez JM, Carmona R, de Luque V, Vicioso L, et al. (2014) A microRNA Signature Associated with Early Recurrence in Breast Cancer. PLoS
ONE 9(3): e91884. doi:10.1371/journal.pone.0091884
Editor: Sonia Rocha, University of Dundee, United Kingdom
Received November 11, 2013; Accepted February 14, 2014; Published March 14, 2014
Copyright: ! 2014 Pe´rez-Rivas et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
Funding: This work was supported by a grant from the Spanish Society of Medical Oncology (SEOM, to NR) and by grants from the Spanish Ministerio de
Economıá, (SAF2010-20203 to J.L and TIN2010-16556 to J.J) and from the Junta de Andalucıá (TIN-4026, to JJ). The funders had no role in study design, data
collection and analysis, decision to publish, or preparation of the manuscript.
patients with diferent RFS. A) Kaplan-Meier graph for the whole patient cohort included in
overall down-regulation of the 5-miRNA signature (i.e. those from cluster 2b in Fig. 2) were
RFS was calculated (red line). RFS was also calculated for the remaining patients in the cohort
that the 5-miRNA signature specifically discriminates tumors with an overall higher risk of early
post-recurrence survival [100], likely because it targets AKT1
mRNA [101].
In sum, the available bibliographic data suggests that down-regulation
of miR-149, miR-30a-3p, miR-20b, miR-10a and
miR342-5p in primary breast tumors could confer them enhanced
proliferative, angiogenic and invasive potentials.
Prognostic value of the 5-miRNA signature. The relation-ship
between expression of the 5-miRNA signature and RFS was
examined by a survival analysis. Figure 3A shows a Kaplan-Meier
graph for the whole series of patients included in the study. Due to
the intrinsic characteristics of the cohort, decreases in the RFS are
only observed in the intervals 0–24 and 50–60 months
(corresponding to groups B and C, respectively). We next grouped
the tumors according to their 5-miRNA signature status in two
different groups. One group included those tumors with all five
miRNAs simultaneously downregulated, (FC.2 and p,0.05) and
a second group included those tumors not having all five miRNAs
downregulated. A survival analysis was performed using clinical
data from the corresponding patients. As shown in Figure 3B, the
Kaplan-Meier graphs for the two groups demonstrate that the 5-
miRNA signature defines a ‘‘high risk’’ group of patients with a
shorter RFS (Peto-Peto test with p-value = 0.02, when comparing
Figure 4. Receiver operating characteristic curve (ROC) for
early breast cancer recurrence by the 5-miRNA signature
status. ROC curves generated using the prognosis information and
expression levels of the 5-miRNA signature can discriminate between
If I know the which genes ARE expressed
THEN
I can know which output WILL be obtained

Characterization of complex variations in cancer
30
© 2014 Nature America, Inc. All rights reserved.
ANALYSIS
for structural variants of different sizes (Supplementary Table 2). For
the present comparison, we ran them as described in their companies’
corresponding publication or website.
We first observed that the calling of somatic SNVs was nearly opti-mal
and within the same range in Mutect and SMUFIN, with sensitivi-ties
of 97% and 92%, and specificities of 93% and 99%, respectively
(Table 1 and Supplementary Table 3). On the other hand, the calling
NATURE BIOTECHNOLOGY ADVANCE ONLINE PUBLICATION efficiency of somatic structural variants varied greatly between differ-ent
methods, revealing clear differences when compared to SMUFIN.
Some methods reached reasonable levels of sensitivity when the eval-uation
was restricted to the range of structural variants they were
designed to detect (Pindel and Delly), but these dropped drastically
when compared against the complete catalog of structural variations
in the tumor (Supplementary Table 4). By contrast, SMUFIN was
Tumor and normal genome sequencing
Construction of
breakpoint blocks
Definition and classification
of variants
Assigning reference
coordinates
Quaternary sequence tree
1 3 6
1 2 3 4 5 6
7 8 9
10 11 12
Read
nt
1
2
3
4
5
6
7
8
9
10
11
n
Single orientation
breakpoint
Double orientation
breakpoint
Quaternary
sequence tree
Overlapping
and complementary
reads from normal
genome
1
3
6
Construction of breakpoint blocks
Undefined breakpoint blocks
Reads in tumor-specific
branches
Comparison of normal and tumor reads
and identification of potential breakpoints
FASTQ file
Reads
Quality
filters
Tumor
Normal
Read
1
2
3
4
5
6
7
8
9
SNV
nt 1 2 3 4 5 6 7 8 9 1011................................. n = Read
length
Short
insertion
Large SV
10
11
12
Tumor and
normal reads
Unambiguous extension of
normal and mutated
tumor allele
Normal
alleles
Nonmutated
tumor allele
Mutated
tumor allele
Definition of small variants (n read size)
Definition of breakpoint and variant sequence for large SVs ( read size)
Breakpoint
100 nt Extension of the variant and normal sequences around the breakpoint 100 nt
SNVs
Tumor
Normal
Inversions
Deletions
Insertions
Small SVs Breakpoint of large SV
Reference genome
Mapping of normal sequences (BWA)
Independent mapping of normal sequences
flanking the breakpoint (BWA)
a
b
c
d
Tumor-specific reads with potential breakpoints

Characterization of complex variations in cancer
If I know the polymorphisms of a person
30
© 2014 Nature America, Inc. All rights reserved.
ANALYSIS
for structural variants of different sizes (Supplementary Table 2). For
the present comparison, we ran them as described in their companies’
corresponding publication or website.
We first observed that the calling of somatic SNVs was nearly opti-mal
and within the same range in Mutect and SMUFIN, with sensitivi-ties
of 97% and 92%, and specificities of 93% and 99%, respectively
(Table 1 and Supplementary Table 3). On the other hand, the calling
NATURE BIOTECHNOLOGY ADVANCE ONLINE PUBLICATION efficiency of somatic structural variants varied greatly between differ-ent
methods, revealing clear differences when compared to SMUFIN.
Some methods reached reasonable levels of sensitivity when the eval-uation
was restricted to the range of structural variants they were
designed to detect (Pindel and Delly), but these dropped drastically
when compared against the complete catalog of structural variations
in the tumor (Supplementary Table 4). By contrast, SMUFIN was
Tumor and normal genome sequencing
Construction of
breakpoint blocks
Definition and classification
of variants
Assigning reference
coordinates
Quaternary sequence tree
1 3 6
1 2 3 4 5 6
7 8 9
10 11 12
Read
nt
1
2
3
4
5
6
7
8
9
10
11
n
Single orientation
breakpoint
Double orientation
breakpoint
Quaternary
sequence tree
Overlapping
and complementary
reads from normal
genome
1
3
6
Construction of breakpoint blocks
Undefined breakpoint blocks
Reads in tumor-specific
branches
Comparison of normal and tumor reads
and identification of potential breakpoints
FASTQ file
Reads
Quality
filters
Tumor
Normal
Read
1
2
3
4
5
6
7
8
9
SNV
nt 1 2 3 4 5 6 7 8 9 1011................................. n = Read
length
Short
insertion
Large SV
10
11
12
Tumor and
normal reads
Unambiguous extension of
normal and mutated
tumor allele
Normal
alleles
Nonmutated
tumor allele
Mutated
tumor allele
Definition of small variants (n read size)
Definition of breakpoint and variant sequence for large SVs ( read size)
Breakpoint
100 nt Extension of the variant and normal sequences around the breakpoint 100 nt
SNVs
Tumor
Normal
Inversions
Deletions
Insertions
Small SVs Breakpoint of large SV
Reference genome
Mapping of normal sequences (BWA)
Independent mapping of normal sequences
flanking the breakpoint (BWA)
a
b
c
d
Tumor-specific reads with potential breakpoints
THEN
I can predict which disease WILL he suffer

Personalised medicine
31
A needle in a haystack WAS FOUND

Linking unrelated diseases
32
Alzheimer patients use to be free of cancer, and cancer
patients use to be free of mental diseases

Linking unrelated diseases
32
Alzheimer patients use to be free of cancer, and cancer
patients use to be free of mental diseases
Molecular Evidence for the Inverse Comorbidity between
Central Nervous System Disorders and Cancers Detected
by Transcriptomic Meta-analyses
Kristina Ibań˜ ez1., Ce´ sar Boullosa1., Rafael Tabare´ s-Seisdedos2, Anaı¨s Baudot3*, Alfonso Valencia1*
1 Structural Biology and Biocomputing Programme, Spanish National Cancer, Research Centre (CNIO), Madrid, Spain, 2 Department of Medicine, University of Valencia,
CIBERSAM, INCLIVA, Valencia, Spain, 3 Aix-Marseille Universite´ , CNRS, I2M, UMR 7373, Marseille, France
Abstract
There is epidemiological evidence that patients with certain Central Nervous System (CNS) disorders have a lower than
expected probability of developing some types of Cancer. We tested here the hypothesis that this inverse comorbidity is
driven by molecular processes common to CNS disorders and Cancers, and that are deregulated in opposite directions. We
conducted transcriptomic meta-analyses of three CNS disorders (Alzheimer’s disease, Parkinson’s disease and Schizophrenia)
and three Cancer types (Lung, Prostate, Colorectal) previously described with inverse comorbidities. A significant overlap was
observed between the genes upregulated in CNS disorders and downregulated in Cancers, as well as between the genes
downregulated in CNS disorders and upregulated in Cancers. We also observed expression deregulations in opposite
directions at the level of pathways. Our analysis points to specific genes and pathways, the upregulation of which could
increase the incidence of CNS disorders and simultaneously lower the risk of developing Cancer, while the downregulation
of another set of genes and pathways could contribute to a decrease in the incidence of CNS disorders while increasing the
Cancer risk. These results reinforce the previously proposed involvement of the PIN1 gene, Wnt and P53 pathways, and
reveal potential new candidates, in particular related with protein degradation processes.
SCZ: schizophrenia
AD: Alzheimer disease
PD: Parkinson disease
CRC: colorectal cancer
PC: prostate cancer
LC: lung cancer
Citation: Ibań˜ ez K, Boullosa C, Tabare´s-Seisdedos R, Baudot A, Valencia A (2014) Molecular Evidence for the Inverse Comorbidity between Central Nervous
System Disorders and Cancers Detected by Transcriptomic Meta-analyses. PLoS Genet 10(2): e1004173. doi:10.1371/journal.pgen.1004173
Editor: Marshall S. Horwitz, University of Washington, United States of America
Received September 16, 2013; Accepted December 30, 2013; Published February 20, 2014
Copyright: ! 2014 Ibań˜ ez et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
Funding: This work was supported by a Fellowship from Obra Social la Caixa grant to KI (http://obrasocial.lacaixa.es/laCaixaFoundation/home_en.html), FPI grant
BES-2008-006332 to CB and grant BIO2012 to AV Group. The funders had no role in study design, data collection and analysis, decision to publish, or preparation
of the manuscript.
* E-mail: anais.baudot@univ-amu.fr (AB); avalencia@cnio.es (AV)
Introduction
Epidemiological evidences point to a lower-than-expected
probability of developing some types of Cancer in certain CNS
Results and Discussion
For each CNS disorder and Cancer type independently, we
undertook meta-analyses from a large collection of microarray
together with these external factors (for review, see [3–7]). In
particular, we propose the deregulation in opposite directions of a
common set of genes and pathways as an underlying cause of
inverse comorbidities.
To investigate the biological plausibility of this hypothesis, a
basic initial step is to establish the existence of inverse gene
expression deregulations (i.e., down- versus up-regulations) in CNS
disorders and Cancers. Towards this objective, we have performed
integrative meta-analyses of collections of gene expression data,
publically available for AD, PD and SCZ, and Lung (LC),
Colorectal (CRC) and Prostate (PC) Cancers. Clinical and
epidemiological data previously reported inverse comorbidities for
these complex disorders, according to population studies assessing
the Cancer risks among patients with CNS disorders [8–17].
significant overlaps (Fisher’s exact test, corrected p-value (q-value),
0.05, see Methods) between the DEGs upregulated in
CNS disorders and those downregulated in Cancers. Similarly,
DEGs downregulated in CNS disorders overlapped significantly
with DEGs upregulated in Cancers (Figure 1A). Significant
overlaps between DEGs deregulated in opposite directions in CNS
disorders and Cancers are still observed while setting more
stringent cutoffs for the detection of DEGs (qvalues lower than
0.005, 0.0005, 0.00005 and 0.000005, Figure S1). A significant
overlap between DEGs deregulated in the same direction was only
identified in the case of CRC and PD upregulated genes
(Figure 1A).
A molecular interpretation of the inverse comorbidity between CNS
disorders and Cancers could be that the downregulation of certain
PLOS Genetics | www.plosgenetics.org 1 February 2014 | Volume 10 | Issue 2 | e1004173
Inverse Comorbidity among Cancer and CNS Disorders
Comparing
differentially
expressed genes

Mental and cancer diseases are really connected
33
(Figure 2, Figure S2, Table S3). The inverse relationship
between the levels of expression deregulations of these pathways
possibly suggests opposite roles in CNS disorders and Cancers.
Figure 3). Hence, global regulations of cellular activity may
account for a protective effect between inversely comorbid
diseases.
Figure 2. KEGG pathways significantly deregulated in Central Nervous System (CNS) disorders and Cancer types. KEGG pathways [24]
significantly up- and downregulated in each disease were identified using the GSEA method [34] (q-value,0.05). The significant pathways were
compared between the 6 diseases and combined in a network representation. Node pie charts are coloured according to the pathway status as
Cancer upregulated (yellow), Cancer downregulated (blue), CNS disorder upregulated (green) and CNS disorder downregulated (red). The green/blue
and yellow/red associations thus correspond to pathways deregulated in opposite directions in CNS disorders and Cancers. Pathway labels are
coloured according to their classifications provided by KEGG [24], as: Metabolism (green), Genetic Information Processing (yellow), Cellular Process
(pink), Environmental Information Processing (red) and Organismal Systems (dark red). All networks are available at bioinfo.cnio.es/people/cboullosa/
validation/cytoscape/Ibanezetal.zip, in cytoscape format (http://www.cytoscape.org/).
doi:10.1371/journal.pgen.1004173.g002
Typical cancer
functions
Typical mental
disease functions

Mental and cancer diseases are really connected
19 genes 74 genes
cancer ↓↓
33
(Figure 2, Figure S2, Table S3). The inverse relationship
between the levels of expression deregulations of these pathways
possibly suggests opposite roles in CNS disorders and Cancers.
Figure 3). Hence, global regulations of cellular activity may
account for a protective effect between inversely comorbid
diseases.
Figure 2. KEGG pathways significantly deregulated in Central Nervous System (CNS) disorders and Cancer types. KEGG pathways [24]
significantly up- and downregulated in each disease were identified using the GSEA method [34] (q-value,0.05). The significant pathways were
compared between the 6 diseases and combined in a network representation. Node pie charts are coloured according to the pathway status as
Cancer upregulated (yellow), Cancer downregulated (blue), CNS disorder upregulated (green) and CNS disorder downregulated (red). The green/blue
and yellow/red associations thus correspond to pathways deregulated in opposite directions in CNS disorders and Cancers. Pathway labels are
coloured according to their classifications provided by KEGG [24], as: Metabolism (green), Genetic Information Processing (yellow), Cellular Process
(pink), Environmental Information Processing (red) and Organismal Systems (dark red). All networks are available at bioinfo.cnio.es/people/cboullosa/
validation/cytoscape/Ibanezetal.zip, in cytoscape format (http://www.cytoscape.org/).
Typical cancer
functions
Typical mental
disease functions
↑↑ cancer
↓↓ mental disease
mental disease↑↑
Since 93 genes are inversely expressed in
cancer and CNS disorders
THEN
I can explain the inverse correlation between both diseases

After basic research, translational research is easy
34

Higher vertebrates have conserved genomes
Chimpanzee
35
The bonobo genome compared with the chimpanzee and human
genomes
Kay Prüfer et al.
Nature 486, 527–531 (28 June 2012)
The zebrafish reference genome sequence and its relationship to the
human genome
Kerstin Howe et al.
Nature 496, 498–503 (25 April 2013)
70% of protein-coding human genes are related to genes found in the zebrafish
84% of genes known to be associated with human disease have a zebrafish
counterpart

Genome plasticity in bacteria
36
Estimating the size of the bacterial pan-genome
Lapierre Gogarten
Trends in Genetics 23(3), 2009, Pages 107–110
Pangenomics – an avenue to improved industrial starter cultures and probiotics
Garrigues et al.
Current Opinion in Biotechnology 2013, 24:187–191

Minimum number of genes for a living organism
37
1354 genes
Giovannoni et al., (2005)
Science 309: 1242-1245
500 genes

Minimum number of genes for a living organism
500 genes If I know the minimal gene number of an organism
37
1354 genes
Giovannoni et al., (2005)
Science 309: 1242-1245
THEN
I can design artificial organisms for biotechnological purposes

There aren’t new genes but duplicated genes
38
The number of gene families plateaus with genome size
Figure 3.15 Because many genes are
duplicated, the number of different gene
families is much less than the total
number of genes. The histogram compares
the total number of genes with the number
of distinct gene families.
GENOMESIZEHASNOTHINGTODO
WITHGENENUMBER
VARIABILITYAMONGGENOMESARISES
FROMANUMBEROFDIFFERENTSOURCES
HIGHTHROUGHPUTTECHNOLOGIESOVERVIEW

We are not able to predict which kind of organism is
produced when having the genome sequence
39
?

We are not able to predict which kind of organism is
produced when having the genome sequence
A living being si more than the sum of its components
39
?

We can now relate facial shapes with genes
* E-mail: mds17@psu.edu
Introduction
The craniofacial complex is initially modulated by precisely-timed
embryonic gene expression and molecular interactions
mediated through complex pathways [1]. As humans grow,
hormones and biomechanical factors also affect many parts of
the face [2,3]. The inability to systematically summarize facial
variation has impeded the discovery of the determinants and
correlates of face shape. In contrast to genomic technologies,
systematic and comprehensive phenotyping has lagged. This is
especially so in the context of multipartite traits such as the human
face. In typical genome-wide association studies (GWAS) today
phenotypes are summarized as univariate variables, which is
inherently limiting for multivariate traits, which, by definition
cannot be expressed with single variables. Current state-of-the-art
PLOS Genetics | www.plosgenetics.org 1 March 2014 | Volume 10 | Issue 3 | e1004224
40
genetic association studies for facial traits are limited in their
description of facial morphology [4–7]. These analyses start from a
sparse set of anatomical landmarks (these being defined as ‘‘a point
of correspondence on an object that matches between and within
populations’’), which overlooks salient features of facial shape.
Subsequently, either a set of conventional morphometric mea-surements
such as distances and angles are extracted, which
Modeling 3D Facial Shape from DNA
drastically oversimplify facial shape, or a set of principal
components (PCs) are extracted using principal components
analysis (PCA) on the shape-space obtained with superimposition
techniques, where each PC is assumed to represent a distinct
morphological trait. Here we describe a novel method that
facilitates the compounding of all PCs into a single scalar variable
customized to relevant independent variables including, sex,
genomic ancestry, and genes. Our approach combines placing
Peter Claes1, Denise K. Liberton2, Katleen Daniels1, Kerri Matthes Rosana2, Ellen E. Quillen2,
Laurel N. Pearson2, Brian McEvoy3, Marc Bauchet2, Arslan A. Zaidi2, Wei Yao2, Hua Tang4,
Gregory S. Barsh4,5, Devin M. Absher5, David A. Puts2, Jorge Rocha6,7, Sandra Beleza4,8,
Rinaldo W. Pereira9, Gareth Baynam10,11,12, Paul Suetens1, Dirk Vandermeulen1, Jennifer K. Wagner13,
James S. Boster14, Mark D. Shriver2*
1 Medical Image Computing, ESAT/PSI, Department of Electrical Engineering, KU Leuven, Medical Imaging Research Center, KU Leuven UZ Leuven, iMinds-KU Leuven
Future Health Department, Leuven, Belgium, 2 Department of Anthropology, Penn State University, University Park, Pennsylvania, United States of America, 3 Smurfit
Institute of Genetics, Dublin, Ireland, 4 Department of Genetics, Stanford University, Palo Alto, California, United States of America, 5 HudsonAlpha Institute for
Biotechnology, Huntsville, Alabama, United States of America, 6 CIBIO: Centro de Investigaçaõ em Biodiversidade e Recursos Gene´ticos, Universidade do Porto, Porto,
Portugal, 7 Departamento de Biologia, Faculdade de Cieˆncias, Universidade do Porto, Porto, Portugal, 8 IPATIMUP: Instituto de Patologia e Imunologia Molecular da
Universidade do Porto, Porto, Portugal, 9 Programa de Po´ s-Graduaçaõ em Cieˆncias Genoˆ micas e Biotecnologia, Universidade Cato´ lica de Brası´lia, Brasilia, Brasil, 10 School
of Paediatrics and Child Health, University of Western Australia, Perth, Australia, 11 Institute for Immunology and Infectious Diseases, Murdoch University, Perth, Australia,
12 Genetic Services of Western Australia, King Edward Memorial Hospital, Perth, Australia, 13 Center for the Integration of Genetic Healthcare Technologies, University of
Pennsylvania, Philadelphia, Pennsylvania, United States of America, 14 Department of Anthropology, University of Connecticut, Storrs, Connecticut, United States of
America
Abstract
Human facial diversity is substantial, complex, and largely scientifically unexplained. We used spatially dense quasi-landmarks
to measure face shape in population samples with mixed West African and European ancestry from three
locations (United States, Brazil, and Cape Verde). Using bootstrapped response-based imputation modeling (BRIM), we
uncover the relationships between facial variation and the effects of sex, genomic ancestry, and a subset of craniofacial
candidate genes. The facial effects of these variables are summarized as response-based imputed predictor (RIP) variables,
which are validated using self-reported sex, genomic ancestry, and observer-based facial ratings (femininity and
proportional ancestry) and judgments (sex and population group). By jointly modeling sex, genomic ancestry, and
genotype, the independent effects of particular alleles on facial features can be uncovered. Results on a set of 20 genes
showing significant effects on facial features provide support for this approach as a novel means to identify genes affecting
normal-range facial features and for approximating the appearance of a face from genetic markers.
Citation: Claes P, Liberton DK, Daniels K, Rosana KM, Quillen EE, et al. (2014) Modeling 3D Facial Shape from DNA. PLoS Genet 10(3): e1004224. doi:10.1371/
journal.pgen.1004224
Editor: Daniela Luquetti, Seattle Children’s Research Institute, United States of America
Received September 12, 2013; Accepted January 22, 2014; Published March 20, 2014
Copyright: ! 2014 Claes et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits
Funding: This investigation was supported by grants to MDS from Science Foundation of Ireland Walton Fellowship (04.W4/B643); to MDS and DAP from the
National Institute Justice (2008-DN-BX-K125); to JKW from the NIH/National Human Genome Research Institute (K99HG006446); to DKL from the National Science
Foundation (BCS-0851815) and from the Wenner Gren Foundation (Fieldwork Grant 7967). PC is partly supported by the Flemish Institute for the Promotion of
Innovation by Science and Technology in Flanders (IWT Vlaanderen), the Research Program of the Fund for Scientific Research - Flanders (Belgium) (FWO), the
Research Fund KU Leuven and SB was supported by the Portuguese Institution ‘‘Fundaçaõ para a Cieˆncia e a Tecnologia’’ [FCT; PTDC/BIABDE/64044/2006
(project) and SFRH/BPD/21887/2005 (post-doc grant)] and by a Dean’s Postdoctoral Fellowship at Stanford University. The funders had no role in study design,
data collection and analysis, decision to publish, or preparation of the manuscript.
* E-mail: mds17@psu.edu
Figure 4. Relationships between the ancestry and sex RIP variables and their initial predictor variables. (A) RIP-A with genomic
ancestry; genomic ancestry is calculated using the core panel of 68 AIMs and RIP-A is calculated using this ancestry estimate on the set of three
populations combined (N = 592). Populations are indicated as shown in the legend with United States participants shown with black circles, Brazilians
with red circles, and Cape Verdeans with blue circles. (B) Histograms of RIP-S by self-reported sex.

We have found the treasure coffer, but…
http://www.slideshare.net/MGonzaloClaros 41

Bioinformatics and the logic of life

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Bioinformatics and the logic of life

Semelhante a Bioinformatics and the logic of life (20)

Mais de M. Gonzalo Claros

Mais de M. Gonzalo Claros (20)

Último

Último (20)

Bioinformatics and the logic of life