This is the presentation of my PhD thesis defence. It describes two applications of network theory to improve the methods to understand genetic adaptation in the human genome.
Similar to Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to human population genetics: from pathways to genotype networks (20)
Call Girls Bareilly Just Call 9907093804 Top Class Call Girl Service Available
Thesis defence of Dall'Olio Giovanni Marco. Applications of network theory to human population genetics: from pathways to genotype networks
1. Applications of network theory to human
population genetics:
from pathways to genotype networks
Giovanni Marco Dall'Olio
Pompeu Fabra University, Barcelona
Advisors: Jaume Bertranpetit
and Hafid Laayouni
2. Acknowledgments
●
I would like to thank:
–
My PhD supervisors, Jaume Bertranpetit and Hafid
Laayouni
–
My committee: Dr. Mauro Santos, Dr. Ricard Solé,
Prof. Guido Barbujani, Dr. Ferran Casals, Dra.
Yolanda Espinosa
–
The Evolutionary Systems Biology group at UPF
–
The Institut of Biologia Evolutiva
2
3. Topics
●
Context and motivations
●
My research:
–
–
Pathway approach on the N-Glycosylation pathway
–
The Genotype Network Approach
–
●
Annotating the N-Glycosylation pathway
The Human Selection Browser and Biostar
Conclusions
3
4. Context of the thesis
●
●
The first anatomically modern humans
appeared about 200,000 years ago
How can we understand the signals of genetic
adaptation in our genome, since then?
4
9. Topics
●
Context and motivations
●
My research:
–
–
Pathway approach on the N-Glycosylation pathway
–
The Genotype Network Approach
–
●
Annotating the N-Glycosylation pathway
The Human Selection Browser and Biostar
Conclusions
9
10. The Pathway approach
●
●
Genes are organized
in pathways
Any eventual selection
constraint will be
distributed among all
the genes of a
pathway
10
11. Distribution of Selection forces
in a pathway
●
Some positions of the
pathway will be more
likely to have stronger
signals of selection
11
12. Pathway Approach - outline
●
●
●
Build a Network
representation of a
pathway
Execute a test for
positive selection on
each gene
Determine how the
signals of selection
are distributed on the
network
12
13. Pathway approach on the
N-Glycosylation pathway
●
●
Asparagine
N-Glycosylation is a
metabolic pathway for
a type of protein
modification
The structure of this
pathway is easy to
represent as a
network
13
14. N-glycosylation - upstream part
●
●
Produces a single sugar called “N-Glycan precursor”
This sugar is required for the proper folding of most
membrane proteins
14
Adapted from Stanley, P., Schachter, H., & Taniguchi, N. (2009).
N-Glycans. Essentials of Glycobiology.
15. N-Glycosylation and protein folding
●
The product of the upstream part of N-glycosylation
is used as a signal to distinguish folded and unfolded
proteins
Folded protein
Un-Folded protein
15
16. N-glycosylation - downstream part
●
●
Complex pathway
composed by
thousands of reactions
Produces multiple
glycans, important for
cell-to-cell interactions
16
Hossler, P., Mulukutla, B. C., & Hu, W.-S. (2007). Systems analysis of
N-glycan processing in mammalian cells.
PloS one, 2(1), e713. doi:10.1371/journal.pone.0000713
17. Glycans on the cell surface
●
●
The surface of a cell is similar to a forest of
glycosylated proteins
Each organism and cell has a specific repertoire
of glycans
17
A. Doeer, Glycoproteomics. Nature Methods, 2011. doi:10.1038/nmeth.1821
22. Erroneous annotation in String
●
There are two genes
with the symbol ALG2:
–
–
●
ALG2 (Asparagine
Linked Glycosylation 2)
ALG-2 (Apoptosis
Linked Gene – 2)
In String, these two were
confused
22
23. Ambigous interpretation of the term
N-Glycosylation in GO
N-Glycosylated pathway
Merged
N-Glycosylated protein
23
27. Methods used
●
●
The FST index → measure of population
differentiation
The iHS test → identification of signals of
recent positive selection
27
28. FST – Population differentiation
●
●
FST is a measure of
population
differentiation
If the FST between two
population is 1, it
means that the two
populations are fixed
for different alleles
28
29. Signatures of population differentiation
in the N-Glycosylation pathway
FST signals are concentrated
in the downstream part, and
in the substrates biosynthesis
29
30. Population Differentiation
and network position
●
●
Node degree correlates
with the distribution of
FST signals
Genes with high FST are
generally more
connected
30
31. IHS and Long range haplotypes
●
●
A selective sweep may
cause the appearance of
long homozygous
haplotypes at a high
frequency
Example: a long
homozygous haplotype
present in the LCT gene
in North-European
populations
Vitti et al, Trends in genetics, 2012
31
32. IHS and Long range haplotypes:
iHS: Compares
the Extended
Haplotype
Homozygosity
decay (EHH
decay) between
ancestral and
derived allele
Voight et al., PLoS Genetics 2006
32
33. Signatures of selection in the
N-Glycosylation pathway
No difference in the distribution of
iHS signals between upstream
and downstream
33
34. Signatures of selection in the
N-Glycosylation pathway
GCS1: redirects to
protein folding
quality control
MGAT3:
redirects to
Hybrid Glycans
MAN2A1: redirects
to Complex Glycans
34
35. Pathway approach on N-Glycosylation
●
There is a difference in the patterns of population differentiation between the
two parts of the N-Glycosylation pathway
●
Signals of positive selection are more likely on key genes
●
One of the few works applying the pathway approach on human genetics
35
36. Topics
●
Context and motivations
●
My research:
–
–
Pathway approach on the N-Glycosylation pathway
–
The Genotype Network Approach
–
●
Annotating the N-Glycosylation pathway
The Human Selection Browser and Biostar
Conclusions
36
37. The Genotype Network approach
●
Genotype Networks
have been used to
study the “innovability”
and evolvability of a
genetic system
37
38. The Genotype Network approach
●
●
Genotype Networks
have been used to
study the “innovability”
and evolvability of a
genetic system
Never applied to
population genetics
data, because they
require too much data!
38
39. Genotype Networks - theory
●
John Maynard-Smith:
the concept of a Protein
Space, which is explored
by populations
39
40. Genotype Networks - theory
●
John Maynard-Smith:
the concept of a Protein
Space, which is explored
by populations
“if evolution by natural selection is
to occur, functional proteins [or
DNA sequences] must form a
continuous network which can be
traversed by unit mutational steps
without passing through nonfunctional intermediates”
40
42. Genotype Networks help recoincile
Neutralism and Selectionism
●
●
Cycles of Neutral
evolution, alterned by
cycles of Selection
Even neutral or
negative mutations
can beneficial on the
long run, because
they allow to explore
the genotype space
42
43. The Genotype Network - definitions
●
●
The Genotype
Space of a region of
5 SNPs can be
represented as a
network
Each node is a
possible genotype,
and edge connect
nodes with only one
difference
43
44. The Genotype Network - definitions
●
●
Green nodes are
sequences observed
in a population
This is the Genotype
Network of a
population
44
45. Average Path Length of a Genotype
Network
●
●
This figure represents
two populations
The yellow one has
an higher Average
Path Length than the
blue one
45
46. Average Degree
●
●
●
●
This population has an
high Average Degree
It is more robust to
mutations
This population has a
low Average Degree
Mutations are more likely
to fall outside the
Genotype Network
46
48. The VCF2Space library
●
●
●
Suite of Python
scripts to calculate
Genotype Networks
from a VCF file
~400,000 lines of
code
~350 unit tests
48
49. Splitting the genome into windows
of 11 SNPs
●
●
Less than 11 SNPs -> networks are too small and
condensed
More than 11 SNPs -> networks are too large and
sparse
Small network
Large network
49
51. Genotype Network properties of the
human genome
http://genome.ucsc.edu/cgi-bin/hgTracks?
db=hg19&hubUrl=http://bioevo.upf.edu/~gdallolio/genotype_space/hub.txt
51
52. Coding & Non-Coding regions
●
Coding regions have higher average path
length and degree than non coding regions
52
54. ●
●
●
Coding networks:
high average path
lenght and degree
Non coding networks:
low average path lenght
and degree
Recent selection: lower
average path lenght and
degree
54
56. Topics
●
Context and motivations
●
My research:
–
–
Pathway approach on the N-Glycosylation pathway
–
The Genotype Network Approach
–
●
Annotating the N-Glycosylation pathway
The Human Selection Browser and Biostar
Conclusions
56
57. Other works: The Human Selection
Browser
●
We applied 21 tests for
positive selection to the
1,000 Genomes dataset
–
●
FST, CLR, iHS, etc...
This dataset will be
published and made freely
available as a genome
browser
57
58. Other works: Biostar
●
An online forum for bioinformatics
●
About 150,000 visits per month
●
Helped thousands of bioinformaticians!
58
59. Topics
●
Context and motivations
●
My research:
–
–
Pathway approach on the N-Glycosylation pathway
–
The Genotype Network Approach
–
●
Annotating the N-Glycosylation pathway
The Human Selection Browser and Biostar
Conclusions
59
60. Conclusions (I)
●
●
●
●
We developed two applications of network theory to the study
of human population genetics.
We produced a network model of the N-Glycosylation
pathway, contributing it to the Reactome database and
improving the annotations in other databases.
We showed that the downstream part of the N-Glycosylation
pathway shows more signatures of genetic differentiation than
the upstream part. This is compatible with the role and
structure of this part of the pathway.
We showed that key genes of the N-Glycosylation pathway,
such as GCS1, MGAT3 and MAN2A1, show signatures of
recent positive selection in human populations.
60
61. Conclusions (II)
●
●
●
We produced a suite of Python scripts, called
VCF2Space, to apply the concept of Genotype
Networks to Single Nucleotide Polimorphism data
Our genome-wide application of Genotype Networks
showed that coding regions tend to have networks
with higher average degree and path length than
non-coding regions
We contributed positively to the bioinformatics
community, providing resources such as the 1000
Genomes Selection Browser and Biostar
61
66. N-glycosylation – how does it work
●
All the N-glycans are generated from a single
sugar with a very conserved structure, called
N-glycan precursor
N-glycan
precursor
Signal for
folded
proteins
Millions of
different
67
glycans
67. The FST test
Almost all the highest
signals of FST are in
genes of the
downstream part
68
68. The iHS test
GCS1 in
EUR
MAN2A1 in
SSAFR and
EASIA
MGAT3 in
EASIA
69
69. Combining p-values
●
●
●
From Peng et al, Eur J Hum Genet. 2010
Fisher's combination test
ZF follows a χ2(2K)
distribution
SNPs from the same
gene may violate the
assumption of
independency, but still the
method is robust to errors
70
70. Comparing upstream and
downstream N-Glycosylation
●
χ2 test comparing the
number of events
observed in the each
part of the pathway,
against what is the
number expected if
there were no
pathway structure
71
71. How to convert genotypes to
networks
●
Two haplotypes per individual
●
Reference allele → 0; Alternative allele → 1
Individual 1
AC AC AA GG TT TG CA TG
Ancestral alleles:
A A A G T T C T
haplotype a
00000000
haplotype b
11000111
72