2. • Humans and other living organisms all contain a digital project
constituted by a linear sequence of different combinations of 4
small chemical compounds, named nucleotides, which
together constitute their DNA.
• Particular combinations of nucleotides specify the key
qualitative and quantitative instructions for the synthesis of
essential structural and operative components of the cell formed
by different combinations of 20 molecules named amino acids
• In turn amino acids are linked to each other to form more
complex molecules named proteins.
3.
4.
5. U U U Phe U C U Ser U A U Tyr U G U Cys
U U C Phe U C C Ser U A C Tyr U G C Cys
U U A Leu U C A Ser U A A STOP U G A STOP
U U G Leu U C G Ser U A G STOP U G G Trp
C U U Leu C C U Pro C A U His C G U Arg
C U C Leu C C C Pro C A C His C G C Arg
C U A Leu C C A Pro C A A Gln C G A Arg
C U G Leu C C G Pro C A G Gln C G G Arg
A U U Ile A C U Thr A A U Asn A G U Ser
A U C Ile A C C Thr A A C Asn A G C Ser
A U A Ile A C A Thr A A A Lys A G A Arg
A U G Met A C G Thr A A G Lys A G G Arg
G U U Val G C U Ala G A U Asp G G U Gly
G U C Val G C C Ala G A C Asp G G C Gly
G U A Val G C A Ala G A A Glu G G A Gly
G U G Val G C G Ala G A G Glu G G G Gly
6. • While the basic composition of both DNA and protein building
blocks and the translational system of one chemical language
into the other is conserved, there is wide variation in the order
of these block units in different organisms and individuals.
• This is because the DNA and deriving protein products are not
a static entity. Instead, DNA is subjected to a variety of different
types of heritable change known as mutation.
• Mutations often arise as copying errors during DNA replication.
Although the fidelity of DNA replication is strikingly high,
misincorporation occurs at a given frequency, known as
mutation rate.
7. • Modern humans originated ~100,000 years ago from pre-
modern humans and represent a relatively homogenous
species which has experienced a dramatic expansion during
its recent evolutionary history.
• Two unrelated human individuals on our planet are identical for
about 99.9% and thus differ for about 0.1% of their DNA content.
• This means that there is approximately one change every
1000 nucleotides (our genome has an overall content of about
two copies of 3.3 billion nucleotides) when comparing the DNA
from two unrelated individuals.
8. This genetic variation has important medical consequences:
In simple mendelian traits, the relationship between the causal
genetic variant (genotype) and the disease state is deterministic.
In a complex trait such as MS, the disease state results from
interactions between multiple genotypes and the environment.
The influence of any individual causal allele tends to be modest
and the relationship between the causal variant and the disease
state is probabilistic.
11. POSSIBILI SIGNIFICATI DI UN’ASSOCIAZIONE
ASSOCIAZIONE PRIMARIA CON LA
VARIANTE CAUSALE
ASSOCIAZIONE SECONDARIA DOVUTA
A CONTIGUITA’
ASSOCIAZIONE SPURIA DOVUTA A
SUBSTRUTTURA DI POPOLAZIONE
13. The imperfect genome-wide search
All Gene Chip Arrays contain SNPs chosen based on linkage
disequilibrium (LD) observed in HapMap populations, a
catalogue of ~ 3 million SNPs genotyped in Europeans,
Asians, and Africans
14. The imperfect genome-wide search
All Gene Chip Arrays contain SNPs chosen based on linkage
disequilibrium (LD) observed in HapMap populations, a
catalogue of ~ 3 million SNPs genotyped in Europeans,
Asians, and Africans
Studying a subset of 500,000 or 1 million is limitative
15. The imperfect genome-wide search
All Gene Chip Arrays contain SNPs chosen based on linkage
disequilibrium (LD) observed in HapMap populations, a
catalogue of ~ 3 million SNPs genotyped in Europeans,
Asians, and Africans
Studying a subset of 500,000 or 1 million is limitative
Power to detect disease associations
Causative Tested at a locus inversely correlates with
variant variant the r2 between typed(tested) and
untyped (causative) SNPs
16. Why a sequencing project in Sardinia?
CROATIA
UKRAINE
SARDINIA
HUNGARY
POLAND
CATALONIA
BASQUE COUNTRY
GEORGIA
ANDALUSIA
CORSICA
SICILY
NORTH-CENTRAL
ITALY
ALBANIA CALABRIA
GREECE
LEBANON TURKEY
17. Why a sequencing project in
Sardinia?
SAAMI
UDMURT
MARI
CZECH AND SLOVAKIAN
DUTCH
UKRAINIAN
FRENCH
POLISH
HUNGARIAN
CROATIAN
GEORGIAN
CENTRAL-NORTHERN ITALIAN
MACEDONIAN
ALBANIAN
SPANISH BASQUES
CALABRIAN
TURKISH
SYRIAN
ANDALUSIAN
GREEK
LEBANESE
MOROCCO
17
18. What samples to sequence in
Sardinia?
• ProgeNIA study
• Case-Control studies
• Future work
20. ProgeNIA/SardiNIA project
6,148 individuals - aged 14-102 y.
95% are known to have all grandparents born in Sardinia
711 pedigrees up to 5 generations deep
Largest family: 625 phenotyped individuals
>34,000 relatives pairs
Pilia et al. PLoS Genet. 2006
28. How many samples to
sequence?
• Is it necessary to sequence all
people analysed?
28
29. • Observed genotypes
• Inferred DNA stretches
sharing along
chromosome
• Inferred missing
genotypes according
to chromosome
sharing
Chen and Abecasis AJHG 2008
Burdick et al. Nat. Genet. 2006
30. 1) Identify Match Among
Reference
Individuals in study sample
. . A A . . . . . . . . A . . . . A . . .
. . G A . . . . . . . . C . . . . A . . .
Observed HapMap Chromosomes
C G A G A T C T C C T T C T T C T G T G C
C G A A A T C T C C C G A C C T C A T G G
C C A A G C T C T T T T C T T C T G T G C
C G A A G C T C T T T T C T T C T G T G C
C G A G A C T C T C C G A C C T T A T G C
T G G A A T C T C C C G A C C T C A T G G
C G A G A T C T C C C G A C C T T G T G C
C G A G A C T C T T T T C T T T T A T A C
C G A G A C T C T C C G A C C T C G T G C
C G G A G C T C T T T T C T T C T G T G C
31. 1) Identify Match Among
Reference
Individuals in study sample
. . A A . . . . . . . . A . . . . A . . .
. . G A . . . . . . . . C . . . . A . . .
Observed HapMap Chromosomes
C G A G A T C T C C T T C T T C T G T G C
C G A A A T C T C C C G A C C T C A T G G
C C A A G C T C T T T T C T T C T G T G C
C G A A G C T C T T T T C T T C T G T G C
C G A G A C T C T C C G A C C T T A T G C
T G G A A T C T C C C G A C C T C A T G G
C G A G A T C T C C C G A C C T T G T G C
C G A G A C T C T T T T C T T T T A T A C
C G A G A C T C T C C G A C C T C G T G C
C G G A G C T C T T T T C T T C T G T G C
32. 1) Identify Match Among
Reference
Individuals in study sample
. . A A . . . . . . . . A . . . . A . . .
. . G A . . . . . . . . C . . . . A . . .
Observed HapMap Chromosomes
C G A G A T C T C C T T C T T C T G T G C
C G A A A T C T C C C G A C C T C A T G G
C C A A G C T C T T T T C T T C T G T G C
C G A A G C T C T T T T C T T C T G T G C
C G A G A C T C T C C G A C C T T A T G C
T G G A A T C T C C C G A C C T C A T G G
C G A G A T C T C C C G A C C T T G T G C
C G A G A C T C T T T T C T T T T A T A C
C G A G A C T C T C C G A C C T C G T G C
C G G A G C T C T T T T C T T C T G T G C
33. 1) Identify Match Among
Reference
Individuals in study sample
. . A A . . . . . . . . A . . . . A . . .
. . G A . . . . . . . . C . . . . A . . .
Observed HapMap Chromosomes
C G A G A T C T C C T T C T T C T G T G C
C G A A A T C T C C C G A C C T C A T G G
C C A A G C T C T T T T C T T C T G T G C
C G A A G C T C T T T T C T T C T G T G C
C G A G A C T C T C C G A C C T T A T G C
T G G A A T C T C C C G A C C T C A T G G
C G A G A T C T C C C G A C C T T G T G C
C G A G A C T C T T T T C T T T T A T A C
C G A G A C T C T C C G A C C T C G T G C
C G G A G C T C T T T T C T T C T G T G C
34. 2) Phase Chromosome
Individuals in study sample
. . A A . . . . . . . . A . . . . A . . .
. . G A . . . . . . . . C . . . . A . . .
Observed HapMap Chromosomes
C G A G A T C T C C T T C T T C T G T G C
C G A A A T C T C C C G A C C T C A T G G
C C A A G C T C T T T T C T T C T G T G C
C G A A G C T C T T T T C T T C T G T G C
C G A G A C T C T C C G A C C T T A T G C
T G G A A T C T C C C G A C C T C A T G G
C G A G A T C T C C C G A C C T T G T G C
C G A G A C T C T T T T C T T T T A T A C
C G A G A C T C T C C G A C C T C G T G C
C G G A G C T C T T T T C T T C T G T G C
35. 3) Impute Missing Genotypes
Individuals in study sample
C G A A A T C T C C C G A C C T C A T G G
C G G A G C T C T T T T C T T T T A T G C
Observed HapMap Chromosomes
C G A G A T C T C C T T C T T C T G T G C
C G A A A T C T C C C G A C C T C A T G G
C C A A G C T C T T T T C T T C T G T G C
C G A A G C T C T T T T C T T C T G T G C
C G A G A C T C T C C G A C C T T A T G C
T G G A A T C T C C C G A C C T C A T G G
C G A G A T C T C C C G A C C T T G T G C
C G A G A C T C T T T T C T T T T A T A C
C G A G A C T C T C C G A C C T C G T G C
C G G A G C T C T T T T C T T C T G T G C
36. Recent updates
We used whole-genome sequences of 52 Europeans available
from the 1,000 Genomes Project to infer ~6.6 million markers
in individuals typed with the higher density chip…..
…. then with imputation method we inferred the 6.6 million
markers to all individuals and performed a GWAS
This :
Provides a fine mapping for previously discovered loci
May show new loci that were poorly tagged by the previous set
of SNPs
37. GWAS finding
Mostly all of the loci detected by GWAS only explain a small
fraction of the heritability
Trait Heritability So far explained
HbF ~60% ~17%
Height ~80% ~4%
BMI ~40% ~1%
Smaller is the effect size, larger is the sample size required to
maintain adequate power
40. ProgeNIA Team Lanusei-Cagliari
Manuela Uda Monica Lai
Serena Sanna Anna Cau
Eleonora Porcu Barbara Deiana
Ilenia Zara Monica Balloi
Carlo Sidore Maria Grazia Piras
Maristella Steri Gianluca Usala
Marco Masala Antonella Mulas
Gianmauro Cuccuru Andrea Maschio
Angelo Scuteri Fabio Busonero
Marco Orrù Sandra Lai
Maria Grazia Pilia Mariano Dei
Danilo Fois
Liana Ferreli Laura Crisponi
Francesco Loi Silvia Naitza
Caterina Flore
Simona Foddi
Giuseppe Pilia, Ideatore e Fondatore del Progetto ProgeNIA