SlideShare uma empresa Scribd logo
1 de 44
An integrated map of genetic
variation from 1,092 human
          genomes
   The 1000 Genomes Project Consortium
       http://www.1000genomes.org

   Nature 491, 56–65 (01 November 2012)
Primary goal
• to create a complete and detailed
  catalogue of human genetic
  variations, which in turn can be used for
  association studies relating genetic
  variation to disease.
Primary goal
• to discover >95 % of the variants (e.g.
  SNPs, CNVs, indels) with minor allele
  frequencies as low as 1% across the
  genome and 0.1-0.5% in gene regions
• to estimate the population
  frequencies, haplotype backgrounds and
  linkage disequilibrium patterns of variant
  alleles
Secondary goals
• support of better SNP and probe selection for
  genotyping platforms in future studies
• improvement of the human reference sequence.
• the completed database will be a useful tool for
  studying regions under selection, variation in
  multiple populations and understanding the
  underlying processes of mutation and
  recombination.
Project design
• to sequence each sample to about 4X coverage;
  at this depth sequencing cannot provide the
  complete genotype of each sample, but should
  allow the detection of most variants with
  frequencies as low as 1%.
• Combining the data from 2500 samples should
  allow highly accurate estimation (imputation) of
  the variants and genotypes for each sample that
  were not seen directly by the light sequencing.
Project design / Stages
• The 1000 genomes full project has been
  divided into phases to represent the dispersed
  nature of the sample collection.
Project design / Stages / Pilot
Three pilot studies provided data to inform the
design of the full-scale project:
• Pilot 1: low coverage pilot (2-4X, WGS of 180
  samples)
• Pilot 2: high coverage pilot (20-60X, WGS of 2
  mother-father-adult child trios)
• Pilot 3: the exon targeted pilot (50X, 1000 gene
  regions in 900 samples)
The pilot was completed in 2009.
Project design / Stages / Phase 1
Phase 1 represents low coverage and exome
data analysis available for the first 1092
samples.
Project design / Stages / Phase 1
Phase 1 represents low coverage and exome
data analysis available for the first 1092
samples.
DONE!
Results published in Nature 491, 56–65 (01
November 2012)
Но это ещё не всё!
Project design / Stages / Phase 2
• Phase 2 represents an expanded set of
  samples, around 1700 in number (the sequence
  data has been finalized).
• This data is being used for method development
  to both improve on existing methods from phase
  1 and also develop new methods to handle
  features like multi allelic variant sites and true
  integration of complex variation and structural
  variants.
Project design / Stages / Phase 3
• Phase 3 represents 2500 samples including
  new African samples and samples from South
  Asia. The new methods developed in phase 2
  will be applied to this data set an a final
  catalogue of variation will be released.
Amounts of Data
• Full genomic sequence of 1,700 individuals is
  now available (200TB of genomic data).
Amounts of Data
• > 2 human genomes every 24 hours
• 60-fold more sequence data than what has
  been published in DNA databases over the
  past 25 years.
Samples
• 14 populations
• 4 Ancestry-based groups
Samples / Ancestry-based groups
• Europe (IBS (Iberian Populations in Spain), GBR (British
  from England and Scotland ), CEU (Utah residents with
  ancestry from northern and western Europe), FIN
  (Finnish in Finland), TSI (Toscani in Italia));
• East Asia (JPT (Japanese in Tokyo, Japan), CHB (Han
  Chinese in Beijing, China), CHS (Han Chinese South));
• Africa (ASW (African Ancestry in SW USA), YRI (Yoruba
  in Ibadan, Nigeria), LWK (Luhya in Webuye, Kenya));
• Americas (MXL (Mexican Ancestry in Los
  Angeles, CA, USA), PUR (Puerto Ricans in Puerto
  Rico), CLM (Colombians in Medellin, Colombia)).
Data
• combination of low-coverage (2–6x) whole-
  genome sequence data, targeted deep (50–100x)
  exome sequence data and dense SNP genotype
  data.
• the approach was augmented with statistical
  methods for selecting higher quality variant calls
  from candidates obtained using multiple
  algorithms, and to integrate SNP, indel and larger
  structural variants within a single framework
• A key goal of the 1000 Genomes Project was
  to identify more than 95% of SNPs at 1%
  frequency in a broad set of populations.
• Our current resource includes ~50%, 98% and
  99.7% of the SNPs with frequencies of
  ~0.1%, 1.0% and 5.0%, respectively, in ~2,500
  UK sampled genomes.
Genetic variation
• 3.60 million single nucleotide polymorphisms
  (SNPs), of which 24,000 were in GENCODE
  (coding) regions
• 344,000 small indels (440 coding) which gives a
  ratio of 1:10 with SNPs in human genomes, and
  demonstrates the strong selection against indels
  in coding regions.
• 717 large deletions (the most confident category
  of SVs that we currently can detect), of which 39
  overlapped GENCODE regions.
• Most common variants (94% of variants with
  frequency>=5%) were known before the
  current phase of the project and had their
  haplotype structure mapped through earlier
  projects.
• Only 62% of variants in the range 0.5–5% and
• 13% of variants with frequencies of <0.5% had
  been described previously.
• Variants present at 10% and above across the
  entire sample are almost all found in all of the
  populations studied.
• By contrast, 17% of low-frequency variants in
  the range 0.5–5% were observed in a single
  ancestry group, and 53% of rare variants at
  0.5% were observed in a single population.
• The derived allele frequency distribution
  shows substantial divergence between
  populations below a frequency of 40%, such
  that individuals from populations with
  substantial African ancestry carry up to three
  times as many low-frequency variants (0.5–
  5%) as those of European or East Asian
  origin, reflecting ancestral bottlenecks in non-
  African populations.
• However, individuals from all populations
  show an enrichment of rare variants (<0.5%
  frequency), reflecting recent explosive
  increases in population size and the effects of
  geographic differentiation.
• Variants present twice across the entire
  sample (referred to as f2 variants), typically
  the most recent of informative mutations, are
  found within the same population in 53% of
  cases
• However, between-population sharing
  identifies recent historical connections.
• At the most highly conserved coding
  sites, 85% of non-synonymous variants and
  more than 90% of stop-gain and splice-
  disrupting variants are below 0.5% in
  frequency, compared with 65% of
  synonymous variants.
• Individuals typically carry more than 2500
  nonsynonymous variants at conserved
  positions, of which 20-40 are likely to be
  damaging (2-5 of which are rare), 150 loss-of-
  function variants (splice site variants, stop
  gains, frameshift indels) of which 10-20 are rare
• 130–400 non-synonymous variants per
  individual, 10–20 LOF variants, 2–5 damaging
  mutations, and 1–2 variants identified previously
  from cancer genome sequencing
Bonus Track
• The non-synonymous to synonymous ratio
  among rare (<0.5%) variants is typically in the
  range 1–2, and among common variants in the
  range 0.5–1.5, suggesting that 25–50% of rare
  non-synonymous variants are deleterious.
• However, the segregating rare load among
  gene groups in KEGG pathways varies
  substantially.
Genetic Variation Map from 1,092 Human Genomes
Genetic Variation Map from 1,092 Human Genomes
Genetic Variation Map from 1,092 Human Genomes
Genetic Variation Map from 1,092 Human Genomes
Genetic Variation Map from 1,092 Human Genomes
Genetic Variation Map from 1,092 Human Genomes

Mais conteúdo relacionado

Mais procurados

Broad Retreat Poster
Broad Retreat PosterBroad Retreat Poster
Broad Retreat PosterEric Ma
 
The human genome project vlad mike mike leo duff
The human genome project vlad mike mike leo duffThe human genome project vlad mike mike leo duff
The human genome project vlad mike mike leo duffguest73a974
 
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...FOODCROPS
 
Human Genome Project
Human Genome Project Human Genome Project
Human Genome Project rheajain25
 
Lecture 3 -the diversity of genomes and the tree of life
Lecture 3 -the diversity of genomes and the tree of lifeLecture 3 -the diversity of genomes and the tree of life
Lecture 3 -the diversity of genomes and the tree of lifeEmmanuel Aguon
 
c elegans genome, life cycle and model organism
 c elegans genome, life cycle and model organism c elegans genome, life cycle and model organism
c elegans genome, life cycle and model organismSubhradeep sarkar
 
Complete assignment on human Genome Project
Complete assignment on human Genome ProjectComplete assignment on human Genome Project
Complete assignment on human Genome Projectaafaq ali
 
The trivial case of the missing heritability
The trivial case of the missing heritabilityThe trivial case of the missing heritability
The trivial case of the missing heritabilityMax Moldovan
 
Whole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thalianaWhole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thalianaBhavya Sree
 
The human genome project
The human genome projectThe human genome project
The human genome projectSahil Biswas
 
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...Mick Watson
 
The human genome project
The human genome projectThe human genome project
The human genome project14pascba
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotesc.titus.brown
 
2 chapter 5 genes and chromosome
2 chapter 5   genes and chromosome2 chapter 5   genes and chromosome
2 chapter 5 genes and chromosomea alice
 
Human genome project
Human genome projectHuman genome project
Human genome project15cookho
 
Genome Sequencing Project
Genome Sequencing ProjectGenome Sequencing Project
Genome Sequencing Projectguestd53a1
 

Mais procurados (20)

2013 Cornell's Plant Breeding and Genetic Seminar Series
2013 Cornell's Plant Breeding and Genetic Seminar Series2013 Cornell's Plant Breeding and Genetic Seminar Series
2013 Cornell's Plant Breeding and Genetic Seminar Series
 
Broad Retreat Poster
Broad Retreat PosterBroad Retreat Poster
Broad Retreat Poster
 
The human genome project vlad mike mike leo duff
The human genome project vlad mike mike leo duffThe human genome project vlad mike mike leo duff
The human genome project vlad mike mike leo duff
 
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
2015. Jason Wallace. Applying high throughput genomics to crops for the devel...
 
Human Genome Project
Human Genome Project Human Genome Project
Human Genome Project
 
Lecture 3 -the diversity of genomes and the tree of life
Lecture 3 -the diversity of genomes and the tree of lifeLecture 3 -the diversity of genomes and the tree of life
Lecture 3 -the diversity of genomes and the tree of life
 
c elegans genome, life cycle and model organism
 c elegans genome, life cycle and model organism c elegans genome, life cycle and model organism
c elegans genome, life cycle and model organism
 
Complete assignment on human Genome Project
Complete assignment on human Genome ProjectComplete assignment on human Genome Project
Complete assignment on human Genome Project
 
The trivial case of the missing heritability
The trivial case of the missing heritabilityThe trivial case of the missing heritability
The trivial case of the missing heritability
 
Whole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thalianaWhole genome sequencing of arabidopsis thaliana
Whole genome sequencing of arabidopsis thaliana
 
The human genome project
The human genome projectThe human genome project
The human genome project
 
Magic population
Magic populationMagic population
Magic population
 
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
Discovery and Annotation of Novel Proteins from Rumen Gut Metagenomic Sequenc...
 
The human genome project
The human genome projectThe human genome project
The human genome project
 
Unlocking the value and use potential of genetic resources
Unlocking the value and use potential of genetic resourcesUnlocking the value and use potential of genetic resources
Unlocking the value and use potential of genetic resources
 
2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes2013 ucdavis-smbe-eukaryotes
2013 ucdavis-smbe-eukaryotes
 
Metagenomics
MetagenomicsMetagenomics
Metagenomics
 
2 chapter 5 genes and chromosome
2 chapter 5   genes and chromosome2 chapter 5   genes and chromosome
2 chapter 5 genes and chromosome
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
Genome Sequencing Project
Genome Sequencing ProjectGenome Sequencing Project
Genome Sequencing Project
 

Semelhante a Genetic Variation Map from 1,092 Human Genomes

IInvestigation of the genetic basis of adaptation
IInvestigation of the genetic basis of adaptationIInvestigation of the genetic basis of adaptation
IInvestigation of the genetic basis of adaptationPhilippe Henry
 
Human genetic diversity and origin of major human groups
Human genetic diversity and origin of major human groupsHuman genetic diversity and origin of major human groups
Human genetic diversity and origin of major human groupsMayank Sagar
 
Population genetics.pptx
Population genetics.pptxPopulation genetics.pptx
Population genetics.pptxprashanthbabu31
 
Inherited disease II IQ-17 Aug 2010.pptx
Inherited disease II IQ-17 Aug 2010.pptxInherited disease II IQ-17 Aug 2010.pptx
Inherited disease II IQ-17 Aug 2010.pptxAmanda783100
 
gnomAD, LoFs, and drug targets
gnomAD, LoFs, and drug targetsgnomAD, LoFs, and drug targets
gnomAD, LoFs, and drug targetsKonrad Karczewski
 
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...VHIR Vall d’Hebron Institut de Recerca
 
The server of the Spanish Population Variability
The server of the Spanish Population VariabilityThe server of the Spanish Population Variability
The server of the Spanish Population VariabilityJoaquin Dopazo
 
Case studies of HTS / NGS applications
Case studies of HTS / NGS applicationsCase studies of HTS / NGS applications
Case studies of HTS / NGS applicationsrjorton
 
Bioinformatics and NGS for advancing in hearing loss research
Bioinformatics and NGS for advancing in hearing loss researchBioinformatics and NGS for advancing in hearing loss research
Bioinformatics and NGS for advancing in hearing loss researchJoaquin Dopazo
 
Human genome project
Human genome projectHuman genome project
Human genome projectsabahayat3
 
Human genome project
Human genome projectHuman genome project
Human genome projectRakesh R
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Manikhandan Mudaliar
 
Human genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsHuman genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsgroovescience
 
Chapter 7 genome structure, chromatin, and the nucleosome (1)
Chapter 7   genome structure, chromatin, and the nucleosome (1)Chapter 7   genome structure, chromatin, and the nucleosome (1)
Chapter 7 genome structure, chromatin, and the nucleosome (1)Roger Mendez
 
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013Functional Genomics Data Society
 
Population Genetics_Dr. Ashwin Atkulwar
Population Genetics_Dr. Ashwin AtkulwarPopulation Genetics_Dr. Ashwin Atkulwar
Population Genetics_Dr. Ashwin AtkulwarAshwin Atkulwar
 
Global patterns of insect diiversity, distribution and evolutionary distinctness
Global patterns of insect diiversity, distribution and evolutionary distinctnessGlobal patterns of insect diiversity, distribution and evolutionary distinctness
Global patterns of insect diiversity, distribution and evolutionary distinctnessAlison Specht
 
Introduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxIntroduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxFatma Sayed Ibrahim
 

Semelhante a Genetic Variation Map from 1,092 Human Genomes (20)

IInvestigation of the genetic basis of adaptation
IInvestigation of the genetic basis of adaptationIInvestigation of the genetic basis of adaptation
IInvestigation of the genetic basis of adaptation
 
Human genetic diversity and origin of major human groups
Human genetic diversity and origin of major human groupsHuman genetic diversity and origin of major human groups
Human genetic diversity and origin of major human groups
 
Population genetics.pptx
Population genetics.pptxPopulation genetics.pptx
Population genetics.pptx
 
Inherited disease II IQ-17 Aug 2010.pptx
Inherited disease II IQ-17 Aug 2010.pptxInherited disease II IQ-17 Aug 2010.pptx
Inherited disease II IQ-17 Aug 2010.pptx
 
CSHL
CSHLCSHL
CSHL
 
gnomAD, LoFs, and drug targets
gnomAD, LoFs, and drug targetsgnomAD, LoFs, and drug targets
gnomAD, LoFs, and drug targets
 
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
NGS Applications II (UEB-UAT Bioinformatics Course - Session 2.1.3 - VHIR, Ba...
 
The server of the Spanish Population Variability
The server of the Spanish Population VariabilityThe server of the Spanish Population Variability
The server of the Spanish Population Variability
 
Case studies of HTS / NGS applications
Case studies of HTS / NGS applicationsCase studies of HTS / NGS applications
Case studies of HTS / NGS applications
 
Bioinformatics and NGS for advancing in hearing loss research
Bioinformatics and NGS for advancing in hearing loss researchBioinformatics and NGS for advancing in hearing loss research
Bioinformatics and NGS for advancing in hearing loss research
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
Human genome project
Human genome projectHuman genome project
Human genome project
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
 
Human genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traitsHuman genetic variation and its contribution to complex traits
Human genetic variation and its contribution to complex traits
 
Lisbon genome diversity
Lisbon genome diversityLisbon genome diversity
Lisbon genome diversity
 
Chapter 7 genome structure, chromatin, and the nucleosome (1)
Chapter 7   genome structure, chromatin, and the nucleosome (1)Chapter 7   genome structure, chromatin, and the nucleosome (1)
Chapter 7 genome structure, chromatin, and the nucleosome (1)
 
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
Mar Gonzales Porta, One gene One transcript, fged_seattle_2013
 
Population Genetics_Dr. Ashwin Atkulwar
Population Genetics_Dr. Ashwin AtkulwarPopulation Genetics_Dr. Ashwin Atkulwar
Population Genetics_Dr. Ashwin Atkulwar
 
Global patterns of insect diiversity, distribution and evolutionary distinctness
Global patterns of insect diiversity, distribution and evolutionary distinctnessGlobal patterns of insect diiversity, distribution and evolutionary distinctness
Global patterns of insect diiversity, distribution and evolutionary distinctness
 
Introduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptxIntroduction to haplotype blocks .pptx
Introduction to haplotype blocks .pptx
 

Mais de Grigory Sapunov

AI Hardware Landscape 2021
AI Hardware Landscape 2021AI Hardware Landscape 2021
AI Hardware Landscape 2021Grigory Sapunov
 
What's new in AI in 2020 (very short)
What's new in AI in 2020 (very short)What's new in AI in 2020 (very short)
What's new in AI in 2020 (very short)Grigory Sapunov
 
Artificial Intelligence (lecture for schoolchildren) [rus]
Artificial Intelligence (lecture for schoolchildren) [rus]Artificial Intelligence (lecture for schoolchildren) [rus]
Artificial Intelligence (lecture for schoolchildren) [rus]Grigory Sapunov
 
Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Grigory Sapunov
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware LandscapeGrigory Sapunov
 
Modern neural net architectures - Year 2019 version
Modern neural net architectures - Year 2019 versionModern neural net architectures - Year 2019 version
Modern neural net architectures - Year 2019 versionGrigory Sapunov
 
AI - Last Year Progress (2018-2019)
AI - Last Year Progress (2018-2019)AI - Last Year Progress (2018-2019)
AI - Last Year Progress (2018-2019)Grigory Sapunov
 
Практический подход к выбору доменно-адаптивного NMT​
Практический подход к выбору доменно-адаптивного NMT​Практический подход к выбору доменно-адаптивного NMT​
Практический подход к выбору доменно-адаптивного NMT​Grigory Sapunov
 
Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018Grigory Sapunov
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNsGrigory Sapunov
 
Введение в Deep Learning
Введение в Deep LearningВведение в Deep Learning
Введение в Deep LearningGrigory Sapunov
 
Введение в машинное обучение
Введение в машинное обучениеВведение в машинное обучение
Введение в машинное обучениеGrigory Sapunov
 
Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016Grigory Sapunov
 
Artificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and FutureArtificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and FutureGrigory Sapunov
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Grigory Sapunov
 

Mais de Grigory Sapunov (20)

Transformers in 2021
Transformers in 2021Transformers in 2021
Transformers in 2021
 
AI Hardware Landscape 2021
AI Hardware Landscape 2021AI Hardware Landscape 2021
AI Hardware Landscape 2021
 
NLP in 2020
NLP in 2020NLP in 2020
NLP in 2020
 
What's new in AI in 2020 (very short)
What's new in AI in 2020 (very short)What's new in AI in 2020 (very short)
What's new in AI in 2020 (very short)
 
Artificial Intelligence (lecture for schoolchildren) [rus]
Artificial Intelligence (lecture for schoolchildren) [rus]Artificial Intelligence (lecture for schoolchildren) [rus]
Artificial Intelligence (lecture for schoolchildren) [rus]
 
Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)Transformer Zoo (a deeper dive)
Transformer Zoo (a deeper dive)
 
Transformer Zoo
Transformer ZooTransformer Zoo
Transformer Zoo
 
BERTology meets Biology
BERTology meets BiologyBERTology meets Biology
BERTology meets Biology
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware Landscape
 
Modern neural net architectures - Year 2019 version
Modern neural net architectures - Year 2019 versionModern neural net architectures - Year 2019 version
Modern neural net architectures - Year 2019 version
 
AI - Last Year Progress (2018-2019)
AI - Last Year Progress (2018-2019)AI - Last Year Progress (2018-2019)
AI - Last Year Progress (2018-2019)
 
Практический подход к выбору доменно-адаптивного NMT​
Практический подход к выбору доменно-адаптивного NMT​Практический подход к выбору доменно-адаптивного NMT​
Практический подход к выбору доменно-адаптивного NMT​
 
Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018Deep Learning: Application Landscape - March 2018
Deep Learning: Application Landscape - March 2018
 
Sequence learning and modern RNNs
Sequence learning and modern RNNsSequence learning and modern RNNs
Sequence learning and modern RNNs
 
Введение в Deep Learning
Введение в Deep LearningВведение в Deep Learning
Введение в Deep Learning
 
Введение в машинное обучение
Введение в машинное обучениеВведение в машинное обучение
Введение в машинное обучение
 
Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016Введение в архитектуры нейронных сетей / HighLoad++ 2016
Введение в архитектуры нейронных сетей / HighLoad++ 2016
 
Artificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and FutureArtificial Intelligence - Past, Present and Future
Artificial Intelligence - Past, Present and Future
 
Multidimensional RNN
Multidimensional RNNMultidimensional RNN
Multidimensional RNN
 
Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016Deep Learning and the state of AI / 2016
Deep Learning and the state of AI / 2016
 

Último

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 

Último (20)

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 

Genetic Variation Map from 1,092 Human Genomes

  • 1. An integrated map of genetic variation from 1,092 human genomes The 1000 Genomes Project Consortium http://www.1000genomes.org Nature 491, 56–65 (01 November 2012)
  • 2. Primary goal • to create a complete and detailed catalogue of human genetic variations, which in turn can be used for association studies relating genetic variation to disease.
  • 3. Primary goal • to discover >95 % of the variants (e.g. SNPs, CNVs, indels) with minor allele frequencies as low as 1% across the genome and 0.1-0.5% in gene regions • to estimate the population frequencies, haplotype backgrounds and linkage disequilibrium patterns of variant alleles
  • 4. Secondary goals • support of better SNP and probe selection for genotyping platforms in future studies • improvement of the human reference sequence. • the completed database will be a useful tool for studying regions under selection, variation in multiple populations and understanding the underlying processes of mutation and recombination.
  • 5. Project design • to sequence each sample to about 4X coverage; at this depth sequencing cannot provide the complete genotype of each sample, but should allow the detection of most variants with frequencies as low as 1%. • Combining the data from 2500 samples should allow highly accurate estimation (imputation) of the variants and genotypes for each sample that were not seen directly by the light sequencing.
  • 6. Project design / Stages • The 1000 genomes full project has been divided into phases to represent the dispersed nature of the sample collection.
  • 7. Project design / Stages / Pilot Three pilot studies provided data to inform the design of the full-scale project: • Pilot 1: low coverage pilot (2-4X, WGS of 180 samples) • Pilot 2: high coverage pilot (20-60X, WGS of 2 mother-father-adult child trios) • Pilot 3: the exon targeted pilot (50X, 1000 gene regions in 900 samples) The pilot was completed in 2009.
  • 8. Project design / Stages / Phase 1 Phase 1 represents low coverage and exome data analysis available for the first 1092 samples.
  • 9. Project design / Stages / Phase 1 Phase 1 represents low coverage and exome data analysis available for the first 1092 samples. DONE! Results published in Nature 491, 56–65 (01 November 2012)
  • 10. Но это ещё не всё!
  • 11. Project design / Stages / Phase 2 • Phase 2 represents an expanded set of samples, around 1700 in number (the sequence data has been finalized). • This data is being used for method development to both improve on existing methods from phase 1 and also develop new methods to handle features like multi allelic variant sites and true integration of complex variation and structural variants.
  • 12. Project design / Stages / Phase 3 • Phase 3 represents 2500 samples including new African samples and samples from South Asia. The new methods developed in phase 2 will be applied to this data set an a final catalogue of variation will be released.
  • 13. Amounts of Data • Full genomic sequence of 1,700 individuals is now available (200TB of genomic data).
  • 14. Amounts of Data • > 2 human genomes every 24 hours • 60-fold more sequence data than what has been published in DNA databases over the past 25 years.
  • 15. Samples • 14 populations • 4 Ancestry-based groups
  • 16.
  • 17. Samples / Ancestry-based groups • Europe (IBS (Iberian Populations in Spain), GBR (British from England and Scotland ), CEU (Utah residents with ancestry from northern and western Europe), FIN (Finnish in Finland), TSI (Toscani in Italia)); • East Asia (JPT (Japanese in Tokyo, Japan), CHB (Han Chinese in Beijing, China), CHS (Han Chinese South)); • Africa (ASW (African Ancestry in SW USA), YRI (Yoruba in Ibadan, Nigeria), LWK (Luhya in Webuye, Kenya)); • Americas (MXL (Mexican Ancestry in Los Angeles, CA, USA), PUR (Puerto Ricans in Puerto Rico), CLM (Colombians in Medellin, Colombia)).
  • 18. Data • combination of low-coverage (2–6x) whole- genome sequence data, targeted deep (50–100x) exome sequence data and dense SNP genotype data. • the approach was augmented with statistical methods for selecting higher quality variant calls from candidates obtained using multiple algorithms, and to integrate SNP, indel and larger structural variants within a single framework
  • 19.
  • 20.
  • 21. • A key goal of the 1000 Genomes Project was to identify more than 95% of SNPs at 1% frequency in a broad set of populations. • Our current resource includes ~50%, 98% and 99.7% of the SNPs with frequencies of ~0.1%, 1.0% and 5.0%, respectively, in ~2,500 UK sampled genomes.
  • 22.
  • 23. Genetic variation • 3.60 million single nucleotide polymorphisms (SNPs), of which 24,000 were in GENCODE (coding) regions • 344,000 small indels (440 coding) which gives a ratio of 1:10 with SNPs in human genomes, and demonstrates the strong selection against indels in coding regions. • 717 large deletions (the most confident category of SVs that we currently can detect), of which 39 overlapped GENCODE regions.
  • 24. • Most common variants (94% of variants with frequency>=5%) were known before the current phase of the project and had their haplotype structure mapped through earlier projects. • Only 62% of variants in the range 0.5–5% and • 13% of variants with frequencies of <0.5% had been described previously.
  • 25.
  • 26.
  • 27. • Variants present at 10% and above across the entire sample are almost all found in all of the populations studied. • By contrast, 17% of low-frequency variants in the range 0.5–5% were observed in a single ancestry group, and 53% of rare variants at 0.5% were observed in a single population.
  • 28.
  • 29. • The derived allele frequency distribution shows substantial divergence between populations below a frequency of 40%, such that individuals from populations with substantial African ancestry carry up to three times as many low-frequency variants (0.5– 5%) as those of European or East Asian origin, reflecting ancestral bottlenecks in non- African populations.
  • 30. • However, individuals from all populations show an enrichment of rare variants (<0.5% frequency), reflecting recent explosive increases in population size and the effects of geographic differentiation.
  • 31.
  • 32. • Variants present twice across the entire sample (referred to as f2 variants), typically the most recent of informative mutations, are found within the same population in 53% of cases • However, between-population sharing identifies recent historical connections.
  • 33.
  • 34. • At the most highly conserved coding sites, 85% of non-synonymous variants and more than 90% of stop-gain and splice- disrupting variants are below 0.5% in frequency, compared with 65% of synonymous variants.
  • 35. • Individuals typically carry more than 2500 nonsynonymous variants at conserved positions, of which 20-40 are likely to be damaging (2-5 of which are rare), 150 loss-of- function variants (splice site variants, stop gains, frameshift indels) of which 10-20 are rare • 130–400 non-synonymous variants per individual, 10–20 LOF variants, 2–5 damaging mutations, and 1–2 variants identified previously from cancer genome sequencing
  • 36.
  • 38. • The non-synonymous to synonymous ratio among rare (<0.5%) variants is typically in the range 1–2, and among common variants in the range 0.5–1.5, suggesting that 25–50% of rare non-synonymous variants are deleterious. • However, the segregating rare load among gene groups in KEGG pathways varies substantially.