This document discusses the evolution of human genetic studies of craniofacial disorders such as orofacial clefts. It describes how studies have progressed from descriptive family studies and Mendelian genetics in the early 20th century to modern genome-wide association studies and genomic sequencing. Key developments include the ability to collect and analyze DNA from families to study linkage, perform candidate gene studies and genome-wide linkage analyses. The advent of widespread genomic sequencing in 2000 enabled unbiased genome-wide association studies to identify risk loci rather than genes. One example described used a GWAS to identify multiple gene loci associated with the age of emergence of primary teeth.
2. Objectives:
To explain the evolution of human genetic studies of
craniofacial disorders including linkage and genome
wide association studies.
3. Evolution of Human Genetic Studies
on Orofacial Clefts
Pregenetics Era up to Mendel
Dawn of Genetics – 1900- 1910
Genetics Era (Pregenomics) – 1910-
2000
Genomics Era – Human genome
sequence 2000 (21st century)
Marazita, M. L., 2012. The evolution of human genetic studies of cleft lip and cleft palate. Annu Rev Genomics Hum Genet. 13, 263-83.
4. The Causes of Orofacial Clefting in
the Pregenetics era
Folklore explanations
Eclipses lead to OFC
Eating rabbits lead to CL
Descriptive family studies - 1757, and
concerned a family with several affected members
Marazita, M. L., 2012. The evolution of human genetic studies of cleft lip and cleft palate. Annu Rev Genomics Hum Genet. 13, 263-83.
5. Genetics Era (Pregenomics)
Early identification of cases with orofacial clefts and
application of statistical methods to estimate frequency
of occurrence
Systematic development of family data sets consisting of
cases and their unaffected parents and siblings. At a
minimum a triad is desirable (parents and affected
offspring)
Multifactorial threshold model to explain OFC because
clefting does not follow the pattern of a single gene
mutation.
Segregation analysis in the 1970’s
Marazita, M. L., 2012. The evolution of human genetic studies of cleft lip and cleft palate. Annu Rev Genomics Hum Genet. 13, 263-83.
6. Collection of DNA from families
Once DNA was banked it was possible to
study the linkage of the trait (i.e. normal or
abnormal traits can be studied)
Candidate genes (association and linkage
Studies)
Genome-wide linkage studies
7. What Has Changed in the last 15
years?
The human genome was sequenced in 2000. There are
19,000 genes coded in the genome.
Each gene is composed of introns and exons as well as a
regulatory region.
Mutations can occur anywhere in the genome – how can
we find them?? Like looking for a needle in a haystack
Advances in genetic technologies over the past decade
and specifically in the way we analyze genomes allow the
detection of mutations with higher efficiency and
accuracy.
Marazita, M. L., 2012. The evolution of human genetic studies of cleft lip and cleft palate. Annu Rev Genomics Hum Genet. 13, 263-83.
8. Even with single gene mutations,
genotype-phenotype correlations are
difficult to predict
Variable expressivity - The variation in the degree to which tissues
are affected.
Incomplete penetrance - The absolute variation of severity from “full
blown” to the apparent absence of clinical signs.
Animal models (particularly mouse models), provide many
spectacular examples of phenotypic variation due to the influence of
epigenetic factors. Embryos with the identical genetic mutation can
have a wide range of phenotypes (see below).
• Liu W, Selever J, Murali D, Sun X, Brugger SM, Ma L, Schwartz RJ, Maxson R, Furuta Y, Martin JF. 2005. Threshold-specific requirements for Bmp4 in mandibular development. Dev Biol
9. Linkage – an unbiased approach
Genetic linkage analysis is a powerful tool to detect the
chromosomal location of diseased genes.
It is based on the observation that genes that reside
physically close on a chromosome remain linked during
meiosis.
The aim of linkage analysis is to identify a marker that co-
segregates with the gene of interest and so can be used
to track the gene within a family without actually knowing
the mutation.
Stefan M. Pulst, 1999. Genetic Linkage Analysis. Arch Neurol. 56, 667-672
10. Genomics era now permits
scanning of the entire genome for
variation in an unbiased manner
Genome-wide association studies
(GWAS)
http://www.genome.gov/20019523
Marazita, M. L., 2012. The evolution of human genetic studies of cleft lip and cleft palate. Annu Rev Genomics Hum Genet. 13, 263-83.
11. Two approaches are used for finding gene
mutations in coding regions
Whole exome sequencing, unbiased, all exomic sequences
are studied
Sequencing of candidate genes that are thought to cause
disease. This is biased and may not discover the causative
genes.
Methadology
12. Methods to find gene mutations in
either non-coding or coding regions
Start with a group of individuals that have a trait such as
late eruption of primary teeth
Comparative genome hybridization - unbiased
Whole genome sequencing - unbiased
Single nucleotide polymorphism analysis – can be biased
to certain SNPs already found to be associated with
disease or unbiased
Genome wide association studies – unbiased
13. Genome-wide association studies
Genome-wide association studies identify loci and not
genes per se and cannot easily identify loci at which there
are many rare risk alleles in any given population. Rather,
this approach is designed to find loci that fit the common
disease–common variant hypothesis of human disease.
This approach relies on the foundation of data produced
by the International Human HapMap Project and the fact
that genetic variance at one locus can predict with high
probability genetic variance at an adjacent locus
John Hardy and Andrew Singleton, 2009. Genomewide Association Studies and Human Disease. N Engl J Med 360:, 759-68
14. Genome-wide association studies
Benefits
Initial hypothesis (not required)
Uses digital and additive data
Encourages the formation of collaborative consortia
Rules out specific genetic associations
Provides data on the ancestry of each subject, which
assists in matching case subjects with control
subjects
Provides data on both sequence and copy-number
variations
John Hardy and Andrew Singleton, 2009. Genomewide Association Studies and Human Disease. N Engl J Med 360:, 759-68
15. Genome-wide association studies
Misconceptions:
Thought to provide data on all genetic variability
associated with disease, when in reality only common
alleles with large effects are identified
Thought to screen out alleles with a small effect size,
when in reality such findings may still be very useful in
determining pathogenic biochemical pathways, even
though low-risk alleles may be of little predictive value
John Hardy and Andrew Singleton, 2009. Genomewide Association Studies and Human Disease. N Engl J Med 360:, 759-68
16. Other challenges of GWAS studies
Finds loci, not genes, which can complicate the
identification of pathogenic changes on an associated
haplotype
Detects only alleles that are common (>5%) in a
population
John Hardy and Andrew Singleton, 2009. Genomewide Association Studies and Human Disease. N Engl J Med 360:, 759-68
Class, what is LOCI?
17. Experimental design challenges of
GWAS
Large sample sizes require patient recruitment from many
international sites
Analysis is complex - Large scale computing and complex
statistical analysis is required to analyze the data
Cost of obtaining and preparing DNA is low but cost
of sequencing is still relatively high
Complex traits that are caused by variation in multiple genes
and interactions with the environment are still difficult to study
Ethnicity affects the variation in DNA sequence, therefore it is
important to draw controls from the same population as those
with the condition.
Replication in different populations is important
18. How Does It Work?
http://www.genome.gov/20019523
19. An example where GWAS was used
to find gene variation correlated
with age of emergence of the first
primary tooth
Pillas, et al., 2010. Genome-wide association study reveals multiple loci associated with primary tooth development during infancy. PLoS Genet. 6, e1000856.
20. Study design
DNA was collected from two cohorts of children, one from
the UK and one from Finland
Time of first tooth eruption & number of teeth by one year
of age were recorded by the parents in a questionnaire
(UK) or by observations made by public health
professionals (Finland)
4,564 individuals from the Northern Finland Birth Cohort
(NFBC 1966) and 1396 individuals from the Avon
Longitudinal Study of Parents and Children in Bristol UK
(ALSPC)
Tested 300,766 SNPs from two cohorts
Pillas, et al., 2010. Genome-wide association study reveals multiple loci associated with primary tooth development during infancy. PLoS Genet. 6, e1000856.
21. Results
Linkage disequilibrium plots showing several genes
that reached genome-wide significance
EDA mutations cause
Ectodermal Dysplasia
Pillas, et al., 2010. Genome-wide association study reveals multiple loci associated with primary tooth development during infancy. PLoS Genet. 6, e1000856.
22. Candidate genes in the top loci
Pillas, et al., 2010. Genome-wide association study reveals multiple loci associated with primary tooth development during infancy. PLoS Genet. 6, e1000856.
23. GWAS has identified a strong genetic
component to the variation in timing
of primary tooth emergence
Since primary teeth develop starting in the late embryonic
period (6-7weeks) their timing of emergence is less
affected by postnatal influences
Emergence of permanent teeth may not show as strong a
signal for genetic control
Other local factors including function, caries, space loss,
pulpal necrosis and the child wiggling their teeth could
also affect timing of permanent tooth emergence
24. Next week, we will see how GWAS
has been used to identify regions in
the genome that are associated with
increased risk of non-syndromic
orofacial clefting
25. References
Cox, T. C., Luquetti, D. V., Cunningham, M. L., 2013. Perspectives and challenges in
advancing research into craniofacial anomalies. Am J Med Genet C Semin Med Genet.
163C, 213-7.
Marazita, M. L., 2012. The evolution of human genetic studies of cleft lip and cleft
palate. Annu Rev Genomics Hum Genet. 13, 263-83.
Liu W, Selever J, Murali D, Sun X, Brugger SM, Ma L, Schwartz RJ, Maxson R, Furuta Y,
Martin JF. 2005. Threshold-specific requirements for Bmp4 in mandibular development.
Dev Biol 283:282-293.
Stefan M. Pulst, 1999. Genetic Linkage Analysis. Arch Neurol. 56, 667-672
John Hardy and Andrew Singleton, 2009. Genomewide Association Studies and
Human Disease. N Engl J Med 360:, 759-68
Pillas, D., Hoggart, C. J., Evans, D. M., O'Reilly, P. F., Sipila, K., Lahdesmaki, R.,
Millwood, I. Y., Kaakinen, M., Netuveli, G., Blane, D., Charoen, P., Sovio, U., Pouta, A.,
Freimer, N., Hartikainen, A. L., Laitinen, J., Vaara, S., Glaser, B., Crawford, P., Timpson,
N. J., Ring, S. M., Deng, G., Zhang, W., McCarthy, M. I., Deloukas, P., Peltonen, L.,
Elliott, P., Coin, L. J., Smith, G. D., Jarvelin, M. R., 2010. Genome-wide association
study reveals multiple loci associated with primary tooth development during infancy.
PLoS Genet. 6, e1000856.
Notas do Editor
Pregenetics era:
Comprises the years leading up to approximately 1900, when Mendel’s research was rediscovered, translated, and championed by Bateson
Dawn of Genetics:
As a scientific discipline began in the early 1900s, with the debates of Mendel proponents such as Bateson against Pearson and the other members of the Galton laboratory.
Genetic Era:
The genetics era (pregenomics) began in approximately 1910 with the development of a wide range of statistical tools for genetic analysis in families
Genomics Era:
Following the successful completion of the Human Genome Projects (HGP) in 2000, the twenty-first century has clearly been the genomics era—distinguished from the genetics era by the concentration on research tools made possible by the HGP and/or required for maximal utilization of (single-gene) traits. The new genomics tools are ideal for dissecting common, complex (non-single-gene) traits such as OFC, and OFC researchers have embraced these tools.
Pregenetics era: comprises the years leading up to approximately 1900, when Mendel’s research was rediscovered, translated, and championed by
Bateson
Folklore:
The Aztecs believed that eclipses occurred because a bite had been taken out of the moon, and so exposure to an eclipse during pregnancy
could lead to OFC
In China there was a common belief that eating rabbit during pregnancy could lead to a “hare lip” (an outmoded term for CL)
Descriptive family studies:
Since the discovery of OFC is familial, descriptive or observational family studies have been published; the first publication
came in 1757, and concerned a family with several affected members.
More than a century later, Darwin cited a paper by Sproule and mentioned “the transmission during a century of hare-lip with a cleft-palate” in his discussion of variation in plants and animals
EXPLAIN FURTHER how no DNA was collected in the pregenomics era, see section 3.3.4 and give examples of other non-DNA based genetic markers
The elucidation of DNA as the genetic molecule midway through the twentieth century led to the field of molecular genetics and eventually genomics.
Early molecular genetics advances were important for OFC because of the development of improved genetic markers (such as restriction fragment length polymorphisms and microsatellites) for family studies of candidate genes and for genome scans
Initial studies of candidate genes for OFC and other traits were done by utilizing non-DNA-based genetic markers such as the ABO blood group and Human leukocyte antigen HLA (assessed serologically) and employed statistical methods such as association analysis in case/control series or linkage analysis in multiplex families or affected relative pairs.
(I – K) Ventral views of the isolated mandible of wild type (I) and mutants (J, K). Arrowhead denotes the abnormal distal mandible in the mild mutant
In the more severe, Class 1 mutants, the majority of the mandible was absent with a small remnant of uncertain identity located near the base of the skull (Figs. 2C, D, J)
In the less severe Class 2 mutants, the proximal mandible was formed but distally truncated with no evidence of incisor tooth formation (Figs. 2I – K). From these data, we conclude that Bmp4, although not the sole functioning Bmp ligand in the distal mandible, provides a critical signal for mandibular morphogenesis.
Unbiased because:
analysed carefully to avoid any bias that could result in a false linkage prediction
Rules out specific genetic associations
Provides data on the ancestry of each subject, which assists in matching case subjects with control subjects
Provides data on both sequence and copy-number variations
Two genetic loci are said to be in linkage if the alleles at the two loci are so close together on the same chromosome that the chances of them separating by a crossover event (recombination) during Meiosis is small (http://www.practical-haemostasis.com/Genetics/linkage_analysis1.html)
A genome is an organism's complete set of DNA, including all of its genes. Each genome contains all of the information needed to build and maintain that organism. In humans, a copy of the entire genome—more than 3 billion DNA base pairs—is contained in all cells that have a nucleus.
An allele is a variant form of a gene. Some genes have a variety of different forms, which are located at the same position, or genetic locus, on a chromosome. Humans are called diploid organisms because they have two alleles at each genetic locus, with one allele inherited from each parent.
Unbiased because:
Apriori – no hypothesis
Genome-wide association studies
A genome-wide association study is an approach that involves rapidly scanning markers across the complete sets of DNA, or genomes, of many people to find genetic variations associated with a particular disease. (http://www.genome.gov/20019523)
Comparison of GWAS and genomewide linkage results
Association studies are more sensitive than linkage studies in detecting common variants of small effect size, but linkage studies are more robust in detecting etiologic genes that exhibit allelic heterogeneity—if multiple different variants (especially rare variants) within a gene can lead to OFC, linkage is much more likely to detect such genes.
Study samples differ for the two approaches: the linkage studies were enriched in familial cases, which make up approximately 20%–30% of CL/P samples, and the association studies were enriched in sporadic cases.
Sequencing and -omics in OFC
Researchers are currently applying both whole-genome and whole-exome sequencing studies of common, complex traits, including OFC now that the costs have become more reasonable.
There are also other “-omics” approaches including epigenomics, proteomics, and metabolomics that will be available in the future
Phenotyping/phenomics
Describing what is observed (phenotype)
Comparative genomic hybridization is a molecular cytogenetic method for analysing copy number variations (CNVs) relative to ploidy level in the DNA of a test sample compared to a reference sample, without the need for culturing cells.
Whole genome sequencing is a laboratory process that determines the complete DNA sequence of an organism's genome at a single time. This entails sequencing all of an organism's chromosomal DNA as well as DNA contained in the mitochondria and, for plants, in the chloroplast.
Single nucleotide polymorphism is a variation in a single nucleotide which may occur at some specific position in the genome, where each variation is present to some appreciable degree within a population (e.g. >1%).
Genome wide association studies is an examination of many common genetic variants in different individuals to see if any variant is associated with a trait. GWASs typically focus on associations between single-nucleotide polymorphisms (SNPs) and traits like major diseases.
Although genomewide association studies are increasingly popular, they present logistical and technical challenges. The primary challenge lies in selecting a disease or a trait suitable for analysis. A successful analysis is more likely when the phenotype of interest can be sensitively and specifically diagnosed or measured. For such studies, extremely large sample series are required, involving thousands of case subjects and control subjects
It is the specific location or position of a gene, DNA sequence, on a chromosome.
DEALING WITH THE CHALLENGES POSED BY PHENOTYPIC VARIABILITY
even with the extraordinary power to detect causative mutations that is afforded by different genetic technologies, we still cannot predict who will present with a particular craniofacial condition even when carrying an identified causative mutation.
Understanding the major factors that contribute to these issues endure as some of the biggest challenges in craniofacial medicine
A single nucleotide polymorphism or often abbreviated to just SNP (pronounced snip; plural snips), is a variation in a single nucleotide which may occur at some specific position in the genome, where each variation is present to some appreciable degree within a population (e.g. >1%)
How Does It Work ?
To carry out a genome-wide association study, researchers use two groups of participants: people with the disease being studied and similar people without the disease. Researchers obtain DNA from each participant, usually by drawing a blood sample or by rubbing a cotton swab along the inside of the mouth to harvest cells.
Each person's complete set of DNA, or genome, is then purified from the blood or cells, placed on tiny chips and scanned on automated laboratory machines. The machines quickly survey each participant's genome for strategically selected markers of genetic variation, which are called single nucleotide polymorphisms, or SNPs.
If certain genetic variations are found to be significantly more frequent in people with the disease compared to people without disease, the variations are said to be "associated" with the disease. The associated genetic variations can serve as powerful pointers to the region of the human genome where the disease-causing problem resides.
However, the associated variants themselves may not directly cause the disease. They may just be "tagging along" with the actual causal variants. For this reason, researchers often need to take additional steps, such as sequencing DNA base pairs in that particular region of the genome, to identify the exact genetic change involved in the disease.
Specifically tested for associations with time of first tooth eruption & number of teeth by one year of age
These phenotypes are relevant to later tooth development because teeth largely acquire their final form at a very early age
Tested 300, 766 SNPs from two cohorts:
4, 564 individuals from the Northern Finland Birth Cohort (NFBC 1966)
1, 518 individuals from the Avon Longitudinal Study of Parents and Children (ALSPC)
Results for the two cohorts were combined
DNA samples as well as age of first tooth appearance were available for 4,564 individuals from the Northern Finland Birth Cohort (NFBC 1966)
Review the presentation by Charis, Tooth Development II, Objective 2, to find out which syndrome is caused by EDA mutations
EDA was fundamental in forming the first teeth in organism
Mutations cause hypohidrotic ectodermal dysplasia & disorders of tooth agenesis
Figure 1. Linkage disequilibrium and association at loci reaching genome-wide significance for primary tooth development in meta-analysis of NFBC1966 and ALSPAC.
KCNJ2 gene region for time to first tooth eruption
EDA gene region for time to first tooth.
MSRB3 gene region for time to first tooth.
IGF2BP1 gene region for number of teeth at 12 months (SNP with high P at 44000 kb is that near HOXB2, rs6504340). Note: This is a gene-rich region, so most genes are omitted to simplify the plot.
RAD51L1 gene region for number of teeth at 12 months. -log10 p-value is plotted against genomic position (NCBI build 36). Most significant SNP in each region is plotted in blue, r2 with top SNP is colour coded red (0.8 – 1.0), orange (0.5 – 0.8), yellow (0.2 – 0.5), and white ,0.2. Gene annotations are based on Genome Browser (RefSeq Genes) and arrows represent direction of transcription. Recombination rate is estimated by LDhat using HapMap CEU sample. All r2 values are calculated in NFBC1966.
Five genetic loci were identified at genome-wide significance (P<5x10-8)
The genes at the loci identified have roles in organogenesis, growth & development, and cancer
KCNJ2 – teeth, jaws, palates, ears, fingers, toes
EDA – teeth, hair, sweat glands, salivary glands
IGF2BP1 – intestines
A Southern blot is a method used in molecular biology for detection of a specific DNA sequence in DNA samples. Southern blotting combines transfer of electrophoresis-separated DNA fragments to a filter membrane and subsequent fragment detection by probe hybridization.
an adaptation of the Southern blot procedure, used to identify specific amino-acid sequences in proteins.
Deoxyribonucleic acid DNA is a molecule that carries most of the genetic instructions used in the development, functioning and reproduction of all known living organisms and many viruses.
A genome is an organism's complete set of DNA, including all of its genes. Each genome contains all of the information needed to build and maintain that organism. In humans, a copy of the entire genome—more than 3 billion DNA base pairs—is contained in all cells that have a nucleus.
The HapMap:
Is a catalog of common genetic variants that occur in human beings.
It describes what these variants are, where they occur in our DNA, and how they are distributed among people within populations and among populations in different parts of the world.