Presentation made to the staff of Keygene, NV, in Wageningen, The Netherlands.
(I don't know what the problem is with the template here. It looks fine if you use a dark background.)
Genomic selection and systems biology – lessons from dairy cattle breeding
1. J. B. Cole
Animal Improvement Programs Laboratory
Agricultural Research Service, USDA
Beltsville, MD 20705-2350, USA
john.cole@ars.usda.gov
Genomic
selec+on
and
systems
biology
–
lessons
from
dairy
ca5le
breeding
2. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(2)
Cole
Dairy Cattle
9 million cows in US
Attempt to have a calf born every year
Replaced after 2 or 3 years of milking
Bred using artificial insemination
Popular bulls have 10,000+ progeny
Cows can have many progeny though
superovulation and embryo transfer
3. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(3)
Cole
Embryo transferred to
recipient"
Parents selected"
Dam inseminated"
Bull born"
Semen collected (1 y)"
Daughters born (9 m later) "
Daughters have calves (2 y later)"
Bull receives
progeny test"
(5 y)"
Genomic Test"
Lifecycle of bull
4. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(4)
Cole
Phenotypes recorded
Monthly recording
Milk, fat, and protein yields
Somatic cell count (udder health)
Visual appraisal for type traits
Breed associations record pedigree
Calving difficulty and stillbirth
5. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(5)
Cole
Available data
Type of Data Number of Records
Cows with lactation data 28,394,976
Lactations 68,373,863
Individual test days 508,574,532
Dystocia records 20,770,758
Animals in pedigree file 58,893,009
Genotyped bulls 105,654
Genotyped cows 276,173
6. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(6)
Cole
0
50000
100000
150000
200000
250000
300000
1004 1008 1012 1104 1108 1112 1204 1208 1212 1304
Bulls Cows
Cole"
Many animals have been genotyped
Evaluation Date (YYMM)"
Genotypes"
381,827 genotyped animals"
Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(6)
7. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(7)
Cole
How does genetic selection work?
ΔG = genetic gain each year
reliability = how certain we are about our estimate of
an animal’s genetic merit (genomics can é)
selection intensity = how “picky” we are when making
mating decisions (management can é)
genetic variance = variation in the population due to
genetics (we can’t really change this)
generation interval = time between generations
(genomics can ê)
8. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(8)
Cole
8"
Calculation of genomic evaluations
Deregressed PTA derived from traditional
evaluations of predictor animals
Allele substitution effects estimated for
45,188 SNP
Polygenic effect estimated for genetic
variation not captured by SNP
Selection index combination of genomic
and traditional not included in genomic
9. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(9)
Cole
Many chips are available
HD"
50KV2 "
LD "
GGP HD!
BovineSNP50
Version 1 54,001 SNP
Version 2 54,609 SNP
45,188 used in evaluations
High-density (HD)
777,962 SNP
Only 50K SNP used,
Low-density (LD)
6,909 SNP
Geneseek Genomic Profiler & GGP-HD
10. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(10)
Cole
What is a SNP genotype worth?
For the protein yield
(h2=0.30), the SNP
genotype provides
information
equivalent to an
additional 34
daughters"
Pedigree is equivalent to information on about 7 daughters "
11. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(11)
Cole
And for daughter pregnancy rate (h2=0.04), SNP = 131 daughters"
What is a SNP genotype worth?"
12. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(12)
Cole
High density SNP chip
Currently only 50K subset of SNP used
Some increase in accuracy from better
tracking of QTL possible
Realized gains have been small
Potential for across-breed evaluations
Requires few new HD genotypes once
adequate base for imputation developed
13. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(13)
Cole
Low density SNP chip
6909 SNP mostly from SNP50 chip
Evenly spaced across 30 chromosomes
Addresses performance issues with 3K
while providing low-cost genotyping
Provides over 98% accuracy imputing
50K genotypes
14. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(14)
Cole
Parentage validation and discovery
Parent-progeny conflicts detected
Animal checked against all other genotypes
Reported to breeds and requesters
Correct sire usually detected
Maternal grandsire checking
SNP at a time checking
Haplotype checking more accurate
Breeds moving to accept SNP in place of
microsatellites
15. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(15)
Cole
Imputation
Based on splitting the genotype into
individual chromosomes
Missing SNP assigned by tracking
inheritance from ancestors and
descendants
Imputed dams increase predictor
population
3K, LD, & 50K genotypes merged by
imputing SNP not on LD or 3K
16. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(16)
Cole
Genotypes and haplotypes
Genotypes indicate how many copies of
each allele were inherited
Haplotypes indicate which alleles are on
which chromosome
Observed genotypes partitioned into the
two unknown haplotypes
Pedigree haplotyping uses relatives
Population haplotyping finds matching
allele patterns
17. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(17)
Cole
Haplotyping program – findhap.f90
Begin with population haplotyping
Divide chromosomes into segments,
~250 to 75 SNP / segment
List haplotypes by genotype match
Similar to fastPhase, IMPUTE
End with pedigree haplotyping
Detect crossover, fix noninheritance
Impute nongenotyped ancestors
20. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(20)
Cole
Recessive defect discovery
Check for homozygous haplotypes
7 to 90 expected but none observed
5 of top 11 are potentially lethal
936 to 52,449 carrier sire-by-carrier
MGS fertility records
3.1% to 3.7% lower conception rates
Some slightly higher stillbirth rates
Confirmed Brachyspina same way
21. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(21)
Cole
Impact on producers
Young-bull evaluations with accuracy of
early 1stcrop evaluations
AI organizations marketing genomically
evaluated 2-year-olds
Rate of genetic improvement may
increase by up to 50%
Studs reducing progeny-test programs
22. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(22)
Cole
Why genomics works in dairy
Extensive historical data available
Well-developed genetic evaluation program
Widespread use of AI sires
Progeny test programs
High-valued animals, worth the cost of
genotyping
Long generation interval which can be reduced
substantially by genomics
23. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(23)
Cole
Where do we go from here?
We found a few QTL
Most traits show infinitessimal
inheritance
Dominance effects also are small
What about epistasis?
Systems biology – gene/protein/
transcription factor networks
24. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(24)
Cole
24"
We confirmed known QTL
Cole, J.B. et al. 2009. Distribution and location of genetic effects for dairy traits. ICAR Tech Ser. 13:355–360."
25. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(25)
Cole
Gene set enrichment analysis-SNP
Gene
pathways (G)"
GWAS results"
Score increase is proportional to SNP test
statistic"
Nominal p-value corrected for multiple
testing"
Pathways with
moderate
effects"
Holden et al., 2008 (Bioinformatics 89:1669-1683. doi:10.2527/jas.2010-3681)"
SNP ranked by
significance
(L)"
SNP in
pathway genes
(S)"
Score
increases for
each Li in S"
Permutation
test and FDR"
Includes all SNP, S, that are included in L"
The more SNP in S
that appear near the
top of L, the higher the
Enrichment Score"
26. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(26)
Cole
Association weight matrix
Find gene coexpression networks (Fortes et al., 2010)
Select SNP by significance, correlation, dist’n, etc.
− Favor intragenic SNP significant across traits
Construct weight matrix
− Rows are SNP, columns are traits cols
− Cells are normalized z-score of the additive
effect of ith SNP on jth trait
Significant correlations are identified using PCIT
(Reverter and Chan, 2008) and visualized
− Cells randomly permuted as control
27. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(27)
Cole
Can we identify regulatory networks?
Fortes et al., 2011 (J. Animal Sci. 89:1669-1683. doi:10.2527/jas.2010-3681)"
Candidate
genes and
pathways that
affect age at
puberty
common to
both breeds"
28. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(28)
Cole
Network analysis
Fortes et al., 2011 (J. Animal Sci. 89:1669-1683. doi:10.2527/jas.2010-3681)"
Gene network – the
red center identifies
highly connected nodes."
Subnetwork of interacting
transcription factors from
the puberty network."
Subnetwork of
interacting
transcription factors
from a collection of
mouse and human
data. (Validation
step.)"
29. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(29)
Cole
Enriched pathways
Fortes et al., 2011 (J. Animal Sci. 89:1669-1683. doi:10.2527/jas.2010-3681)"
30. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(30)
Cole
Transcription factor network
Fortes et al., 2011 (J. Animal Sci. 89:1669-1683. doi:10.2527/jas.2010-3681)"
Yellow genes were
submitted to
database.
Other nodes were
mined from
FunCoup.
Red: protein-
protein interaction
Blue: mRNA
coexpression"
31. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(31)
Cole
How do we rank allele effects?
GSEA and AWM require that we order
SNP on some criterion
p-values (actual or nominal)
q-values (false discovery rate)
Not all models provide p-values
Allele substitution effects (not so good)
Scaled substitution effects (better)
It’s not clear (to me) which is best
32. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(32)
Cole
Aren’t P-values easy?
Single SNP, fixed-effects model
Inflation of error variances
Spurious associations
e.g., Plink
Multiple SNP, mixed-effects model
Accounts for population structure
e.g., TASSEL, GoldenHelix SVS
33. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(33)
Cole
A recent example from dairy
Extreme birth weights are associated
with increased risk of stillbirth and
calving difficulty
Birth weights are not measured on most
dairy farms in the US
With German colleagues, we developed a
predictor based on traits we do measure
34. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(34)
Cole
GWAS for birth weight PTA
h"
Cole et al.(2013), unpublished data"
35. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(35)
Cole
KEGG pathways for birth weight
What does
regulation of the
actin
cytoskeleton
have to do with
birth weight in
cattle?
That is, do
these results
make sense?"
Maybe…these
pathways may
be involved in
establishment &
maintenance of
pregnancy, as
well as
coordination of
growth and
development.
"
Cole et al.(2013), unpublished data"
36. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(36)
Cole
A new project
The Brown Swiss, Holstein, and Jersey
breeds experience dystocia at different
rates
We are applying the AWM method of
Fortes et al. to these data
The goal is to identify gene networks…
Common to all breeds
Different by breed
37. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(37)
Cole
We have divergent populations
Cole et al., 2005 (J. Dairy Sci. 88(4):1529–1539)"
38. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(38)
Cole
Challenges
Annotation
This is a mess in the cow
The reference assembly may not be
representative of all taurine cows
Validation
Doing functional genomics with large
mammals is expensive – who pays?
When have we proven something?
39. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(39)
Cole
Conclusions
We’re not going to find big QTL for most
traits
We may identify gene networks affecting
complex phenotypes
We’re learning how much we don’t know
about functional genomics in the cow
Validation remains a problem
40. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(40)
Cole
Partners
Illumina
Marylinn Munson
Cindy Lawley
Christian Haudenschild
BARC
Curt Van Tassell
Lakshmi Matukumalli
Tad Sonstegard
Missouri
Jerry Taylor
Bob Schnabel
Stephanie McKay
Alberta
Steve Moore
USMARC – Clay Center
Tim Smith
Mark Allan
iBMAC Consortium" Funding Agencies"
USDA/NRI/CSREES
2006-35616-16697
2006-35205-16888
2006-35205-16701
USDA/ARS
1265-31000-081D
1265-31000-090D
5438-31000-073D
Merial
Stewart Bauck
NAAB
Godon Doak
ABS Global
Accelerated Genetics
Alta Genetics
CRI/Genex
Select Sires
Semex Alliance
Taurus Service
41. Keygene
N.V.,
Wageningen,
The
Netherlands,
28
May
2013
(41)
Cole
Questions?
http://gigaom.com/2012/05/31/t-mobile-pits-its-math-against-verizons-the-loser-common-sense/shutterstock_76826245/"