Human Factors of XR: Using Human Factors to Design XR Systems
Assessing the impact of transposable element variation on mouse phenotypes and traits
1. Thomas Keane, WTSI 14th May, 2011
Assessing the impact of transposable element
variation on mouse phenotypes and traits
Thomas Keane
Vertebrate Resequencing Informatics
Wellcome Trust Sanger Institute
Cambridge, UK
Christoffer Nellåker and Chris Ponting
MRC Functional Genomics Unit
University of Oxford
Oxford, UK
2. Thomas Keane, WTSI 14th May, 2011
Transposable Elements (TEs)
Transposons are segments of DNA that can move within the genome
A minimal ‘genome’ – ability to replicate and change location
Dominate landscape of mammalian genomes
38-45% of rodent and primate genomes
Genome size proportional to number of TEs
Class 1 (RNA intermediate) and 2 (DNA intermediate)
Potent genetic mutagens
Disrupt expression of genes
Genome reorganisation and evolution
Transduction of flanking sequence
Transposable elements (TEs) active amongst laboratory mouse strains
Mouse Genomes Project: Whole genome sequencing of 17 key
laboratory mouse strains
13 classical laboratory strains and 4 wild derived inbred strains
Average of ~25x illumina sequencing per strain
3. Thomas Keane, WTSI 14th May, 2011
Agouti Mouse Model
Dolinoy PNAS 2007;104:13056–13061
4. Thomas Keane, WTSI 14th May, 2011
Mouse TEs
3 main classes of TEs in mouse genome
Long interspersed nuclear elements (LINE)
Short interspersed nuclear elements (SINE)
Endogenous retrovirus superfamily (ERV)
Etn, IAP, MuLV, IS2, MaLR, VL30, RLTR
Key questions
What is the true extent and distribution of TEs in the germline of laboratory mouse
strains?
What can we learn about the selective pressure acting on TEs maintained in the
germline?
How much phenotypic variation and complex traits can we associate with TEs?
5. Thomas Keane, WTSI 14th May, 2011
TE Calling
Terminology
B6+: Present in the reference genome
B6-: Not present in reference
TEV: Transposable element variant
Computational calling methods
B6+
SVMerge* pipeline: Integrate calls from several read-pair based SV ‘deletion’ (!) callers
(Kim Wong, WTSI)
B6-
RetroSeq** pipeline developed
Identifies discordant mate pairs and compares to a library of known TE sequences
Size estimation
Full length element (~5-8kb) vs. solo LTR (<1kb)
30-40x physical coverage long fragment (~3kb) end reads (15 strains)
Test if insertion point spanned by 3kb fragment read pairs
*Wong K, Keane TM, Stalker J, Adams DJ (2010) SVMerge: Enhanced structural variant and breakpoint detection by integration of multiple
detection methods and local assembly, Genome Biology, 11:R128
**RetroSeq available from https://github.com/tk2/RetroSeq
6. Thomas Keane, WTSI 14th May, 2011
B6+ TEV Example
C57B6/NJ strain has the ERV
Absent in DBA/2J strain
Flanking spanning read pairs denote
absence
DBA/2J
C57B6/NJ
7. Thomas Keane, WTSI 14th May, 2011
B6- TEV Example
NOD/ShiLtJ
Full length (~8kb) IAP insertion
Not spanned by 3kb fragment
reads
3kb fragments
Zoomed into breakpoint
9. Thomas Keane, WTSI 14th May, 2011
Callset Validation
B6+
Manually annotated all of Chr19 across 8 strains (Flint group, Oxford)
PCR validation of 250 randomly selected calls across 8 strains
B6-
PCR validation of 109 calls across 8 strains (Binnaz Yalcin, Oxford)
Initially SINE false positive rate found to be high
Further filtering of low complexity, microsatellites, simple repeats was required
Reduced false positive from ~30% to 9%
False negative determined by examining SDP from PCR data
Size status assignment accurate
>95% of SINEs assigned <3kb status
10. Thomas Keane, WTSI 14th May, 2011
Structure of ERV Families
!
"!
#!
$!
%!
&!!
'()*+(,-
.)-)/012
MuLV
VL30
IS2
ETn
RLTR1B
RLTR45
IAP
RLTR10
MaLR
34(5467819:892:;8<=>
!
"!
#!
$!
%!
&!!
?+@4A
19:
MuLV
VL30
IS2
ETn
RLTR1B
RLTR45
IAP
RLTR10
MaLR
34(54678)B8C46)D+5892:;8<=>
!
E
$
F
0)C&!8G,;4;892:;8+68C46)D4
MuLV
VL30
IS2
ETn
RLTR1B
RLTR45
IAP
RLTR10
MaLR
" #
$
34(5467819:8;)-)8012;8<=>
!
"!
#!
$!
%!
&!!
HG GI IJ J9 9? ?K KL
916
MH'
2012&!
N,02
24D,+6+6C
892:;
5’ LTR
(~430 nt)
3’ LTR
IAP Type I
7.3 kb (full length) gag-pol genes (usually defective)
Solo LTR
Solo LTR element
Recombination of the
flanking LTRs
13. Thomas Keane, WTSI 14th May, 2011
Density and Orientation within Genes
Distinct anti-sense bias observed in all types
Significantly different bias in first introns between ERVs vs SINEs
Orientation bias remains constant despite divergence of element
Biphasic selection process
Assuming no sense/anti-sense insertion bias
Implies that half of sense orientated ERVs and one third of SINE/LINEs are deleterious
! "
!"#
$!#
$"#
%!#
%"#
&!#
'()*+,*-.*/0#1
"23
&2%
$2!
425
627
3"233
3&23%
3$23!
!"#!#"$%#&'%&'()&()'#"%)&$*$%#&'+,-
89:;
<9:;
;=>
;?@*.A*B
;=> <9:; 89:;
!"#
$"#
%"#
&"#
3"#
"#
C(+DA
E(BBF*
<GDA
HHH HHH
! "
!"#
$!#
$"#
%!#
%"#
&!#
'()*+,*-.*/0#1
"23
&2%
$2!
425
627
3"233
3&23%
3$23!
!"#!#"$%#&'%&'()&()'#"%)&$*$%#&'+,-
89:;
<9:;
;=>
;?@*.A*B
;=> <9:; 89:;
!"#
$"#
%"#
&"#
3"#
"#
C(+DA
E(BBF*
<GDA
HHH HHH
"
!"#
$!#
$"#
%!#
%"#
&!#
'()*+,*-.*/0#1
"23
&2%
$2!
425
627
3"233
3&23%
3$23!
!"#!#"$%#&'%&'()&()'#"%)&$*$%#&'+,-
89:;
<9:;
;=>
;?@*.A*B
89:;
C(+DA
E(BBF*
<GDA
HHH
14. Thomas Keane, WTSI 14th May, 2011
QTLs associated with TEs
29
Table 3: QTLs associated with SVs
Phenotype Chr SV start SV stop
Ancestral
Event Gene SV overlap LogP
Mean platelet volume 1 175158884 175158885 insertion Fcer1a upstream 52.833
OFT Total activity 2 144402772 144402974 SINE insertion Sec23b intron 15.721
Hippocampus cellular proliferation marker 4 49690364 49690365 SINE insertion Grin3a intron 20.119
Home cage activity 4 108951264 108951265 ERV insertion Eps15 upstream 15.922
T-cells: %CD3 4 130038389 130038390 SINE insertion Snrnp40 intron 12.129
Wound healing 7 90731819 90731820 ERV insertion Tmc3 upstream 22.216
Red cells: mean cellular haemoglobin 7 111398000 111480000 insertion Trim5 exon 13.016
Red cells: mean cellular haemoglobin 7 111504957 111505193 deletion Trim30b UTR 12.806
Red cells: mean cellular volume 8 87957244 87957245 LINE insertion 4921524J17Rik upstream 18.141
Serum urea concentration 11 115106122 115106250 deletion Tmem104 UTR 13.404
Hippocampus cellular proliferation marker 13 113783196 113783359 deletion Gm6320 upstream 17.456
T-cells: CD4/CD8 ratio 17 34483680 34483681 deletion H2-Ea upstream 82.858
Start and stop coordinates are given for build37 of the mouse genome, so that insertions into the reference are given as
consecutive base pairs (columns headed SV start and SV stop). The part of the gene overlapped is reported in the column
headed SV overlap. LogP is the negative logarithm of the P-value for association between the SV and the phenotype as
assessed in outbred HS mice 22
.
Yalcin et al, under review
SINE deletion
16. Thomas Keane, WTSI 14th May, 2011
Conclusions
Unprecedented catalog (>100k) of mouse TEV elements identified
False positive and negative rates are low
Wild derived strains contain significantly more TEs
Evolutionary context shows expansion of ERVs in mouse lineage
Distinct anti-sense bias for all elements within genes
Estimate that half of sense orientated ERVs and one third of SINE/
LINEs are deleterious
17. Thomas Keane, WTSI 14th May, 2011
Acknowledgements
Mouse TE Project
Christoffer Nellåker (Oxford)
Wayne Frankel (Jax)
Chris Ponting (Oxford)
Mouse Genomes Project
Sanger
Petr Danecek
Kim Wong
David Adams
Richard Durbin
Sanger Sequencing Teams
EBI
Ewan Birney
Wellcome Trust Centre Oxford
Jonathan Flint et al.
Binnaz Yalcin
Avigail Agam
Richard Mott
Jackson Lab
Laura Reinholdt
Leah Rae Donahue
Further Information
http://www.sanger.ac.uk/mousegenomes
Contacts
thomas.keane@sanger.ac.uk
christoffer.nellaker@gmail.com
chris.ponting@dpag.ox.ac.uk