Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Proteome bioinformatics and genetics for associating proteins with grain phenotype
1. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Rudi Appels,
Centre for Comparative Genomics, Murdoch University, Australia
Paula Moolhuijzen, Brett Chapman, Wujun Ma, Dean Diepeveen, Matthew
Bellgard,
Centre for Comparative Genomics, Murdoch University and Department
of Food and Agriculture WA, Australia.
Yueming Yan, Shunli Wang,
Capital Normal University, Beijing
Angela Juhasz,
Agricultural Institute, Martonvá r, Hungary
sá
Frank Bekes,
FBFD Pty Ltd, Beecroft, Sydney, Australia 2119
CENTRE FOR
COMPARATIVE GENOMICS
2. Centre for Comparative Genomics (CCG) at Murdoch University
Supercomputer
• Stage 1A Pawsey
Centre (SKA)
• Ranked 87 in the
world
• 9600 cores
CENTRE FOR
COMPARATIVE GENOMICS
3. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Rudi Appels,
Centre for Comparative Genomics, Murdoch University, Australia
Paula Moolhuijzen, Brett Chapman, Wujun Ma, Dean Diepeveen, Matthew
Bellgard,
Centre for Comparative Genomics, Murdoch University and Department
of Food and Agriculture WA, Australia.
Yueming Yan, Shunli Wang,
Capital Normal University, Beijing
Angela Juhasz,
Agricultural Institute, Martonvá r, Hungary
sá
Frank Bekes,
FBFD Pty Ltd, Beecroft, Sydney, Australia 2119
CENTRE FOR
COMPARATIVE GENOMICS
4. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
• Genome sequencing and high resolution genetic maps of wheat
• Integrating new wheat protein level analyses
• Translating research findings to industry – the Decision Matrix
5. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
• The integration of new efforts to obtain reference sequences for bread
wheat and barley genomes is accelerating gene discovery.
• Locations of traits and proteins on DNA sequence assemblies via
genetic maps define gene networks
•The genomic resources are refining molecular marker development and
mapping strategies for combining yield with quality attributes of the
grain that meet markets requirements
6. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
• Genome sequencing and high resolution genetic maps of wheat
• Integrating new wheat protein level analyses
• Translating research findings to industry – the Decision Matrix
7. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Locations of proteins within a
genetic map can be determined
One of the first examples was
published by Amiour (2003) using
2D gels to identify chromosomal
locations of amphiphilic proteins
from wheat grains .
Later Chen et al (2007) carried out
mapping using MALDI-TOF defined
peaks of gliadin
Progress in the DNA sequencing of
the wheat transcribed genes and
now allows higher resolution maps
to be established
Amiour N, et al (2003) Theor. Appl. Genet. 108: 62–72. .
Chen J, et al (2007) Rapid Comm Mass Spectrometry 21: 2913 – 2917
8. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
2007 – 2012
Suites of genomic resources and knowledge have been established to provide
the foundation for sequencing the wheat and barley
• International Wheat Genome Sequencing Consortium (www.wheatgenome.org)
• UK WISP consortium (www.wheatisp.org)
• International Barley Sequencing Consortium (www.barleygenome.org)
• European TriticeaeGenome FP7 project (www.triticeaegenome.eu)
The initiatives built on long standing resources such as:
• KOMUGI in Japan (www.shigen.nig.ac.jp/wheat/komugi/)
• Graingenes in the USA (wheat.pw.usda.gov/GG2/index.shtml)
• Extensive EST collections (ITEC http://avena.pw.usda.gov/genome/)
9. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
• Reducing the complexity of the
wheat genome through flow
sorting of chromosome arms has
formed the basis for the
international effort to produce a
reference sequence for the variety
Chinese Spring
• All the chromosome arms now
have a completed survey sequence
analysis. This provides a pool of
DNA contigs that can be used to
anchor gene sequences and
proteins to chromosome arms
10. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
The array technologies to
assay single nucleotide
polymorphisms (SNPs) is now
establishing genetic maps with
2000-3000 molecular markers
.
map for chromosomes
1A, 1B, 1D, from a cross,
Avalon x Cadenza
Allen AM, Barker GLA, Berry ST, Coghill, JA, Gwilliam R, Kirby S, Robinson P, Brenchley RC, D’Amore R,
McKenzie N, Waite D, Hall A, Bevan M, Neil Hall N, Edwards KJ. (2011)Transcript-specific, single-nucleotide
polymorphism discovery and linkage analysis in hexaploid bread wheat (Triticum aestivum L.). Plant Biotechnology
Journal 2011: 1–14
11. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Chromosome 7A
The 9000 SNP array (“chip”) technology for assaying
SNPs has been used to establish a 2000 molecular
marker map for a set of 225 double haploid lines from a
Westonia x Kauz cross.
A large study in Australia is examining progeny from a
complex cross (MAGIC, currently a 4 –way cross using
Baxter, Yitpi, Westonia, Chara, 1500 lines, with markers
from a 9K SNP chip and markers from a 90K chip
planned). This work at CSIRO with Colin Cavanagh.
An 8 –way cross using Baxter, Yitpi, Westonia, AC
Barrie (Canada), Alsen (US), Pastor (CIMMYT),
Xiaoyan 54 (China), and Volcani (Israel), 5000 lines are
being characterized.
12. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
In a large population of 5,000 lines (as required for accurate mapping) it is not
feasible to phenotype all progeny
The marker information can be used to define families of progeny for
phenotyping
For the 1500 lines from the 4x MAGIC lines, a population 370 families have
been defined for phenotyping (in duplicated/randomized designs) and while we
are still in the middle of this analysis (includes milling yield), some QTL for %
wet gluten at the LMW-glutenin locus of chromosome 1B are evident.
It is interesting that in the high resolution maps the QTL may not be exactly
superimposed on the LMW-glutenin locus.
13. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
GluStar system
for “wet
gluten” • MAGIC and
measurements assignment of a QTL
on 4.5 g flour for % wet gluten to
1B near the LMW
glutenin locus but
not coincident with it
• The high density of
markers allows a
fine resolution of
map location when
1,500 progeny are
analyzed
Tomoshozi S, Budapest University of Technology and Economy; http://www.labintern.hu
14. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
To determine protein fingerprints as a “phenotype” we have explored MALDI-
TOF as a means for increasing the number of lines we can analyse.
Low molecular weight glutenins
Li et al (2010). BMC Plant Biology 10:124
16. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
High molecular weight glutenins (70,000– 90,000 Da)
Li et al (2009). Cereal Sci. 50: 295-301; Gao L et al (2010). J Ag Food Chem 58: 2777–2786
18. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
The MALDI-TOF based analyses of the LMW and HMW glutenins have
provided a good basis for establishing a high throughput analysis for breeding
programs. This analysis now runs as a fee-for-service (Saturn Biotech;
AUS$6/sample).
The glutenin subunit protein loci we know to date however can only account
for approximately 60% of the variation in measured grain quality attributes.
More detailed genetic analyses is yielding new information
19. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Chromosome 1D
Map based on DH lines from a
L29183 Westonia x Kauz cross
L33288
The classic designation of the LMW
L33529
glutenin locus Westonia on
chromosome 1D is LMWG-D3c (in
addition to A3c, B3h).
Kauz designation is not known
Peaks from:
Westonia = L33288
Kauz = L29183, L33529
Peaks found in LMWG-D3c (based on
Li et al 2010):
33021
33290
33453
Li et al (2010). BMC Plant Biology 10:124
20. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Chromosome 1D
Map based on DH lines from a
L29183 Westonia x Kauz cross
L33288
The classic designation of the LMW
L33529
glutenin locus Westonia on
chromosome 1D is LMWG-D3c (in
addition to A3c, B3h).
Kauz designation is not known
Peaks from:
Westonia = L33288
Kauz = L29183, L33529
Peaks found in LMWG-D3c (based on
Li et al 2010):
33021
33290
33453
Li et al (2010). BMC Plant Biology 10:124
21. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Chromosome 7A
Classical mapping of LMW-glutenin loci defined the
chromosome 1A, 1B and 1D loci based on single
dimension SDS PAGE technology (Gupta and Shepherd,
1994) and it was noted then that the protein family was
complex.
We now find some of the peaks in the MALDI-TOF are
mapping to other chromosomes such as chromosome
7A
We used our wheat proteome data base to see if we
could identify the L32831 and L31965 proteins
L32831
L31965
Gupta and Shepherd (1994. Two-step one-dimensional SDS-PAGE
analysis of LMW subunits of glutenin. I. Variation and genetic control of
the subunits in hexaploid wheats. Theor. Appl. Genet. 80:65-74)
22. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Chromosome 7A
In this analysis we are accessing a complex
part of the LMW glutenin protein
spectrum that was not available for
analysis in the earlier SDS gel-based
studies
L32831
L31965
23. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Chromosome 7A
In this analysis we are accessing a complex
part of the LMW glutenin protein
spectrum that was not available for
analysis in the earlier SDS gel-based
studies
L32831
L31965
24. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Criteria for database search:
(1) Qualitative – amino acid composition (occurrence of QQQ etc) consistent
with being co-extracted with LMW-glutenins (gliadins removed before-
hand)
(2) Quantitative – molecular weight within 10 dalton
>Komugi_AJ133603_1 AJ133603
7209247 [Triticum aestivum]
Query : L31965 Triticum aestivum mRNA for alpha-
gliadin storage protein, clone alpha-9
IWGSC_4DS_v1_2275417.fa.genscan.pep.1 31960
MVRVTVPQLQPQNPSQQQPQEQ
IWGSC_2AL_v1_6356128.fa.genscan.pep.2 31960
VPLVQQQQFLGQQQPFPPQQPYP
IWGSC_4BS_v1_4917914.fa.genscan.pep.1 31960 QPQPFPSQQPYLQLQPFPQPQLP
IWGSC_1AL_v2_3915175.fa.genscan.pep.1 31960 YSQPQPFRPQQPYPQPQPQYSQP
Komugi_ AJ133603_1 31960 QQPISQQQQQQQQQQQQQQQQ
QQQQQQQILQQILQQQLIPCMDV
IWGSC_3B_v1_10586963.fa.genscan.pep.1 31961 VLQQHNIVHGRSQVLQQSTYQLL
IWGSC_5DS_v1_2734070.fa.genscan.pep.1 31961 QELCCQHLWQIPEQSQCQAIHNV
IWGSC_2BS_v1_5247743.fa.genscan.pep.3 31961 VHAIILHQQQKQQQQPSSQVSFQ
QPLQQYPLGQGSFRPSQQNPQAQ
GSVQPQQLPQFEEIRNLALQTLPA
MCNVYIPPYCTIAPFGIFGTNYR
25. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Criteria for database search:
(1) Qualitative – amino acid composition (occurrence of QQQ etc) consistent
with being co-extracted with LMW-glutenins (gliadins removed before-
hand)
(2) Quantitative – molecular weight within 10 dalton
Query : L32831 >Solomon_B2ZRD2_WHEAT B2ZRD2
SubName: Full=Alpha-gliadin; [Triticum
IWGSC_4BL_v1_6996674.fa.genscan.pep.4 31980 aestivum (Wheat).]
MKTFLILALLAIVATTATTAGRVPVPQL
QPQNPSQQQPQEQVPLVQQQQFLGQ
Solomon_Q8H0J4_WHEAT 31934 QQPFPPQQPYPQPQPFPSQQPYLQLQP
FPQPQLPYSQPQPFRPQQPYPQPQPQY
Solomon_B2ZRD2_WHEAT 32829 SQPQQPISQQQQQQQQQQQQQQQEQ
QILQQILQQQLIPCMDVVLQQHNIAH
GRSQVLQQSTYQLLQELCCQHLWQIP
EQSQCQAIHNVVHAIILHQQQKQQQQ
PSSQFSFQQPLQQYPLGQGSSRPSQQN
PQAQGSVQPQQLPQFEEIRNLALQTLP
AMCNVYIPPYCTIAPFGIFGTN
26. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Chromosome 7A
This analysis suggests that there are probably
more genetic loci for major seed storage proteins
than we have found to date.
Genome sequencing and proteome analyses,
combined with genetic mapping can define these
new loci and provide molecular markers for
breeding and selection.
It turns out that a 1980 report did find
LMWG/gliadins on 4B and 7A
Salcedo G, Prada J, Sanchez-Monge R,
Aragoncillo C (1980). Aneuploid analysis of low
L32831 molecular weight gliadins from wheat. Theor
L31965 Appl Genet 56 ; 65-69
27. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Chromosome 7A
The “hits” on chromosome 7A will be resolved
as we have now started to sequence this
chromosome, as a national project in Australia.
This is part of the International Wheat
Genome Sequencing Consortium (IWGSC) in
which different countries around the world are
doing a chromosome each.
L32831
L31965
28. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
• Genome sequencing and high resolution genetic maps of wheat
• Integrating new wheat protein level analyses
• Translating research findings to industry – the Decision Matrix
29. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
The Wheat Proteome database:
Motivation : wheat genome, transcriptome and proteome studies are now advanced
and need a reference proteome database for
• annotating the genes in the wheat
• assigning peptides, obtained from high level proteomic analyses, to wheat proteins
Content of proteins/peptides:
• wheat/Triticum entries from SwissProt, UniProt, TrEMBL (2,690)
• translation from the KOMUGI full-length cDNA collection (13,717)
• peptides from INRA (France), USDA (USA), CNU (China) labs (still sorting out a
final non-redundant set)
• IWGSC-genome-wide-sequence (GWS) gene model translations (144,920)
30. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
The Wheat Proteome database:
(1) Translations of conserved genes.
The IWGSC-GWS database for each chromosome arm typically identifies 4000-9000
genic sequences per chromosome. These include gene fragments and pseudogenes.
Following their identification, genes conserved between wheat, Brachypodium, rice,
sorghum and barley (Klaus Mayer “chromosome zipper”) can be clustered into
syntenic groups.
(2) Non-redundant proteins/wheat known to originate from wheat
30-40% of the gene complement in wheat and barley do not reside in the conserved
syntenic gene order space
All genes and protein/peptide sequences need to be anchored to the IWGSC-GWS
chromosome arms DNA sequences. So far only 205 KOMUGI translations and 6 from
the SwissProt/UniProt/TrEMBL dataset have been anchored to the IWGSC-GWS
translations so there is quite a bit of curation to carry out.
31. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
• Genome sequencing and high resolution genetic maps of wheat
• Integrating new wheat protein level analyses
• Translating research findings to industry – the Decision Matrix
32. To complete this presentation it
Weights assigned to features
is important to consider
translating research findings to Feature
industry.
Genome Gene Protein Other
fingerprint marker marker traits
(1) Further stream-lining of the
MALDI-TOF scoring of wheat
proteins For each breeding line
(matrix rows) the
(2) Assigning a toxicity score to feature score (matrix
specific proteins in considering
selection index values
columns) is multiplied
celiac and wheat allergy by the feature weight.
reactions to wheat flour
These are then added
to provide a selection
index (SI)
The aim is to be able to enter This SI is used to rank
specific features of the wheat grain breeding lines or
as a number into a Decision Matrix suitability for an end-
product in industry
33. (1) Further stream-lining of the MALDI-TOF scoring of wheat proteins we are following
the MALDIquant process described by Sebastian Gibb (IMISE, University of Leipzig)
1: raw 2: variance stabilization 3: smoothing
4: base line correction 5: peak detection 6: peak plot
Dean Diepeveen
34. (2) Assigning a toxicity score to specific proteins in considering celiac disease (CD) and
wheat allergy (WA) reactions to wheat flour
Proof of concept by Angla Juhasz and Frank Bekes carried on
the data set published by DuPont et al (2011)
Every protein in the wheat grain defined by DuPont et al
(2011) was assigned a toxicity score which is the result of the
amount of protein in the grain x the number of epitopes
present that are known to relate to CD and or WA
35. (2) Assigning a toxicity score to specific proteins in considering celiac disease (CD) and
wheat allergy (WA) reactions to wheat flour
Proof of concept by Angla Juhasz and Frank Bekes carried on
the data set published by DuPont et al (2011)
Every protein in the wheat grain defined by DuPont et al
(2011) was assigned a toxicity score which is the result of the
amount of protein in the grain x the number of epitopes
present that are known to relate to CD and or WA
36. Proteome bioinformatics and genetics for associating
proteins with grain phenotype
• Genome sequencing and high resolution genetic maps of wheat
• Integrating new wheat protein level analyses
• Translating research findings to industry – the Decision Matrix
The proteins of the wheat grain form a significant
phenotype in breeding, industry processing and
marketing, and will become more important in
defining the product