Proteome bioinformatics and genetics for associating proteins with grain phenotype

Proteome bioinformatics and genetics for associating
proteins with grain phenotype
Rudi Appels,
Centre for Comparative Genomics, Murdoch University, Australia

Paula Moolhuijzen, Brett Chapman, Wujun Ma, Dean Diepeveen, Matthew
Bellgard,
Centre for Comparative Genomics, Murdoch University and Department
of Food and Agriculture WA, Australia.

Yueming Yan, Shunli Wang,
Capital Normal University, Beijing

Angela Juhasz,
Agricultural Institute, Martonvá r, Hungary
sá

Frank Bekes,
FBFD Pty Ltd, Beecroft, Sydney, Australia 2119

CENTRE FOR
COMPARATIVE GENOMICS

Centre for Comparative Genomics (CCG) at Murdoch University
Supercomputer
• Stage 1A Pawsey
Centre (SKA)
• Ranked 87 in the
world
• 9600 cores

CENTRE FOR
COMPARATIVE GENOMICS

• Genome sequencing and high resolution genetic maps of wheat

• Integrating new wheat protein level analyses

• Translating research findings to industry – the Decision Matrix


• The integration of new efforts to obtain reference sequences for bread
wheat and barley genomes is accelerating gene discovery.

• Locations of traits and proteins on DNA sequence assemblies via
genetic maps define gene networks

•The genomic resources are refining molecular marker development and
mapping strategies for combining yield with quality attributes of the
grain that meet markets requirements

Locations of proteins within a
genetic map can be determined

One of the first examples was
published by Amiour (2003) using
2D gels to identify chromosomal
locations of amphiphilic proteins
from wheat grains .

Later Chen et al (2007) carried out
mapping using MALDI-TOF defined
peaks of gliadin

Progress in the DNA sequencing of
the wheat transcribed genes and
now allows higher resolution maps
to be established

Amiour N, et al (2003) Theor. Appl. Genet. 108: 62–72. .
Chen J, et al (2007) Rapid Comm Mass Spectrometry 21: 2913 – 2917


2007 – 2012
Suites of genomic resources and knowledge have been established to provide
the foundation for sequencing the wheat and barley

• International Wheat Genome Sequencing Consortium (www.wheatgenome.org)

• UK WISP consortium (www.wheatisp.org)

• International Barley Sequencing Consortium (www.barleygenome.org)

• European TriticeaeGenome FP7 project (www.triticeaegenome.eu)

The initiatives built on long standing resources such as:

• KOMUGI in Japan (www.shigen.nig.ac.jp/wheat/komugi/)

• Graingenes in the USA (wheat.pw.usda.gov/GG2/index.shtml)

• Extensive EST collections (ITEC http://avena.pw.usda.gov/genome/)


• Reducing the complexity of the
wheat genome through flow
sorting of chromosome arms has
formed the basis for the
international effort to produce a
reference sequence for the variety
Chinese Spring

• All the chromosome arms now
have a completed survey sequence
analysis. This provides a pool of
DNA contigs that can be used to
anchor gene sequences and
proteins to chromosome arms


The array technologies to
assay single nucleotide
polymorphisms (SNPs) is now
establishing genetic maps with
2000-3000 molecular markers
.

map for chromosomes
1A, 1B, 1D, from a cross,
Avalon x Cadenza

Allen AM, Barker GLA, Berry ST, Coghill, JA, Gwilliam R, Kirby S, Robinson P, Brenchley RC, D’Amore R,
McKenzie N, Waite D, Hall A, Bevan M, Neil Hall N, Edwards KJ. (2011)Transcript-specific, single-nucleotide
polymorphism discovery and linkage analysis in hexaploid bread wheat (Triticum aestivum L.). Plant Biotechnology
Journal 2011: 1–14

Chromosome 7A

The 9000 SNP array (“chip”) technology for assaying
SNPs has been used to establish a 2000 molecular
marker map for a set of 225 double haploid lines from a
Westonia x Kauz cross.

A large study in Australia is examining progeny from a
complex cross (MAGIC, currently a 4 –way cross using
Baxter, Yitpi, Westonia, Chara, 1500 lines, with markers
from a 9K SNP chip and markers from a 90K chip
planned). This work at CSIRO with Colin Cavanagh.

An 8 –way cross using Baxter, Yitpi, Westonia, AC
Barrie (Canada), Alsen (US), Pastor (CIMMYT),
Xiaoyan 54 (China), and Volcani (Israel), 5000 lines are
being characterized.


In a large population of 5,000 lines (as required for accurate mapping) it is not
feasible to phenotype all progeny

The marker information can be used to define families of progeny for
phenotyping

For the 1500 lines from the 4x MAGIC lines, a population 370 families have
been defined for phenotyping (in duplicated/randomized designs) and while we
are still in the middle of this analysis (includes milling yield), some QTL for %
wet gluten at the LMW-glutenin locus of chromosome 1B are evident.

It is interesting that in the high resolution maps the QTL may not be exactly
superimposed on the LMW-glutenin locus.


GluStar system
for “wet
gluten” • MAGIC and
measurements assignment of a QTL
on 4.5 g flour for % wet gluten to
1B near the LMW
glutenin locus but
not coincident with it

• The high density of
markers allows a
fine resolution of
map location when
1,500 progeny are
analyzed

Tomoshozi S, Budapest University of Technology and Economy; http://www.labintern.hu

To determine protein fingerprints as a “phenotype” we have explored MALDI-
TOF as a means for increasing the number of lines we can analyse.
Low molecular weight glutenins

Li et al (2010). BMC Plant Biology 10:124


High molecular weight glutenins (70,000– 90,000 Da)

Li et al (2009). Cereal Sci. 50: 295-301; Gao L et al (2010). J Ag Food Chem 58: 2777–2786

HMW-GS Mr (Da) deduced from coding gene Mr (Da) by MALDI-TOF
1Ax2* 86309 86200
1Bx6 Unknown 86500
1Bx7 82524 82300
1Bx7OE 83134 82900
1Bx7b* Unknown 82600
1Bx13 Unknown 83000
1Bx14 84012 83600
1Bx17 78607 77900, 78400
1Bx20 Unknown 82100
1Dx2 87022 87000
1Dx3 Unknown 85400
1Dx5 88128 87900
1By8 75156 74900
1By8a* Unknown 74800
1By8b* Unknown 75000
1By9 73515 73300
1By15 75733 74900
1By16 Unknown 76900
1By18 Unknown 75000
1By20 Unknown 74900
1Dy10 67473 67300 Li et al (2009) Cereal
1Dy12 68652 68300 Sci. 50: 295-301;


The MALDI-TOF based analyses of the LMW and HMW glutenins have
provided a good basis for establishing a high throughput analysis for breeding
programs. This analysis now runs as a fee-for-service (Saturn Biotech;
AUS$6/sample).

The glutenin subunit protein loci we know to date however can only account
for approximately 60% of the variation in measured grain quality attributes.

More detailed genetic analyses is yielding new information

Chromosome 1D
Map based on DH lines from a
L29183 Westonia x Kauz cross
L33288
The classic designation of the LMW
L33529
glutenin locus Westonia on
chromosome 1D is LMWG-D3c (in
addition to A3c, B3h).

Kauz designation is not known

Peaks from:
Westonia = L33288
Kauz = L29183, L33529

Peaks found in LMWG-D3c (based on
Li et al 2010):
33021
33290
33453
Li et al (2010). BMC Plant Biology 10:124

Chromosome 7A

Classical mapping of LMW-glutenin loci defined the
chromosome 1A, 1B and 1D loci based on single
dimension SDS PAGE technology (Gupta and Shepherd,
1994) and it was noted then that the protein family was
complex.

We now find some of the peaks in the MALDI-TOF are
mapping to other chromosomes such as chromosome
7A

We used our wheat proteome data base to see if we
could identify the L32831 and L31965 proteins

L32831
L31965
Gupta and Shepherd (1994. Two-step one-dimensional SDS-PAGE
analysis of LMW subunits of glutenin. I. Variation and genetic control of
the subunits in hexaploid wheats. Theor. Appl. Genet. 80:65-74)

Chromosome 7A

In this analysis we are accessing a complex
part of the LMW glutenin protein
spectrum that was not available for
analysis in the earlier SDS gel-based
studies

L32831
L31965

Criteria for database search:

(1) Qualitative – amino acid composition (occurrence of QQQ etc) consistent
with being co-extracted with LMW-glutenins (gliadins removed before-
hand)
(2) Quantitative – molecular weight within 10 dalton
>Komugi_AJ133603_1 AJ133603
7209247 [Triticum aestivum]
Query : L31965 Triticum aestivum mRNA for alpha-
gliadin storage protein, clone alpha-9
IWGSC_4DS_v1_2275417.fa.genscan.pep.1 31960
MVRVTVPQLQPQNPSQQQPQEQ
IWGSC_2AL_v1_6356128.fa.genscan.pep.2 31960
VPLVQQQQFLGQQQPFPPQQPYP
IWGSC_4BS_v1_4917914.fa.genscan.pep.1 31960 QPQPFPSQQPYLQLQPFPQPQLP
IWGSC_1AL_v2_3915175.fa.genscan.pep.1 31960 YSQPQPFRPQQPYPQPQPQYSQP
Komugi_ AJ133603_1 31960 QQPISQQQQQQQQQQQQQQQQ
QQQQQQQILQQILQQQLIPCMDV
IWGSC_3B_v1_10586963.fa.genscan.pep.1 31961 VLQQHNIVHGRSQVLQQSTYQLL
IWGSC_5DS_v1_2734070.fa.genscan.pep.1 31961 QELCCQHLWQIPEQSQCQAIHNV
IWGSC_2BS_v1_5247743.fa.genscan.pep.3 31961 VHAIILHQQQKQQQQPSSQVSFQ
QPLQQYPLGQGSFRPSQQNPQAQ
GSVQPQQLPQFEEIRNLALQTLPA
MCNVYIPPYCTIAPFGIFGTNYR

Criteria for database search:

(1) Qualitative – amino acid composition (occurrence of QQQ etc) consistent
with being co-extracted with LMW-glutenins (gliadins removed before-
hand)
(2) Quantitative – molecular weight within 10 dalton

Query : L32831 >Solomon_B2ZRD2_WHEAT B2ZRD2
SubName: Full=Alpha-gliadin; [Triticum
IWGSC_4BL_v1_6996674.fa.genscan.pep.4 31980 aestivum (Wheat).]
MKTFLILALLAIVATTATTAGRVPVPQL
QPQNPSQQQPQEQVPLVQQQQFLGQ
Solomon_Q8H0J4_WHEAT 31934 QQPFPPQQPYPQPQPFPSQQPYLQLQP
FPQPQLPYSQPQPFRPQQPYPQPQPQY
Solomon_B2ZRD2_WHEAT 32829 SQPQQPISQQQQQQQQQQQQQQQEQ
QILQQILQQQLIPCMDVVLQQHNIAH
GRSQVLQQSTYQLLQELCCQHLWQIP
EQSQCQAIHNVVHAIILHQQQKQQQQ
PSSQFSFQQPLQQYPLGQGSSRPSQQN
PQAQGSVQPQQLPQFEEIRNLALQTLP
AMCNVYIPPYCTIAPFGIFGTN

Chromosome 7A

This analysis suggests that there are probably
more genetic loci for major seed storage proteins
than we have found to date.

Genome sequencing and proteome analyses,
combined with genetic mapping can define these
new loci and provide molecular markers for
breeding and selection.

It turns out that a 1980 report did find
LMWG/gliadins on 4B and 7A

Salcedo G, Prada J, Sanchez-Monge R,
Aragoncillo C (1980). Aneuploid analysis of low
L32831 molecular weight gliadins from wheat. Theor
L31965 Appl Genet 56 ; 65-69

Chromosome 7A

The “hits” on chromosome 7A will be resolved
as we have now started to sequence this
chromosome, as a national project in Australia.

This is part of the International Wheat
Genome Sequencing Consortium (IWGSC) in
which different countries around the world are
doing a chromosome each.

L32831
L31965

The Wheat Proteome database:

Motivation : wheat genome, transcriptome and proteome studies are now advanced
and need a reference proteome database for

• annotating the genes in the wheat

• assigning peptides, obtained from high level proteomic analyses, to wheat proteins

Content of proteins/peptides:

• wheat/Triticum entries from SwissProt, UniProt, TrEMBL (2,690)

• translation from the KOMUGI full-length cDNA collection (13,717)

• peptides from INRA (France), USDA (USA), CNU (China) labs (still sorting out a
final non-redundant set)

• IWGSC-genome-wide-sequence (GWS) gene model translations (144,920)

The Wheat Proteome database:

(1) Translations of conserved genes.

The IWGSC-GWS database for each chromosome arm typically identifies 4000-9000
genic sequences per chromosome. These include gene fragments and pseudogenes.

Following their identification, genes conserved between wheat, Brachypodium, rice,
sorghum and barley (Klaus Mayer “chromosome zipper”) can be clustered into
syntenic groups.

(2) Non-redundant proteins/wheat known to originate from wheat

30-40% of the gene complement in wheat and barley do not reside in the conserved
syntenic gene order space

All genes and protein/peptide sequences need to be anchored to the IWGSC-GWS
chromosome arms DNA sequences. So far only 205 KOMUGI translations and 6 from
the SwissProt/UniProt/TrEMBL dataset have been anchored to the IWGSC-GWS
translations so there is quite a bit of curation to carry out.

To complete this presentation it
Weights assigned to features
is important to consider
translating research findings to Feature
industry.
Genome Gene Protein Other
fingerprint marker marker traits
(1) Further stream-lining of the
MALDI-TOF scoring of wheat
proteins For each breeding line
(matrix rows) the
(2) Assigning a toxicity score to feature score (matrix
specific proteins in considering

selection index values
columns) is multiplied
celiac and wheat allergy by the feature weight.
reactions to wheat flour
These are then added
to provide a selection
index (SI)

The aim is to be able to enter This SI is used to rank
specific features of the wheat grain breeding lines or
as a number into a Decision Matrix suitability for an end-
product in industry

(1) Further stream-lining of the MALDI-TOF scoring of wheat proteins we are following
the MALDIquant process described by Sebastian Gibb (IMISE, University of Leipzig)
1: raw 2: variance stabilization 3: smoothing

4: base line correction 5: peak detection 6: peak plot

Dean Diepeveen

(2) Assigning a toxicity score to specific proteins in considering celiac disease (CD) and
wheat allergy (WA) reactions to wheat flour

Proof of concept by Angla Juhasz and Frank Bekes carried on
the data set published by DuPont et al (2011)

Every protein in the wheat grain defined by DuPont et al
(2011) was assigned a toxicity score which is the result of the
amount of protein in the grain x the number of epitopes
present that are known to relate to CD and or WA

• Genome sequencing and high resolution genetic maps of wheat

• Integrating new wheat protein level analyses

• Translating research findings to industry – the Decision Matrix

The proteins of the wheat grain form a significant
phenotype in breeding, industry processing and
marketing, and will become more important in
defining the product

Proteome bioinformatics and genetics for associating proteins with grain phenotype

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Proteome bioinformatics and genetics for associating proteins with grain phenotype

Semelhante a Proteome bioinformatics and genetics for associating proteins with grain phenotype (20)

Mais de CIMMYT

Mais de CIMMYT (20)

Último

Último (20)

Proteome bioinformatics and genetics for associating proteins with grain phenotype