2. Bioinformatic data analysis – comparison
from three human studies using different
Affymetrix platforms
Agnieszka M. Lichanska1
, Sheryl Maher2,3
, Nguyen Pham1
,
Timothy Pan1,2
, and Saso Ivanovski4
1
School of Dentistry, 2
Institute for Molecular Biosciences, 3
Australian
Biosecurity CRC for Emerging Infectious Diseases The University of
Queensland, and School of Dentistry Griffith University Brisbane, Australia
3. Dr. Agnieszka Lichanska, UQ
Overview
• Studies and starting hypothesis
• Analysis tools
• Results from bioinformatics
• Validation
• Future studies
• How well the new exon arrays characterize
the gene expression?
4. Dr. Agnieszka Lichanska, UQ
Studies
1. Comparison of gene expression between
periodontal ligament cells and gingival
cells
2. Functions of nuclear IGFBP5
3. Identification of biological processes
induced in osteoblasts by LPS
5. Dr. Agnieszka Lichanska, UQ
Array analysis
• Affymetrix Platform
– Hu133A arrays - using Ambion MessageAmp and Enzo IVT kit
– Human ST1.0 exon arrays - using new GeneChip WT cDNA
amplification kit
• Analysis
– MAS, DMT, Spotfire,
– Partek
– GO Browser (Affymetrix), Pathway Miner, DAVID, Onto-Tools,
Clover, PAINT, MSCAN, CpGPro,
• Data validation
– Real time PCR, cell-based assays, other methods
6. Dr. Agnieszka Lichanska, UQ
Study 1 - periodontal ligament cells and
gingival cells
• The objective was to identify the markers for
periodontal ligament cells which can be used for
development of periodontitis therapies.
• Limited knowledge about the regulation of gene
expression in those tissues.
• Extensive functional knowledge about the role of the
different cells in periodontium.
• Hu133A arrays
7. Dr. Agnieszka Lichanska, UQ
Questions
1. What regulates the differential gene
expression in those tissues?
2. Is the differential methylation playing a
role in expression regulation?
3. Can we identify markers for each of the
tissues?
8. Dr. Agnieszka Lichanska, UQ
Identification of differentially
expressed genes
Total differentially expressed genes - 292
Genes with CpG islands – 121
Up in Ligament – 112 genes
Down in Ligament – 180 genes
Genes with CpG islands – 70
Identification of differentially expressed genes:
MAS 5.0 – presence/absence calls
DMT - number of concordant changes
Spotfire - ANOVA analysis
9. Dr. Agnieszka Lichanska, UQ
Biological
processes
Up in ligament
Down in ligament
DAVID functional
annotation tool
10. Dr. Agnieszka Lichanska, UQ
Elk-1
Gene Name Predicted Elk-1 Cluster
PAINT MSCAN
CYP51A1
EGR1
HSPE1
KPNB1
MAGOH
MET
PAWR
PLCB4
PPP1CB
RNF5
SNRPD1
SNRPG
TAF11
TDG
GLG1
SIP1
FUBP3
ADAMTS1
KIAA0152
COX17
CDC42EP3
PDLIM5
PAPOLA
EBNA1BP2
U2AF2
DHRS7B
C14orf109
LSM3
TPRKB
C14orf111
MRPL35
LSM8
ENAH
C13orf10
YRDC
ZNF587
Prediction of Elk-1
Transcription
factor binding site
clusters in gene by
both PAINT and
MSCAN
Prediction of Elk-1
Transcription
factor binding site
clusters in gene by
PAINT only
Prediction of Elk-1
Transcription
factor binding site
clusters in gene
by MSCAN only
PAINT analysis
MSCAN analysis
12. Dr. Agnieszka Lichanska, UQ
Conclusions
• A lot of additional information can be mined from
the array datasets, such as what can regulate
differential gene expression
• The promoter analysis can be particularly useful in
cases when little in know about the system
• Similarly to all of other analyses multiple tools
have to be used as none of them provides all the
information.
• The output formats can be difficult to manipulate
• The hypothesis has to be obviously validated in
vitro
14. Dr. Agnieszka Lichanska, UQ
Study 2 - Functions of nuclear IGFBP5 in
osteoblasts
• The objective was to identify the genes regulated by nuclear
translocation of IGFBP5
• IGFBP5 Functions:
– It is the main IGFBP in the bone
– It induces proliferation of osteoblasts in vitro
– It can act through IGF-dependent or IGF-independent
mechanisms
– It is also associated with breast cancer progression
– It is known to interact with FHL2 and RAR/RXR in the nucleus
– The target genes regulated by IGFBP5 are not known
• Array platform Human ST1.0 exon arrays
16. Dr. Agnieszka Lichanska, UQ
Time course of IGFBP5
A. 2 Hour C. 8 Hour E. 48 Hour
B. 4 Hour D. 24 Hour F. No Treatment
Concentration
of IGFBP-5,
625ng/mL
α-nucleolin Isotype control α-IGFBP5
Confocal Z-
sections
17. Dr. Agnieszka Lichanska, UQ
Affymetrix Human Exon 1.0 ST Array
• Exon-level detection: differentiate differentially spliced
transcripts of each gene
• Gene-level detection: all probesets are summarised into an
expression value of all transcripts from the same gene
• Each exon comprises one probeset which contains 4 probes
• Each gene contains around 40 probes
(www.affymetrix.com)
18. Dr. Agnieszka Lichanska, UQ
(www.affymetrix.com)
rRNA reduction step
2nd cycle cDNA synthesis
Sample preparation of
3’UTR vs exon arrays
19. Dr. Agnieszka Lichanska, UQ
Analysis of the new Affymetrix exon
arrays
•Experimental QC
–rRNA reduction
–IVT yield
–cDNA yield
–Fragmentation of cDNA
•Analytical QC
–Box plot - actually best done in
Expression console (Affymetrix)
–Histogram analysis
–PCA analysis
•Analysis
–Exon alternative splicing
(visualized with gene model)
–Gene level analysis (visualized
with a bar chart)
•Output
–Splicing - gives Transcript
cluster ID
–Gene level - gives Probeset Ids
–Entrez Gene ID has to be
retrieved from Affymetrix to
use in functional analysis
•Functional analysis
–GO analysis
–Pathway mapping
–Promoter analysis
–CpG islands
24. Dr. Agnieszka Lichanska, UQ
Study 3 - Identification of biological
processes induced in osteoblasts by LPS
• The objective was to determine how LPS modulates function of
osteoblasts
• Osteoblasts express Toll-like receptors 2, 3, 4, 5 and 9, with TLR4
the being the main receptor for bacterial LPS
• In periodontitis tissue loss includes bone loss but the changes
induced by bacteria remain unclear
• LPS is used in this study as a model for infection
• What transcriptional events are induced by LPS?
• What is the mechanism of induction of apoptosis in
osteoblasts in response to LPS?
Questions
25. Dr. Agnieszka Lichanska, UQ
PCA analysis of the LPS experiment
QC analysis using PCA
PCA analysis not really useful for
separating dataset the into
groups after the ANOVA analysis
28. Dr. Agnieszka Lichanska, UQ
Conclusions
• IGFBP5 study has identified biological processes
– expected to be regulated by IGFBP5 treatment - cell cycle,
proliferation
– Also some unexpected ones - RNA splicing and transcriptional
regulators
• LPS study has provided us with clues as to the mechanisms
by which LPS regulates osteoblast function
– There is an upregulation of osteoclast stimulating factors,e.g.
CSF-1
– There is downregulation of genes involved in proliferation
– There is also upregulation of apoptotic genes
– This suggests that a number of mechanisms can be potentially
be involved in apoptosis known to occur in osteoblasts in
response to LPS
30. Dr. Agnieszka Lichanska, UQ
3’UTR arrays
These arrays let us to analyze the gene levels for
each gene only and as the probe sets were selected
mainly in 3’UTR region thus giving us limited
information about gene expression.
31. Dr. Agnieszka Lichanska, UQ
Splicing data - Differences between strong inducer -
LPS and weak inducer - IGFBP5
IGFBP5LPS
32. Dr. Agnieszka Lichanska, UQ
Final conclusions
• Exon arrays provide us with much more information than
3’UTR ones
• The analysis of new whole genome arrays and exon arrays
can be combined by using the same hybridization cocktail
• The new probe synthesis method has eliminated the need for
using Test arrays, required by using cRNA on the arrays.
• The data analysis can use the entire dataset, can focus on
alternative splicing or gene levels
• The output at the moment is difficult to manipulate
• Gene Ontology and pathway mapping of exon arrays can be
done through the same tool, DAVID Functional annotation
tool.
• Not all public tools are yet catering for the exon arrays.
33. Dr. Agnieszka Lichanska, UQ
Acknowledgements
QBI:
Virginia Nink,
Paul Beatus
IMB:
Sheryl Maher,
Elisabetta d’Aniello
School of Dentistry:
Thor Friis
Timothy Pan
Nguyen Pham
Griffith University:
Dr Saso Ivanovski
Millenium Science: Robert Henke,Jeremy Preston
Spotfire: Andrew Khoo,
Partek: Michael Venezia
This work was supported by the UQ ECR grant, ADRF and Eli Lilly
Foundation grant
34. Dr. Agnieszka Lichanska, UQ
ComBio 22-26th September, Sydney
MGED/AMATA - 3-5th September, Brisbane
Editor's Notes
292 genes less than -2 and more than 2 fold changed 60% have CpG islands (that is average in genome)
Only processes with p-values less than 0.05 are shown. Up-regulation of RNA metabolism and biopolymer metabolism seem to support the fact that Periodontal ligament (PDL) produces high levels proteins, especially ECM (extracellular matrix) proteins.
As one of the objectives was to identify what regulates the differential gene expression. We used a number of tools to analyze all 292 promoter regions. PAINT relies on TransFac database to identify potential Transcription factor binding sites. It provides a clustering type of output as well as list of potential factors. We have identified ELK-1 as overrepresented in a number of genes, and NKX2.5 as under-represented TF. As a confirmation we wanted to use a different method to identify binding sites. MSCAN identified ELK-1 sites in some of the same genes.
As we wanted some statistical validation of the identified sites. CLOVER provides both statistical output and graphical output. Here is just an example of one of the genes with a number of TF binding sites identified, only the ones with a p-value<0.05 are indicated on the graph. ELK-1 was also identified through this method, in addition FREAC7 binding site, DOF3, PAX4, HMG-IV. Differential expression analysis identified FREAC2 as being differentially expressed so it is possible that this transcription factor is upregulated and can also regulate expression of some of other genes through the FREAC7 binding sites.
IGFBP5 is one of the 6 IGFBPs and the main IGFBP in the bone and has been shown to affect osteoblasts, osteoclasts and chondrocytes. As IGFBP5 has nuclear localisation signal and has been shown to be present in the nucleus we wanted to identify the effects it has on gene expression. In vitro IGFBP5 has been shown to induce osteoblast proliferation but being down regulated during differentiation. IGFBP5 has also been associated with breast and prostate cancer. FHL2 (Transcription factor), RAR/RXR (retinoic acid receptors) identified by Yeast two hybrid screen. IGFBP5 does not bind to the DNA so it has to bind to a TF.
As we wanted to assay effects of exogenous IGFBP5 on osteoblast function and gene expression. So we needed a model system, therefore we checked the expression levels of IGFBP5 by real time PCR in a range of cells and identified MG63 cells as the best model for our study.
We needed to perform dose and time response as nothing of that has been identified previously. We determined 625ng/ml as the best dose. Using this amount we performed time course. The protein is just detectable at 2 hours and its concentration in the nucleus over next few hours and by 8 hours it appears to be localized to nucleoli. To determine if that is a likely answer we have used anti-nucleolin antibody but as both antibodies are mouse monoclonal we could not do a colocalization study. In order to show that the IGFBP5 is in the nucleus we used confocal microscopy and it showed that IGFBP5 is present in the nucleus and in the perinuclear region (DO NOT SAY THAT TO THEM BUT IF SOMEONE ASKS IT COULD BE GOLGI BUT I AM NOT FULLY CONVINCED, WE WOULD NEED TO HAVE SOME MARKERS, THIS WILL BE HOPEFULLY DONE BY LUCAS WITH LABELLED IGFBP5).
Comparison of 3 ’ UTR arrays (Hu133A) with probes designed in the 3 ’ UTR and exon arrays with probes designed in all exons
Sample preparation comparison
QC analysis is similar to other Affy arrays and in fact any other arrays in fact. The exon arrays allow analysis of alternative splicing or gene level analysis depending what we want to look at. Moreover we can either analyze either just known genes or all of the genes which are on the arrays, this includes predicted and hypothetical genes. There are two types of outputs, transcript cluster IDs or Probeset IDs, however, they cannot be so far used in the web-based tools, which require the Affy Ids, Entrez IDs, GeneBank accession numbers or other identifiers. The Entrez IDs have to be retrieved from Affy website. Now let ’ s see what we are able to do with those arrays.
We used Partek Genomics Suite for analysis. PCA (principal component analysis) has showed that the two datasets can be separated from each other, Below is the PCA for the ANOVA analyzed data (bottom graph). An interesting way of looking at the data is the chromosomal view. It shows that there are no any particular areas of the genome that are linked to regulation of gene expression by IGFBP5. This view is particularly useful if the methylation study is done at the same time.
Here are just some of the data we identified in the analysis. POINT TO THE CELL DIVISION/CELL CYCLE, PROLIFERATION LINKED GENES
Not surprisingly there was downregulation of ECM markers linked to differentiation, which was not unexpected. In addition to the GO analysis we wanted to identify the pathways involved but because of the output format we could not use Pathway Miner, which requires GeneBank Accession numbers and signal log ratios. Instead we used DAVID Functional annotation tool which allows to use Entrez IDs as input.
One of the top scoring pathways was the regulation of cytoskeleton shown here. It indicates that in response to growth factors, such as FGF there will be alteration of actin polymerization.
The last study I will show here was also done on the exon arrays and it was aimed at a continuation of the first one as the rationale was based on the periodontal LPS from Salmonella minessotta rough mutant
PCA analysis was useful in the first study we did but in some cases such as this one it is not useful.
The volcano plot‚ arrange genes along dimensions of biological and statistical significance. The first (horizontal) dimension is the fold change between the two groups (on a log scale, so that up and down regulation appear symmetric), and the second (vertical) axis represents the p-value for a t-test of differences between samples (most conveniently on a negative log scale ‚ so smaller p-values appear higher up). The first axis indicates biological impact of the change; the second indicates the statistical evidence, or reliability of the change. The researcher can then make judgements about the most promising candidates for follow-up studies, by trading off both these criteria by eye. With a good interactive program, it is possible to attach names to genes that appear promising.
Exon arrays allow to look at the gene level or at the differential splicing. Changes can be either significant as in LPS or just in some exons in case of IGFBP5.