Using the T-BioInfo Platform to analyze publicly available data from a scientific publication, we were able to further explore the determination of macrophage types based upon gene expression.
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
Pine Biotech - Profiling Tumor-associated Macrophages Using RNA-Seq
1. Profiling Tumor-associated
Macrophages Using RNA-Seq
Based on “Expression Profiling of Macrophages Reveals Multiple Populations with Distinct
Biological Roles in an Immunocompetent Orthotopic Model of Lung Cancer. ” (Journal of
Immunology 2016. doi:10.4049/jimmunol.1502364)
2. Introduction
Tumor-associated macrophages are a type of white blood cell found near
or inside tumors. There is evidence for their involvement in both pro-tumor
and anti-tumor processes. The current widely used classification for
macrophages is M1/M2. Using data from GSE76033 one can look at the
expression profiles of all of the macrophage samples to see if there is a
trend or grouping that can show genes specific to M1/M2 or subgroups
within those classifications.
Cells were taken from a cancerous lung and placed directly into the left
lung of immunocompetent mice. At 2 weeks and 3 weeks after the
injection procedure, the mice were sacrificed, along with uninjected
control mice.1 Next samples that have commonalities between time points
were selected. Using an RNA-Seq pipeline, estimated levels of
expression for each gene and isoform were generated across all samples.
Unsupervised analysis using principal component analysis (PCA) shows
groupings by time and macrophage type. A supervised approach using
Factor Regression Analysis shows the effects of different factors on genes
in the samples. Thus specific genes and isoforms that are uniquely
expressed in macrophage subtypes were identified.
1. Poczobutt JM, De S, Yadav VK, et al. Expression Profiling of Macrophages Reveals Multiple
Populations with Distinct Biological Roles in an Immunocompetent Orthotopic Model of Lung
Cancer. J Immunol. 2016. doi:10.4049/jimmunol.1502364.
Run SiglecF CD11b Ly6G CD64 CD11c Cancer cell type Weeks
1
SRR300264
5
SiglecF+ CD11c+ No Mac A
0
2
SRR300264
6
SiglecF+ CD11c+ No Mac A
0
3
SRR300264
7
SiglecF+ CD11c+ No Mac A
0
4
SRR300264
8
CD11b+ Ly6G- CD64low CD11c+ No Mac B1
0
5
SRR300264
9
CD11b+ Ly6G- CD64low CD11c+ No Mac B1
0
6
SRR300265
0
CD11b+ Ly6G- CD64low CD11c+ No Mac B1
0
7
SRR300265
1
CD11b+ Ly6G- CD64med CD11c- No Mac B2
0
8
SRR300265
2
CD11b+ Ly6G- CD64med CD11c- No Mac B2
0
9
SRR300265
3
CD11b+ Ly6G- CD64med CD11c- No Mac B2
0
10
SRR300265
4
SiglecF+ CD11c+ Yes Mac A 2
11
SRR300265
5
SiglecF+ CD11c+ Yes Mac A 2
12
SRR300265
6
SiglecF+ CD11c+ Yes Mac A 2
13
SRR300265
7
CD11b+ Ly6G- CD64med CD11c- Yes Mac B2 2
14
SRR300265
8
CD11b+ Ly6G- CD64med CD11c- Yes Mac B2 2
15
SRR300265
9
CD11b+ Ly6G- CD64med CD11c- Yes Mac B2 2
3. RNA-seq pipeline prepares all annotated and non-
annotated genomic element estimation of expression
levels
Removing genomic elements that
did not have any expression (all
zeros) in the RSEM table.
Quantile Normalization
Principal Component Analysis
RSEM output tables of genes, isoforms and exons
are prepared for Machine Learning Analysis
1. Mapping TopHat
2. Finding Isoforms by Cufflinks
3. GTF file of isoforms is made by Cuffmerge
4. Mapping Bowtie-2t on new transcriptome Factor Regression Analysis
4. Principal Component
Analysis
Principal Component Analysis is a data reduction technique that represents the dataset structure
on principal components. The components that explain the most percentage of variability are
chosen as principal.
Using gene and isoform expression profiles across all samples, PCA shows a good separation
between macrophage groups and time. Grouping by type can be seen in the PCA graphs on the
bottom, with Mac A encircled in blue, Mac B1 and 2wk Mac B2 in red, and 3wk MacB2 along with
MacB3 in green. The graph on the bottom right shows the normal distribution of gene expression,
with no strong outliers.
-15
-10
-5
0
5
10
15
-20 -15 -10 -5 0 5 10 15
Normal Distribution of Gene Expression
NoC MacA
NoC MacA
NoC MacA
NoC MacB1
NoC MacB1
NoC MacB1
NoC MacB2NoC MacB2
NoC MacB2
2wk MacA
2wk MacA 2wk MacA
2wk MacB2
2wk MacB2
2wk MacB2
2wk MacB3
2wk MacB3
2wk MacB3
3wk MacB2
3wk MacB2
3wk MacB2
3wk MacB3
3wk MacB3
3wk MacB3
PCA (13.26%, 10.78%) of Isoform Expression of All
Samples after Quantile Normalization
0wk MacA
0wk MacA
0wk MacA
0wk MacB1
0wk MacB1
0wk MacB1
0wk MacB2
0wk MacB2
0wk MacB2
2wk MacA2wk MacA 2wk MacA
2wk MacB2
2wk MacB2
2wk MacB2
2wk MacB3
2wk MacB3
2wk MacB3
3wk MacB2
3wk MacB2
3wk MacB2
3wk MacB3
3wk MacB3
3wk MacB3
PCA (16.04%, 13.21%) of Gene Expression of All
Samples after Quantile Normalization
5. PCA Comparison of Factor Groups
MacA 0wk
MacA 0wk
MacA 0wk MacB2 0wk
MacB2 0wk
MacB2 0wk
MacA 2wk
MacA 2wk
MacA 2wk
MacB2 2wk
MacB2 2wk
MacB2 2wk
PCA 28.65%/10.10% of 0wk/2wk MacA/B2 Genes over Samples
MacA 0wk MacB2 0wk MacA 2wk MacB2 2wk
MacA 0wk
MacA 0wk
MacA 0wk MacB2 0wk
MacB2 0wk
MacB2 0wk
MacA 2wkMacA 2wk
MacA 2wk
MacB2 2wk
MacB2 2wk
MacB2 2wk
PCA 21.80%/10.93% of 0wk/2wk MacA/B2 Isoforms over Samples
MacA 0wk MacB2 0wk MacA 2wk MacB2 2wk
MacB2 2wk
MacB2 2wk
MacB2 2wk
MacB3 2wk
MacB3 2wk MacB3 2wk
MacB2 3wk
MacB2 3wk
MacB2 3wk
MacB3 3wk
MacB3 3wk
MacB3 3wk
PCA 15.05%/10.49% of 2wk/3wk MacB2/B3 Genes over Samples
MacB2 2wk MacB3 2wk MacB2 3wk MacB3 3wk
MacB2 2wk
MacB2 2wk
MacB2 2wk
MacB3 2wk
MacB3 2wk
MacB3 2wk
MacB2 3wk
MacB2 3wk
MacB2 3wk
MacB3 3wk
MacB3 3wk
MacB3 3wk
PCA 11.85%/10.47% of 2wk/3wk MacB2/B3 Isoforms over Samples
MacB2 2wk MacB3 2wk MacB2 3wk MacB3 3wk
Because there are no matching factors on all samples, we divided the project into 2 distinct groups that have 1 group of replicates that is overlapping. Before even
performing factor analysis, we can see grouping appear in PCAs of the divided factor groups as outlined on slide 3. This factor analysis compared a combination of 0
weeks vs 2 weeks and Mac A vs Mac B2, alongside 2 weeks vs 3 weeks and MacB2 vs MacB3.
6. Factor Regression Analysis
• MacA cells in the presence or absence of the tumor
was negative for Ly6C and weak for MHC II
suggesting alveolar macrophages.
• MacB1 cells expressed low levels of Ly6C and
MHC II expression ranged, suggesting two cell
types.
• MacB2 cells high levels of Ly6C and no
expression for MHC II which is typical for
monocytes.
• MacB3 cells negative for Ly6C and expressed
high levels of MHC II which is typical for
macrophages.
Run SiglecF CD11b Ly6G CD64 CD11c Cancer cell type Weeks
1 SRR3002645 SiglecF+ CD11c+ No Mac A 0
2 SRR3002646 SiglecF+ CD11c+ No Mac A 0
3 SRR3002647 SiglecF+ CD11c+ No Mac A 0
4 SRR3002648 CD11b+ Ly6G- CD64low CD11c+ No Mac B1 0
5 SRR3002649 CD11b+ Ly6G- CD64low CD11c+ No Mac B1 0
6 SRR3002650 CD11b+ Ly6G- CD64low CD11c+ No Mac B1 0
7 SRR3002651 CD11b+ Ly6G- CD64med CD11c- No Mac B2 0
8 SRR3002652 CD11b+ Ly6G- CD64med CD11c- No Mac B2 0
9 SRR3002653 CD11b+ Ly6G- CD64med CD11c- No Mac B2 0
10 SRR3002654 SiglecF+ CD11c+ Yes Mac A 2
11 SRR3002655 SiglecF+ CD11c+ Yes Mac A 2
12 SRR3002656 SiglecF+ CD11c+ Yes Mac A 2
13 SRR3002657 CD11b+ Ly6G- CD64med CD11c- Yes Mac B2 2
14 SRR3002658 CD11b+ Ly6G- CD64med CD11c- Yes Mac B2 2
15 SRR3002659 CD11b+ Ly6G- CD64med CD11c- Yes Mac B2 2
16 SRR3002660 CD11b+ Ly6G- CD64hi CD11c+ Yes Mac B3 2
17 SRR3002661 CD11b+ Ly6G- CD64hi CD11c+ Yes Mac B3 2
18 SRR3002662 CD11b+ Ly6G- CD64hi CD11c+ Yes Mac B3 2
19 SRR3002663 CD11b+ Ly6G- CD64med CD11c- Yes Mac B2 3
20 SRR3002664 CD11b+ Ly6G- CD64med CD11c- Yes Mac B2 3
21 SRR3002665 CD11b+ Ly6G- CD64med CD11c- Yes Mac B2 3
22 SRR3002666 CD11b+ Ly6G- CD64hi CD11c+ Yes Mac B3 3
23 SRR3002667 CD11b+ Ly6G- CD64hi CD11c+ Yes Mac B3 3
24 SRR3002668 CD11b+ Ly6G- CD64hi CD11c+ Yes Mac B3 3
F1:nocancervs
cancer
F2:MacAvsMacB2
F1:2wkvs3wkaftertumor
F2:MacB2vsMacB3
In order to use Factor Regression Analysis, factors on all
levels need to be present for each sample. In this project,
there are a number of time points and cell types that are
not fully represented across all samples.
Because there are no matching factors on all samples, we
divided the project into 2 distinct groups that have 1 group
of replicates that is overlapping.
This factor analysis compared a combination of 0 weeks
vs 2 weeks and Mac A vs Mac B2, alongside 2 weeks vs 3
weeks and MacB2 vs MacB3.
7. Top Genes influenced by Factors
0
1
2
3
4
5
6
7
ENSMUSG00000048078
ENSMUSG00000028031
ENSMUSG00000086503
ENSMUSG00000036446
ENSMUSG00000029838
ENSMUSG00000026069
ENSMUSG00000035799
ENSMUSG00000026204 0
2
4
6
8
10
0wk and 2wk
MacA vs MacB2
ENSMUSG00000039934
ENSMUSG00000061397
ENSMUSG00000020838
ENSMUSG00000048834
ENSMUSG00000010651
ENSMUSG00000039013
ENSMUSG00000000794
ENSMUSG00000026065
0
1
2
3
4
5
6
ENSMUSG00000030000
ENSMUSG00000020950
ENSMUSG00000025738
ENSMUSG00000074480
ENSMUSG00000024968
ENSMUSG00000002459
ENSMUSG00000039476
ENSMUSG00000028197
ENSMUSG00000021136
ENSMUSG00000076614
0
2
4
6
8
10
12
ENSMUSG00000036905
ENSMUSG00000026548
ENSMUSG00000018920
ENSMUSG00000089929
ENSMUSG00000060586
ENSMUSG00000024663
ENSMUSG00000093809
ENSMUSG00000050777
ENSMUSG00000045404
ENSMUSG00000010307
Taking a sample of the top genes from each Factor Regression Analysis comparison shows the high expression
of selected genes for one factor (in the top right, Mac A) and low expression of those same genes in samples for
another factor (Mac B2 in the top right). We can see similar results, with a smaller expression gap, in the bottom
right between Mac B2 (low) and Mac B3 (high). The same comparison is made for time as a factor on the left.
Qualifying genes which survived Factor Analysis filtering and match pathway analysis for PPAR Pathways and
Cytokine-Cytokine Receptor Interaction are seen on the far right.
0 weeks 2 weeks
2 weeks 3 weeks B2 B3 B2 B3
A B2 A B2
2 weeks 3 weeks
0 weeks 2 weeksA B2 A B2
B2 B3 B2 B3
0
2
4
6
8
10
PPAR Pathways
ENSMUSG00000030162
ENSMUSG00000010651
ENSMUSG00000015846
ENSMUSG00000030546
ENSMUSG00000022853
ENSMUSG00000028607
ENSMUSG00000002108
ENSMUSG00000015568
ENSMUSG00000026003
ENSMUSG00000025059
ENSMUSG00000062908
ENSMUSG00000020777
ENSMUSG00000031808
ENSMUSG00000024900
ENSMUSG00000002944
A B2 A B2
0
1
2
3
4
5
6
7
8
9
Cytokine-Cytokine
Receptor Interaction
ENSMUSG0000007388
9
ENSMUSG0000000918
5
ENSMUSG0000002251
4
ENSMUSG0000000761
3
ENSMUSG0000009732
8
ENSMUSG0000002440
1
ENSMUSG0000003074
5
ENSMUSG0000001892
0
ENSMUSG0000000289
7
ENSMUSG0000002462
0
ENSMUSG0000006822
7
ENSMUSG0000000079
1
ENSMUSG0000000048
9
ENSMUSG0000007171
4
B2 B3 B2 B3
0wk vs 2wk – MacA and MacB2 0wk and 2wk – MacA vs MacB2
2wk vs 3wk – MacB2 and MacB3 2wk vs 3wk – MacB2 and MacB3
8. Pathway Analysis of MacA vs MacB2 Genes
Authors report out of 16 PPAR signaling genes, 12 were
highly expressed in MacA cells. Using DAVID analysis, we
were able to find similar results. In the pathway graph
genes with red stars were found in our analysis exclusively,
while genes with blue stars were exclusively in the author
analysis, and the overlapping genes are marked with grey
stars. As all of the author results were in our results, there
are only red and grey stars, showing that our analysis
found additional genes in the PPAR Pathway.
0
2
4
6
8
10
PPAR Pathways
ENSMUSG0000003016
2
ENSMUSG0000001065
1
ENSMUSG0000001584
6
ENSMUSG0000003054
6
ENSMUSG0000002285
3
ENSMUSG0000002860
7
ENSMUSG0000000210
8
ENSMUSG0000001556
8
ENSMUSG0000002600
3
ENSMUSG0000002505
9
ENSMUSG0000006290
8
ENSMUSG0000002077
7
A B2 A B2
0 weeks 2 weeks
9. Pathway Analysis of MacB2 and MacB3
Authors report cluster B3 (genes highly expressed
both in MacB3-2wk and MacB3-3wk) was enriched
in pathways related to chemokine and cytokine
signaling. Using Factor Analysis (below) and DAVID
(left) we can see similar patterns. As on the previous
slide, our analysis found all the same genes as the
authors analysis and more.
0
1
2
3
4
5
6
7
8
9
ENSMUSG00000073889
ENSMUSG00000009185
ENSMUSG00000022514
ENSMUSG00000007613
ENSMUSG00000097328
ENSMUSG00000024401
ENSMUSG00000030745
ENSMUSG00000018920
ENSMUSG00000002897
ENSMUSG00000024620
ENSMUSG00000068227
ENSMUSG00000000791
ENSMUSG00000000489
ENSMUSG00000071714
ENSMUSG00000050395
ENSMUSG00000028362
ENSMUSG00000042333
ENSMUSG00000004296
B2 B3 B2 B3
10. Conclusion
• The macrophage population plays a critical role in controlling tumors, including their
growth and progression, thus understanding the model of cancer progression and the
interactions between cancer cells and the macrophages/adaptive immunity cells that are
present allows for the ability to define gene expression signatures that could assist in
clinical determinations.
• Several Biologically significant pathways were identified, including cytokine-cytokine
signaling and PPAR signaling. This was identified from gene expression after Factor
Analysis.
• The expression profile of both genes and isoforms were consistent, as can be seen in
slide 5 by PCA.
• Overall, this study demonstrates the complex system interactions between the
macrophage population to the presence of tumors.
11. Data
All of the factor analysis data can be found in the following files.
• Genes:
• Unfiltered: expression_genes.txt
• MacA and MacB2: FA of MacA and MacB2
• MacB2 and MacB3: FA of MacB2 and MacB3
• Isoforms:
• Unfiltered: expression_isoforms.txt
• MacA and MacB2: FA of MacA and MacB2 isoforms
• MacB2 and MacB3: FA of MacB2 and MacB3 isoforms
• PCA: PCA
12. Educational Dataset
Running a full pipeline on unfiltered samples can take a long time, and produce many
additional results that are difficult for interpretation like unannotated genes and transcripts.
To simplify the project for educational use, we took all the reads from all samples that
aligned to a selection of significant and insignificant genes, and extracted them into a small
FastQ file. On average, these files are 6.2% of the original size and take significantly less
time to run (approx. 3 hours).
Significant genes are selected from the original set through Factor Regression Analysis, and
a select amount of insignificant genes are also selected to show the difference in
significance to students. These smaller datasets are available for download at the links
below:
• MacA vs MacB2: http://pine-biotech.com/data/edu-macab2.tar.xz
• MacB2 vs MacB3: http://pine-biotech.com/data/edu-macb23.tar.xz
Notas do Editor
Background: based on data from (publication), where they did differential gene expression based on flow cytometry, etc (extract two good sentences)
Valid approach, significant results much smaller dataset, list of selected genes including significant and some insignificant,