ANN in System Biology

Knowledge-Primed Neural Networks Enable
Biologically Interpretable Deep Learning on
Single-Cell Sequencing Data
Course: Machine Learning and Drug Design
Nikolaus Fortelny and Chistoph Bock

Knowledge-Primed Neural Networks
(KPNNs)
6/3/2023 5:46 PM 2
Trained ANNs typically lack interpretability, i.e., the ability to provide
human-understandable, high-level explanations of how they transform inputs
(prediction attributes) into outputs which is a major limitation to the wider
application of deep learning in biology.
Figure 1: ANN vs. knowledge-primed neural networks (KPNNs)
Introduction Results & Discussion
Methodology Conclusion

Methodology
6/3/2023 5:46 PM 3
Figure 2: Pipeline for the construction of KPNN
Construction of general regulatory neural networks for human cell
Construction of application specific KNNs for deep learning
Applying Deep Learning Approach to KNNs
Optimization of Learning Method to Enhance KPNN interpretability
Training of the optimized KPNN using single cell RNA seq data
Comparing ANN and KPNN

6/3/2023 5:46 PM 4
1.Constructing Generic KPNNs
• Integrates multiple signaling pathways and gene-regulatory interactions into
a single network.
• For the biological basis, large-scale, genome-wide datasets of transcription
factor target genes and protein signaling interactions from several public
databases were integrated into a broad network model of the potential
regulatory space in human cells.
• This general regulatory network model is represented as a directed graph.

6/3/2023 5:46 PM 5
1.Constructing Generic KPNNs
Figure 3: Generic acyclic graphs

6/3/2023 5:46 PM 6
2. Construction of Application-Specific
KPNNs for Deep Learning
• The resulting graphs are used for interpretable deep learning:
Measured gene expression values (input nodes) predict the regulatory
importance of connected transcription factors (hidden nodes), which predict
the regulatory importance of their connected signaling proteins (hidden nodes),
which then predict the regulatory importance and activation states of surface
receptors, which finally predict the output (phenotype) of the specific dataset.
 Two KPNNs were constructed in this study:
1. TCR signaling (TCR KPNN), to predict TCR stimulation from gene
expression
2. Generalized KPNN (GEN KPNN) to predict cell states from single-cell
RNA-seq datasets independent of specific receptors.

6/3/2023 5:46 PM 7
2. Construction of Application-Specific
KPNNs for Deep Learning
G1
G2
G3
TF1
TF2
TF3
TF4
SP1
SP 2
SP 3
SP 4
0
1
Objective Database Used
TF- Target Gene
Pair
Harmonizome & TTRUST
Signaling Proteins
– Target Proteins
SIGNOR
Figure 4: Application specific KPNNs
Table 1:Databases used for dataset acquisition

6/3/2023 5:46 PM 8
3. Datasets Used for Training KPNN
Interpretable deep learning on KPNNs was developed and evaluated using
simulated transcriptome profiles.
 It was validated and benchmarked on the biological data of TCR
stimulation.
Model was further validated on 4 benchmark datasets
1. HCA
2. LCH
3. AML
4. Glioblastoma

3. Datasets Used for Training KPNN
6/3/2023 5:46 PM 9
• TCR Dataset
downloaded from GEO (GSE137554).
Class Values: TCR stimulated (1) – TCR un-stimulated samples (0)
• Simulated Dataset
We averaged expression counts for the unstimulated state for each gene to
generate a baseline expression profile.
 a positive or negative twofold change was introduced in selected genes
resulting in a second (differential) gene expression profile.
Class Values: TCR stimulated (1) – TCR un-stimulated samples (0).

4. Applying Deep learning to KPNNs
6/3/2023 5:46 PM 10
1.
Preprocessing
the input data
2.
Training of the
KPNN or ANN
deep neural
networks
3.
Prediction
evaluation
Figure 5: Methodology of generic deep learning applied to KPNNs

4.1. Preprocessing of Input
6/3/2023 5:46 PM 11
• Input data were split into training set (60%), validation set (20%), and
test set (20%) using numpy random.choice.
• Gene expression data were converted to log (TPM + 1) values and
normalized at the scale of 0 -1.
• Normalization factors were calculated based on the training data and
then applied to the validation and test data.

4.2. Network Training
6/3/2023 5:46 PM 12
Parameters Configurations
Implementation Tensor flow
Edge weights Randomly initialized
Activation function Sigmoid
Loss Function A weighed cross entropy with L2 regularization.
Weighted class entropy- to improve training in the presence of class imbalance, the cross-
entropy of each sample was weighted by the number of samples of
each class.
Class Weight (1/Nclasses)/(Nx/Nsamples)
where Nclasses is the number of classes, Nx is the number of samples of
class x, and Nsamples is the total number of samples
Table 2: Configurations for training KPNNs

4.2. Network Training
6/3/2023 5:46 PM 13
• Error rate: True label probability – Predicted class probability
• Training was terminated with early stopping, which was triggered by a
specific number (“patience”) of epochs without considerable learning
processes (“failed epochs”).
• Epochs were considered as “failed epochs” if either
(i) loss does not show significant improvement and remains relatively constant over a
certain number of epochs.
(ii) validation set error increased (indicative of overfitting)
• Models were saved during training if they achieved considerable learning
progress compared to the previously saved model.

5. Optimization of Learning Method to
Enhance KPNN interpretability
6/3/2023 5:46 PM 14
• Three modifications were implemented to enhance biological
interpretability of the trained KPNNs:
dropout on hidden nodes to improve robustness.
dropout on input nodes to increase quantitative interpretability.
training on simulated control inputs to normalize for uneven
connectivity in biological networks.

6.Training the KPNN on the Single
Cell RNA Seq. Dataset
6/3/2023 5:46 PM 15
• Raw read counts of all 1735 cells, class values (stimulated or
unstimulated), and an edge list encoding the TCR KPNN structure
were provided as input.
• Learning was stopped after 20 failed epochs.
• During learning, new models were saved if they reduced the validation
set error of the latest stored (i.e., so far best-performing) model by at
least 20%.
• Dropout rates from 0 to 50% were evaluated in steps of 10 percentage
points.
• Network training was repeated 90 times for each level of dropout

Results and Discussion
Maria Hassan Khan
6/3/2023 5:46 PM 16

KPNNs Establish Deep Learning on
Biological Networks
6/3/2023 5:46 PM 17
• Constructed KPNNs reflected the typical flow of information in cells:
Signals are transduced from receptors via signaling proteins to transcription
factors, which in turn induce changes in gene expression.
Figure 6: Mapping genome-wide regulatory networks to KPNNs

KPNNs Establish Deep Learning on
Biological Networks
6/3/2023 5:46 PM 18
• KPNNs can be trained based on single-cell RNA-seq data for cells in different
states (e.g., subtypes of cancer and immune cells) or under different environmental
exposures (e.g., receptor-stimulated vs unstimulated cells)
• After successful completion of the learning process, each fitted edge weight is
taken as an indicator for the relevance of the corresponding regulatory relationship
in the investigated biological system.
• We then derive node weights from the edge weights, in order to identify signaling
proteins and transcription factors that are likely relevant in this biological system.
• KPNNs thus exploit the ability of deep learning algorithms to assign meaningful
weights across multiple hidden layers in KPNNs, thereby identifying and
prioritizing relevant regulatory proteins for experimental validation and biological
interpretation.

TCR Signaling
• KPNNs predicted TCR stimulation
with a median receiver operating
characteristic (ROC) area under
curve (AUC) value of 0.984
whereas, ANN achieved value
0.948.
6/3/2023 5:46 PM 19
Figure 7: Schematic diagram of TCR KPNN

6/3/2023 5:46 PM 20
KPNN Leading to Interpretable Deep
Learning
Figure 8: Comparison of the number of layers in TCR KPNN vs. TCR ANN

6/3/2023 5:46 PM 21
KPNN Leading to Interpretable Deep
Learning
Figure 9: Comparison of the number of degrees of different layers in TCR KPNN vs.
TCR ANN

6/3/2023 5:46 PM 22
Interpretability of KPNN Network
Connectedness
Figure 10: Comparison of the number of outdegrees of different layers in TCR KPNN vs.

6/3/2023 5:46 PM 23
Interpretability of KPNN Network
Connectedness
Figure 11: Comparison of the network connectedness of TCR KPNN vs. TCR ANN

6/3/2023 5:46 PM 24
Enhanced KPNN Interpretability via
Optimized Learning Methods
• Application of generic deep learning to KPNNs instead of ANNs,
yielded high prediction accuracies with three recurrent challenges in
interpreting the learned models:
low reproducibility in the presence of network redundancy
lack of quantitative interpretability (i.e., node weights fail to quantitatively
reflect the predictive power of individual nodes)
uneven connectivity inherent to biological networks, which affects node
weights independent of training data

6/3/2023 5:46 PM 25
Enhanced KPNN Interpretability via
Optimized Learning Methods
1. Hidden node dropout.
2. Input dropout.
3. Correction for network structure.
• These three methodological optimizations enhanced the biological
interpretability of KPNNs both on simulated and on real data.
• It is anticipated that similar optimizations will also be relevant to
other studies in the growing field of interpretable deep learning.

6/3/2023 5:46 PM 26
KPNNs Infer a Regulatory Model of T-
Cell Receptor Stimulation
Figure 12: TCR stimulation model as inferred by KPNN

6/3/2023 5:46 PM 27
Conclusion
Introduction Pre-processing Results & Discussion
• Several technologies converged to enable interpretable deep learning on
biological networks with single-cell RNA-seq data as input.
large databases of signaling pathways and gene-regulatory interactions, built on high-
throughput experiments and on manual curation of many studies.
Advancements in single-cell sequencing makes it possible to obtain transcriptome
profiles for deep learning to play out its strengths.
Deep learning as a method for inferring hidden states in deep neural networks based
on large training datasets, allows to infer unobserved states in complex biological
networks.
• By combining collection of potential regulatory interactions from public
databases with single-cell RNA-seq data for the investigated biological
system, deep neural networks may indeed be able to unravel those
regulatory mechanisms that are relevant to the biological question of
interest.

Reference Paper
6/3/2023 5:46 PM 28
Introduction Sample Preparation Data Analysis
Considerations References

ANN in System Biology

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a ANN in System Biology

Semelhante a ANN in System Biology (20)

Último

Último (20)

ANN in System Biology

Notas do Editor