The KEDRI Integrated System for Personalised Modelling

The KEDRI Integrated System for
Personalised Modelling:
Software development
and experiment results
Prof. Nikola Kasabov Dr. Raphael Hu Gary Chen
The Knowledge Engineering and Discovery Research Institute (KEDRI)
Auckland University of Technology

www.kedri.info

23/11/2011 nkasabov@aut.ac.nz; rhu@aut.ac.nz

Overview

• Introduction
• The development of new algorithms and
methods for personalised modelling in KEDRI
• Software prototype demo
• Conclusion and future direction


KEDRI: The Knowledge Engineering and
Discovery Research Institute at AUT
(www.kedri.info)

• Established in 2002 by Prof. Nikola Kasabov
• Focus: novel information processing methods,
technologies and applications for discoveries across
different areas of science
• Methods are mainly based on personalised
modelling, brain information processing, evolution,
genetics and quantum physics;


KEDRI


Computational Modelling Techniques

Global, local and personalized modelling are three main approaches
for modelling and pattern discovery in machine learning area [1].

Global modelling creates a model from the data which covers the entire
problem space and is represented by a single function, e.g. a regression
function, a RBF, a MLP neural network, SVM, etc.

Local modelling builds a set of local models from data, each representing a
sub-space (e.g. a cluster) of the whole problem space. These models can
be a set of rules or a set of local regressions, etc.

Personalised modelling uses transductive reasoning to create a specific
model for each single data point (e.g. a data vector, a patient record) within
a localised problem space.


Why Personalised Modelling?
• The issue of using global modelling for prediction problems:
a global model is derived from all available data for the target and then applied
to any new patient anywhere at anytime. Prediction and treatment based on
global models are only effective for some patients (approx 70%) [2].

• Personalised Modelling:
The rationale behind personalised modeling paradigm is: since each person is
different, the most effective treatment could be only based on the detailed
analysis for this particular patient.

• The availability of utilising a variety of data:
DNA, RNA, protein expression, inheritance, disease, etc.

• The benefits of using personalised models for medical applications
– To produce better results for classification and prediction
– To create the profiling for individuals
– To provide a potential improvement scenario for individuals, if it is possible


Research Objectives of Personalised Modelling

• To create accurate personalised computational models:
the model is specific for an individual utilising the available information from
other individuals related to the same problem.

• To develop new algorithms and methods for personalised modelling;

• To apply the above proposed algorithms and methods on the data
from different sources:
gene expression data, protein data, SNPs (single-nucleotide polymorphism)
data, clinical data, etc;


The Integrated Method for Personalised
Modelling (IMPM) for Data Analysis

Learning
Feature
models, e.g. Outcome
selection
risk probability visualisation
Data evaluation, (personalised
Repository disease profiling, risk
Similarity Neighbour classification, probability)
measurement creation etc.

Optimisation
(evolutionary computation, snn)

The proposed framework and system using IMPM biomedical data analysis [2]

23/11/2011 nkasabov@aut.ac.nz;

Optimisation
Coevolutionary algorithm (CEA):
CEA is derived from evolutionary algorithm. The individuals in CEA
are from two or more populations and their assigned fitness values
based on their interactions between different populations.

A sample of a simple 2-species coevolutionary model.


Software Architecture of IMPM

An example of software architecture of ISPM


An Integrated Optimisation System for
Personalised Modelling (IOSPM)

• Cross-platform – implemented by QT which is able to be
compiled under different platforms, such as Microsoft
Windows, Mac OS, and Linux.

• Integrated – combine methods/functions written in different
languages (e.g. MATLAB, Python, JAVA and C/C++ etc).

• Extensible – new methods/functions can be easily plugged in by
editing system schema to generate dynamic GUI interface.

23/11/2011
nkasabov@aut.ac.nz;

An overview of the IOSPM system
Main GUI
Spiking Neural Network
-<<UI>>
+Select Data file()
+Select Optimisation Method()
+Select Modelling Method()
Lib SVM
+Select Data pre-processing()

I K-Nearest Neighbor
Data Loading N
T
E
R DENFIS
F
Data Pre-processing A
C
E
PM Optimisation WKNN/W KN
W N

1

Visualisation GUI INNTERFACE 2
-<<UI>>
+Select Visualisation Mode()
+Create Results Report() Data Report Generator

Visualisation Mode 1



MATLAB
Python
QT XML GUI Executable OpenGL C++ Code
Package
Package


Implementation of ISPM
An exemplar content of the modules is given below:

• Module for Neighbourhood Creation:
Euclidean distance method; Hamming distance method; Cosine distance
method; Kernel distance methods; other methods.

• Module for Classification/Prediction:
– Classification methods, such as: MLR, MLP, ECF, wkNN, wwkNN, TWNFI,
SVM, eSNN.
– Probability prediction methods, such as: DENFIS, TWNFI.

• Module for Optimisation:
Evolutionary computatio (EC), quantum inspired evolutionary algorithm, particle
swarm optimisation (PSO), quantum inspired PSO, other methods.

• Module for Task Distribution Centre:
This module will control the whole optimisation process, will communicate with
the user, will visualise the results.


Global Modelling vs. Personalised Modelling

Colon cancer gene expression data

Model Overall accuracy Class 1 Class 2
MLR (global) 72.58% 75.00% 68.18%
RBF (global) 79.03% 90.00% 59.09%
IMPM(personalised) 87.10% 90.00% 81.82%


Personalised Modelling for Bioinformatics Research

An example: applying PM on gene expression data for colon
cancer diagnosis
Compact GA Evolution Weighted importance of selected features
600
0.08

Weighted importance
0.8
500
0.7
0.06
0.6
400
generation

0.5
300 0.4 0.04
0.3
200
0.2 0.02
0.1
100
0
0
3771285
1892419 8121843
15743501991513 5611863814 8091069395 462 348
20 40 60 80 100 120 140
each bit represents one feature Index of genes

(a) The evolution of feature selection for (b) The weighted importance of the selected
sample #32 using 600 generations of GA features for sample #32 after one run of the
optimisation; method;

Results from a simple experiment on colon cancer gene expression data


Blue (Circle points) - actual value of this gene
Visualizing the results of PFS with 3 features Green Upward Triangle -Healthy Red Downward Triangle-Diseased
Blue (Circle points) - actual value of this gene
1400
Green Upward Triangle -Healthy; Red Downward Triangle-Diseased

Gene Expression Level
1200

1000
0.8
800
0.6
f1892

600
0.4

0.2 400

200
0.2 0.8
0.4 0.6
0.6 0.4 0
0.8 0.2 377 12851892 419 812 18431574 350 1991 513 561 1863 814 809 1069 395 462 348
1
f1285 Index of Selected Genes
f377
15 1
Colon cancer data - area under Curve: 0.87727

0.9

0.8

Classification Accuracy
10 0.7

0.6

0.5

0.4
5 ROC Curve
0.3 Overall Accuracy
Class 1 Accuracy
0.2 Class 2 Accuracy

0.1
0
419 377 1423 132 105818921982 350 79110601495 49 824 892129618631924 0
0 0.2 0.4 0.6 0.8 1
Threshold

(c) Sample 32 (a blue dot) is plotted with its neighbouring samples (red triangles represent cancer samples and green
triangles - control) in the 3D space of the top 3 gene variables from (b);
(d) The profile of sample #32 (blue dots) versus the average local profile of the control (green) and cancer (red triangles)
using the features from (b)
(e) The 17 most frequently selected features for all samples - the method is run 20 times for each sample;
(f)The accuracy of personalised diagnosis across all 60 samples when the 17 markers from (e) are used in a leave-one-
out cross validation; in case of ROC curve x axis represents false positive rate (1-specificity), while y axis is true
positive rate (sensitivity); the area under curve is 0.87727 and the overall accuracy - 87.10%;


Personalised Modelling (PM) for CVD Diagnosis and
Risk Prognosis
This study aims at personalised modelling for cardiovascular diseases
(CVD) diagnosis.

The dependency of classification accuracy on number of neighbors
0.9
overallAcc
class 1 acc
0.85 class 2 acc
Classification accuracy

0.8

0.75

0.7

0.65
5 10 15 20 25 30 35
Num of neighbors (k)

The PM method optimises automatically the number of the neighbouring samples K,
which can be unique for every input sample or chosen as an optimal for all.


Software Demo


Conclusion
• The proposed IMPM has a major advantage: the modelling process
starts with all relevant variables available for a person, rather than with
a fixed set of variables required by a global model.
• The proposed IMPM leads to a better prognostic accuracy and a
computed personalised profile;
• With global optimisation using IMPM, a small set of variables (potential
markers) can be identified from the selected variable set across the
whole population
• The proposed algorithms and models of IMPM are generic which can
be potentially incorporated into a variety of applications for data
analysis and knowledge discovery with certain constraints, such as
financial risk analysis, time series data prediction, etc
• We hope that this study will motivate the applications of personalised
modelling research in different research areas.


Reference List:

1. Kasabov, N.: Global, local and personalized modelling and pattern discovery in
bioinformatics: An integrated approach. Pattern Recognition Letters 28(6) (2007) 673–685.
2. Amnon Shabo. Health record banks: integrating clinical and genomic data into patientcentric
longitudinal and cross-institutional health records. Personalised Medicine, 4(4):453–455,
2007.
3. Kasabov, N and Hu, Y (2011) Integrated optimisation method for personalised modelling
and case study applications, Int. J. Functional Informatics and Personalised Medicine, vol. 3,
no.3, pp. 236-256, 2010.


Questions?


The KEDRI Integrated System for Personalised Modelling

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a The KEDRI Integrated System for Personalised Modelling

Semelhante a The KEDRI Integrated System for Personalised Modelling (20)

Mais de Health Informatics New Zealand

Mais de Health Informatics New Zealand (20)

Último

Último (20)

The KEDRI Integrated System for Personalised Modelling