SlideShare uma empresa Scribd logo
1 de 42
Baixar para ler offline
Analysis of
Gene Expression Data
     _______________________

            Jhoirene B. Clemente
       Algorithms and Complexity Lab
     University of the Philippines Diliman
Overview

● Definitions
● Clustering of Gene Expression Data
● Visualizations of Gene Expression Data
Definitions
Gene
Basic unit of heredity in a living organism.
It is normally a stretch of DNA that codes
for a type of protein or for an RNA chain
that has a function in the organism.

Gene Expression Data
Expression level of genes in an individual
that is measured through Microarray
Definitions
Definitions
Definitions
Gene Expression Data

                        Gene     Gene
                               Expression
                       a
                       b
                       c
                       ...
                       n
Definitions
Gene Expression Data                 1 Sample

                              Gene     Gene
                                     Expression
                             a
                             b
                      n
                   Samples   c
                             ...
                             n
Definitions
   (n x m) Data Matrix          m Samples


            Gene   Sample   Sample      .....   Sample
                     1        1                   m
           a
           b
   n
Samples    c
           ...
           n
Definitions
   (n x m) Data Matrix          m Samples


            Gene   Sample   Sample      .....   Sample
                     1        1                   m
           a
           b
   n
Samples    c
           ...
           n
Clustering




Clustering is the unsupervised classification of
patterns including observations, data sets and
feature vectors into groups called clusters,
such that objects in the same cluster are similar to
each other while objects in different clusters are
dissimilar as possible.
Clustering




Clustering is the unsupervised classification of
patterns including observations, data sets and
feature vectors into groups called clusters,
such that objects in the same cluster are similar to
each other while objects in different clusters are
dissimilar as possible.
Cluster Analysis
Preprocessing
 ● Filtering

 ● Normalization




                   Clustering



                                Analysis
Clustering
Partitional
●   K-means Algorithm
●   X-means Algorithm



Hierarchical
Clustering
Given the (n x m) data matrix, we can

●   Cluster the set of genes
●   Cluster the set of samples
●   Cluster the set of genes and samples
    simultaneously.
Data Set
Data set is a time series gene expression data from
a synchronized population of yeast.
Data Set
Data set is a time series gene expression data from
a synchronized population of yeast.
Preprocessing
Filtering
 ● Removed genes not involved in cell cycle

    regulation
 ● Removed genes belonging to more than one

    group

Normalization
● All gene expression values range from -1.0 to

  1.0.
Data Set
Data matrix (384 genes and 17 samples) with 5
classifications.
Groupings based from cell cycle phase activation.
Data Set
Group 1: Resting Phase
Data Set
Group 2: First Growth Phase
Data Set
Group 3: Synthesis Phase
Data Set
Group 4: Second Growth Phase
Data Set
Group 5: Cell Division
Clustering of genes
K-means Algorithm

Given n data points in Rd
1. Assign k initial centers of the k clusters
2. Assign all the data points to the nearest cluster
   (Euclidean distance, Manhattan distance, etc.)
3. Adjust the k centers
4. Repeat steps 2 and 3 until convergence
Clustering of genes
K-means Algorithm

Given n data points in Rd
1. Assign k initial centers of the k clusters
2. Assign all the data points to the nearest cluster
   (Euclidean distance, Manhattan distance, etc.)
3. Adjust the k centers
4. Repeat steps 2 and 3 until convergence
                   k =5
    since we want to approximate the 5
Clustering of genes
Initialization

1. Choose the first k centers that will maximize the
   distance between the clusters
2. Sort the distances between all the data points
   and then choose the k initial points at constant
   intervals from the sorted list
3. Use the first k points in the data set as the first k
   centers
Clustering of genes
Using k-means clustering, with k =5
Clustering of genes
●   Clustering may suggest possible roles for genes
    with unknown functions
●   Clustering the samples or experiments may shed
    light on new subtypes of diseases.
●   Identify which type of treatment is suited for a
    specific type of cancer.
●   Building genetic networks
visualization
Vector Fusion
Non-metric Multidimensional Scaling (nMDS)
Principal Components Analysis (PCA)
Vector fusion
Visualization technique that uses the Single point
broken line parallel algorithm
nMDS visualization
Input (Dissimilarity Matrix=|ij|) actual distance
 ● In nMDS, only the rank order of entries is

   assumed to contain the significant information.
 ● Thus, the purpose of the non-metric MDS

   algorithm is to find a configuration of points
   whose distances reflect as closely as possible
   the rank order of the data.
 ● The transformation is by using a non parametric

   function f. (monotone regression)

             dij= f(dij) pseudo-distance
PCA
vector fusion
visualization
nmds visualization
nmds visualization
nmds visualization
nmds visualization
nmds visualization
nmds visualization
nmds visualization
References
2010: "Non-Metric Multidimensional Scaling and Vector
Fusion Visualization of Cell Cycle Independent Gene
Expressions for Gene Function Analysis", Clemente J.,
Salido J.A., (2010), Published in the conference
proceedings of National Conference on Information
Technology for Education(NCITE) 2010 and Philippine IT
Journal Feb 2011 Issue.

2010: "Cluster Analysis for Identifying Genes Highly
Correlated with a Phenotype", Clemente J.,
Undergraduate thesis, Department of Computer Science,
University of the Philippines Diliman
Thank you for
  Listening

Mais conteúdo relacionado

Mais procurados

Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
hemantbreeder
 

Mais procurados (20)

Introduction of bioinformatics
Introduction of bioinformaticsIntroduction of bioinformatics
Introduction of bioinformatics
 
Sequence assembly
Sequence assemblySequence assembly
Sequence assembly
 
Nucleic Acid Sequence Databases
Nucleic Acid Sequence DatabasesNucleic Acid Sequence Databases
Nucleic Acid Sequence Databases
 
NCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology InformationNCBI National Center for Biotechnology Information
NCBI National Center for Biotechnology Information
 
Primary and secondary database
Primary and secondary databasePrimary and secondary database
Primary and secondary database
 
Blast and fasta
Blast and fastaBlast and fasta
Blast and fasta
 
Genomic databases
Genomic databasesGenomic databases
Genomic databases
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
Biological databases
Biological databasesBiological databases
Biological databases
 
Introduction to NCBI
Introduction to NCBIIntroduction to NCBI
Introduction to NCBI
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
 
Genomics
GenomicsGenomics
Genomics
 
Genome assembly
Genome assemblyGenome assembly
Genome assembly
 
Genomics
GenomicsGenomics
Genomics
 
Phylogenetic analysis
Phylogenetic analysisPhylogenetic analysis
Phylogenetic analysis
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
NCBI
NCBINCBI
NCBI
 
Genome sequencing
Genome sequencingGenome sequencing
Genome sequencing
 

Destaque

Graph properties of biological networks
Graph properties of biological networksGraph properties of biological networks
Graph properties of biological networks
ngulbahce
 
Systems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelSystems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems level
Lars Juhl Jensen
 

Destaque (12)

Introduction to Network Medicine
Introduction to Network MedicineIntroduction to Network Medicine
Introduction to Network Medicine
 
The Genopolis Microarray database
The Genopolis Microarray databaseThe Genopolis Microarray database
The Genopolis Microarray database
 
Graph properties of biological networks
Graph properties of biological networksGraph properties of biological networks
Graph properties of biological networks
 
Artificial Intelligence in Data Curation
Artificial Intelligence in Data CurationArtificial Intelligence in Data Curation
Artificial Intelligence in Data Curation
 
Gene expression concept and analysis
Gene expression concept and analysisGene expression concept and analysis
Gene expression concept and analysis
 
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...
The Opera of Phantome - 2017 (presented at the 22nd Biennial Evergreen Phage ...
 
RT-PCR
RT-PCRRT-PCR
RT-PCR
 
System biology and its tools
System biology and its toolsSystem biology and its tools
System biology and its tools
 
Systems biology & Approaches of genomics and proteomics
 Systems biology & Approaches of genomics and proteomics Systems biology & Approaches of genomics and proteomics
Systems biology & Approaches of genomics and proteomics
 
Systems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems levelSystems biology - Understanding biology at the systems level
Systems biology - Understanding biology at the systems level
 
Introduction to systems biology
Introduction to systems biologyIntroduction to systems biology
Introduction to systems biology
 
Dr. Leroy Hood Lecuture on P4 Medicine
Dr. Leroy Hood Lecuture on P4 MedicineDr. Leroy Hood Lecuture on P4 Medicine
Dr. Leroy Hood Lecuture on P4 Medicine
 

Semelhante a Gene Expression Data Analysis

LE03.doc
LE03.docLE03.doc
LE03.doc
butest
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
Computer Science Club
 
Seminar Slides
Seminar SlidesSeminar Slides
Seminar Slides
pannicle
 
Session ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corrSession ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corr
USD Bioinformatics
 
Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysis
yuvraj404
 
Survey and Evaluation of Methods for Tissue Classification
Survey and Evaluation of Methods for Tissue ClassificationSurvey and Evaluation of Methods for Tissue Classification
Survey and Evaluation of Methods for Tissue Classification
perfj
 

Semelhante a Gene Expression Data Analysis (20)

LE03.doc
LE03.docLE03.doc
LE03.doc
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
 
20100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_020100509 bioinformatics kapushesky_lecture03-04_0
20100509 bioinformatics kapushesky_lecture03-04_0
 
Seminar Slides
Seminar SlidesSeminar Slides
Seminar Slides
 
Dbm630 lecture09
Dbm630 lecture09Dbm630 lecture09
Dbm630 lecture09
 
Gene expression profiling ii
Gene expression profiling  iiGene expression profiling  ii
Gene expression profiling ii
 
MCQs on DNA MicroArray.pdf
MCQs on DNA MicroArray.pdfMCQs on DNA MicroArray.pdf
MCQs on DNA MicroArray.pdf
 
Doctoral Thesis Dissertation 2014-03-20 @PoliMi
Doctoral Thesis Dissertation 2014-03-20 @PoliMiDoctoral Thesis Dissertation 2014-03-20 @PoliMi
Doctoral Thesis Dissertation 2014-03-20 @PoliMi
 
RNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the TranscriptomeRNA-seq: A High-resolution View of the Transcriptome
RNA-seq: A High-resolution View of the Transcriptome
 
Session ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corrSession ii g1 lab genomics and gene expression mmc-corr
Session ii g1 lab genomics and gene expression mmc-corr
 
20131019 生物物理若手 Journal Club
20131019 生物物理若手 Journal Club20131019 生物物理若手 Journal Club
20131019 生物物理若手 Journal Club
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019 Islamic University Pattern Recognition & Neural Network 2019
Islamic University Pattern Recognition & Neural Network 2019
 
Gene expression profiling i
Gene expression profiling  iGene expression profiling  i
Gene expression profiling i
 
Identification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning MethodIdentification of Differentially Expressed Genes by unsupervised Learning Method
Identification of Differentially Expressed Genes by unsupervised Learning Method
 
Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysis
 
Survey and Evaluation of Methods for Tissue Classification
Survey and Evaluation of Methods for Tissue ClassificationSurvey and Evaluation of Methods for Tissue Classification
Survey and Evaluation of Methods for Tissue Classification
 
31931 31941
31931 3194131931 31941
31931 31941
 
High Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and VisualizationHigh Dimensional Biological Data Analysis and Visualization
High Dimensional Biological Data Analysis and Visualization
 
Genetic Programming for Generating Prototypes in Classification Problems
Genetic Programming for Generating Prototypes in Classification ProblemsGenetic Programming for Generating Prototypes in Classification Problems
Genetic Programming for Generating Prototypes in Classification Problems
 

Mais de Jhoirene Clemente

Parallel Random Projection for Motif Discovery on GPUs
Parallel Random Projection for Motif Discovery on GPUsParallel Random Projection for Motif Discovery on GPUs
Parallel Random Projection for Motif Discovery on GPUs
Jhoirene Clemente
 

Mais de Jhoirene Clemente (7)

Reoptimization Algorithms and Persistent Turing Machines
Reoptimization Algorithms and Persistent Turing MachinesReoptimization Algorithms and Persistent Turing Machines
Reoptimization Algorithms and Persistent Turing Machines
 
LaTex Tutorial
LaTex TutorialLaTex Tutorial
LaTex Tutorial
 
Introduction to Approximation Algorithms
Introduction to Approximation AlgorithmsIntroduction to Approximation Algorithms
Introduction to Approximation Algorithms
 
Reoptimization techniques for solving hard problems
Reoptimization techniques for solving hard problemsReoptimization techniques for solving hard problems
Reoptimization techniques for solving hard problems
 
Randomized Computation
Randomized ComputationRandomized Computation
Randomized Computation
 
Parallel Random Projection for Motif Discovery on GPUs
Parallel Random Projection for Motif Discovery on GPUsParallel Random Projection for Motif Discovery on GPUs
Parallel Random Projection for Motif Discovery on GPUs
 
Consurrent Processes and Reaction
Consurrent Processes and ReactionConsurrent Processes and Reaction
Consurrent Processes and Reaction
 

Último

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Último (20)

ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 

Gene Expression Data Analysis

  • 1. Analysis of Gene Expression Data _______________________ Jhoirene B. Clemente Algorithms and Complexity Lab University of the Philippines Diliman
  • 2. Overview ● Definitions ● Clustering of Gene Expression Data ● Visualizations of Gene Expression Data
  • 3. Definitions Gene Basic unit of heredity in a living organism. It is normally a stretch of DNA that codes for a type of protein or for an RNA chain that has a function in the organism. Gene Expression Data Expression level of genes in an individual that is measured through Microarray
  • 6. Definitions Gene Expression Data Gene Gene Expression a b c ... n
  • 7. Definitions Gene Expression Data 1 Sample Gene Gene Expression a b n Samples c ... n
  • 8. Definitions (n x m) Data Matrix m Samples Gene Sample Sample ..... Sample 1 1 m a b n Samples c ... n
  • 9. Definitions (n x m) Data Matrix m Samples Gene Sample Sample ..... Sample 1 1 m a b n Samples c ... n
  • 10. Clustering Clustering is the unsupervised classification of patterns including observations, data sets and feature vectors into groups called clusters, such that objects in the same cluster are similar to each other while objects in different clusters are dissimilar as possible.
  • 11. Clustering Clustering is the unsupervised classification of patterns including observations, data sets and feature vectors into groups called clusters, such that objects in the same cluster are similar to each other while objects in different clusters are dissimilar as possible.
  • 12. Cluster Analysis Preprocessing ● Filtering ● Normalization Clustering Analysis
  • 13. Clustering Partitional ● K-means Algorithm ● X-means Algorithm Hierarchical
  • 14. Clustering Given the (n x m) data matrix, we can ● Cluster the set of genes ● Cluster the set of samples ● Cluster the set of genes and samples simultaneously.
  • 15. Data Set Data set is a time series gene expression data from a synchronized population of yeast.
  • 16. Data Set Data set is a time series gene expression data from a synchronized population of yeast.
  • 17. Preprocessing Filtering ● Removed genes not involved in cell cycle regulation ● Removed genes belonging to more than one group Normalization ● All gene expression values range from -1.0 to 1.0.
  • 18. Data Set Data matrix (384 genes and 17 samples) with 5 classifications. Groupings based from cell cycle phase activation.
  • 19. Data Set Group 1: Resting Phase
  • 20. Data Set Group 2: First Growth Phase
  • 21. Data Set Group 3: Synthesis Phase
  • 22. Data Set Group 4: Second Growth Phase
  • 23. Data Set Group 5: Cell Division
  • 24. Clustering of genes K-means Algorithm Given n data points in Rd 1. Assign k initial centers of the k clusters 2. Assign all the data points to the nearest cluster (Euclidean distance, Manhattan distance, etc.) 3. Adjust the k centers 4. Repeat steps 2 and 3 until convergence
  • 25. Clustering of genes K-means Algorithm Given n data points in Rd 1. Assign k initial centers of the k clusters 2. Assign all the data points to the nearest cluster (Euclidean distance, Manhattan distance, etc.) 3. Adjust the k centers 4. Repeat steps 2 and 3 until convergence k =5 since we want to approximate the 5
  • 26. Clustering of genes Initialization 1. Choose the first k centers that will maximize the distance between the clusters 2. Sort the distances between all the data points and then choose the k initial points at constant intervals from the sorted list 3. Use the first k points in the data set as the first k centers
  • 27. Clustering of genes Using k-means clustering, with k =5
  • 28. Clustering of genes ● Clustering may suggest possible roles for genes with unknown functions ● Clustering the samples or experiments may shed light on new subtypes of diseases. ● Identify which type of treatment is suited for a specific type of cancer. ● Building genetic networks
  • 29. visualization Vector Fusion Non-metric Multidimensional Scaling (nMDS) Principal Components Analysis (PCA)
  • 30. Vector fusion Visualization technique that uses the Single point broken line parallel algorithm
  • 31. nMDS visualization Input (Dissimilarity Matrix=|ij|) actual distance ● In nMDS, only the rank order of entries is assumed to contain the significant information. ● Thus, the purpose of the non-metric MDS algorithm is to find a configuration of points whose distances reflect as closely as possible the rank order of the data. ● The transformation is by using a non parametric function f. (monotone regression) dij= f(dij) pseudo-distance
  • 32. PCA
  • 41. References 2010: "Non-Metric Multidimensional Scaling and Vector Fusion Visualization of Cell Cycle Independent Gene Expressions for Gene Function Analysis", Clemente J., Salido J.A., (2010), Published in the conference proceedings of National Conference on Information Technology for Education(NCITE) 2010 and Philippine IT Journal Feb 2011 Issue. 2010: "Cluster Analysis for Identifying Genes Highly Correlated with a Phenotype", Clemente J., Undergraduate thesis, Department of Computer Science, University of the Philippines Diliman
  • 42. Thank you for Listening