SlideShare uma empresa Scribd logo
1 de 32
Azhar Ali Shah @ Interdisciplinary Optimization and Decision Making  Journal Club (IODMJC) IODMJC, March 20 , 2009
Overview  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31
Introduction:  authors Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31
Introduction:  Hierarchical  Clustering Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31
Introduction:  Hierarchical Clustering ,[object Object],[object Object],[object Object],[object Object],[object Object],Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31
Introduction:  about the topic  Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31 There is no guideline for selecting the best linkage method. In practice, people almost always use  average linkage. UPGMA  (Unweighted Pair Group Method using arithmetic Averages) Scalable to large datasets as it requires only (O(1)) edges in memory. BUT Highly susceptible to outliers!
Introduction:  UPGMA ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Introduction:  UPGMA -Sparse input N=11  input singletons ( vertices ): {1,2,3,4,11,12,13,14,21,22,23}  and  14 edges  in the sparse input.   The input is considered  sparse  since  not all pairs are given  e.g. there is no edge b/w 1 and 22.  Clusters  1,2,3,4  form a  clique  A.  Clusters  11,12,13,14  are missing edge < 11,14 > to form  clique  B.  Clusters  21,22,23  are loosely connected to each other and to the cluster of  clique  A.  In total there are two connected components in the input graph:  ({1,2,3,4,21,22,23})  (producing 6 merges for 7 vertices) and  {11,12,13,14}  (producing 4 merges for 3 nodes), which therefore forms a  forest of two disjoint trees , rather than the full tree of N-1=10 merges.  UPGMA-input 90 23 1 70 23 22 50 22 21 30 14 13 20 14 12 12 13 12 11 13 11 1e+01 12 11 4e-10 4 3 1e-50 4 2 1e-80 3 2 2e-40 4 1 1e-40 3 1 1e-100 2 1 UPGMA-tree 32 99.167 31 26 31 85 29 23 30 50 28 14 29 50 22 21 28 11.5 27 13 27 10 12 11 26 1.33e-10 25 4 25 5e-41 24 3 24 1e-100 2 1
Research Problem:  UPGMA ,[object Object],Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31 This data renders UPGMA impractical
Methodology: 1)  Sparse-UPGMA Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31 Can’t  cope with huge datasets, where an  O ( E ) memory requirement is intolerable (e.g. Table 1).  UPGMA (mean): New eq: Time and memory improvement:
Methodology: 2)  Multi-Round MC-UPGMA ,[object Object],[object Object],[object Object],Illustration of  non-metric  constraints imposed by BLAST sequence similarities (eges).  False transitivity  is possible due to CSKP_HUMAN.
Methodology: 2)  Multi-Round MC-UPGMA ,[object Object],[object Object],Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31
Methodology: 2)  Multi-Round MC-UPGMA Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31 ,[object Object],[object Object]
Methodology: 2)  Single-Round MC-UPGMA Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31 Requires O(n) memory for holding forming tree!
Methodology: 2)  Single-Round MC-UPGMA
Methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Methods ,[object Object],[object Object],[object Object],Jaccard Score
Results ,[object Object],[object Object],[object Object],[object Object]
Results Smith–Waterman BLAST Sparse UPGMA With reduced dataset 220K 1.80M
Results 200 clustering rounds on a single 4GB memory 4-CPU workstation took about 1-2 days.
Results
Observations ,[object Object],[object Object]
Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31
Cluster Card Page
View Proteins of Cluster
Keywords Appearances
Cluster Similarity Distribution
similarity matrix for the proteins in this cluster
 
 
 
 

Mais conteúdo relacionado

Mais procurados

B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 
Presentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informaticePresentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informaticezahid6
 
BTrees - Great alternative to Red Black, AVL and other BSTs
BTrees - Great alternative to Red Black, AVL and other BSTsBTrees - Great alternative to Red Black, AVL and other BSTs
BTrees - Great alternative to Red Black, AVL and other BSTsAmrinder Arora
 
Blast fasta
Blast fastaBlast fasta
Blast fastayaghava
 
Graphs, Trees, Paths and Their Representations
Graphs, Trees, Paths and Their RepresentationsGraphs, Trees, Paths and Their Representations
Graphs, Trees, Paths and Their RepresentationsAmrinder Arora
 
Bioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeBioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeProf. Wim Van Criekinge
 
Swaati algorithm of alignment ppt
Swaati algorithm of alignment pptSwaati algorithm of alignment ppt
Swaati algorithm of alignment pptSwati Kumari
 
Product to a Power
Product to a PowerProduct to a Power
Product to a Powertoni dimella
 
Splay Trees and Self Organizing Data Structures
Splay Trees and Self Organizing Data StructuresSplay Trees and Self Organizing Data Structures
Splay Trees and Self Organizing Data StructuresAmrinder Arora
 
Prediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methodsPrediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methodsziggurat
 
Data Structure with C -Part-2 ADT,Array, Strucure and Union
Data Structure with C -Part-2 ADT,Array, Strucure and  UnionData Structure with C -Part-2 ADT,Array, Strucure and  Union
Data Structure with C -Part-2 ADT,Array, Strucure and UnionSyed Mustafa
 

Mais procurados (20)

B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
Presentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informaticePresentation for blast algorithm bio-informatice
Presentation for blast algorithm bio-informatice
 
Syabus
SyabusSyabus
Syabus
 
BTrees - Great alternative to Red Black, AVL and other BSTs
BTrees - Great alternative to Red Black, AVL and other BSTsBTrees - Great alternative to Red Black, AVL and other BSTs
BTrees - Great alternative to Red Black, AVL and other BSTs
 
Phylogenetics: Tree building
Phylogenetics: Tree buildingPhylogenetics: Tree building
Phylogenetics: Tree building
 
Blast fasta
Blast fastaBlast fasta
Blast fasta
 
Graphs, Trees, Paths and Their Representations
Graphs, Trees, Paths and Their RepresentationsGraphs, Trees, Paths and Their Representations
Graphs, Trees, Paths and Their Representations
 
synopsis_divyesh
synopsis_divyeshsynopsis_divyesh
synopsis_divyesh
 
Bioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekingeBioinformatics t5-database searching-v2013_wim_vancriekinge
Bioinformatics t5-database searching-v2013_wim_vancriekinge
 
dot plot analysis
dot plot analysisdot plot analysis
dot plot analysis
 
blast and fasta
 blast and fasta blast and fasta
blast and fasta
 
Upgma
UpgmaUpgma
Upgma
 
Use of the Tree.
Use of the Tree.Use of the Tree.
Use of the Tree.
 
Swaati algorithm of alignment ppt
Swaati algorithm of alignment pptSwaati algorithm of alignment ppt
Swaati algorithm of alignment ppt
 
Product to a Power
Product to a PowerProduct to a Power
Product to a Power
 
Biological sequences analysis
Biological sequences analysisBiological sequences analysis
Biological sequences analysis
 
Splay Trees and Self Organizing Data Structures
Splay Trees and Self Organizing Data StructuresSplay Trees and Self Organizing Data Structures
Splay Trees and Self Organizing Data Structures
 
Prediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methodsPrediction of transcription factor binding to DNA using rule induction methods
Prediction of transcription factor binding to DNA using rule induction methods
 
Slides -a._afanasiev
Slides  -a._afanasievSlides  -a._afanasiev
Slides -a._afanasiev
 
Data Structure with C -Part-2 ADT,Array, Strucure and Union
Data Structure with C -Part-2 ADT,Array, Strucure and  UnionData Structure with C -Part-2 ADT,Array, Strucure and  Union
Data Structure with C -Part-2 ADT,Array, Strucure and Union
 

Destaque

Final Journal Club Presentation
Final Journal Club PresentationFinal Journal Club Presentation
Final Journal Club PresentationAnna Schemel
 
The Structural Basis for Agonist and Partial Agonist
The Structural Basis for Agonist and Partial AgonistThe Structural Basis for Agonist and Partial Agonist
The Structural Basis for Agonist and Partial AgonistLucas Man
 
20140328 TNTL journal club axion electrodynamics, TI-FI interface (nomura, ...
20140328 TNTL journal club   axion electrodynamics, TI-FI interface (nomura, ...20140328 TNTL journal club   axion electrodynamics, TI-FI interface (nomura, ...
20140328 TNTL journal club axion electrodynamics, TI-FI interface (nomura, ...Dongwook Go
 
Pseudogene Journal Club Presentation
Pseudogene Journal Club PresentationPseudogene Journal Club Presentation
Pseudogene Journal Club PresentationLucas Man
 
Journal Club - Early versus Late Parenteral Nutrition in Critically Ill Adults
Journal Club - Early versus Late Parenteral Nutrition in Critically Ill AdultsJournal Club - Early versus Late Parenteral Nutrition in Critically Ill Adults
Journal Club - Early versus Late Parenteral Nutrition in Critically Ill AdultsJoy Awoniyi
 
Schaefer, Joseph, R. Fidaxomicin Presentation
Schaefer, Joseph, R. Fidaxomicin PresentationSchaefer, Joseph, R. Fidaxomicin Presentation
Schaefer, Joseph, R. Fidaxomicin PresentationJoseph Schaefer
 
Parkinson's Disease Presentation
Parkinson's Disease PresentationParkinson's Disease Presentation
Parkinson's Disease PresentationSteven Zuckerman
 
Azithromycin for prevention of exacerbations of copd
Azithromycin for prevention of exacerbations of copdAzithromycin for prevention of exacerbations of copd
Azithromycin for prevention of exacerbations of copdWarawut Ia
 
Acute exacerbation of COPD
Acute exacerbation of COPDAcute exacerbation of COPD
Acute exacerbation of COPDThomas Kurian
 
Journal Club: Daily Corticosteroids Reduce Infection-associated Relapses in F...
Journal Club: Daily Corticosteroids Reduce Infection-associated Relapses in F...Journal Club: Daily Corticosteroids Reduce Infection-associated Relapses in F...
Journal Club: Daily Corticosteroids Reduce Infection-associated Relapses in F...Hofstra Northwell School of Medicine
 
Journal Club: Fidaxomicin versus Vancomycin for Clostridium Difficile Infection
Journal Club: Fidaxomicin versus Vancomycin for Clostridium Difficile InfectionJournal Club: Fidaxomicin versus Vancomycin for Clostridium Difficile Infection
Journal Club: Fidaxomicin versus Vancomycin for Clostridium Difficile InfectionJoy Awoniyi
 
Prevention of Venous Thromboembolism
Prevention of Venous ThromboembolismPrevention of Venous Thromboembolism
Prevention of Venous ThromboembolismJoy Awoniyi
 
Journal Club: Thrombin-Receptor Antagonist Vorapaxar in Acute Coronary Syndromes
Journal Club: Thrombin-Receptor Antagonist Vorapaxar in Acute Coronary SyndromesJournal Club: Thrombin-Receptor Antagonist Vorapaxar in Acute Coronary Syndromes
Journal Club: Thrombin-Receptor Antagonist Vorapaxar in Acute Coronary SyndromesJoy Awoniyi
 
Parkinsons Disease
Parkinsons DiseaseParkinsons Disease
Parkinsons Diseasetest
 
How to present a journal club
How to present a journal clubHow to present a journal club
How to present a journal clubsanch1684
 

Destaque (19)

Journal Club @ UVigo 2011.07.22
Journal Club @ UVigo 2011.07.22Journal Club @ UVigo 2011.07.22
Journal Club @ UVigo 2011.07.22
 
Final Journal Club Presentation
Final Journal Club PresentationFinal Journal Club Presentation
Final Journal Club Presentation
 
The Structural Basis for Agonist and Partial Agonist
The Structural Basis for Agonist and Partial AgonistThe Structural Basis for Agonist and Partial Agonist
The Structural Basis for Agonist and Partial Agonist
 
20140328 TNTL journal club axion electrodynamics, TI-FI interface (nomura, ...
20140328 TNTL journal club   axion electrodynamics, TI-FI interface (nomura, ...20140328 TNTL journal club   axion electrodynamics, TI-FI interface (nomura, ...
20140328 TNTL journal club axion electrodynamics, TI-FI interface (nomura, ...
 
Pseudogene Journal Club Presentation
Pseudogene Journal Club PresentationPseudogene Journal Club Presentation
Pseudogene Journal Club Presentation
 
Journal Club - Early versus Late Parenteral Nutrition in Critically Ill Adults
Journal Club - Early versus Late Parenteral Nutrition in Critically Ill AdultsJournal Club - Early versus Late Parenteral Nutrition in Critically Ill Adults
Journal Club - Early versus Late Parenteral Nutrition in Critically Ill Adults
 
Schaefer, Joseph, R. Fidaxomicin Presentation
Schaefer, Joseph, R. Fidaxomicin PresentationSchaefer, Joseph, R. Fidaxomicin Presentation
Schaefer, Joseph, R. Fidaxomicin Presentation
 
Rituximab CJASN Journal Club
Rituximab CJASN Journal ClubRituximab CJASN Journal Club
Rituximab CJASN Journal Club
 
Parkinson's Disease Presentation
Parkinson's Disease PresentationParkinson's Disease Presentation
Parkinson's Disease Presentation
 
Azithromycin for prevention of exacerbations of copd
Azithromycin for prevention of exacerbations of copdAzithromycin for prevention of exacerbations of copd
Azithromycin for prevention of exacerbations of copd
 
Acute exacerbation of COPD
Acute exacerbation of COPDAcute exacerbation of COPD
Acute exacerbation of COPD
 
Journal Club: Daily Corticosteroids Reduce Infection-associated Relapses in F...
Journal Club: Daily Corticosteroids Reduce Infection-associated Relapses in F...Journal Club: Daily Corticosteroids Reduce Infection-associated Relapses in F...
Journal Club: Daily Corticosteroids Reduce Infection-associated Relapses in F...
 
Journal Club: Fidaxomicin versus Vancomycin for Clostridium Difficile Infection
Journal Club: Fidaxomicin versus Vancomycin for Clostridium Difficile InfectionJournal Club: Fidaxomicin versus Vancomycin for Clostridium Difficile Infection
Journal Club: Fidaxomicin versus Vancomycin for Clostridium Difficile Infection
 
Genetic Basis Of Parkinson Disease
Genetic Basis Of Parkinson DiseaseGenetic Basis Of Parkinson Disease
Genetic Basis Of Parkinson Disease
 
Prevention of Venous Thromboembolism
Prevention of Venous ThromboembolismPrevention of Venous Thromboembolism
Prevention of Venous Thromboembolism
 
Journal Club
Journal ClubJournal Club
Journal Club
 
Journal Club: Thrombin-Receptor Antagonist Vorapaxar in Acute Coronary Syndromes
Journal Club: Thrombin-Receptor Antagonist Vorapaxar in Acute Coronary SyndromesJournal Club: Thrombin-Receptor Antagonist Vorapaxar in Acute Coronary Syndromes
Journal Club: Thrombin-Receptor Antagonist Vorapaxar in Acute Coronary Syndromes
 
Parkinsons Disease
Parkinsons DiseaseParkinsons Disease
Parkinsons Disease
 
How to present a journal club
How to present a journal clubHow to present a journal club
How to present a journal club
 

Semelhante a Presentation 2009 Journal Club Azhar Ali Shah

The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...CSCJournals
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...Waqas Tariq
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07Computer Science Club
 
Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programmingNixon Mendez
 
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics Computational Materials Science Initiative
 
Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...TELKOMNIKA JOURNAL
 
Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS TECSI FEA USP
 
Msa & rooted/unrooted tree
Msa & rooted/unrooted treeMsa & rooted/unrooted tree
Msa & rooted/unrooted treeSamiul Ehsan
 
04 15029 active node ijeecs 1570310145(edit)
04 15029 active node ijeecs 1570310145(edit)04 15029 active node ijeecs 1570310145(edit)
04 15029 active node ijeecs 1570310145(edit)nooriasukmaningtyas
 
Nural network ER.Abhishek k. upadhyay
Nural network  ER.Abhishek k. upadhyayNural network  ER.Abhishek k. upadhyay
Nural network ER.Abhishek k. upadhyayabhishek upadhyay
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastRai University
 
Graph theoretic neuromorphology
Graph theoretic neuromorphologyGraph theoretic neuromorphology
Graph theoretic neuromorphologyTamalBatabyal
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsIJMER
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Natalio Krasnogor
 
Elastic path2path (International Conference on Image Processing'18)
Elastic path2path (International Conference on Image Processing'18)Elastic path2path (International Conference on Image Processing'18)
Elastic path2path (International Conference on Image Processing'18)TamalBatabyal
 

Semelhante a Presentation 2009 Journal Club Azhar Ali Shah (20)

The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
 
report
reportreport
report
 
20100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture0720100515 bioinformatics kapushesky_lecture07
20100515 bioinformatics kapushesky_lecture07
 
post119s1-file2
post119s1-file2post119s1-file2
post119s1-file2
 
BioINfo.pptx
BioINfo.pptxBioINfo.pptx
BioINfo.pptx
 
Clustering and Visualisation using R programming
Clustering and Visualisation using R programmingClustering and Visualisation using R programming
Clustering and Visualisation using R programming
 
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
CMSI計算科学技術特論A (2015) 第13回 Parallelization of Molecular Dynamics
 
Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...Kernal based speaker specific feature extraction and its applications in iTau...
Kernal based speaker specific feature extraction and its applications in iTau...
 
Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS Automated Clustering Project - 12th CONTECSI 34th WCARS
Automated Clustering Project - 12th CONTECSI 34th WCARS
 
Msa & rooted/unrooted tree
Msa & rooted/unrooted treeMsa & rooted/unrooted tree
Msa & rooted/unrooted tree
 
04 15029 active node ijeecs 1570310145(edit)
04 15029 active node ijeecs 1570310145(edit)04 15029 active node ijeecs 1570310145(edit)
04 15029 active node ijeecs 1570310145(edit)
 
Nural network ER.Abhishek k. upadhyay
Nural network  ER.Abhishek k. upadhyayNural network  ER.Abhishek k. upadhyay
Nural network ER.Abhishek k. upadhyay
 
B.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blastB.sc biochem i bobi u 3.2 algorithm + blast
B.sc biochem i bobi u 3.2 algorithm + blast
 
FractalTreeIndex
FractalTreeIndexFractalTreeIndex
FractalTreeIndex
 
H010223640
H010223640H010223640
H010223640
 
Graph theoretic neuromorphology
Graph theoretic neuromorphologyGraph theoretic neuromorphology
Graph theoretic neuromorphology
 
An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data Fragments
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
 
Elastic path2path (International Conference on Image Processing'18)
Elastic path2path (International Conference on Image Processing'18)Elastic path2path (International Conference on Image Processing'18)
Elastic path2path (International Conference on Image Processing'18)
 

Último

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 

Último (20)

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 

Presentation 2009 Journal Club Azhar Ali Shah

  • 1. Azhar Ali Shah @ Interdisciplinary Optimization and Decision Making Journal Club (IODMJC) IODMJC, March 20 , 2009
  • 2.
  • 3. Introduction: authors Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31
  • 4. Introduction: Hierarchical Clustering Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31
  • 5.
  • 6. Introduction: about the topic Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31 There is no guideline for selecting the best linkage method. In practice, people almost always use average linkage. UPGMA (Unweighted Pair Group Method using arithmetic Averages) Scalable to large datasets as it requires only (O(1)) edges in memory. BUT Highly susceptible to outliers!
  • 7.
  • 8. Introduction: UPGMA -Sparse input N=11 input singletons ( vertices ): {1,2,3,4,11,12,13,14,21,22,23} and 14 edges in the sparse input. The input is considered sparse since not all pairs are given e.g. there is no edge b/w 1 and 22. Clusters 1,2,3,4 form a clique A. Clusters 11,12,13,14 are missing edge < 11,14 > to form clique B. Clusters 21,22,23 are loosely connected to each other and to the cluster of clique A. In total there are two connected components in the input graph: ({1,2,3,4,21,22,23}) (producing 6 merges for 7 vertices) and {11,12,13,14} (producing 4 merges for 3 nodes), which therefore forms a forest of two disjoint trees , rather than the full tree of N-1=10 merges. UPGMA-input 90 23 1 70 23 22 50 22 21 30 14 13 20 14 12 12 13 12 11 13 11 1e+01 12 11 4e-10 4 3 1e-50 4 2 1e-80 3 2 2e-40 4 1 1e-40 3 1 1e-100 2 1 UPGMA-tree 32 99.167 31 26 31 85 29 23 30 50 28 14 29 50 22 21 28 11.5 27 13 27 10 12 11 26 1.33e-10 25 4 25 5e-41 24 3 24 1e-100 2 1
  • 9.
  • 10. Methodology: 1) Sparse-UPGMA Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31 Can’t cope with huge datasets, where an O ( E ) memory requirement is intolerable (e.g. Table 1). UPGMA (mean): New eq: Time and memory improvement:
  • 11.
  • 12.
  • 13.
  • 14. Methodology: 2) Single-Round MC-UPGMA Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31 Requires O(n) memory for holding forming tree!
  • 15. Methodology: 2) Single-Round MC-UPGMA
  • 16.
  • 17.
  • 18.
  • 19. Results Smith–Waterman BLAST Sparse UPGMA With reduced dataset 220K 1.80M
  • 20. Results 200 clustering rounds on a single 4GB memory 4-CPU workstation took about 1-2 days.
  • 22.
  • 23. Azhar A Shah Efficient algorithms for accurate hierarchical clustering of huge datasets: tackling the entire protein space /31
  • 25. View Proteins of Cluster
  • 28. similarity matrix for the proteins in this cluster
  • 29.  
  • 30.  
  • 31.  
  • 32.