SlideShare uma empresa Scribd logo
1 de 16
Baixar para ler offline
Automated Clustering Project
MiklosVasarhelyi, Paul Byrnes, andYunsenWang
Presented by DenizAppelbaum
Motivation
 Motivation entails the development of a program that automatically performs
clustering and outlier detection for a wide variety of numerically represented data.
Outline of program features
 Normalizes all data to be clustered
 Creates normalized principal components from the normalized data
 Automatically selects the necessary normalized principal components for use in actual
clustering and outlier detection
 Compares a variety of algorithms based upon the selected set of normalized principal
components
 Adopts the top performing model based upon silhouette coefficient values to perform
the final clustering and outlier detection procedures
 Produces relevant information and outputs throughout the process
Data normalization
 Data normalization
 Converts each numerically represented dimension to be clustered into the range [0,1].
 A desirable procedure for preparing numeric attributes for clustering
Principal component analysis
 Principal component analysis (PCA) is a statistical procedure that uses an
orthogonal transformation to convert a set of observations of possibly correlated
variables into a set of values of linearly uncorrelated variables called principal
components.
 In this way, PCA can both reduce dimensionality as well as eliminate inherent
problems associated with clustering data whose attributes are correlated
 In the following slides, a random sample of 5,000 credit card customers is used to
demonstrate the automated clustering and outlier detection program
Principal component analysis
 PCA initially results in four principal
components being generated from
the original data
 Using a cumulative data variability
threshold of 80% (default
specification), three principal
components are automatically
selected for analysis – they explain
the vast majority of data variability
Principal component analysis
 Scatter plot of PC1 and PC2
 In this view, the top 2 principal
components are plotted for each object in
two-dimensional space.
 As can be seen, a small subset of records
appear significantly more distant/different
from the vast majority of objects.
Clustering exploration/simulation process - examples
 Ward method
 Ward suggested a general agglomerative hierarchical clustering procedure, where the criterion for
choosing the pair of clusters to merge at each step is based on the optimal value of an objective function.
 Complete link method
 This method is also known as farthest neighbor clustering.The result of the clustering can be visualized
as a dendrogram, which shows the sequence of cluster fusion and the distance at which each fusion took
place.
 PAM (partitioning around medoids)
 The k-medoids algorithm is a clustering method related to the k-means algorithm and the medoids shift
algorithm; It is considered more stable than k-means, because it uses the median rather than mean
 K-means
 k-means clustering aims to partition n observations into k clusters in which each observation belongs to
the cluster with the nearest mean, serving as a prototype of the cluster.
Clustering exploration results
 The result shown below is based upon a simulation exercise, whereby all four
algorithms are automatically compared on the data set (i.e., a random sample of 5,000
records from the credit card customer data). In this particular case, the best model is
found to be a two-cluster solution using the complete link hierarchical method. This is
the final model and is used for subsequent clustering and outlier detection.
 Best clustering result:
 The silhouette value can theoretically range from -1 to +1, with higher values indicative
of better cluster quality in terms of both cohesion and separation.
Best Method Number Of Clusters SilhouetteValue
complete link hierarchical 2 0.753754205720575
Complete-link Hierarchical clustering (1/2)
 The 5,000 instances are on the
x-axis. In moving vertically from
the x-axis, one can begin to see
how the actual clusters are
formed.
Plot of PCs with cluster assignment labels (1/3)
 In this view, the top two principal
components (i.e., PC1 and PC2) are
plotted for each object in two-
dimensional space.
 In the graph, there are two clusters, one
dark blue and the other light blue.
 The small subset of three records appears
substantially more different from the
majority of objects.
Plot of PCs with cluster assignment labels (2/3)
 In this view, PC1 and PC3 are plotted for
each object in two-dimensional space.
 In the graph, the two clusters are again
shown.
 It is once again evident that the small
subset of three records appears more
different from the majority of other
objects.
Plot of PCs with cluster assignment labels (3/3)
 In this view, PC2 and PC3 are
plotted for each object in two-
dimensional space.
 Cluster differences appear less
prominent from this perspective.
Principal components 3D scatterplot
 Cluster one represents the majority
class (black) while cluster two
represents the rare class (red).
 In this view, one can clearly see the
subset of three records (in red)
appearing more isolated from the other
objects.
Cluster 1 outlier plot
 In this view, an arbitrary cutoff is
inserted at the 99.9th percentile (red
horizontal line) so as to provide for
efficient identification of very irregular
records.
 Objects further from the x-axis are
more questionable.
 While all objects distant from the x-
axis might be worth investigating,
points above the cutoff should be
viewed as particularly suspicious.
Conclusion of Process
 At the conclusion of outlier detection, an output file for each cluster containing the unique
record identifier, original variables, normalized variables, principal components, normalized
principal components, cluster assignments, and mahalanobis distance information can be
exported to facilitate further analyses and investigations.
 Cluster 2 – final output file of a subset of fields:
 Distinguishing features of cluster 2 records: 1) New accounts (age = 1 month), 2)
Very high incidence of late payments, and 3) Relatively high credit limits,
particularly given the account age and late payment issues.
Record AccountAge CreditLimit AdditionalAssets LatePayments model.cluster md
32430 1 2500 1 3 2 5.83E-05
65470 1 8500 1 4 2 0.002371778
78772 1 2200 0 3 2 0.000442305

Mais conteúdo relacionado

Mais procurados

DATA MINING:Clustering Types
DATA MINING:Clustering TypesDATA MINING:Clustering Types
DATA MINING:Clustering TypesAshwin Shenoy M
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methodsKrish_ver2
 
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...Zac Darcy
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingIRJET Journal
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component AnalysisMason Ziemer
 
presentation 2019 04_09_rev1
presentation 2019 04_09_rev1presentation 2019 04_09_rev1
presentation 2019 04_09_rev1Hyun Wong Choi
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithmLaura Petrosanu
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmEditor IJCATR
 
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONSSVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONSijscmcj
 
Pillar k means
Pillar k meansPillar k means
Pillar k meansswathi b
 
IRJET- Different Data Mining Techniques for Weather Prediction
IRJET-  	  Different Data Mining Techniques for Weather PredictionIRJET-  	  Different Data Mining Techniques for Weather Prediction
IRJET- Different Data Mining Techniques for Weather PredictionIRJET Journal
 
Application of stochastic modelling in bioinformatics
Application of stochastic modelling in bioinformaticsApplication of stochastic modelling in bioinformatics
Application of stochastic modelling in bioinformaticsSpyros Ktenas
 
Canopy clustering algorithm
Canopy clustering algorithmCanopy clustering algorithm
Canopy clustering algorithmAshish Karki
 
3.6 constraint based cluster analysis
3.6 constraint based cluster analysis3.6 constraint based cluster analysis
3.6 constraint based cluster analysisKrish_ver2
 
Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...IJECEIAES
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduceVarad Meru
 

Mais procurados (20)

DATA MINING:Clustering Types
DATA MINING:Clustering TypesDATA MINING:Clustering Types
DATA MINING:Clustering Types
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
 
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...
EVALUATING SYMMETRIC INFORMATION GAP BETWEEN DYNAMICAL SYSTEMS USING PARTICLE...
 
Parallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive IndexingParallel KNN for Big Data using Adaptive Indexing
Parallel KNN for Big Data using Adaptive Indexing
 
Birch
BirchBirch
Birch
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
presentation 2019 04_09_rev1
presentation 2019 04_09_rev1presentation 2019 04_09_rev1
presentation 2019 04_09_rev1
 
8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm8.clustering algorithm.k means.em algorithm
8.clustering algorithm.k means.em algorithm
 
A046010107
A046010107A046010107
A046010107
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONSSVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONS
 
Pillar k means
Pillar k meansPillar k means
Pillar k means
 
IRJET- Different Data Mining Techniques for Weather Prediction
IRJET-  	  Different Data Mining Techniques for Weather PredictionIRJET-  	  Different Data Mining Techniques for Weather Prediction
IRJET- Different Data Mining Techniques for Weather Prediction
 
Application of stochastic modelling in bioinformatics
Application of stochastic modelling in bioinformaticsApplication of stochastic modelling in bioinformatics
Application of stochastic modelling in bioinformatics
 
Canopy clustering algorithm
Canopy clustering algorithmCanopy clustering algorithm
Canopy clustering algorithm
 
3.6 constraint based cluster analysis
3.6 constraint based cluster analysis3.6 constraint based cluster analysis
3.6 constraint based cluster analysis
 
Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...Clustering using kernel entropy principal component analysis and variable ker...
Clustering using kernel entropy principal component analysis and variable ker...
 
Clustering
ClusteringClustering
Clustering
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 

Destaque

Not Only Statements: The Role of Textual Analysis in Software Quality
Not Only Statements: The Role of Textual Analysis in Software QualityNot Only Statements: The Role of Textual Analysis in Software Quality
Not Only Statements: The Role of Textual Analysis in Software QualityRocco Oliveto
 
Consulting whitepaper enterprise-architecture-transformation-pharmaceutical-c...
Consulting whitepaper enterprise-architecture-transformation-pharmaceutical-c...Consulting whitepaper enterprise-architecture-transformation-pharmaceutical-c...
Consulting whitepaper enterprise-architecture-transformation-pharmaceutical-c...asd123456789123
 
A2DataDive workshop: Introduction to R
A2DataDive workshop: Introduction to RA2DataDive workshop: Introduction to R
A2DataDive workshop: Introduction to ROpen.Michigan
 
Preliminary Study of Engineering Self
Preliminary Study of Engineering SelfPreliminary Study of Engineering Self
Preliminary Study of Engineering SelfDan Tetrick
 
Selected ion flow tube MS - Online quantitative VOC analysis
Selected ion flow tube MS - Online quantitative VOC analysisSelected ion flow tube MS - Online quantitative VOC analysis
Selected ion flow tube MS - Online quantitative VOC analysisIS-X
 

Destaque (6)

Not Only Statements: The Role of Textual Analysis in Software Quality
Not Only Statements: The Role of Textual Analysis in Software QualityNot Only Statements: The Role of Textual Analysis in Software Quality
Not Only Statements: The Role of Textual Analysis in Software Quality
 
Consulting whitepaper enterprise-architecture-transformation-pharmaceutical-c...
Consulting whitepaper enterprise-architecture-transformation-pharmaceutical-c...Consulting whitepaper enterprise-architecture-transformation-pharmaceutical-c...
Consulting whitepaper enterprise-architecture-transformation-pharmaceutical-c...
 
A2DataDive workshop: Introduction to R
A2DataDive workshop: Introduction to RA2DataDive workshop: Introduction to R
A2DataDive workshop: Introduction to R
 
Preliminary Study of Engineering Self
Preliminary Study of Engineering SelfPreliminary Study of Engineering Self
Preliminary Study of Engineering Self
 
Kent ro systems
Kent ro systemsKent ro systems
Kent ro systems
 
Selected ion flow tube MS - Online quantitative VOC analysis
Selected ion flow tube MS - Online quantitative VOC analysisSelected ion flow tube MS - Online quantitative VOC analysis
Selected ion flow tube MS - Online quantitative VOC analysis
 

Semelhante a Automated Clustering and Outlier Detection Project

An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsIJMER
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
 
Mine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means ClusteringMine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means Clusteringijcsity
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptxGandhiMathy6
 
An improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracyAn improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracyijpla
 
Data Mining: Cluster Analysis
Data Mining: Cluster AnalysisData Mining: Cluster Analysis
Data Mining: Cluster AnalysisSuman Mia
 
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Seval Çapraz
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...CSCJournals
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...Waqas Tariq
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfSowmyaJyothi3
 
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...Happiest Minds Technologies
 
CONVOLUTIONAL NEURAL NETWORK BASED RETINAL VESSEL SEGMENTATION
CONVOLUTIONAL NEURAL NETWORK BASED RETINAL VESSEL SEGMENTATIONCONVOLUTIONAL NEURAL NETWORK BASED RETINAL VESSEL SEGMENTATION
CONVOLUTIONAL NEURAL NETWORK BASED RETINAL VESSEL SEGMENTATIONCSEIJJournal
 
Convolutional Neural Network based Retinal Vessel Segmentation
Convolutional Neural Network based Retinal Vessel SegmentationConvolutional Neural Network based Retinal Vessel Segmentation
Convolutional Neural Network based Retinal Vessel SegmentationCSEIJJournal
 
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET Journal
 
Performance Analysis of Different Clustering Algorithm
Performance Analysis of Different Clustering AlgorithmPerformance Analysis of Different Clustering Algorithm
Performance Analysis of Different Clustering AlgorithmIOSR Journals
 
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...ijcsit
 

Semelhante a Automated Clustering and Outlier Detection Project (20)

An Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data FragmentsAn Efficient Clustering Method for Aggregation on Data Fragments
An Efficient Clustering Method for Aggregation on Data Fragments
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 
Mine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means ClusteringMine Blood Donors Information through Improved K-Means Clustering
Mine Blood Donors Information through Improved K-Means Clustering
 
Experimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithmsExperimental study of Data clustering using k- Means and modified algorithms
Experimental study of Data clustering using k- Means and modified algorithms
 
Comparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data AnalysisComparison Between Clustering Algorithms for Microarray Data Analysis
Comparison Between Clustering Algorithms for Microarray Data Analysis
 
Unsupervised Learning.pptx
Unsupervised Learning.pptxUnsupervised Learning.pptx
Unsupervised Learning.pptx
 
An improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracyAn improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracy
 
Data Mining: Cluster Analysis
Data Mining: Cluster AnalysisData Mining: Cluster Analysis
Data Mining: Cluster Analysis
 
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
Statistical Data Analysis on a Data Set (Diabetes 130-US hospitals for years ...
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
 
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
The Positive Effects of Fuzzy C-Means Clustering on Supervised Learning Class...
 
CLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdfCLUSTERING IN DATA MINING.pdf
CLUSTERING IN DATA MINING.pdf
 
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
An Approach to Mixed Dataset Clustering and Validation with ART-2 Artificial ...
 
CONVOLUTIONAL NEURAL NETWORK BASED RETINAL VESSEL SEGMENTATION
CONVOLUTIONAL NEURAL NETWORK BASED RETINAL VESSEL SEGMENTATIONCONVOLUTIONAL NEURAL NETWORK BASED RETINAL VESSEL SEGMENTATION
CONVOLUTIONAL NEURAL NETWORK BASED RETINAL VESSEL SEGMENTATION
 
Convolutional Neural Network based Retinal Vessel Segmentation
Convolutional Neural Network based Retinal Vessel SegmentationConvolutional Neural Network based Retinal Vessel Segmentation
Convolutional Neural Network based Retinal Vessel Segmentation
 
IRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET- Customer Segmentation from Massive Customer Transaction Data
IRJET- Customer Segmentation from Massive Customer Transaction Data
 
Performance Analysis of Different Clustering Algorithm
Performance Analysis of Different Clustering AlgorithmPerformance Analysis of Different Clustering Algorithm
Performance Analysis of Different Clustering Algorithm
 
F017132529
F017132529F017132529
F017132529
 
Az36311316
Az36311316Az36311316
Az36311316
 
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...Q UANTUM  C LUSTERING -B ASED  F EATURE SUBSET  S ELECTION FOR MAMMOGRAPHIC I...
Q UANTUM C LUSTERING -B ASED F EATURE SUBSET S ELECTION FOR MAMMOGRAPHIC I...
 

Mais de TECSI FEA USP

12th CONTECSI USP - Guia para publicar Andre Jun Emerald
12th CONTECSI USP - Guia para publicar  Andre Jun Emerald12th CONTECSI USP - Guia para publicar  Andre Jun Emerald
12th CONTECSI USP - Guia para publicar Andre Jun EmeraldTECSI FEA USP
 
12 contecsi IT Management GAESI USP Rastreabilidade de Medicamentos - Elcio...
12 contecsi  IT Management GAESI USP  Rastreabilidade de Medicamentos - Elcio...12 contecsi  IT Management GAESI USP  Rastreabilidade de Medicamentos - Elcio...
12 contecsi IT Management GAESI USP Rastreabilidade de Medicamentos - Elcio...TECSI FEA USP
 
12 contecsi Workshop Mendeley Ligia Capobianco
12 contecsi   Workshop Mendeley Ligia Capobianco12 contecsi   Workshop Mendeley Ligia Capobianco
12 contecsi Workshop Mendeley Ligia CapobiancoTECSI FEA USP
 
Planejamento e gestão de indicadores para projetos digitais - Workshop 12th C...
Planejamento e gestão de indicadores para projetos digitais - Workshop 12th C...Planejamento e gestão de indicadores para projetos digitais - Workshop 12th C...
Planejamento e gestão de indicadores para projetos digitais - Workshop 12th C...TECSI FEA USP
 
Programa de Apoio às Publicações Científicas Periódicas da USP - 12th CONTECSI
Programa de Apoio às Publicações Científicas Periódicas da USP - 12th CONTECSI Programa de Apoio às Publicações Científicas Periódicas da USP - 12th CONTECSI
Programa de Apoio às Publicações Científicas Periódicas da USP - 12th CONTECSI TECSI FEA USP
 
Centro Integrado de Mobilidade Urbana CIMU - 12th CONTECSI
 Centro Integrado de Mobilidade Urbana CIMU - 12th CONTECSI   Centro Integrado de Mobilidade Urbana CIMU - 12th CONTECSI
Centro Integrado de Mobilidade Urbana CIMU - 12th CONTECSI TECSI FEA USP
 
The ladder of Citizens ‘smartness’ Citizens participation in smart cities - 1...
The ladder of Citizens ‘smartness’ Citizens participation in smart cities - 1...The ladder of Citizens ‘smartness’ Citizens participation in smart cities - 1...
The ladder of Citizens ‘smartness’ Citizens participation in smart cities - 1...TECSI FEA USP
 
Papel de los vocabularios semánticos en la economía en red - 12th CONTECSI
 Papel de los vocabularios semánticos en la economía en red - 12th CONTECSI  Papel de los vocabularios semánticos en la economía en red - 12th CONTECSI
Papel de los vocabularios semánticos en la economía en red - 12th CONTECSI TECSI FEA USP
 
Balance Innovations in Backoffice Improvement and Service Delivery A study ca...
Balance Innovations in Backoffice Improvement and Service Delivery A study ca...Balance Innovations in Backoffice Improvement and Service Delivery A study ca...
Balance Innovations in Backoffice Improvement and Service Delivery A study ca...TECSI FEA USP
 
Sistema Autenticador e Transmissor (SAT): modelo tecnológico de automação e c...
Sistema Autenticador e Transmissor (SAT): modelo tecnológico de automação e c...Sistema Autenticador e Transmissor (SAT): modelo tecnológico de automação e c...
Sistema Autenticador e Transmissor (SAT): modelo tecnológico de automação e c...TECSI FEA USP
 
GAESI - Gestão em Automação e TI - 12th CONTECSI
GAESI - Gestão em Automação e TI - 12th CONTECSI GAESI - Gestão em Automação e TI - 12th CONTECSI
GAESI - Gestão em Automação e TI - 12th CONTECSI TECSI FEA USP
 
Co-production: an opportunity toward better digital governance - 12th CONTECSI
 Co-production: an opportunity toward better digital governance - 12th CONTECSI  Co-production: an opportunity toward better digital governance - 12th CONTECSI
Co-production: an opportunity toward better digital governance - 12th CONTECSI TECSI FEA USP
 
The Digital Transformation - Challenges and Opportunities for IS researchers ...
The Digital Transformation - Challenges and Opportunities for IS researchers ...The Digital Transformation - Challenges and Opportunities for IS researchers ...
The Digital Transformation - Challenges and Opportunities for IS researchers ...TECSI FEA USP
 
Auditoria Contínua e o Sistema de Controle da Administração Pública Federal -...
Auditoria Contínua e o Sistema de Controle da Administração Pública Federal -...Auditoria Contínua e o Sistema de Controle da Administração Pública Federal -...
Auditoria Contínua e o Sistema de Controle da Administração Pública Federal -...TECSI FEA USP
 
Big (huge) Data and a continuous and predictive audit: new evidence, new met...
 Big (huge) Data and a continuous and predictive audit: new evidence, new met... Big (huge) Data and a continuous and predictive audit: new evidence, new met...
Big (huge) Data and a continuous and predictive audit: new evidence, new met...TECSI FEA USP
 
Text Mining and Continuous Assurance Kevin Moffitt - 12th CONTECSI 34th WCARS
Text Mining and Continuous Assurance Kevin Moffitt - 12th CONTECSI 34th WCARSText Mining and Continuous Assurance Kevin Moffitt - 12th CONTECSI 34th WCARS
Text Mining and Continuous Assurance Kevin Moffitt - 12th CONTECSI 34th WCARSTECSI FEA USP
 
Federal Audit in Relation to Continuous Audit - 12th CONTECSI 34th WCARS
 Federal Audit in Relation to Continuous Audit - 12th CONTECSI 34th WCARS Federal Audit in Relation to Continuous Audit - 12th CONTECSI 34th WCARS
Federal Audit in Relation to Continuous Audit - 12th CONTECSI 34th WCARSTECSI FEA USP
 
O Tribunal de Contas da União e a Auditoria Contínua - 12th CONTECSI 34th WCARS
O Tribunal de Contas da União e a Auditoria Contínua - 12th CONTECSI 34th WCARSO Tribunal de Contas da União e a Auditoria Contínua - 12th CONTECSI 34th WCARS
O Tribunal de Contas da União e a Auditoria Contínua - 12th CONTECSI 34th WCARSTECSI FEA USP
 
Cenário atual da Auditoria Contínua em Bancos no Brasil Itaú-Unibanco Holding...
Cenário atual da Auditoria Contínua em Bancos no Brasil Itaú-Unibanco Holding...Cenário atual da Auditoria Contínua em Bancos no Brasil Itaú-Unibanco Holding...
Cenário atual da Auditoria Contínua em Bancos no Brasil Itaú-Unibanco Holding...TECSI FEA USP
 
Auditoria Eletrônica: Automatização de procedimentos de auditoria através do ...
Auditoria Eletrônica: Automatização de procedimentos de auditoria através do ...Auditoria Eletrônica: Automatização de procedimentos de auditoria através do ...
Auditoria Eletrônica: Automatização de procedimentos de auditoria através do ...TECSI FEA USP
 

Mais de TECSI FEA USP (20)

12th CONTECSI USP - Guia para publicar Andre Jun Emerald
12th CONTECSI USP - Guia para publicar  Andre Jun Emerald12th CONTECSI USP - Guia para publicar  Andre Jun Emerald
12th CONTECSI USP - Guia para publicar Andre Jun Emerald
 
12 contecsi IT Management GAESI USP Rastreabilidade de Medicamentos - Elcio...
12 contecsi  IT Management GAESI USP  Rastreabilidade de Medicamentos - Elcio...12 contecsi  IT Management GAESI USP  Rastreabilidade de Medicamentos - Elcio...
12 contecsi IT Management GAESI USP Rastreabilidade de Medicamentos - Elcio...
 
12 contecsi Workshop Mendeley Ligia Capobianco
12 contecsi   Workshop Mendeley Ligia Capobianco12 contecsi   Workshop Mendeley Ligia Capobianco
12 contecsi Workshop Mendeley Ligia Capobianco
 
Planejamento e gestão de indicadores para projetos digitais - Workshop 12th C...
Planejamento e gestão de indicadores para projetos digitais - Workshop 12th C...Planejamento e gestão de indicadores para projetos digitais - Workshop 12th C...
Planejamento e gestão de indicadores para projetos digitais - Workshop 12th C...
 
Programa de Apoio às Publicações Científicas Periódicas da USP - 12th CONTECSI
Programa de Apoio às Publicações Científicas Periódicas da USP - 12th CONTECSI Programa de Apoio às Publicações Científicas Periódicas da USP - 12th CONTECSI
Programa de Apoio às Publicações Científicas Periódicas da USP - 12th CONTECSI
 
Centro Integrado de Mobilidade Urbana CIMU - 12th CONTECSI
 Centro Integrado de Mobilidade Urbana CIMU - 12th CONTECSI   Centro Integrado de Mobilidade Urbana CIMU - 12th CONTECSI
Centro Integrado de Mobilidade Urbana CIMU - 12th CONTECSI
 
The ladder of Citizens ‘smartness’ Citizens participation in smart cities - 1...
The ladder of Citizens ‘smartness’ Citizens participation in smart cities - 1...The ladder of Citizens ‘smartness’ Citizens participation in smart cities - 1...
The ladder of Citizens ‘smartness’ Citizens participation in smart cities - 1...
 
Papel de los vocabularios semánticos en la economía en red - 12th CONTECSI
 Papel de los vocabularios semánticos en la economía en red - 12th CONTECSI  Papel de los vocabularios semánticos en la economía en red - 12th CONTECSI
Papel de los vocabularios semánticos en la economía en red - 12th CONTECSI
 
Balance Innovations in Backoffice Improvement and Service Delivery A study ca...
Balance Innovations in Backoffice Improvement and Service Delivery A study ca...Balance Innovations in Backoffice Improvement and Service Delivery A study ca...
Balance Innovations in Backoffice Improvement and Service Delivery A study ca...
 
Sistema Autenticador e Transmissor (SAT): modelo tecnológico de automação e c...
Sistema Autenticador e Transmissor (SAT): modelo tecnológico de automação e c...Sistema Autenticador e Transmissor (SAT): modelo tecnológico de automação e c...
Sistema Autenticador e Transmissor (SAT): modelo tecnológico de automação e c...
 
GAESI - Gestão em Automação e TI - 12th CONTECSI
GAESI - Gestão em Automação e TI - 12th CONTECSI GAESI - Gestão em Automação e TI - 12th CONTECSI
GAESI - Gestão em Automação e TI - 12th CONTECSI
 
Co-production: an opportunity toward better digital governance - 12th CONTECSI
 Co-production: an opportunity toward better digital governance - 12th CONTECSI  Co-production: an opportunity toward better digital governance - 12th CONTECSI
Co-production: an opportunity toward better digital governance - 12th CONTECSI
 
The Digital Transformation - Challenges and Opportunities for IS researchers ...
The Digital Transformation - Challenges and Opportunities for IS researchers ...The Digital Transformation - Challenges and Opportunities for IS researchers ...
The Digital Transformation - Challenges and Opportunities for IS researchers ...
 
Auditoria Contínua e o Sistema de Controle da Administração Pública Federal -...
Auditoria Contínua e o Sistema de Controle da Administração Pública Federal -...Auditoria Contínua e o Sistema de Controle da Administração Pública Federal -...
Auditoria Contínua e o Sistema de Controle da Administração Pública Federal -...
 
Big (huge) Data and a continuous and predictive audit: new evidence, new met...
 Big (huge) Data and a continuous and predictive audit: new evidence, new met... Big (huge) Data and a continuous and predictive audit: new evidence, new met...
Big (huge) Data and a continuous and predictive audit: new evidence, new met...
 
Text Mining and Continuous Assurance Kevin Moffitt - 12th CONTECSI 34th WCARS
Text Mining and Continuous Assurance Kevin Moffitt - 12th CONTECSI 34th WCARSText Mining and Continuous Assurance Kevin Moffitt - 12th CONTECSI 34th WCARS
Text Mining and Continuous Assurance Kevin Moffitt - 12th CONTECSI 34th WCARS
 
Federal Audit in Relation to Continuous Audit - 12th CONTECSI 34th WCARS
 Federal Audit in Relation to Continuous Audit - 12th CONTECSI 34th WCARS Federal Audit in Relation to Continuous Audit - 12th CONTECSI 34th WCARS
Federal Audit in Relation to Continuous Audit - 12th CONTECSI 34th WCARS
 
O Tribunal de Contas da União e a Auditoria Contínua - 12th CONTECSI 34th WCARS
O Tribunal de Contas da União e a Auditoria Contínua - 12th CONTECSI 34th WCARSO Tribunal de Contas da União e a Auditoria Contínua - 12th CONTECSI 34th WCARS
O Tribunal de Contas da União e a Auditoria Contínua - 12th CONTECSI 34th WCARS
 
Cenário atual da Auditoria Contínua em Bancos no Brasil Itaú-Unibanco Holding...
Cenário atual da Auditoria Contínua em Bancos no Brasil Itaú-Unibanco Holding...Cenário atual da Auditoria Contínua em Bancos no Brasil Itaú-Unibanco Holding...
Cenário atual da Auditoria Contínua em Bancos no Brasil Itaú-Unibanco Holding...
 
Auditoria Eletrônica: Automatização de procedimentos de auditoria através do ...
Auditoria Eletrônica: Automatização de procedimentos de auditoria através do ...Auditoria Eletrônica: Automatização de procedimentos de auditoria através do ...
Auditoria Eletrônica: Automatização de procedimentos de auditoria através do ...
 

Último

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Automated Clustering and Outlier Detection Project

  • 1. Automated Clustering Project MiklosVasarhelyi, Paul Byrnes, andYunsenWang Presented by DenizAppelbaum
  • 2. Motivation  Motivation entails the development of a program that automatically performs clustering and outlier detection for a wide variety of numerically represented data.
  • 3. Outline of program features  Normalizes all data to be clustered  Creates normalized principal components from the normalized data  Automatically selects the necessary normalized principal components for use in actual clustering and outlier detection  Compares a variety of algorithms based upon the selected set of normalized principal components  Adopts the top performing model based upon silhouette coefficient values to perform the final clustering and outlier detection procedures  Produces relevant information and outputs throughout the process
  • 4. Data normalization  Data normalization  Converts each numerically represented dimension to be clustered into the range [0,1].  A desirable procedure for preparing numeric attributes for clustering
  • 5. Principal component analysis  Principal component analysis (PCA) is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.  In this way, PCA can both reduce dimensionality as well as eliminate inherent problems associated with clustering data whose attributes are correlated  In the following slides, a random sample of 5,000 credit card customers is used to demonstrate the automated clustering and outlier detection program
  • 6. Principal component analysis  PCA initially results in four principal components being generated from the original data  Using a cumulative data variability threshold of 80% (default specification), three principal components are automatically selected for analysis – they explain the vast majority of data variability
  • 7. Principal component analysis  Scatter plot of PC1 and PC2  In this view, the top 2 principal components are plotted for each object in two-dimensional space.  As can be seen, a small subset of records appear significantly more distant/different from the vast majority of objects.
  • 8. Clustering exploration/simulation process - examples  Ward method  Ward suggested a general agglomerative hierarchical clustering procedure, where the criterion for choosing the pair of clusters to merge at each step is based on the optimal value of an objective function.  Complete link method  This method is also known as farthest neighbor clustering.The result of the clustering can be visualized as a dendrogram, which shows the sequence of cluster fusion and the distance at which each fusion took place.  PAM (partitioning around medoids)  The k-medoids algorithm is a clustering method related to the k-means algorithm and the medoids shift algorithm; It is considered more stable than k-means, because it uses the median rather than mean  K-means  k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster.
  • 9. Clustering exploration results  The result shown below is based upon a simulation exercise, whereby all four algorithms are automatically compared on the data set (i.e., a random sample of 5,000 records from the credit card customer data). In this particular case, the best model is found to be a two-cluster solution using the complete link hierarchical method. This is the final model and is used for subsequent clustering and outlier detection.  Best clustering result:  The silhouette value can theoretically range from -1 to +1, with higher values indicative of better cluster quality in terms of both cohesion and separation. Best Method Number Of Clusters SilhouetteValue complete link hierarchical 2 0.753754205720575
  • 10. Complete-link Hierarchical clustering (1/2)  The 5,000 instances are on the x-axis. In moving vertically from the x-axis, one can begin to see how the actual clusters are formed.
  • 11. Plot of PCs with cluster assignment labels (1/3)  In this view, the top two principal components (i.e., PC1 and PC2) are plotted for each object in two- dimensional space.  In the graph, there are two clusters, one dark blue and the other light blue.  The small subset of three records appears substantially more different from the majority of objects.
  • 12. Plot of PCs with cluster assignment labels (2/3)  In this view, PC1 and PC3 are plotted for each object in two-dimensional space.  In the graph, the two clusters are again shown.  It is once again evident that the small subset of three records appears more different from the majority of other objects.
  • 13. Plot of PCs with cluster assignment labels (3/3)  In this view, PC2 and PC3 are plotted for each object in two- dimensional space.  Cluster differences appear less prominent from this perspective.
  • 14. Principal components 3D scatterplot  Cluster one represents the majority class (black) while cluster two represents the rare class (red).  In this view, one can clearly see the subset of three records (in red) appearing more isolated from the other objects.
  • 15. Cluster 1 outlier plot  In this view, an arbitrary cutoff is inserted at the 99.9th percentile (red horizontal line) so as to provide for efficient identification of very irregular records.  Objects further from the x-axis are more questionable.  While all objects distant from the x- axis might be worth investigating, points above the cutoff should be viewed as particularly suspicious.
  • 16. Conclusion of Process  At the conclusion of outlier detection, an output file for each cluster containing the unique record identifier, original variables, normalized variables, principal components, normalized principal components, cluster assignments, and mahalanobis distance information can be exported to facilitate further analyses and investigations.  Cluster 2 – final output file of a subset of fields:  Distinguishing features of cluster 2 records: 1) New accounts (age = 1 month), 2) Very high incidence of late payments, and 3) Relatively high credit limits, particularly given the account age and late payment issues. Record AccountAge CreditLimit AdditionalAssets LatePayments model.cluster md 32430 1 2500 1 3 2 5.83E-05 65470 1 8500 1 4 2 0.002371778 78772 1 2200 0 3 2 0.000442305