SlideShare uma empresa Scribd logo
1 de 5
Baixar para ler offline
PCA Understanding Document
Theory :
Let the data points be the following on which PCA will be applied.
X Y
2.5 2.4
0.5 0.7
2.2 2.9
1.9 2.2
3.1 3.0
2.3 2.7
2 1.6
1 1.1
1.5 1.6
1.1 0.9
Subtract the mean from the dataset from each of the individual axes.The modified dataset is :
X Y
.69 .49
-1.31 -1.21
.39 .99
.09 .29
1.29 1.09
.49 79
.19 -.31
-.81 -.81
-.31 -.31
-.71 -1.01
Calculate the covariance matrix :
ccv X Y
X 0.61655556 0.615444444
Y 0.615444444 0.71655556
Calculate the eigenvalues and the eigenvectors.
eigenvalues
0.490833989
1.28402771
eigenvector 1 eigenvector 2
-0.735178656 -0.677873399
0.677873399 -0.735178656
The eigenvector with the highest eigenvalue is the principal component of the data set.
Once eigenvectors are found from the covariance matrix, the next step is to order them by
eigenvalue, highest to lowest. This gives you the components in order of significance. Now, if you
like, you can decide to ignore the components of lesser significance. You do lose some information,
but if the eigenvalues are small, you don’t lose much. If you leave out some components, the final
data set will have less dimensions than the original. To be precise, if you originally have dimensions
in your data, and so you calculate eigenvectors and eigenvalues, and then you choose only the first
eigenvectors, then the final data set has only dimensions.
FeatureVector = [eig1 eig2 eig3.....]
eigenvector 1 eigenvector 2
-0.677873399 -0.735178656
0.735178656 -0.677873399
We can choose to leave out the smaller, less significant component and only have a single column:
eigenvector 1
-0.677873399
0.735178656
FinalData = RowFeatureVector * RowDataAdjust
where RowFeatureVector is the matrix with the eigenvectors in the columns transposed so that the
eigenvectors are now in the rows, with the most significant eigenvector at the top and
RowDataAdjust is the mean-adjusted data transposed ie. the data items are in each column, with each
row holding a separate dimension.
X Y
-.827970186 -.175115307
1.77758033 .142857227
-.992197494 .384374989
-.274210416 .130417207
-1.67580142 -.209498461
-.912949103 .175282444
.0991094375 -.349824698
1.14457216 .0464172582
.438046137 .0177646297
1.22382056 -.162675287
Transformed Data (Single eigenvector)
X
-.827970186
1.77758033
-.992197494
-.274210416
-1.67580142
-.912949103
.0991094375
1.14457216
.438046137
1.22382056
Eg .
We have n features
FinalData = SampleData(1*n matrix) * Eigen Vector (n*1 matrix)
= 1*1 Matrix i.e( 1st eigen vector of n eigen vectors which are in descending order according to
its eigen values is used to get 1st value for features after PCA execution.)
To get the Final Data :
FinalData = RowFeatureVector * RowDataAdjust
Getting back old Data :
RowDataAdjust = RowFeatureVector^(-1) * Final Data
Java Library :
1 . java-statistical-analysis-tool: (JSAT)
https://code.google.com/p/java-statistical-analysis-
tool/source/browse/trunk/JSAT/src/jsat/datatransform/PCA.java?spec=svn414&r=414
 License : GNU GPL v3
2. efficient-java-matrix-library: (EJML)
https://code.google.com/p/efficient-java-matrix-library/wiki/PrincipleComponentAnalysisExample
 License : GNU Lesser GPL
3. Michael Thomas Flanagan's Java Scientific Library
http://www.ee.ucl.ac.uk/~mflanaga/java/PCA.html
License : This library is no longer publicly available
Here we can commercially use efficient-java-matrix-library: (EJML) :
Explaining EJML :
Here is the code which you can use after adding ejml jar to the classpath :
https://code.google.com/p/efficient-java-matrix-library/wiki/PrincipleComponentAnalysisExample
We can write a test component class for this class.
Process:
1. First we have to provide all the data sample by
pca.addSample(sample);
2. Then we have to call : pca.computeBasis(n);
It actually is the main component , here n is the number to which we want our feature to reduce to.
3. Now we can use eigen vectors created to actually get values using function : sampleToEigenSpace(
double[] sampleData )
Points to Note :
PCA will not be pretty useful with the data having 0's and 1's as the data having this feature can be
easily converted to sparse matrix format which will automatically reduces your memory req.
The PCA o/p will never be useful to convert it into sparse matrix format as it will not contain 0's .
So its better not to use PCA if we have data having 0's and 1's.
(We didn't got any java library to give sparse matrix as input format to PCA)
Links :
http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf

Mais conteúdo relacionado

Mais procurados

Introduction to Principle Component Analysis
Introduction to Principle Component AnalysisIntroduction to Principle Component Analysis
Introduction to Principle Component AnalysisSunjeet Jena
 
PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)Learnbay Datascience
 
Pca(principal components analysis)
Pca(principal components analysis)Pca(principal components analysis)
Pca(principal components analysis)kalung0313
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesAbhishekKumar4995
 
Principal Component Analysis and Cluster Analysis
Principal Component Analysis and Cluster AnalysisPrincipal Component Analysis and Cluster Analysis
Principal Component Analysis and Cluster AnalysisMuhammed Ameer
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchEshanAgarwal4
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorAmir Al-Ansary
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionDrZahid Khan
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionsaba khan
 
Statistical Methods
Statistical MethodsStatistical Methods
Statistical Methodsguest9fa52
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear RegressionAndrew Ferlitsch
 
Chapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares RegressionChapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares Regressionnszakir
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With REdureka!
 
PCA (Principal Component Analysis)
PCA (Principal Component Analysis)PCA (Principal Component Analysis)
PCA (Principal Component Analysis)Jungho Park
 

Mais procurados (20)

Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Introduction to Principle Component Analysis
Introduction to Principle Component AnalysisIntroduction to Principle Component Analysis
Introduction to Principle Component Analysis
 
PCA (Principal component analysis)
PCA (Principal component analysis)PCA (Principal component analysis)
PCA (Principal component analysis)
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Pca(principal components analysis)
Pca(principal components analysis)Pca(principal components analysis)
Pca(principal components analysis)
 
Principal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT SlidesPrincipal Component Analysis (PCA) and LDA PPT Slides
Principal Component Analysis (PCA) and LDA PPT Slides
 
Principal Component Analysis and Cluster Analysis
Principal Component Analysis and Cluster AnalysisPrincipal Component Analysis and Cluster Analysis
Principal Component Analysis and Cluster Analysis
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratch
 
Linear regression
Linear regression Linear regression
Linear regression
 
Introduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood EstimatorIntroduction to Maximum Likelihood Estimator
Introduction to Maximum Likelihood Estimator
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
AR model
AR modelAR model
AR model
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Statistical Methods
Statistical MethodsStatistical Methods
Statistical Methods
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Chapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares RegressionChapter 2 part3-Least-Squares Regression
Chapter 2 part3-Least-Squares Regression
 
Linear Regression With R
Linear Regression With RLinear Regression With R
Linear Regression With R
 
Multiple linear regression
Multiple linear regressionMultiple linear regression
Multiple linear regression
 
PCA (Principal Component Analysis)
PCA (Principal Component Analysis)PCA (Principal Component Analysis)
PCA (Principal Component Analysis)
 

Destaque

MVFI Meeting (January 14th, 2011)
MVFI Meeting (January 14th, 2011)MVFI Meeting (January 14th, 2011)
MVFI Meeting (January 14th, 2011)ivangomezconde
 
Text Analytics Online Knowledge Base / Database
Text Analytics Online Knowledge Base / DatabaseText Analytics Online Knowledge Base / Database
Text Analytics Online Knowledge Base / DatabaseNaveen Kumar
 
Characterization of a dielectric barrier discharge (DBD) for waste gas treatment
Characterization of a dielectric barrier discharge (DBD) for waste gas treatmentCharacterization of a dielectric barrier discharge (DBD) for waste gas treatment
Characterization of a dielectric barrier discharge (DBD) for waste gas treatmentDevansh Sharma
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis Naveen Kumar
 
Understanding Mahout classification documentation
Understanding Mahout  classification documentationUnderstanding Mahout  classification documentation
Understanding Mahout classification documentationNaveen Kumar
 
Regularized Principal Component Analysis for Spatial Data
Regularized Principal Component Analysis for Spatial DataRegularized Principal Component Analysis for Spatial Data
Regularized Principal Component Analysis for Spatial DataWen-Ting Wang
 
Steps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS softwareSteps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS softwareSwetha A
 
Application of Principal Components Analysis in Quality Control Problem
Application of Principal Components Analysisin Quality Control ProblemApplication of Principal Components Analysisin Quality Control Problem
Application of Principal Components Analysis in Quality Control ProblemMaxwellWiesler
 

Destaque (9)

MVFI Meeting (January 14th, 2011)
MVFI Meeting (January 14th, 2011)MVFI Meeting (January 14th, 2011)
MVFI Meeting (January 14th, 2011)
 
Text Analytics Online Knowledge Base / Database
Text Analytics Online Knowledge Base / DatabaseText Analytics Online Knowledge Base / Database
Text Analytics Online Knowledge Base / Database
 
Characterization of a dielectric barrier discharge (DBD) for waste gas treatment
Characterization of a dielectric barrier discharge (DBD) for waste gas treatmentCharacterization of a dielectric barrier discharge (DBD) for waste gas treatment
Characterization of a dielectric barrier discharge (DBD) for waste gas treatment
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
 
Understanding Mahout classification documentation
Understanding Mahout  classification documentationUnderstanding Mahout  classification documentation
Understanding Mahout classification documentation
 
Regularized Principal Component Analysis for Spatial Data
Regularized Principal Component Analysis for Spatial DataRegularized Principal Component Analysis for Spatial Data
Regularized Principal Component Analysis for Spatial Data
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
Steps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS softwareSteps for Principal Component Analysis (pca) using ERDAS software
Steps for Principal Component Analysis (pca) using ERDAS software
 
Application of Principal Components Analysis in Quality Control Problem
Application of Principal Components Analysisin Quality Control ProblemApplication of Principal Components Analysisin Quality Control Problem
Application of Principal Components Analysis in Quality Control Problem
 

Semelhante a Principal Component Analysis(PCA) understanding document

Octave - Prototyping Machine Learning Algorithms
Octave - Prototyping Machine Learning AlgorithmsOctave - Prototyping Machine Learning Algorithms
Octave - Prototyping Machine Learning AlgorithmsCraig Trim
 
maxbox starter60 machine learning
maxbox starter60 machine learningmaxbox starter60 machine learning
maxbox starter60 machine learningMax Kleiner
 
A practical Introduction to Machine(s) Learning
A practical Introduction to Machine(s) LearningA practical Introduction to Machine(s) Learning
A practical Introduction to Machine(s) LearningBruno Gonçalves
 
Lines and planes in space
Lines and planes in spaceLines and planes in space
Lines and planes in spaceFaizan Shabbir
 
Enhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmEnhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmHadi Fadlallah
 
Mathematica tutorial 3
Mathematica tutorial 3Mathematica tutorial 3
Mathematica tutorial 3coolsumayya
 
Soham Patra_13000120121.pdf
Soham Patra_13000120121.pdfSoham Patra_13000120121.pdf
Soham Patra_13000120121.pdfPritamDutta66
 
APPLIED MACHINE LEARNING
APPLIED MACHINE LEARNINGAPPLIED MACHINE LEARNING
APPLIED MACHINE LEARNINGRevanth Kumar
 
curve fitting or regression analysis-1.pptx
curve fitting or regression analysis-1.pptxcurve fitting or regression analysis-1.pptx
curve fitting or regression analysis-1.pptxabelmeketa
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningVahid Mirjalili
 
MATLAB/SIMULINK for Engineering Applications day 2:Introduction to simulink
MATLAB/SIMULINK for Engineering Applications day 2:Introduction to simulinkMATLAB/SIMULINK for Engineering Applications day 2:Introduction to simulink
MATLAB/SIMULINK for Engineering Applications day 2:Introduction to simulinkreddyprasad reddyvari
 
Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlabBilawalBaloch1
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component AnalysisMason Ziemer
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Predictionsriram30691
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
 

Semelhante a Principal Component Analysis(PCA) understanding document (20)

Octave - Prototyping Machine Learning Algorithms
Octave - Prototyping Machine Learning AlgorithmsOctave - Prototyping Machine Learning Algorithms
Octave - Prototyping Machine Learning Algorithms
 
maxbox starter60 machine learning
maxbox starter60 machine learningmaxbox starter60 machine learning
maxbox starter60 machine learning
 
A practical Introduction to Machine(s) Learning
A practical Introduction to Machine(s) LearningA practical Introduction to Machine(s) Learning
A practical Introduction to Machine(s) Learning
 
Lines and planes in space
Lines and planes in spaceLines and planes in space
Lines and planes in space
 
Enhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithmEnhancing the performance of kmeans algorithm
Enhancing the performance of kmeans algorithm
 
Mathematica tutorial 3
Mathematica tutorial 3Mathematica tutorial 3
Mathematica tutorial 3
 
Soham Patra_13000120121.pdf
Soham Patra_13000120121.pdfSoham Patra_13000120121.pdf
Soham Patra_13000120121.pdf
 
APPLIED MACHINE LEARNING
APPLIED MACHINE LEARNINGAPPLIED MACHINE LEARNING
APPLIED MACHINE LEARNING
 
Matlab introduction
Matlab introductionMatlab introduction
Matlab introduction
 
curve fitting or regression analysis-1.pptx
curve fitting or regression analysis-1.pptxcurve fitting or regression analysis-1.pptx
curve fitting or regression analysis-1.pptx
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
 
Xgboost
XgboostXgboost
Xgboost
 
MATLAB/SIMULINK for Engineering Applications day 2:Introduction to simulink
MATLAB/SIMULINK for Engineering Applications day 2:Introduction to simulinkMATLAB/SIMULINK for Engineering Applications day 2:Introduction to simulink
MATLAB/SIMULINK for Engineering Applications day 2:Introduction to simulink
 
Introduction to matlab
Introduction to matlabIntroduction to matlab
Introduction to matlab
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
House Sale Price Prediction
House Sale Price PredictionHouse Sale Price Prediction
House Sale Price Prediction
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
Basic concepts in_matlab
Basic concepts in_matlabBasic concepts in_matlab
Basic concepts in_matlab
 
Unit3_1.pptx
Unit3_1.pptxUnit3_1.pptx
Unit3_1.pptx
 
pca.ppt
pca.pptpca.ppt
pca.ppt
 

Último

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 

Último (20)

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 

Principal Component Analysis(PCA) understanding document

  • 1. PCA Understanding Document Theory : Let the data points be the following on which PCA will be applied. X Y 2.5 2.4 0.5 0.7 2.2 2.9 1.9 2.2 3.1 3.0 2.3 2.7 2 1.6 1 1.1 1.5 1.6 1.1 0.9 Subtract the mean from the dataset from each of the individual axes.The modified dataset is : X Y .69 .49 -1.31 -1.21 .39 .99 .09 .29 1.29 1.09 .49 79 .19 -.31 -.81 -.81 -.31 -.31 -.71 -1.01
  • 2. Calculate the covariance matrix : ccv X Y X 0.61655556 0.615444444 Y 0.615444444 0.71655556 Calculate the eigenvalues and the eigenvectors. eigenvalues 0.490833989 1.28402771 eigenvector 1 eigenvector 2 -0.735178656 -0.677873399 0.677873399 -0.735178656 The eigenvector with the highest eigenvalue is the principal component of the data set. Once eigenvectors are found from the covariance matrix, the next step is to order them by eigenvalue, highest to lowest. This gives you the components in order of significance. Now, if you like, you can decide to ignore the components of lesser significance. You do lose some information, but if the eigenvalues are small, you don’t lose much. If you leave out some components, the final data set will have less dimensions than the original. To be precise, if you originally have dimensions in your data, and so you calculate eigenvectors and eigenvalues, and then you choose only the first eigenvectors, then the final data set has only dimensions. FeatureVector = [eig1 eig2 eig3.....] eigenvector 1 eigenvector 2 -0.677873399 -0.735178656 0.735178656 -0.677873399 We can choose to leave out the smaller, less significant component and only have a single column: eigenvector 1 -0.677873399
  • 3. 0.735178656 FinalData = RowFeatureVector * RowDataAdjust where RowFeatureVector is the matrix with the eigenvectors in the columns transposed so that the eigenvectors are now in the rows, with the most significant eigenvector at the top and RowDataAdjust is the mean-adjusted data transposed ie. the data items are in each column, with each row holding a separate dimension. X Y -.827970186 -.175115307 1.77758033 .142857227 -.992197494 .384374989 -.274210416 .130417207 -1.67580142 -.209498461 -.912949103 .175282444 .0991094375 -.349824698 1.14457216 .0464172582 .438046137 .0177646297 1.22382056 -.162675287 Transformed Data (Single eigenvector) X -.827970186 1.77758033 -.992197494 -.274210416 -1.67580142 -.912949103 .0991094375 1.14457216
  • 4. .438046137 1.22382056 Eg . We have n features FinalData = SampleData(1*n matrix) * Eigen Vector (n*1 matrix) = 1*1 Matrix i.e( 1st eigen vector of n eigen vectors which are in descending order according to its eigen values is used to get 1st value for features after PCA execution.) To get the Final Data : FinalData = RowFeatureVector * RowDataAdjust Getting back old Data : RowDataAdjust = RowFeatureVector^(-1) * Final Data Java Library : 1 . java-statistical-analysis-tool: (JSAT) https://code.google.com/p/java-statistical-analysis- tool/source/browse/trunk/JSAT/src/jsat/datatransform/PCA.java?spec=svn414&r=414  License : GNU GPL v3 2. efficient-java-matrix-library: (EJML) https://code.google.com/p/efficient-java-matrix-library/wiki/PrincipleComponentAnalysisExample  License : GNU Lesser GPL 3. Michael Thomas Flanagan's Java Scientific Library http://www.ee.ucl.ac.uk/~mflanaga/java/PCA.html License : This library is no longer publicly available
  • 5. Here we can commercially use efficient-java-matrix-library: (EJML) : Explaining EJML : Here is the code which you can use after adding ejml jar to the classpath : https://code.google.com/p/efficient-java-matrix-library/wiki/PrincipleComponentAnalysisExample We can write a test component class for this class. Process: 1. First we have to provide all the data sample by pca.addSample(sample); 2. Then we have to call : pca.computeBasis(n); It actually is the main component , here n is the number to which we want our feature to reduce to. 3. Now we can use eigen vectors created to actually get values using function : sampleToEigenSpace( double[] sampleData ) Points to Note : PCA will not be pretty useful with the data having 0's and 1's as the data having this feature can be easily converted to sparse matrix format which will automatically reduces your memory req. The PCA o/p will never be useful to convert it into sparse matrix format as it will not contain 0's . So its better not to use PCA if we have data having 0's and 1's. (We didn't got any java library to give sparse matrix as input format to PCA) Links : http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.pdf