SlideShare a Scribd company logo
1 of 61
DIMENSION REDUCTION
KaziToufiqWadud
kazitoufiq@gmail.com
Twitter: @KaziToufiqWadud
WHAT IS DIMENSION REDUCTION?
Process of converting data set having vast dimensionsinto data set
with lesser dimensions ensuring that it conveys similar
information concisely
DIMENSION REDUCTION :WHY ?
Curse of Dimensionality
what is the curse?
UNDERSTANDINGTHE CURSE
Consider a 3 -class pattern recognition problem.
1) Strat with 1 dimension/feature
2) Divide the feature space into uniform bins
3) Compute the ratio of examples for each class at each bin
4) For a new example, find its bin and choose the predominant class in that bin
In this case, We decide to start with one feature and divide the real line into 3 bins
- but exists overlap between classes ; so let’s add 2nd Feature to improve discrimination.
UNDERSTANDINGTHE CURSE
 2 dimensions : two dimensions increases the number of bins from 3 to 32 =9
QUESTION: Which should we maintain constant?
The total number of examples?This results in a 2D scatter plot - Reduced Overlapping , Higher Sparsity
To address sparsity, what about keeping the density of examples per bin constant (say 3) ?This increases the
number of examples from 9 to 27 ( 9 x 3 = 27 , at least)
UNDERSTANDINGTHE CURSE
Moving to 3 features
 The number of bins grow to 3^3 = 27
 For same number of examples, 3D scatter plot is
almost empty
Constant Density:
 To keep the initial density of 3, required examples, 27 X 3 = 81
IMPLICATIONS OF CURSE OF
DIMENSIONALITY
Exponential growth with dimensionality in the number of examples
required to accurately estimate a function
In practice, the curse of dimensionality means that for a given sample size, there is a
maximum number of features above which the performance of a classifier will degrade
rather than improve.
In most cases the information that is lost by discarding some features is compensated by
more accurate mapping in lower dimensional space
MULTICOLLINEARITY
Multicollinearity is a state of very high
intercorrelations or inter-associations among the
independent variables. It is therefore a type of
disturbance in the data, and if present in the data the
statistical inferences made about the data may not
be reliable.
UNDERSTANDING MULTI-COLLINEARITY
Let’s look at the image shown below.
It shows 2 dimensions x1 and x2, which are,
Let us say, measurements of several object -
in cm (x1) and inches (x2).
Now, if you were to use both these dimensions in machine learning,
they will convey similar information and introduce a lot of noise in
System.
We are better of just using one dimension. Here we have converted
the dimension of data from 2D (from x1 and x2) to 1D (z1), which has made the data
relatively easier to explain.
BENEFITS OF APPLYING DIMENSION
REDUCTION
 Data Compression; Reduction of storage space
 Less computing; Faster processing
 Removal of multi-collinearity (redundant features) to reduce noise for better
model fit
 Better visualization and interpretation
APPROACHES FOR DIMENSION
REDUCTION
 Feature selection:
choosing a subset of all the features
 Feature extraction:
creating new features by combining existing ones
In either case, the goal is to find a low-dimensional representation of the data that preserves (most of)
the information or structure in the data
COMMON METHODS FOR DIMENSION
REDUCTION
 MISSINGVALUE
While exploring data, if we encounter missing values, what we do? Our first step should be to
identify the reason then impute missing values/ drop variables using appropriate methods.
But, what if we have too many missing values? Should we impute missing or drop the variables?
May be dropping is a good option, because it would not have lot more details about data set.
Also, it would not help in improving the power of model.
Next question, is there any threshold of missing values for dropping a variable?
It varies from case to case. If the information contained in the variable is not that much, you can
drop the variable if it has more than ~40-50% missing values.
COMMON METHODS FOR DIMENSION
REDUCTION
 MISSING VALUE
R code:
summary(data)
COMMON METHODS FOR DIMENSION
REDUCTION
 LOWVARIANCE
Let’s think of a scenario where we have a constant variable (all observations have
same value, 5) in our data set. Do you think, it can improve the power of model?
Of course NOT, because it has zero variance. In case of high number of dimensions,
we should drop variables having low variance compared to others because these
variables will not explain the variation in target variables.
nearZeroVar
EXAMPLE AND CODE
 To identify these types of predictors, the following two metrics can be calculated:
- the frequency of the most prevalent value over the second most frequent value
(called the "frequency ratio''), which would be near one for well-behaved predictors
and very large for highly-unbalanced data>
- the "percent of unique values'' is the number of unique values divided by the total
number of samples (times 100) that approaches zero as the granularity of the data
increases
 If the frequency ratio is less than a pre-specified threshold and the unique value
percentage is less than a threshold, we might consider a predictor to be near zero-
variance
NEARZEROVARIANCE
data(mdrr)
data.frame(table(mdrrDescr$nR11))
nzv <- nearZeroVar(mdrrDescr, saveMetrics= TRUE)
nzv[nzv$nzv,][1:10,]
dim(mdrrDescr)
nzv <- nearZeroVar(mdrrDescr)
filteredDescr <- mdrrDescr[, -nzv]
dim(filteredDescr)
COMMON METHODS FOR DIMENSION
REDUCTION
 DECISION TREE
It can be used as an ultimate solution to tackle multiple challenges like missing
values, outliers and identifying significant variables
- How? Need to understand how DecisionTree works and the
concept of Entropy and Information Gain
DECISIONTREE – DATA SET
DECISIONTREE: HOW DOES IT LOOK LIKE?
DECISIONTREE AND ENTROPY
DECISIONTREE: ENTROPY AND
INFORMATION GAIN
To build a decision tree , we need to calculate 2 types of Entropy using frequency tables:
a) Entropy using the frequency table of one attribute
DECISIONTREE: ENTROPY AND
INFORMATION GAIN
b) Entropy using the frequency table of two attributes:
ENTROPY …..CONTD.
ENTROPY …..CONTD.
ENTROPY CALCULATION
ENTROPY CALCULATION
DECISIONTREE: ENTROPY AND
INFORMATION GAIN
Similarly :
KEY POINTS:
 Branch with entropy 0 is leaf node
KEY POINTS:
 A branch with entropy more than 0 needs further splitting.
The ID3 algorithm is run
recursively on the non-leaf
branches, until all data is
classified
DECISIONTREE: PROS AND CONS
R CODE
fancyRpartPlot(fit)
##to calculate information gain
library(FSelector)
weights <- information.gain(Play~., data=w)
print(weights)
subset <- cutoff.k(weights, 3)
f <- as.simple.formula(subset, "Play")
print(f)
COMMON METHODS FOR DIMENSION
REDUCTION
 Random Forest
Similar to decision tree is Random Forest. I would also recommend using the in-built
feature importance provided by random forests to select a smaller subset of input
features.
Just be careful that random forests have a tendency to bias towards variables that have
more no. of distinct values i.e. favour numeric variables over binary/categorical values.
HOW DOES RANDOM FORESTWORK?
W/ ANDW/O REPLACEMENT
HOW RANDOMFOREST IS BUILT
R CODE
w <- read.csv("weather.csv",header=T)
library(randomForest)
set.seed(12345)
w <- as.data.frame(w)
w <- read.csv("weather.csv",header=T)
str(w)
w <- w[names(w)[1:5]]
fit1 <- randomForest(Play~ Outlook+Temperature+Humidity+Windy, data=w,
importance=TRUE, ntree=20)
varImpPlot(fit1)
COMMON METHODS FOR DIMENSION
REDUCTION
 High Correlation
Dimensions exhibiting higher correlation can lower down the performance of model.
Moreover, it is not good to have multiple variables of similar information or variation
also known as “Multicollinearity”.
You can use Pearson (continuous variables) or Polychoric (discrete variables) correlation
matrix to identify the variables with high correlation and select one of them using
VIF (Variance Inflation Factor).Variables having higher value (VIF > 5 ) can be dropped.
IMPACT OF MULTICOLLINEARITY
 R demonstration with example
 13 independent variables against response variable of Crimerate
- a discrepancy is noticeable on studying the regression equation with regard to
expenditure on police services between the years 1959 and 1960.Why should police
expenditure in one year be associated with increase in crime rate and decrease in the
previous year? It does not make sense
IMPACT OF MULTICOLLINEARITY
2nd - even though the F statistic is highly significant, and which provides evidence for the presence
of a linear relationship between all 13 variables and the response variable, the β coefficients of both
expenditures for 1959 and 1960 have nonsignificant t ratios. Non-significant t means there is no
slope! In other words police expenditure has no effect whatsoever on crime rate!
HIGH CORRELATION -VIF
Most widely-used diagnostic for multicollinearity, the variance inflation factor (VIF).
-TheVIF may be calculated for each predictor by doing a linear regression of that predictor on all
the other predictors, and then obtaining the R2 from that regression.TheVIF is just 1/(1-R2).
- It’s called the variance inflation factor because it estimates how much the variance of a
coefficient is “inflated” because of linear dependence with other predictors.Thus, aVIF of 1.8 tells
us that the variance (the square of the standard error) of a particular coefficient is 80% larger than
it would be if that predictor was completely uncorrelated with all the other predictors.
-TheVIF has a lower bound of 1 but no upper bound. Authorities differ on how high theVIF has to
be to constitute a problem. Personally, I tend to get concerned when aVIF is greater than 2.50,
which corresponds to an R2 of .60 with the other variables.
VIF – R CODE
 vif(fit)
KEY POINTS:
 The variables with highVIFs are control variables, and the variables of interest
do not have highVIFs.
Let’s the sample consists of U.S. colleges.The dependent variable is graduation
rate, and the variable of interest is an indicator (dummy) for public vs. private.Two
control variables are average SAT scores and average ACT scores for entering
freshmen.These two variables have a correlation above .9, which corresponds to
VIFs of at least 5.26 for each of them. But theVIF for the public/private indicator is
only 1.04. So there’s no problem to be concerned about, and no need to delete one
or the other of the two controls.
http://statisticalhorizons.com/multicollinearity
COMMON METHODS FOR DIMENSION
REDUCTION
 Backward Feature Elimination/ Forward Feature Selection
 In this method, we start with all n dimensions. Compute the sum of square of error
(SSR) after eliminating each variable (n times).Then, identifying variables whose
removal has produced the smallest increase in the SSR and removing it finally, leaving
us with n-1 input features.
 Repeat this process until no other variables can be dropped.
Reverse to this, we can use “Forward Feature Selection” method. In this method, we
select one variable and analyse the performance of model by adding another variable.
Here, selection of variable is based on higher improvement in model performance.
COMMON METHODS FOR DIMENSION
REDUCTION
 Factor Analysis
Let’s say some variables are highly correlated.These variables can be grouped by their
correlations i.e. all variables in a particular group can be highly correlated among themselves but
have low correlation with variables of other group(s). Here each group represents a single
underlying construct or factor.These factors are small in number as compared to large number of
dimensions. However, these factors are difficult to observe.
There are basically two methods of performing factor analysis:
 EFA (Exploratory Factor Analysis)
 CFA (Confirmatory FactorAnalysis)
COMMON METHODS FOR DIMENSION
REDUCTION
 Principal Component Analysis (PCA)
In this technique, variables are transformed into a new set of variables, which are linear combination of original
variables.These new set of variables are known as principle components.They are obtained in such a way that
first principle component accounts for most of the possible variation of original data after which each succeeding
component has the highest possible variance.
The second principal component must be orthogonal to the first principal component. In other words, it does its
best to capture the variance in the data that is not captured by the first principal component. For two-dimensional
dataset, there can be only two principal components.
The principal components are sensitive to the scale of measurement, now to fix this issue we should always
standardize variables before applying PCA.Applying PCA to your data set loses its meaning.
If interpretability of the results is important for your analysis, PCA is not the right technique for your project.
UNDERSTANDING PCA:
PREREQUISITE
Standard Deviation
Statisticians are usually concerned with taking a sample of a population
Mean
“The average distance from the mean of the data set to a point”
Variance
UNDERSTANDING PCA:
PREREQUISITE
Covariance Covariance is always measured between 2 dimensions.
Covariance Matrix
EXAMPLE: HOWTO CALCULATE
COVARIANCE
EIGENVECTORS & EIGENVALUES OF A
MATRIX
EigenVector
EigenValueMatrix
PROPERTIES:
- eigenvectors and eigenvalues always come in pairs
- Eigenvectors can only be found for square matrices; but not all square matrices have eigenvectors;
-For n x n matrix, there are n eigenvectors
-Another property of eigenvectors is that even if I scale the vector by some amount
before I multiply it, I still get the same multiple of it as a result.This
is because if you scale a vector by some amount, all you are doing is making it longer, not changing it’s
direction
PROPERTIES:
all the eigenvectors of a matrix are perpendicular,
ie. at right angles to each other, no matter how many dimensions you have. By the way,
another word for perpendicular, in maths talk, is orthogonal
This is important ;
That means we can express data in terms of these perpendicular eigenvectors, instead of expressing
them in terms of the x axes and y axes.
PROPERTIES:
Another important thing to know is that when mathematicians find eigenvectors,
they like to find the eigenvectors whose length is exactly one.This is because, as
you know, the length of a vector doesn’t affect whether it’s an eigenvector or not,
whereas the direction does. So, in order to keep eigenvectors standard, whenever
we find an eigenvector we usually scale it to make it have a length of 1, so that all
eigenvectors have the same length. Here’s a demonstration from our example
above.
HOW DOES ONE GO ABOUT FINDINGTHESE
MYSTICAL EIGENVECTORS?
- Unfortunately, it’s only easy(ish) if you have a rather small matrix,
-The usual way to find the eigenvectors is by some complicated iterative method
which is beyond the scope of this tutorial
REVISE:
METHOD
 1. Get Some Data
 2. Subtract the mean
3. Calculate the covariance matrix
METHOD ….CONTD
4. Calculate the eigenvectors and eigenvalues of the covariance
Matrix
PLOT
METHOD
METHOD:
 Step 5: Deriving the new data set
RawFeatureVector is the matrix with the eigenvectors in the columns transposed
so that the eigenvectors are now in the rows, with the most significant eigenvector
at the top,
RowDataAdjust is the mean-adjusted data transposed, ie. the data
items are in each column, with each row holding a separate dimension
PCA
Thanks
kazitoufiq@gmail.com
Twitter @KaziToufiqWadud

More Related Content

What's hot

Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
Marc Garcia
 
Random forest
Random forestRandom forest
Random forest
Ujjawal
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Simplilearn
 

What's hot (20)

Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Pca ppt
Pca pptPca ppt
Pca ppt
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
 
Random forest
Random forestRandom forest
Random forest
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Random forest
Random forestRandom forest
Random forest
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Principal Component Analysis
Principal Component AnalysisPrincipal Component Analysis
Principal Component Analysis
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Feature selection
Feature selectionFeature selection
Feature selection
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scale
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Pca(principal components analysis)
Pca(principal components analysis)Pca(principal components analysis)
Pca(principal components analysis)
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detection
 

Similar to Dimension Reduction: What? Why? and How?

Important Terminologies In Statistical Inference I I
Important Terminologies In  Statistical  Inference  I IImportant Terminologies In  Statistical  Inference  I I
Important Terminologies In Statistical Inference I I
Zoha Qureshi
 
3.2 measures of variation
3.2 measures of variation3.2 measures of variation
3.2 measures of variation
leblance
 
3.2 measures of variation
3.2 measures of variation3.2 measures of variation
3.2 measures of variation
leblance
 
Analyzing quantitative data
Analyzing quantitative dataAnalyzing quantitative data
Analyzing quantitative data
Bing Villamor
 
Dimensionality Reduction for Classification with High-Dimensional Data
Dimensionality Reduction for Classification with High-Dimensional DataDimensionality Reduction for Classification with High-Dimensional Data
Dimensionality Reduction for Classification with High-Dimensional Data
International Journal of Engineering Inventions www.ijeijournal.com
 
Random forest algorithm for regression a beginner's guide
Random forest algorithm for regression   a beginner's guideRandom forest algorithm for regression   a beginner's guide
Random forest algorithm for regression a beginner's guide
prateek kumar
 
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxSAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
anhlodge
 

Similar to Dimension Reduction: What? Why? and How? (20)

Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Important Terminologies In Statistical Inference I I
Important Terminologies In  Statistical  Inference  I IImportant Terminologies In  Statistical  Inference  I I
Important Terminologies In Statistical Inference I I
 
2 UNIT-DSP.pptx
2 UNIT-DSP.pptx2 UNIT-DSP.pptx
2 UNIT-DSP.pptx
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
M08 BiasVarianceTradeoff
M08 BiasVarianceTradeoffM08 BiasVarianceTradeoff
M08 BiasVarianceTradeoff
 
Statistical Clustering
Statistical ClusteringStatistical Clustering
Statistical Clustering
 
3.2 measures of variation
3.2 measures of variation3.2 measures of variation
3.2 measures of variation
 
3.2 measures of variation
3.2 measures of variation3.2 measures of variation
3.2 measures of variation
 
Statistics excellent
Statistics excellentStatistics excellent
Statistics excellent
 
Statistics in research
Statistics in researchStatistics in research
Statistics in research
 
Explore ml day 2
Explore ml day 2Explore ml day 2
Explore ml day 2
 
Describing and exploring data
Describing and exploring dataDescribing and exploring data
Describing and exploring data
 
Statistics.pdf
Statistics.pdfStatistics.pdf
Statistics.pdf
 
Statistics for data scientists
Statistics for  data scientistsStatistics for  data scientists
Statistics for data scientists
 
Analyzing quantitative data
Analyzing quantitative dataAnalyzing quantitative data
Analyzing quantitative data
 
Dimensionality Reduction for Classification with High-Dimensional Data
Dimensionality Reduction for Classification with High-Dimensional DataDimensionality Reduction for Classification with High-Dimensional Data
Dimensionality Reduction for Classification with High-Dimensional Data
 
Random forest algorithm for regression a beginner's guide
Random forest algorithm for regression   a beginner's guideRandom forest algorithm for regression   a beginner's guide
Random forest algorithm for regression a beginner's guide
 
Machine learning algorithms and business use cases
Machine learning algorithms and business use casesMachine learning algorithms and business use cases
Machine learning algorithms and business use cases
 
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxSAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
 

Recently uploaded

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 

Recently uploaded (20)

Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Ranking and Scoring Exercises for Research
Ranking and Scoring Exercises for ResearchRanking and Scoring Exercises for Research
Ranking and Scoring Exercises for Research
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 

Dimension Reduction: What? Why? and How?

  • 2. WHAT IS DIMENSION REDUCTION? Process of converting data set having vast dimensionsinto data set with lesser dimensions ensuring that it conveys similar information concisely
  • 3. DIMENSION REDUCTION :WHY ? Curse of Dimensionality what is the curse?
  • 4. UNDERSTANDINGTHE CURSE Consider a 3 -class pattern recognition problem. 1) Strat with 1 dimension/feature 2) Divide the feature space into uniform bins 3) Compute the ratio of examples for each class at each bin 4) For a new example, find its bin and choose the predominant class in that bin In this case, We decide to start with one feature and divide the real line into 3 bins - but exists overlap between classes ; so let’s add 2nd Feature to improve discrimination.
  • 5. UNDERSTANDINGTHE CURSE  2 dimensions : two dimensions increases the number of bins from 3 to 32 =9 QUESTION: Which should we maintain constant? The total number of examples?This results in a 2D scatter plot - Reduced Overlapping , Higher Sparsity To address sparsity, what about keeping the density of examples per bin constant (say 3) ?This increases the number of examples from 9 to 27 ( 9 x 3 = 27 , at least)
  • 6. UNDERSTANDINGTHE CURSE Moving to 3 features  The number of bins grow to 3^3 = 27  For same number of examples, 3D scatter plot is almost empty Constant Density:  To keep the initial density of 3, required examples, 27 X 3 = 81
  • 7. IMPLICATIONS OF CURSE OF DIMENSIONALITY Exponential growth with dimensionality in the number of examples required to accurately estimate a function In practice, the curse of dimensionality means that for a given sample size, there is a maximum number of features above which the performance of a classifier will degrade rather than improve. In most cases the information that is lost by discarding some features is compensated by more accurate mapping in lower dimensional space
  • 8. MULTICOLLINEARITY Multicollinearity is a state of very high intercorrelations or inter-associations among the independent variables. It is therefore a type of disturbance in the data, and if present in the data the statistical inferences made about the data may not be reliable.
  • 9. UNDERSTANDING MULTI-COLLINEARITY Let’s look at the image shown below. It shows 2 dimensions x1 and x2, which are, Let us say, measurements of several object - in cm (x1) and inches (x2). Now, if you were to use both these dimensions in machine learning, they will convey similar information and introduce a lot of noise in System. We are better of just using one dimension. Here we have converted the dimension of data from 2D (from x1 and x2) to 1D (z1), which has made the data relatively easier to explain.
  • 10. BENEFITS OF APPLYING DIMENSION REDUCTION  Data Compression; Reduction of storage space  Less computing; Faster processing  Removal of multi-collinearity (redundant features) to reduce noise for better model fit  Better visualization and interpretation
  • 11. APPROACHES FOR DIMENSION REDUCTION  Feature selection: choosing a subset of all the features  Feature extraction: creating new features by combining existing ones In either case, the goal is to find a low-dimensional representation of the data that preserves (most of) the information or structure in the data
  • 12. COMMON METHODS FOR DIMENSION REDUCTION  MISSINGVALUE While exploring data, if we encounter missing values, what we do? Our first step should be to identify the reason then impute missing values/ drop variables using appropriate methods. But, what if we have too many missing values? Should we impute missing or drop the variables? May be dropping is a good option, because it would not have lot more details about data set. Also, it would not help in improving the power of model. Next question, is there any threshold of missing values for dropping a variable? It varies from case to case. If the information contained in the variable is not that much, you can drop the variable if it has more than ~40-50% missing values.
  • 13. COMMON METHODS FOR DIMENSION REDUCTION  MISSING VALUE R code: summary(data)
  • 14. COMMON METHODS FOR DIMENSION REDUCTION  LOWVARIANCE Let’s think of a scenario where we have a constant variable (all observations have same value, 5) in our data set. Do you think, it can improve the power of model? Of course NOT, because it has zero variance. In case of high number of dimensions, we should drop variables having low variance compared to others because these variables will not explain the variation in target variables. nearZeroVar
  • 15. EXAMPLE AND CODE  To identify these types of predictors, the following two metrics can be calculated: - the frequency of the most prevalent value over the second most frequent value (called the "frequency ratio''), which would be near one for well-behaved predictors and very large for highly-unbalanced data> - the "percent of unique values'' is the number of unique values divided by the total number of samples (times 100) that approaches zero as the granularity of the data increases  If the frequency ratio is less than a pre-specified threshold and the unique value percentage is less than a threshold, we might consider a predictor to be near zero- variance
  • 16. NEARZEROVARIANCE data(mdrr) data.frame(table(mdrrDescr$nR11)) nzv <- nearZeroVar(mdrrDescr, saveMetrics= TRUE) nzv[nzv$nzv,][1:10,] dim(mdrrDescr) nzv <- nearZeroVar(mdrrDescr) filteredDescr <- mdrrDescr[, -nzv] dim(filteredDescr)
  • 17. COMMON METHODS FOR DIMENSION REDUCTION  DECISION TREE It can be used as an ultimate solution to tackle multiple challenges like missing values, outliers and identifying significant variables - How? Need to understand how DecisionTree works and the concept of Entropy and Information Gain
  • 19. DECISIONTREE: HOW DOES IT LOOK LIKE?
  • 21. DECISIONTREE: ENTROPY AND INFORMATION GAIN To build a decision tree , we need to calculate 2 types of Entropy using frequency tables: a) Entropy using the frequency table of one attribute
  • 22. DECISIONTREE: ENTROPY AND INFORMATION GAIN b) Entropy using the frequency table of two attributes:
  • 28. KEY POINTS:  Branch with entropy 0 is leaf node
  • 29. KEY POINTS:  A branch with entropy more than 0 needs further splitting. The ID3 algorithm is run recursively on the non-leaf branches, until all data is classified
  • 31. R CODE fancyRpartPlot(fit) ##to calculate information gain library(FSelector) weights <- information.gain(Play~., data=w) print(weights) subset <- cutoff.k(weights, 3) f <- as.simple.formula(subset, "Play") print(f)
  • 32. COMMON METHODS FOR DIMENSION REDUCTION  Random Forest Similar to decision tree is Random Forest. I would also recommend using the in-built feature importance provided by random forests to select a smaller subset of input features. Just be careful that random forests have a tendency to bias towards variables that have more no. of distinct values i.e. favour numeric variables over binary/categorical values.
  • 33. HOW DOES RANDOM FORESTWORK?
  • 36. R CODE w <- read.csv("weather.csv",header=T) library(randomForest) set.seed(12345) w <- as.data.frame(w) w <- read.csv("weather.csv",header=T) str(w) w <- w[names(w)[1:5]] fit1 <- randomForest(Play~ Outlook+Temperature+Humidity+Windy, data=w, importance=TRUE, ntree=20) varImpPlot(fit1)
  • 37. COMMON METHODS FOR DIMENSION REDUCTION  High Correlation Dimensions exhibiting higher correlation can lower down the performance of model. Moreover, it is not good to have multiple variables of similar information or variation also known as “Multicollinearity”. You can use Pearson (continuous variables) or Polychoric (discrete variables) correlation matrix to identify the variables with high correlation and select one of them using VIF (Variance Inflation Factor).Variables having higher value (VIF > 5 ) can be dropped.
  • 38. IMPACT OF MULTICOLLINEARITY  R demonstration with example  13 independent variables against response variable of Crimerate - a discrepancy is noticeable on studying the regression equation with regard to expenditure on police services between the years 1959 and 1960.Why should police expenditure in one year be associated with increase in crime rate and decrease in the previous year? It does not make sense
  • 39. IMPACT OF MULTICOLLINEARITY 2nd - even though the F statistic is highly significant, and which provides evidence for the presence of a linear relationship between all 13 variables and the response variable, the β coefficients of both expenditures for 1959 and 1960 have nonsignificant t ratios. Non-significant t means there is no slope! In other words police expenditure has no effect whatsoever on crime rate!
  • 40. HIGH CORRELATION -VIF Most widely-used diagnostic for multicollinearity, the variance inflation factor (VIF). -TheVIF may be calculated for each predictor by doing a linear regression of that predictor on all the other predictors, and then obtaining the R2 from that regression.TheVIF is just 1/(1-R2). - It’s called the variance inflation factor because it estimates how much the variance of a coefficient is “inflated” because of linear dependence with other predictors.Thus, aVIF of 1.8 tells us that the variance (the square of the standard error) of a particular coefficient is 80% larger than it would be if that predictor was completely uncorrelated with all the other predictors. -TheVIF has a lower bound of 1 but no upper bound. Authorities differ on how high theVIF has to be to constitute a problem. Personally, I tend to get concerned when aVIF is greater than 2.50, which corresponds to an R2 of .60 with the other variables.
  • 41. VIF – R CODE  vif(fit)
  • 42. KEY POINTS:  The variables with highVIFs are control variables, and the variables of interest do not have highVIFs. Let’s the sample consists of U.S. colleges.The dependent variable is graduation rate, and the variable of interest is an indicator (dummy) for public vs. private.Two control variables are average SAT scores and average ACT scores for entering freshmen.These two variables have a correlation above .9, which corresponds to VIFs of at least 5.26 for each of them. But theVIF for the public/private indicator is only 1.04. So there’s no problem to be concerned about, and no need to delete one or the other of the two controls. http://statisticalhorizons.com/multicollinearity
  • 43. COMMON METHODS FOR DIMENSION REDUCTION  Backward Feature Elimination/ Forward Feature Selection  In this method, we start with all n dimensions. Compute the sum of square of error (SSR) after eliminating each variable (n times).Then, identifying variables whose removal has produced the smallest increase in the SSR and removing it finally, leaving us with n-1 input features.  Repeat this process until no other variables can be dropped. Reverse to this, we can use “Forward Feature Selection” method. In this method, we select one variable and analyse the performance of model by adding another variable. Here, selection of variable is based on higher improvement in model performance.
  • 44. COMMON METHODS FOR DIMENSION REDUCTION  Factor Analysis Let’s say some variables are highly correlated.These variables can be grouped by their correlations i.e. all variables in a particular group can be highly correlated among themselves but have low correlation with variables of other group(s). Here each group represents a single underlying construct or factor.These factors are small in number as compared to large number of dimensions. However, these factors are difficult to observe. There are basically two methods of performing factor analysis:  EFA (Exploratory Factor Analysis)  CFA (Confirmatory FactorAnalysis)
  • 45. COMMON METHODS FOR DIMENSION REDUCTION  Principal Component Analysis (PCA) In this technique, variables are transformed into a new set of variables, which are linear combination of original variables.These new set of variables are known as principle components.They are obtained in such a way that first principle component accounts for most of the possible variation of original data after which each succeeding component has the highest possible variance. The second principal component must be orthogonal to the first principal component. In other words, it does its best to capture the variance in the data that is not captured by the first principal component. For two-dimensional dataset, there can be only two principal components. The principal components are sensitive to the scale of measurement, now to fix this issue we should always standardize variables before applying PCA.Applying PCA to your data set loses its meaning. If interpretability of the results is important for your analysis, PCA is not the right technique for your project.
  • 46. UNDERSTANDING PCA: PREREQUISITE Standard Deviation Statisticians are usually concerned with taking a sample of a population Mean “The average distance from the mean of the data set to a point” Variance
  • 47. UNDERSTANDING PCA: PREREQUISITE Covariance Covariance is always measured between 2 dimensions. Covariance Matrix
  • 49. EIGENVECTORS & EIGENVALUES OF A MATRIX EigenVector EigenValueMatrix
  • 50. PROPERTIES: - eigenvectors and eigenvalues always come in pairs - Eigenvectors can only be found for square matrices; but not all square matrices have eigenvectors; -For n x n matrix, there are n eigenvectors -Another property of eigenvectors is that even if I scale the vector by some amount before I multiply it, I still get the same multiple of it as a result.This is because if you scale a vector by some amount, all you are doing is making it longer, not changing it’s direction
  • 51. PROPERTIES: all the eigenvectors of a matrix are perpendicular, ie. at right angles to each other, no matter how many dimensions you have. By the way, another word for perpendicular, in maths talk, is orthogonal This is important ; That means we can express data in terms of these perpendicular eigenvectors, instead of expressing them in terms of the x axes and y axes.
  • 52. PROPERTIES: Another important thing to know is that when mathematicians find eigenvectors, they like to find the eigenvectors whose length is exactly one.This is because, as you know, the length of a vector doesn’t affect whether it’s an eigenvector or not, whereas the direction does. So, in order to keep eigenvectors standard, whenever we find an eigenvector we usually scale it to make it have a length of 1, so that all eigenvectors have the same length. Here’s a demonstration from our example above.
  • 53. HOW DOES ONE GO ABOUT FINDINGTHESE MYSTICAL EIGENVECTORS? - Unfortunately, it’s only easy(ish) if you have a rather small matrix, -The usual way to find the eigenvectors is by some complicated iterative method which is beyond the scope of this tutorial
  • 55. METHOD  1. Get Some Data  2. Subtract the mean 3. Calculate the covariance matrix
  • 56. METHOD ….CONTD 4. Calculate the eigenvectors and eigenvalues of the covariance Matrix
  • 57. PLOT
  • 59. METHOD:  Step 5: Deriving the new data set RawFeatureVector is the matrix with the eigenvectors in the columns transposed so that the eigenvectors are now in the rows, with the most significant eigenvector at the top, RowDataAdjust is the mean-adjusted data transposed, ie. the data items are in each column, with each row holding a separate dimension
  • 60. PCA