SlideShare uma empresa Scribd logo
1 de 20
Visualization using tSNE
Yan Xu
Jun 7, 2013
Dimension Reduction Overview
Parametric
(LDA)

Linear

Dimension
reduction

(PCA)

Global

Nonparametric

(ISOMAP,MDS)

Nonlinear
tSNE (t-distributed Stochastic Neighbor Embedding)
easier
implementation

MDS

SNE

Local+probability

2002

Local

more stable
and faster
solution

sym SNE

UNI-SNE

crowding problem

2007

(LLE, SNE)

tSNE

Barnes-Hut-SNE

O(N2)->O(NlogN)

2008

2013
MDS: Multi-Dimensional Scaling
• Multi-Dimensional Scaling arranges the low-dimensional points so as to
minimize the discrepancy between the pairwise distances in the original
space and the pairwise distances in the low-D space.

Cost

(d ij
i j

d ij

|| xi

x j ||2

ˆ
d ij

|| yi

y j ||2

ˆ
d ij ) 2
Sammon mapping from MDS
high-D
distance

low-D
distance

|| xi x j || || y i y j ||

Cost
ij

2

|| xi x j ||

It puts too much emphasis on getting very small distances exactly
right. It’s slow to optimize and also gets stuck in different local
optima each time

Global to Local?
Maps that preserve local geometry
LLE (Locally Linear Embedding)
The idea is to make the local configurations of points in the low-dimensional
space resemble the local configurations in the high-dimensional space.

Cost

|| xi
i

wij x j || 2 ,
j N (i )

wij

1

j N (i )

fixed weights

Cost

|| y i
i

wij y j || 2
j N (i )

Find the y that minimize the cost subject to the constraint that the y have
unit variance on each dimension.
A probabilistic version of local MDS:
Stochastic Neighbor Embedding (SNE)
• It is more important to get local distances right than non-local ones.
• Stochastic neighbor embedding has a probabilistic way of deciding if
a pairwise distance is “local”.
• Convert each high-dimensional similarity into the probability that one
data point will pick the other data point as its neighbor.

probability of
p
picking j given i in j|i
high D

|| xi x j ||2 2 i2
e
|| xi xk ||2 2 i2
e
k

e

q j|i

|| yi y j ||2

e
k

2

|| yi yk ||

probability of
picking j given
i in low D
Picking the radius of the Gaussian that is
used to compute the p’s
• We need to use different radii in different parts of the space so that
we keep the effective number of neighbors about constant.
• A big radius leads to a high entropy for the distribution over
neighbors of i. A small radius leads to a low entropy.
• So decide what entropy you want and then find the radius that
produces that entropy.
• Its easier to specify perplexity:

||xi x j ||2 2 i2
e

p j|i

|| xi xk ||2 2 i2
e
k
The cost function for a low-dimensional
representation
Cost

KL ( Pi || Qi )

i

i

j

p j|i log

p j|i
q j|i

Gradient descent:

C
yi

2

(y j

y i ) ( p j|i

q j|i

j

Gradient update with a momentum term:

Learning
rate

Momentum

pi| j

qi| j )
Simpler version SNE: Turning conditional
probabilities into pairwise probabilities

pij

e

|| xi x j ||2 2 2

e

p j|i

pij

|| xk xl ||2 2 2

2n

k l

pij
j

Cost

KL( P || Q )

C
yi

4

( pij
j

pij log

qij )( yi

pi| j

yj)

pij
qij

1
2n
MNIST
Database
of handwritten
digits
28×28 images

Problem?
Why SNE does not have gaps between
classes
Crowding problem: the area accommodating moderately distant
datapoints is not large enough compared with the area
accommodating nearby datapoints.

A uniform background model (UNI-SNE) eliminates this effect and
allows gaps between classes to appear.
qij can never fall below

2
n(n 1)
From UNI-SNE to t-SNE
High dimension: Convert distances into probabilities using a
Gaussian distribution
Low dimension: Convert distances into probabilities using a
probability distribution that has much heavier tails than a Gaussian.
Student’s t-distribution

V : the number of degrees of freedom
Standard
Normal Dis.
T-Dis. With
V=1

qij

(1 || yi
(1 || yk
k l

y j ||2 )

1

yl ||2 )

1
Compare tSNE with SNE and UNI-SNE

18
16
14
12

14
12
10

10

-2
-4
Optimization method for tSNE
||xi x j ||2 2 i2
e

p j|i

e
k

|| xi xk ||2 2 i2

qij

(1 || yi
(1 || yk
k l

y j ||2 )

1

yl ||2 )

1
Optimization method for tSNE
Tricks:
1. Keep momentum term small until the map points have become
moderately well organized.
2. Use adaptive learning rate described by Jacobs (1988), which
gradually increases the learning rate in directions where the
gradient is stable.
3. Early compression: force map points to stay close together at the
start of the optimization.
4. Early exaggeration: multiply all the pij’s by 4, in the initial stages
of the optimization.
Isomap

Sammon mapping

6000
MNIST
digits
t-SNE

Locally Linear Embedding
tSNE vs Diffusion maps
Diffusion distance:
|| xi x j ||2
(1)
pij

e
n

Diffusion maps:

(
pijt )

(
pikt
k 1

1)

(
pkjt

1)
Weakness
1. It’s unclear how t-SNE performs on general dimensionality
reduction task;
2. The relative local nature of t-SNE makes it sensitive to the curse
of the intrinsic dimensionality of the data;
3. It’s not guaranteed to converge to a global optimum of its cost
function.
References:
t-SNE homepage:
http://homepage.tudelft.nl/19j49/t-SNE.html
Advanced Machine Learning: Lecture11: Non-linear Dimensionality Reduction
http://www.cs.toronto.edu/~hinton/csc2535/lectures.html

Plugin Ad: tSNE in Farsight
splot = new SNEPlotWindow(this);
splot->setPerplexity(perplexity);
splot->setModels(table, selection))
splot->show();

Mais conteúdo relacionado

Mais procurados

Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - IntroductionJungwon Kim
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
 
Chapter 8 image compression
Chapter 8 image compressionChapter 8 image compression
Chapter 8 image compressionasodariyabhavesh
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clusteringSOYEON KIM
 
support vector regression
support vector regressionsupport vector regression
support vector regressionAkhilesh Joshi
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaPreferred Networks
 
Cs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN ArchitectureCs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN ArchitectureYanbin Kong
 
Flow based generative models
Flow based generative modelsFlow based generative models
Flow based generative models수철 박
 
Manifold learning with application to object recognition
Manifold learning with application to object recognitionManifold learning with application to object recognition
Manifold learning with application to object recognitionzukun
 
Webinar on Graph Neural Networks
Webinar on Graph Neural NetworksWebinar on Graph Neural Networks
Webinar on Graph Neural NetworksLucaCrociani1
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Luba Elliott
 
decision tree regression
decision tree regressiondecision tree regression
decision tree regressionAkhilesh Joshi
 
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
Support vector machine
Support vector machineSupport vector machine
Support vector machineRishabh Gupta
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
 

Mais procurados (20)

Gnn overview
Gnn overviewGnn overview
Gnn overview
 
Graph Neural Network - Introduction
Graph Neural Network - IntroductionGraph Neural Network - Introduction
Graph Neural Network - Introduction
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
Chapter 8 image compression
Chapter 8 image compressionChapter 8 image compression
Chapter 8 image compression
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clustering
 
support vector regression
support vector regressionsupport vector regression
support vector regression
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
 
Cs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN ArchitectureCs231n 2017 lecture9 CNN Architecture
Cs231n 2017 lecture9 CNN Architecture
 
06. graph mining
06. graph mining06. graph mining
06. graph mining
 
Flow based generative models
Flow based generative modelsFlow based generative models
Flow based generative models
 
Image compression .
Image compression .Image compression .
Image compression .
 
Bayes Belief Networks
Bayes Belief NetworksBayes Belief Networks
Bayes Belief Networks
 
Manifold learning with application to object recognition
Manifold learning with application to object recognitionManifold learning with application to object recognition
Manifold learning with application to object recognition
 
Webinar on Graph Neural Networks
Webinar on Graph Neural NetworksWebinar on Graph Neural Networks
Webinar on Graph Neural Networks
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
 
decision tree regression
decision tree regressiondecision tree regression
decision tree regression
 
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 
U-Net (1).pptx
U-Net (1).pptxU-Net (1).pptx
U-Net (1).pptx
 

Destaque

Cloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical Images
Cloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical ImagesCloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical Images
Cloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical ImagesYan Xu
 
Clustering introduction
Clustering introductionClustering introduction
Clustering introductionYan Xu
 
K means and dbscan
K means and dbscanK means and dbscan
K means and dbscanYan Xu
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Yan Xu
 
Spectral clustering - Houston ML Meetup
Spectral clustering - Houston ML MeetupSpectral clustering - Houston ML Meetup
Spectral clustering - Houston ML MeetupYan Xu
 
Nonlinear dimension reduction
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reductionYan Xu
 
Kernel Bayes Rule
Kernel Bayes RuleKernel Bayes Rule
Kernel Bayes RuleYan Xu
 
Hierarchical Stochastic Neighbor Embedding
Hierarchical Stochastic Neighbor EmbeddingHierarchical Stochastic Neighbor Embedding
Hierarchical Stochastic Neighbor EmbeddingNicola Pezzotti
 
Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAbhinav Gupta
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)Pratik Meshram
 
Project - Deep Locality Sensitive Hashing
Project - Deep Locality Sensitive HashingProject - Deep Locality Sensitive Hashing
Project - Deep Locality Sensitive HashingGabriele Angeletti
 
Statistical Arbitrage Strategies
Statistical Arbitrage StrategiesStatistical Arbitrage Strategies
Statistical Arbitrage Strategiesguest8fde7a
 
実践! D3.jsで可視化入門
実践! D3.jsで可視化入門実践! D3.jsで可視化入門
実践! D3.jsで可視化入門Kenta Sato
 
Reunio_poblet_2013_14
Reunio_poblet_2013_14Reunio_poblet_2013_14
Reunio_poblet_2013_14csferreries
 
DTC ONLINE STORE BUSINESS PACKAGE#3
DTC ONLINE STORE BUSINESS PACKAGE#3DTC ONLINE STORE BUSINESS PACKAGE#3
DTC ONLINE STORE BUSINESS PACKAGE#3D'Trendy Clothings
 

Destaque (20)

Cloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical Images
Cloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical ImagesCloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical Images
Cloud-based Storage, Processing and Rendering for Gegabytes 3D Biomedical Images
 
Clustering introduction
Clustering introductionClustering introduction
Clustering introduction
 
K means and dbscan
K means and dbscanK means and dbscan
K means and dbscan
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
 
Spectral clustering - Houston ML Meetup
Spectral clustering - Houston ML MeetupSpectral clustering - Houston ML Meetup
Spectral clustering - Houston ML Meetup
 
Nonlinear dimension reduction
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reduction
 
Kernel Bayes Rule
Kernel Bayes RuleKernel Bayes Rule
Kernel Bayes Rule
 
Hierarchical Stochastic Neighbor Embedding
Hierarchical Stochastic Neighbor EmbeddingHierarchical Stochastic Neighbor Embedding
Hierarchical Stochastic Neighbor Embedding
 
Analyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in PythonAnalyzing Arguments during a Debate using Natural Language Processing in Python
Analyzing Arguments during a Debate using Natural Language Processing in Python
 
pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)pratik meshram-Unit 5 (contemporary mkt r sch)
pratik meshram-Unit 5 (contemporary mkt r sch)
 
Project - Deep Locality Sensitive Hashing
Project - Deep Locality Sensitive HashingProject - Deep Locality Sensitive Hashing
Project - Deep Locality Sensitive Hashing
 
Statistical Arbitrage Strategies
Statistical Arbitrage StrategiesStatistical Arbitrage Strategies
Statistical Arbitrage Strategies
 
実践! D3.jsで可視化入門
実践! D3.jsで可視化入門実践! D3.jsで可視化入門
実践! D3.jsで可視化入門
 
t-SNE
t-SNEt-SNE
t-SNE
 
Reunio_poblet_2013_14
Reunio_poblet_2013_14Reunio_poblet_2013_14
Reunio_poblet_2013_14
 
โทรศัพท์และสัญญาณ
โทรศัพท์และสัญญาณโทรศัพท์และสัญญาณ
โทรศัพท์และสัญญาณ
 
Asballl
AsballlAsballl
Asballl
 
Pollution
PollutionPollution
Pollution
 
Unidad 5 (1).
Unidad 5 (1).Unidad 5 (1).
Unidad 5 (1).
 
DTC ONLINE STORE BUSINESS PACKAGE#3
DTC ONLINE STORE BUSINESS PACKAGE#3DTC ONLINE STORE BUSINESS PACKAGE#3
DTC ONLINE STORE BUSINESS PACKAGE#3
 

Semelhante a Visualization using tSNE

ImageSegmentation (1).ppt
ImageSegmentation (1).pptImageSegmentation (1).ppt
ImageSegmentation (1).pptNoorUlHaq47
 
ImageSegmentation.ppt
ImageSegmentation.pptImageSegmentation.ppt
ImageSegmentation.pptAVUDAI1
 
ImageSegmentation.ppt
ImageSegmentation.pptImageSegmentation.ppt
ImageSegmentation.pptDEEPUKUMARR
 
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...Institute of Information Systems (HES-SO)
 
Visualization of general defined space data
Visualization of general defined space dataVisualization of general defined space data
Visualization of general defined space dataijcga
 
Digital Distance Geometry
Digital Distance GeometryDigital Distance Geometry
Digital Distance Geometryppd1961
 
Do's and Don'ts of using t-SNE.pdf
Do's and Don'ts of using t-SNE.pdfDo's and Don'ts of using t-SNE.pdf
Do's and Don'ts of using t-SNE.pdfFrankClat
 
Clustering of graphs and search of assemblages
Clustering of graphs and search of assemblagesClustering of graphs and search of assemblages
Clustering of graphs and search of assemblagesData-Centric_Alliance
 
Converting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional OnesConverting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional OnesStrand Life Sciences Pvt Ltd
 
Stixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normalStixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normalTaeKang Woo
 
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...Pioneer Natural Resources
 
Time of arrival based localization in wireless sensor networks a non linear ...
Time of arrival based localization in wireless sensor networks  a non linear ...Time of arrival based localization in wireless sensor networks  a non linear ...
Time of arrival based localization in wireless sensor networks a non linear ...sipij
 
image segmentation image segmentation.pptx
image segmentation image segmentation.pptximage segmentation image segmentation.pptx
image segmentation image segmentation.pptxNaveenKumar5162
 
Module-5-1_230523_171754 (1).pdf
Module-5-1_230523_171754 (1).pdfModule-5-1_230523_171754 (1).pdf
Module-5-1_230523_171754 (1).pdfvikasmittal92
 
A four dimensional analysis of agricultural data
A four dimensional analysis of agricultural dataA four dimensional analysis of agricultural data
A four dimensional analysis of agricultural dataMargaret Donald
 

Semelhante a Visualization using tSNE (20)

Image segmentation
Image segmentationImage segmentation
Image segmentation
 
ImageSegmentation (1).ppt
ImageSegmentation (1).pptImageSegmentation (1).ppt
ImageSegmentation (1).ppt
 
ImageSegmentation.ppt
ImageSegmentation.pptImageSegmentation.ppt
ImageSegmentation.ppt
 
ImageSegmentation.ppt
ImageSegmentation.pptImageSegmentation.ppt
ImageSegmentation.ppt
 
Lect14 lines+circles
Lect14 lines+circlesLect14 lines+circles
Lect14 lines+circles
 
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
A 3-D Riesz-Covariance Texture Model for the Prediction of Nodule Recurrence ...
 
Visualization of general defined space data
Visualization of general defined space dataVisualization of general defined space data
Visualization of general defined space data
 
Digital Distance Geometry
Digital Distance GeometryDigital Distance Geometry
Digital Distance Geometry
 
VoxelNet
VoxelNetVoxelNet
VoxelNet
 
Do's and Don'ts of using t-SNE.pdf
Do's and Don'ts of using t-SNE.pdfDo's and Don'ts of using t-SNE.pdf
Do's and Don'ts of using t-SNE.pdf
 
Clustering of graphs and search of assemblages
Clustering of graphs and search of assemblagesClustering of graphs and search of assemblages
Clustering of graphs and search of assemblages
 
Converting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional OnesConverting High Dimensional Problems to Low Dimensional Ones
Converting High Dimensional Problems to Low Dimensional Ones
 
Stixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normalStixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normal
 
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...
OPTIMIZED RATE ALLOCATION OF HYPERSPECTRAL IMAGES IN COMPRESSED DOMAIN USING ...
 
2. IP Fundamentals.pdf
2. IP Fundamentals.pdf2. IP Fundamentals.pdf
2. IP Fundamentals.pdf
 
Time of arrival based localization in wireless sensor networks a non linear ...
Time of arrival based localization in wireless sensor networks  a non linear ...Time of arrival based localization in wireless sensor networks  a non linear ...
Time of arrival based localization in wireless sensor networks a non linear ...
 
image segmentation image segmentation.pptx
image segmentation image segmentation.pptximage segmentation image segmentation.pptx
image segmentation image segmentation.pptx
 
Module-5-1_230523_171754 (1).pdf
Module-5-1_230523_171754 (1).pdfModule-5-1_230523_171754 (1).pdf
Module-5-1_230523_171754 (1).pdf
 
Clustering
ClusteringClustering
Clustering
 
A four dimensional analysis of agricultural data
A four dimensional analysis of agricultural dataA four dimensional analysis of agricultural data
A four dimensional analysis of agricultural data
 

Mais de Yan Xu

Kaggle winning solutions: Retail Sales Forecasting
Kaggle winning solutions: Retail Sales ForecastingKaggle winning solutions: Retail Sales Forecasting
Kaggle winning solutions: Retail Sales ForecastingYan Xu
 
Basics of Dynamic programming
Basics of Dynamic programming Basics of Dynamic programming
Basics of Dynamic programming Yan Xu
 
Walking through Tensorflow 2.0
Walking through Tensorflow 2.0Walking through Tensorflow 2.0
Walking through Tensorflow 2.0Yan Xu
 
Practical contextual bandits for business
Practical contextual bandits for businessPractical contextual bandits for business
Practical contextual bandits for businessYan Xu
 
Introduction to Multi-armed Bandits
Introduction to Multi-armed BanditsIntroduction to Multi-armed Bandits
Introduction to Multi-armed BanditsYan Xu
 
A Data-Driven Question Generation Model for Educational Content - by Jack Wang
A Data-Driven Question Generation Model for Educational Content - by Jack WangA Data-Driven Question Generation Model for Educational Content - by Jack Wang
A Data-Driven Question Generation Model for Educational Content - by Jack WangYan Xu
 
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Yan Xu
 
Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...
Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...
Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...Yan Xu
 
Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...
Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...
Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...Yan Xu
 
Introduction to Autoencoders
Introduction to AutoencodersIntroduction to Autoencoders
Introduction to AutoencodersYan Xu
 
State of enterprise data science
State of enterprise data scienceState of enterprise data science
State of enterprise data scienceYan Xu
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term MemoryYan Xu
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationYan Xu
 
Linear algebra and probability (Deep Learning chapter 2&3)
Linear algebra and probability (Deep Learning chapter 2&3)Linear algebra and probability (Deep Learning chapter 2&3)
Linear algebra and probability (Deep Learning chapter 2&3)Yan Xu
 
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningYan Xu
 
Secrets behind AlphaGo
Secrets behind AlphaGoSecrets behind AlphaGo
Secrets behind AlphaGoYan Xu
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep LearningYan Xu
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkYan Xu
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network Yan Xu
 
Introduction to Neural Network
Introduction to Neural NetworkIntroduction to Neural Network
Introduction to Neural NetworkYan Xu
 

Mais de Yan Xu (20)

Kaggle winning solutions: Retail Sales Forecasting
Kaggle winning solutions: Retail Sales ForecastingKaggle winning solutions: Retail Sales Forecasting
Kaggle winning solutions: Retail Sales Forecasting
 
Basics of Dynamic programming
Basics of Dynamic programming Basics of Dynamic programming
Basics of Dynamic programming
 
Walking through Tensorflow 2.0
Walking through Tensorflow 2.0Walking through Tensorflow 2.0
Walking through Tensorflow 2.0
 
Practical contextual bandits for business
Practical contextual bandits for businessPractical contextual bandits for business
Practical contextual bandits for business
 
Introduction to Multi-armed Bandits
Introduction to Multi-armed BanditsIntroduction to Multi-armed Bandits
Introduction to Multi-armed Bandits
 
A Data-Driven Question Generation Model for Educational Content - by Jack Wang
A Data-Driven Question Generation Model for Educational Content - by Jack WangA Data-Driven Question Generation Model for Educational Content - by Jack Wang
A Data-Driven Question Generation Model for Educational Content - by Jack Wang
 
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
 
Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...
Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...
Deep Hierarchical Profiling & Pattern Discovery: Application to Whole Brain R...
 
Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...
Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...
Detecting anomalies on rotating equipment using Deep Stacked Autoencoders - b...
 
Introduction to Autoencoders
Introduction to AutoencodersIntroduction to Autoencoders
Introduction to Autoencoders
 
State of enterprise data science
State of enterprise data scienceState of enterprise data science
State of enterprise data science
 
Long Short Term Memory
Long Short Term MemoryLong Short Term Memory
Long Short Term Memory
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
Linear algebra and probability (Deep Learning chapter 2&3)
Linear algebra and probability (Deep Learning chapter 2&3)Linear algebra and probability (Deep Learning chapter 2&3)
Linear algebra and probability (Deep Learning chapter 2&3)
 
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep Learning
 
Secrets behind AlphaGo
Secrets behind AlphaGoSecrets behind AlphaGo
Secrets behind AlphaGo
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
Introduction to Neural Network
Introduction to Neural NetworkIntroduction to Neural Network
Introduction to Neural Network
 

Último

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 

Último (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 

Visualization using tSNE

  • 2. Dimension Reduction Overview Parametric (LDA) Linear Dimension reduction (PCA) Global Nonparametric (ISOMAP,MDS) Nonlinear tSNE (t-distributed Stochastic Neighbor Embedding) easier implementation MDS SNE Local+probability 2002 Local more stable and faster solution sym SNE UNI-SNE crowding problem 2007 (LLE, SNE) tSNE Barnes-Hut-SNE O(N2)->O(NlogN) 2008 2013
  • 3. MDS: Multi-Dimensional Scaling • Multi-Dimensional Scaling arranges the low-dimensional points so as to minimize the discrepancy between the pairwise distances in the original space and the pairwise distances in the low-D space. Cost (d ij i j d ij || xi x j ||2 ˆ d ij || yi y j ||2 ˆ d ij ) 2
  • 4. Sammon mapping from MDS high-D distance low-D distance || xi x j || || y i y j || Cost ij 2 || xi x j || It puts too much emphasis on getting very small distances exactly right. It’s slow to optimize and also gets stuck in different local optima each time Global to Local?
  • 5. Maps that preserve local geometry LLE (Locally Linear Embedding) The idea is to make the local configurations of points in the low-dimensional space resemble the local configurations in the high-dimensional space. Cost || xi i wij x j || 2 , j N (i ) wij 1 j N (i ) fixed weights Cost || y i i wij y j || 2 j N (i ) Find the y that minimize the cost subject to the constraint that the y have unit variance on each dimension.
  • 6. A probabilistic version of local MDS: Stochastic Neighbor Embedding (SNE) • It is more important to get local distances right than non-local ones. • Stochastic neighbor embedding has a probabilistic way of deciding if a pairwise distance is “local”. • Convert each high-dimensional similarity into the probability that one data point will pick the other data point as its neighbor. probability of p picking j given i in j|i high D || xi x j ||2 2 i2 e || xi xk ||2 2 i2 e k e q j|i || yi y j ||2 e k 2 || yi yk || probability of picking j given i in low D
  • 7. Picking the radius of the Gaussian that is used to compute the p’s • We need to use different radii in different parts of the space so that we keep the effective number of neighbors about constant. • A big radius leads to a high entropy for the distribution over neighbors of i. A small radius leads to a low entropy. • So decide what entropy you want and then find the radius that produces that entropy. • Its easier to specify perplexity: ||xi x j ||2 2 i2 e p j|i || xi xk ||2 2 i2 e k
  • 8. The cost function for a low-dimensional representation Cost KL ( Pi || Qi ) i i j p j|i log p j|i q j|i Gradient descent: C yi 2 (y j y i ) ( p j|i q j|i j Gradient update with a momentum term: Learning rate Momentum pi| j qi| j )
  • 9. Simpler version SNE: Turning conditional probabilities into pairwise probabilities pij e || xi x j ||2 2 2 e p j|i pij || xk xl ||2 2 2 2n k l pij j Cost KL( P || Q ) C yi 4 ( pij j pij log qij )( yi pi| j yj) pij qij 1 2n
  • 11. Why SNE does not have gaps between classes Crowding problem: the area accommodating moderately distant datapoints is not large enough compared with the area accommodating nearby datapoints. A uniform background model (UNI-SNE) eliminates this effect and allows gaps between classes to appear. qij can never fall below 2 n(n 1)
  • 12.
  • 13. From UNI-SNE to t-SNE High dimension: Convert distances into probabilities using a Gaussian distribution Low dimension: Convert distances into probabilities using a probability distribution that has much heavier tails than a Gaussian. Student’s t-distribution V : the number of degrees of freedom Standard Normal Dis. T-Dis. With V=1 qij (1 || yi (1 || yk k l y j ||2 ) 1 yl ||2 ) 1
  • 14. Compare tSNE with SNE and UNI-SNE 18 16 14 12 14 12 10 10 -2 -4
  • 15. Optimization method for tSNE ||xi x j ||2 2 i2 e p j|i e k || xi xk ||2 2 i2 qij (1 || yi (1 || yk k l y j ||2 ) 1 yl ||2 ) 1
  • 16. Optimization method for tSNE Tricks: 1. Keep momentum term small until the map points have become moderately well organized. 2. Use adaptive learning rate described by Jacobs (1988), which gradually increases the learning rate in directions where the gradient is stable. 3. Early compression: force map points to stay close together at the start of the optimization. 4. Early exaggeration: multiply all the pij’s by 4, in the initial stages of the optimization.
  • 18. tSNE vs Diffusion maps Diffusion distance: || xi x j ||2 (1) pij e n Diffusion maps: ( pijt ) ( pikt k 1 1) ( pkjt 1)
  • 19. Weakness 1. It’s unclear how t-SNE performs on general dimensionality reduction task; 2. The relative local nature of t-SNE makes it sensitive to the curse of the intrinsic dimensionality of the data; 3. It’s not guaranteed to converge to a global optimum of its cost function.
  • 20. References: t-SNE homepage: http://homepage.tudelft.nl/19j49/t-SNE.html Advanced Machine Learning: Lecture11: Non-linear Dimensionality Reduction http://www.cs.toronto.edu/~hinton/csc2535/lectures.html Plugin Ad: tSNE in Farsight splot = new SNEPlotWindow(this); splot->setPerplexity(perplexity); splot->setModels(table, selection)) splot->show();

Notas do Editor

  1. Perplexity, 2 to the power of the entropy of the distribution. It measures the uncertainty, in this case can be interpreted as a smooth measure of the effective number of neighbors
  2. KL divergence of Q from P is a measure of the information lost when Q is used to approximate P.In the early stage of the optimization, Gaussian noise is added to the map points after each iteration. Gradually reduce the variance of this noise performs a type of simulated annealing that helps the optimization to escape from poor local minima in the cost function. This requires sensible choices of the initial amount of Gaussian noise and the rate at which it decays. These choices interact with the amount of momentum and the step size that are employed in the gradient descent. Run optimization several times on a data set to find appropriate values for the parameters.
  3. when xi is an outlier, all pairwise would be large. pij would be very small for all j. so the location of yi has little effect on the cost function. This point is not well determined by the positions of the other map point. Points are pulled towards each other if the p’s are bigger than the q’s and repelled if the q’s are bigger than the p’s
  4. if we want to model the small distances accurately in the map, most of the points at a moderate distance will have to be placed much too far away in the 2D map.small attractive force. the very large number of such forces crushes together the datapoints in the center of the map, preventing the gapsAs a result, for datapoints far apart in the high D space, q will always be larger than p, leading to slight repulsion. optimization of UNI-SNE is tedious:Optimize the UNI-SNE cost function directly does not work because two map points that are far apart will get all there qs from the uniform background. When p is large, no
  5. This allows a moderate distance in the hD space to be faithfully modeled by a much larger distance in the map. Eliminate the attractive force.
  6. UNI-SNE: the repulsion is only strong hen the pairwise distance between the points in ld is already large.the strength of repulsion between dissimilar points is proportional to the pairwise distance in ld map. Move too far awaytSNE introduces long-range forces in lowD that can pull back together two similar points that get separated early on in the optimization
  7. Shammon mapping:Soft border between the local and global structure. tSNE determines the local neighborhood size for each datapointseperately based on the local density of the dataIsomap:Susceptibility to short circuiting (connecting the wrong point because of large k, leading to drastically different lowD visualization), modeling large geodesic distances rather than small ones.Weakness of LLE: easy to cheatThe only thing that prevents all datapoints from collapsing into a single point is a constraint on the covariance of the lowD representation. In practice, this is often satisfied by placing most of the map points near the center of the map and using a few widely scattered points to keep that variance.LLE and Isomap, the neighbor graphs, are not capable of visualizing data of two or more seperatedsubmanifolds. Lose relative similaries of the separate components.
  8. Now mostly use tSNE for visualization. It’s not readily for reducing data to d > 3 dimensions because of the heavy tails. In high dim spaces, the heavy tails comprise a relatively large portion of the probability mass. It can lead to data presentation that do not preserve local structure of the data.Perplexity to define the neighborhood. End up with different lowD layout if we haven’t estimated this variable right.It needs several optimization parameters for solution. The same choice of optimization params can be used for a variety of different vis tasks. It’s relatively stable.