SlideShare a Scribd company logo
1 of 25
Download to read offline
Low-rank matrix approximations with Python
Christian Thurau
Table of Contents
1 Intro
2 The Basics
3 Matrix approximation
4 Some methods
5 Matrix Factorization with Python
6 Example & Conclusion
2
For Starters...
Observations
• Data matrix factorization has become an important tool in
information retrieval, data mining, and pattern recognition
• Nowadays, typical data matrices are HUGE
• Examples include:
• Gene expression data and microarrays
• Digital images
• Term by document matrices
• User ratings for movies, products, ...
• Graph adjacency matrices
3
Matrix Factorization
• given a matrix
V
• determine matrices
W and H
• such that
V = WH or V ≈ WH
• characteristics such as entries, shape, rank of V , W , and H will
depend on application context
4
The Basics
matrix factorization allows for:
• solving linear equations
• transforming data
• compressing data
matrix factorization facilitates subsequent processing in:
• information retrieval
• pattern recognition
• data mining
5
Low-rank Matrix Approximations
• Aapproximate V
V ≈ WH
• where
V ∈ Rm×n
W ∈ Rm×k
H ∈ Rk×n
• and
rank(W ) ≪ rank(V )
k ≪ min(m, n)
V
=
W H
6
Matrix Approximation
• If
V = WH
• then
vi,j = wi,∗h∗,j
=
k∑
x=1
wi,x hx,j
V
=
W H
7
Matrix Approximation
• More importantly:
v∗,j = Wh∗,j
=
k∑
x=1
w∗,x hx,j
• therefore
W ↔ ”basis” matrix
H ↔ coefficient matrix
V
=
W H
= + +
8
On Matrix Factorization Methods
• matrix factorization ↔ data transformation
• matrix rank reduction ↔ data compression
• Common form: V = WH
• Broad range of methods:
• K-means clustering
• SVD/PCA
• Non-negative Matrix Factorization
• Archetypal Analysis
• Binary matrix factorization
• CUR decomposition
• ...
• Each method yields a unique view on data . . .
• . . . and is suited for different tasks
9
K-means Clustering1
• Baseline clustering method
• Constrained quadradic optimization problem:
min
W ,H
∥V − WH∥2
s.t. H = [0; 1],
∑
k
hk,i = 1
• Find W , H using expectation maximization
• Optimal k-means partitioning is np-hard
• Goal: group similar data points
• Interesting: K-means clustering is matrix factorization
1
J.B. MacQueen, Some Methods for classification and Analysis of Multivariate
Observations”. Berkeley Symposium on Mathematical Statistics and Probability. 1967
10
K-means Clustering is Matrix Factorization!







x1,1 x1,2 x1,3 . . . x1,n
x2,1 x2,2 x2,3 . . . x2,n
x3,1 x3,2 x3,3 . . . x3,n
..
.
..
.
..
.
...
..
.
xm,1 xm,2 xm,3 . . . xm,n














b1,1 b1,2 b1,3
b2,1 b2,2 b2,3
b3,1 b3,2 b2,3
..
.
..
.
..
.
bn,1 bn,2 bn,3









0 1 1 . . . 0
1 0 0 . . . 0
0 0 0 . . . 1


• i.e. for X ∈ Rm×n, and B ∈ Rn×3, and A ∈ R3×n as above, the
product
XBA = MA
realizes an assignment
xi → mj , where mj = Xbj
11
Example: K-means
≈ 0.0 + 0.0 . . . 1.0 . . . 0.0 =
• Similar images are grouped into k groups
• Approximate data by mapping each data point onto the mean of a
cluster regions
12
Python Matrix Factorization Toolbox (PyMF)2
• Started in 2010 at Fraunhofer IAIS/University of Bonn
• Vast number of different methods!
• Supports hdf5/h5py and sparse matrices
How to factorize a data matrix V :
>>>import pymf
>>>import numpy as np
>>>data = np.array([[1.0, 0.0, 2.0], [0.0, 1.0, 1.0]])
>>>mdl = pymf.kmeans.Kmeans(data, num_bases=2)
>>>mdl.factorize(niter=10) # optimize for WH
>>>V_approx = np.dot(mdl.W, mdl.H) # V = WH
2
http://github.com/cthurau/pymf
13
Python Matrix Factorization Toolbox (PyMF)2
• Restarted development a few weeks back ;)
• Looking for contributors!
How to map data onto W :
>>>import pymf
>>>import numpy as np
>>>test_data = np.array([[1.0], [0.3]])
>>>mdl_test = pymf.kmeans.Kmeans(test_data, num_bases=2)
>>>mdl_test.W = mdl.W # mdl.W -> existing basis W
>>>mdl_test.factorize(compute_w=False)
>>>test_datx_approx = np.dot(mdl.W, mdl_test.H)
2
http://github.com/cthurau/pymf
14
PCA
Principal Component Analysis (PCA)3
• SVD/PCA are baseline matrix factorization methods
• Optimize:
min
W ,H
∥V − WH∥2
s.t. W T
W = I
• Restrict W to singular vectors of V (orthogonal matrix)
• Can (usually does) violate non-negativity
• Goal: best possible matrix approximation for a given k
• Great for compression or filtering out noise!
3
K. Pearson, On Lines and Planes of Closest Fit to Systems of Points in Space,
Philosophical Magazine, 1901.
15
Example PCA
>>>from pymf.pca import PCA
>>>import numpy as np
>>>mdl = PCA(data, num_bases=2)
>>>mdl.factorize()
>>>V_approx = np.dot(mdl.W, mdl.H)
• Usage for data analysis questionable
• Basis vectors usually not interpretable
V
≈
Vapprox
W = . . .
16
Non-negative Matrix Factorization4
• For V ≥ 0 constrained quadradic optimization problem:
min
W ,H
∥V − WH∥2
s.t. W ≥ 0
H ≥ 0
• a globally optimal solution provably exists; algorithms guaranteed to
find it remain elusive; exact NMF is NP hard
• Often W converges to partial representations
• Active area of research
• Goal: reconstruct data by independent parts
4
D.D. Lee and H.S. Seung, Learning the Parts of Objects by Non-Negative Matrix
Factorization, Nature, 401(6755), 1999
17
Example NMF
>>>from pymf.nmf import NMF
>>>import numpy as np
>>>mdl = NMF(data, num_bases=2, iter=50)
>>>mdl.factorize()
>>>V_approx = np.dot(mdl.W, mdl.H)
• Additive combination of parts
• Interesting options for data analysis
V
≈
Vapprox
W = . . .
18
Archetypal Analysis5
• Convexity constrained quadratic optmization problem:
min
W ,H
∥V − VWH∥2
s.t. wl,i ≥ 0,
∑
l
wl,i = 1
hk,i ≥ 0,
∑
k
hk,i = 1
• Reconstruct data by its archetypes, i.e. convex combinations of polar
opposites
• Yields novel and intuitive insights into data
• Great for interpretable data representations!
• O(n2), but: efficient approximations for large data exist
5
A. Cutler and L. Breiman, Archetypal Analysis, in Technometrics 36(4), 1994
19
Example Archetypal Analysis
>>>from pymf.aa import AA
>>>import numpy as np
>>>mdl = AA(data, num_bases=2, iter=50)
>>>mdl.factorize()
>>>V_approx = np.dot(mdl.W, mdl.H)
• Existent data points as basis vectors
• Convex combination allows a
probablilist interpretation
V
≈
Vapprox
W = . . .
20
Method Summary
• Common form: V = WH (or V = VWH)
W constraint H constraint Outcome
PCA - - compressed V
K-means - H = [0; 1],
∑
k hk,i = 1 groups
NMF W ≥ 0 H ≥ 0 parts
AA W ≥ 0,
∑
l wl,i = 1 H ≥ 0,
∑
k hk,i = 1 opposites
• Doesn’t only work for images ;)
• More complex constraints usually result in more complex solvers
• Active area of research deals with approximations for large data
21
Large matrices: PyMF and h5py
>>> import h5py
>>> import numpy as np
>>> from pymf.sivm import SIVM # uses [6]
>>> file = h5py.File(’myfile.hdf5’, ’w’)
>>> file[’dataset’] = np.random.random((100,1000))
>>> file[’W’] = np.random.random((100,10))
>>> file[’H’] = np.random.random((10,1000))
>>> sivm_mdl = SIVM(file[’dataset’], num_bases=10)
>>> sivm_mdl.W = file[’W’]
>>> sivm_mdl.H = file[’H’]
>>> sivm_mdl.factorize()
6
Thurau, Kersting, and Bauckhage, ”Simplex volume maximization for descriptive
web scale matrix factorization”, CIKM’2010
22
7
Science, 2010: Vol. 330
Take Home Message
• Most clustering, and data analysis methods are matrix
approximations
• Imposed constraints shape the factorization
• Imposed constraints yield different views on data
• One of the most effective and versatile tools for data exploration!
• Python implementation → http://github.com/cthurau/pymf
24
Thank you for your attention!
christian.thurau@unbelievable-machine.com

More Related Content

What's hot

Greedy algorithms
Greedy algorithmsGreedy algorithms
Greedy algorithmsRajendran
 
Théorie de graphe
Théorie de grapheThéorie de graphe
Théorie de grapheTECOS
 
k medoid clustering.pptx
k medoid clustering.pptxk medoid clustering.pptx
k medoid clustering.pptxRoshan86572
 
Computer Vision - Image Filters
Computer Vision - Image FiltersComputer Vision - Image Filters
Computer Vision - Image FiltersYoss Cohen
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machinesUjjawal
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reductionmrizwan969
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeGilles Louppe
 
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)Parinda Rajapaksha
 
Edge detection of video using matlab code
Edge detection of video using matlab codeEdge detection of video using matlab code
Edge detection of video using matlab codeBhushan Deore
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
07 Analysis of Algorithms: Order Statistics
07 Analysis of Algorithms: Order Statistics07 Analysis of Algorithms: Order Statistics
07 Analysis of Algorithms: Order StatisticsAndres Mendez-Vazquez
 
Problème De Sac à Dos
Problème De Sac à Dos Problème De Sac à Dos
Problème De Sac à Dos chagra bassem
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep LearningYan Xu
 
Ford Fulkerson Algorithm
Ford Fulkerson AlgorithmFord Fulkerson Algorithm
Ford Fulkerson AlgorithmAdarsh Rotte
 

What's hot (20)

Minimum spanning tree
Minimum spanning treeMinimum spanning tree
Minimum spanning tree
 
Image Segmentation
 Image Segmentation Image Segmentation
Image Segmentation
 
Greedy algorithms
Greedy algorithmsGreedy algorithms
Greedy algorithms
 
Théorie de graphe
Théorie de grapheThéorie de graphe
Théorie de graphe
 
Kruskal's algorithm
Kruskal's algorithmKruskal's algorithm
Kruskal's algorithm
 
k medoid clustering.pptx
k medoid clustering.pptxk medoid clustering.pptx
k medoid clustering.pptx
 
Computer Vision - Image Filters
Computer Vision - Image FiltersComputer Vision - Image Filters
Computer Vision - Image Filters
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
Analysis of Feature Selection Algorithms (Branch & Bound and Beam search)
 
Edge detection of video using matlab code
Edge detection of video using matlab codeEdge detection of video using matlab code
Edge detection of video using matlab code
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
les graphes v2.pptx
les graphes v2.pptxles graphes v2.pptx
les graphes v2.pptx
 
K-Means Algorithm
K-Means AlgorithmK-Means Algorithm
K-Means Algorithm
 
Mathématiques et Python
Mathématiques et PythonMathématiques et Python
Mathématiques et Python
 
07 Analysis of Algorithms: Order Statistics
07 Analysis of Algorithms: Order Statistics07 Analysis of Algorithms: Order Statistics
07 Analysis of Algorithms: Order Statistics
 
Problème De Sac à Dos
Problème De Sac à Dos Problème De Sac à Dos
Problème De Sac à Dos
 
Optimization in Deep Learning
Optimization in Deep LearningOptimization in Deep Learning
Optimization in Deep Learning
 
Ford Fulkerson Algorithm
Ford Fulkerson AlgorithmFord Fulkerson Algorithm
Ford Fulkerson Algorithm
 

Viewers also liked

Hierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimationHierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimationAlexander Litvinenko
 
Text mining, By Hadi Mohammadzadeh
Text mining, By Hadi MohammadzadehText mining, By Hadi Mohammadzadeh
Text mining, By Hadi MohammadzadehHadi Mohammadzadeh
 
Zavala lilia tecnologia
Zavala lilia tecnologiaZavala lilia tecnologia
Zavala lilia tecnologiaAngela Zavala
 
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...PyData
 
Robert Meyer- pypet
Robert Meyer- pypetRobert Meyer- pypet
Robert Meyer- pypetPyData
 
Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"PyData
 
Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014PyData
 
Nipype
NipypeNipype
NipypePyData
 
Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014PyData
 
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...PyData
 
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...PyData
 
Interactive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves HilpischInteractive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves HilpischPyData
 
How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...PyData
 
Doing frequentist statistics with scipy
Doing frequentist statistics with scipyDoing frequentist statistics with scipy
Doing frequentist statistics with scipyPyData
 
Python resampling
Python resamplingPython resampling
Python resamplingPyData
 
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerFaster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerPyData
 
Fang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and WikidataFang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and WikidataPyData
 
Promoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices EnvironmentPromoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices EnvironmentPyData
 
Making your code faster cython and parallel processing in the jupyter notebook
Making your code faster   cython and parallel processing in the jupyter notebookMaking your code faster   cython and parallel processing in the jupyter notebook
Making your code faster cython and parallel processing in the jupyter notebookPyData
 

Viewers also liked (20)

Hierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimationHierarchical matrix techniques for maximum likelihood covariance estimation
Hierarchical matrix techniques for maximum likelihood covariance estimation
 
Text mining, By Hadi Mohammadzadeh
Text mining, By Hadi MohammadzadehText mining, By Hadi Mohammadzadeh
Text mining, By Hadi Mohammadzadeh
 
Zavala lilia tecnologia
Zavala lilia tecnologiaZavala lilia tecnologia
Zavala lilia tecnologia
 
Query Based Summarization
Query Based SummarizationQuery Based Summarization
Query Based Summarization
 
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
Brains & Brawn: the Logic and Implementation of a Redesigned Advertising Mark...
 
Robert Meyer- pypet
Robert Meyer- pypetRobert Meyer- pypet
Robert Meyer- pypet
 
Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"Evolutionary Algorithms: Perfecting the Art of "Good Enough"
Evolutionary Algorithms: Perfecting the Art of "Good Enough"
 
Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014Speed Without Drag by Saul Diez-Guerra PyData SV 2014
Speed Without Drag by Saul Diez-Guerra PyData SV 2014
 
Nipype
NipypeNipype
Nipype
 
Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014Crushing the Head of the Snake by Robert Brewer PyData SV 2014
Crushing the Head of the Snake by Robert Brewer PyData SV 2014
 
Numba: Flexible analytics written in Python with machine-code speeds and avo...
Numba:  Flexible analytics written in Python with machine-code speeds and avo...Numba:  Flexible analytics written in Python with machine-code speeds and avo...
Numba: Flexible analytics written in Python with machine-code speeds and avo...
 
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
Dark Data: A Data Scientists Exploration of the Unknown by Rob Witoff PyData ...
 
Interactive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves HilpischInteractive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
Interactive Financial Analytics with Python & Ipython by Dr Yves Hilpisch
 
How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...How Soon is Now: automatically extracting publication dates of news articles ...
How Soon is Now: automatically extracting publication dates of news articles ...
 
Doing frequentist statistics with scipy
Doing frequentist statistics with scipyDoing frequentist statistics with scipy
Doing frequentist statistics with scipy
 
Python resampling
Python resamplingPython resampling
Python resampling
 
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike MullerFaster Python Programs Through Optimization by Dr.-Ing Mike Muller
Faster Python Programs Through Optimization by Dr.-Ing Mike Muller
 
Fang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and WikidataFang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
Fang Xu- Enriching content with Knowledge Base by Search Keywords and Wikidata
 
Promoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices EnvironmentPromoting a Data Driven Culture in a Microservices Environment
Promoting a Data Driven Culture in a Microservices Environment
 
Making your code faster cython and parallel processing in the jupyter notebook
Making your code faster   cython and parallel processing in the jupyter notebookMaking your code faster   cython and parallel processing in the jupyter notebook
Making your code faster cython and parallel processing in the jupyter notebook
 

Similar to Low-rank matrix approximations in Python by Christian Thurau PyData 2014

Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Mostafa G. M. Mostafa
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdfRahul926331
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx36rajneekant
 
machine learning.pptx
machine learning.pptxmachine learning.pptx
machine learning.pptxAbdusSadik
 
Dimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxDimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxRohanBorgalli
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...butest
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the WeightsMark Chang
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the WeightsMark Chang
 
Expectation propagation
Expectation propagationExpectation propagation
Expectation propagationDong Guo
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleHakka Labs
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modeljins0618
 
Advanced matlab codigos matematicos
Advanced matlab codigos matematicosAdvanced matlab codigos matematicos
Advanced matlab codigos matematicosKmilo Bolaños
 
2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approachnozomuhamada
 
Training machine learning k means 2017
Training machine learning k means 2017Training machine learning k means 2017
Training machine learning k means 2017Iwan Sofana
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive ModellingRajiv Advani
 

Similar to Low-rank matrix approximations in Python by Christian Thurau PyData 2014 (20)

Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 
5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf5 DimensionalityReduction.pdf
5 DimensionalityReduction.pdf
 
DimensionalityReduction.pptx
DimensionalityReduction.pptxDimensionalityReduction.pptx
DimensionalityReduction.pptx
 
machine learning.pptx
machine learning.pptxmachine learning.pptx
machine learning.pptx
 
Dimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptxDimension Reduction Introduction & PCA.pptx
Dimension Reduction Introduction & PCA.pptx
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
Information in the Weights
Information in the WeightsInformation in the Weights
Information in the Weights
 
09Evaluation_Clustering.pdf
09Evaluation_Clustering.pdf09Evaluation_Clustering.pdf
09Evaluation_Clustering.pdf
 
Expectation propagation
Expectation propagationExpectation propagation
Expectation propagation
 
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at GoogleDataEngConf: Feature Extraction: Modern Questions and Challenges at Google
DataEngConf: Feature Extraction: Modern Questions and Challenges at Google
 
Clustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture modelClustering:k-means, expect-maximization and gaussian mixture model
Clustering:k-means, expect-maximization and gaussian mixture model
 
Advanced matlab codigos matematicos
Advanced matlab codigos matematicosAdvanced matlab codigos matematicos
Advanced matlab codigos matematicos
 
MLE.pdf
MLE.pdfMLE.pdf
MLE.pdf
 
2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach2012 mdsp pr08 nonparametric approach
2012 mdsp pr08 nonparametric approach
 
Training machine learning k means 2017
Training machine learning k means 2017Training machine learning k means 2017
Training machine learning k means 2017
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
 
Dynamic programming
Dynamic programmingDynamic programming
Dynamic programming
 

More from PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydPyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverPyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 

More from PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Recently uploaded

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制vexqp
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.pptibrahimabdi22
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangeThinkInnovation
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxchadhar227
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubaikojalkojal131
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATIONLakpaYanziSherpa
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...Bertram Ludäscher
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...nirzagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabiaahmedjiabur940
 

Recently uploaded (20)

Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 

Low-rank matrix approximations in Python by Christian Thurau PyData 2014

  • 1. Low-rank matrix approximations with Python Christian Thurau
  • 2. Table of Contents 1 Intro 2 The Basics 3 Matrix approximation 4 Some methods 5 Matrix Factorization with Python 6 Example & Conclusion 2
  • 3. For Starters... Observations • Data matrix factorization has become an important tool in information retrieval, data mining, and pattern recognition • Nowadays, typical data matrices are HUGE • Examples include: • Gene expression data and microarrays • Digital images • Term by document matrices • User ratings for movies, products, ... • Graph adjacency matrices 3
  • 4. Matrix Factorization • given a matrix V • determine matrices W and H • such that V = WH or V ≈ WH • characteristics such as entries, shape, rank of V , W , and H will depend on application context 4
  • 5. The Basics matrix factorization allows for: • solving linear equations • transforming data • compressing data matrix factorization facilitates subsequent processing in: • information retrieval • pattern recognition • data mining 5
  • 6. Low-rank Matrix Approximations • Aapproximate V V ≈ WH • where V ∈ Rm×n W ∈ Rm×k H ∈ Rk×n • and rank(W ) ≪ rank(V ) k ≪ min(m, n) V = W H 6
  • 7. Matrix Approximation • If V = WH • then vi,j = wi,∗h∗,j = k∑ x=1 wi,x hx,j V = W H 7
  • 8. Matrix Approximation • More importantly: v∗,j = Wh∗,j = k∑ x=1 w∗,x hx,j • therefore W ↔ ”basis” matrix H ↔ coefficient matrix V = W H = + + 8
  • 9. On Matrix Factorization Methods • matrix factorization ↔ data transformation • matrix rank reduction ↔ data compression • Common form: V = WH • Broad range of methods: • K-means clustering • SVD/PCA • Non-negative Matrix Factorization • Archetypal Analysis • Binary matrix factorization • CUR decomposition • ... • Each method yields a unique view on data . . . • . . . and is suited for different tasks 9
  • 10. K-means Clustering1 • Baseline clustering method • Constrained quadradic optimization problem: min W ,H ∥V − WH∥2 s.t. H = [0; 1], ∑ k hk,i = 1 • Find W , H using expectation maximization • Optimal k-means partitioning is np-hard • Goal: group similar data points • Interesting: K-means clustering is matrix factorization 1 J.B. MacQueen, Some Methods for classification and Analysis of Multivariate Observations”. Berkeley Symposium on Mathematical Statistics and Probability. 1967 10
  • 11. K-means Clustering is Matrix Factorization!        x1,1 x1,2 x1,3 . . . x1,n x2,1 x2,2 x2,3 . . . x2,n x3,1 x3,2 x3,3 . . . x3,n .. . .. . .. . ... .. . xm,1 xm,2 xm,3 . . . xm,n               b1,1 b1,2 b1,3 b2,1 b2,2 b2,3 b3,1 b3,2 b2,3 .. . .. . .. . bn,1 bn,2 bn,3          0 1 1 . . . 0 1 0 0 . . . 0 0 0 0 . . . 1   • i.e. for X ∈ Rm×n, and B ∈ Rn×3, and A ∈ R3×n as above, the product XBA = MA realizes an assignment xi → mj , where mj = Xbj 11
  • 12. Example: K-means ≈ 0.0 + 0.0 . . . 1.0 . . . 0.0 = • Similar images are grouped into k groups • Approximate data by mapping each data point onto the mean of a cluster regions 12
  • 13. Python Matrix Factorization Toolbox (PyMF)2 • Started in 2010 at Fraunhofer IAIS/University of Bonn • Vast number of different methods! • Supports hdf5/h5py and sparse matrices How to factorize a data matrix V : >>>import pymf >>>import numpy as np >>>data = np.array([[1.0, 0.0, 2.0], [0.0, 1.0, 1.0]]) >>>mdl = pymf.kmeans.Kmeans(data, num_bases=2) >>>mdl.factorize(niter=10) # optimize for WH >>>V_approx = np.dot(mdl.W, mdl.H) # V = WH 2 http://github.com/cthurau/pymf 13
  • 14. Python Matrix Factorization Toolbox (PyMF)2 • Restarted development a few weeks back ;) • Looking for contributors! How to map data onto W : >>>import pymf >>>import numpy as np >>>test_data = np.array([[1.0], [0.3]]) >>>mdl_test = pymf.kmeans.Kmeans(test_data, num_bases=2) >>>mdl_test.W = mdl.W # mdl.W -> existing basis W >>>mdl_test.factorize(compute_w=False) >>>test_datx_approx = np.dot(mdl.W, mdl_test.H) 2 http://github.com/cthurau/pymf 14
  • 15. PCA Principal Component Analysis (PCA)3 • SVD/PCA are baseline matrix factorization methods • Optimize: min W ,H ∥V − WH∥2 s.t. W T W = I • Restrict W to singular vectors of V (orthogonal matrix) • Can (usually does) violate non-negativity • Goal: best possible matrix approximation for a given k • Great for compression or filtering out noise! 3 K. Pearson, On Lines and Planes of Closest Fit to Systems of Points in Space, Philosophical Magazine, 1901. 15
  • 16. Example PCA >>>from pymf.pca import PCA >>>import numpy as np >>>mdl = PCA(data, num_bases=2) >>>mdl.factorize() >>>V_approx = np.dot(mdl.W, mdl.H) • Usage for data analysis questionable • Basis vectors usually not interpretable V ≈ Vapprox W = . . . 16
  • 17. Non-negative Matrix Factorization4 • For V ≥ 0 constrained quadradic optimization problem: min W ,H ∥V − WH∥2 s.t. W ≥ 0 H ≥ 0 • a globally optimal solution provably exists; algorithms guaranteed to find it remain elusive; exact NMF is NP hard • Often W converges to partial representations • Active area of research • Goal: reconstruct data by independent parts 4 D.D. Lee and H.S. Seung, Learning the Parts of Objects by Non-Negative Matrix Factorization, Nature, 401(6755), 1999 17
  • 18. Example NMF >>>from pymf.nmf import NMF >>>import numpy as np >>>mdl = NMF(data, num_bases=2, iter=50) >>>mdl.factorize() >>>V_approx = np.dot(mdl.W, mdl.H) • Additive combination of parts • Interesting options for data analysis V ≈ Vapprox W = . . . 18
  • 19. Archetypal Analysis5 • Convexity constrained quadratic optmization problem: min W ,H ∥V − VWH∥2 s.t. wl,i ≥ 0, ∑ l wl,i = 1 hk,i ≥ 0, ∑ k hk,i = 1 • Reconstruct data by its archetypes, i.e. convex combinations of polar opposites • Yields novel and intuitive insights into data • Great for interpretable data representations! • O(n2), but: efficient approximations for large data exist 5 A. Cutler and L. Breiman, Archetypal Analysis, in Technometrics 36(4), 1994 19
  • 20. Example Archetypal Analysis >>>from pymf.aa import AA >>>import numpy as np >>>mdl = AA(data, num_bases=2, iter=50) >>>mdl.factorize() >>>V_approx = np.dot(mdl.W, mdl.H) • Existent data points as basis vectors • Convex combination allows a probablilist interpretation V ≈ Vapprox W = . . . 20
  • 21. Method Summary • Common form: V = WH (or V = VWH) W constraint H constraint Outcome PCA - - compressed V K-means - H = [0; 1], ∑ k hk,i = 1 groups NMF W ≥ 0 H ≥ 0 parts AA W ≥ 0, ∑ l wl,i = 1 H ≥ 0, ∑ k hk,i = 1 opposites • Doesn’t only work for images ;) • More complex constraints usually result in more complex solvers • Active area of research deals with approximations for large data 21
  • 22. Large matrices: PyMF and h5py >>> import h5py >>> import numpy as np >>> from pymf.sivm import SIVM # uses [6] >>> file = h5py.File(’myfile.hdf5’, ’w’) >>> file[’dataset’] = np.random.random((100,1000)) >>> file[’W’] = np.random.random((100,10)) >>> file[’H’] = np.random.random((10,1000)) >>> sivm_mdl = SIVM(file[’dataset’], num_bases=10) >>> sivm_mdl.W = file[’W’] >>> sivm_mdl.H = file[’H’] >>> sivm_mdl.factorize() 6 Thurau, Kersting, and Bauckhage, ”Simplex volume maximization for descriptive web scale matrix factorization”, CIKM’2010 22
  • 24. Take Home Message • Most clustering, and data analysis methods are matrix approximations • Imposed constraints shape the factorization • Imposed constraints yield different views on data • One of the most effective and versatile tools for data exploration! • Python implementation → http://github.com/cthurau/pymf 24
  • 25. Thank you for your attention! christian.thurau@unbelievable-machine.com