Denunciar

Chiraz NafoukiSeguir

8 de Aug de 2016•0 gostou•95 visualizações

8 de Aug de 2016•0 gostou•95 visualizações

Denunciar

Understand Manifolds using MATLAB Pranav Challa

Ijcnc050213IJCNCJournal

Exact network reconstruction from consensus signals and one eigen valueIJCNCJournal

Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...wl820609

Lecture 07 leonidas guibas - networks of shapes and imagesmustafa sarac

Paper id 24201464IJRAT

- 1. 1 Graph Signal Processing: Handwritten Digits Recognition Via Community Detection Abstract—Graph signal processing is an emerging ﬁeld of research. When the structure of signals can be represented as a graph, it allows to fully exploit their inherent structure. It has been shown that the normalized graph Laplacian matrix plays a major role in the characterization of a signal on a graph. In this paper we are interested in using the spectrum of this matrix to solve classical problems. More precisely, we aim to detect communities in order to recognize image digits. Indeed, we use the spectrum of the normalized graph Laplacian as a suitable method to detect two communities in a graph. We show that this method has better results than many algorithms of the state of art. Then, we use the same spectrum to recognize handwritten digit images. We compare the spectral clustering method with some other classical algorithms, emphasizing the advantages of spectral clustering in community detection and semi-supervised classiﬁcation applications. Index Terms—Graph signal processing, Community detection, Digit recognition, Normalized Graph Laplacian. I. INTRODUCTION During the recent years, the analysis and processing of large-scale datasets using graphs has become very useful [8]. In fact, many kinds of data domains such as social and economic networks, electric grids, neuronal networks and images databases require a graph representation of their structure. Each of these structures usually carries out infor- mation that ﬂow between different elements of the network. For example, in a neural network, a neuron is activated after receiving an electric excitation, and the activation of a neuron usually inﬂuences the nearby neurons. In the case of economic networks, we can consider the economic crisis as a ﬂow that spreads from one bank to another. This need to represent these phenomena has lead to the development of a new ﬁeld: the graph signal processing. Indeed, a continuous signal can be sampled according to a speciﬁc frequency and the sampled discrete signal that is obtained is usually carried out on a graph [10]. By this way, we obtain at the same time a representation of the structure of the network as well as of the information ﬂowing through it. For instance, a sound signal can be represented on a linear or a ring graph. However, a picture is usually represented on a grid graph where each pixel is linked to its four or eigth nearest neighbors [8]. Weighted graphs are particularly used to represent the links and similarities between the different elements of a network. The advantage about signals on graphs is the fact that they can be processed in a way analogous to the classical processing [1]. One of the main applications of graph signal processing today appear in the ﬁeld of artiﬁcial intelligence and especially in machine learning. Community detection and digit recogni- tion are among the most known applications in this domain [4]. For community detection many methods have been used but the graph signal processing using the spectral clustering method seems to be more efﬁcient. Further more, for digit recognition, there are many algorithms that are used nowadays to classify handwritten digits such as the k-means algorithm [5] but the graph signal processing can also be used for the same purpose. In this paper, we present a method based on the graph signal processing and known as spectral clustering to resolve the problem of community detection and provide a mathematical proof of this method. The same method is then applied to recognize handwritten digits from the MNIST data base. This method takes a variant based on the smoothness properties of signals deﬁned on graphs . The remainder of the paper is as follow. In the next section, we provide some background from the graph signal processing domain. In section III, we present the method of spectral clustering applied to both community detection and handwritten digits recognition problems. Then, we discuss our results compared to the state of art in order to identify both the advantages and the drawbacks of the spectral clustering method. Section V concludes the paper. II. GRAPH SIGNAL PROCESSING Let us introduce notations ﬁrst. We consider a weighted, simple, undirected graph G = (E, V ) where E represents the set of edges and V the set of vertices. Without a loss of generality, we consider V to be the set of integers between 1 and N = |E|. We equip G with a N × N adjacency matrix W deﬁned as follows [9]: Wi,j The weight of the edge connecting i and j 0 if no such edge exists (1) When the edge weights are not naturally deﬁned by an application, one common way to deﬁne the weight of an edge connecting vertices i and j is via a similarity function like a distance : Wi,j = dist(i, j) (2) Where dist(i, j) may represent a physical distance between two feature vectors describing the nodes i and j. We also deﬁne the N × N diagonal degree matrix D as: Di,i = di = N k=0 Wi,k (3) For instance, a social network can be represented by a weighted, simple, undirected graph, where the vertices are the individuals and the edges represent the friendship bond between two individuals. In this case, the degree matrix gives
- 2. 2 us an idea about how important are the friendship links of each individual. We then introduce the non-normalized graph Laplacian L D−W [9]. This matrix turns to have a major importance as it stands for a differentiation operator for a signal over a graph. We remind that a signal over a graph G is a vector x ∈ RN where the ith component of the vector x represents the function value at the ith vertex of V . The Laplacian’s ith component of such a signal is the vector: (Lx)(i) = N j=1 Wi,j[x(i) − x(j)] (4) For example in the case of the social network, a signal can represent a rumor: the individuals who received the rumor are given the value 1 and those who did not are given the value 0. We obtain therefore a binary signal on graph. When working with L2-norm, it makes sense to use instead the normalized graph Laplacian, deﬁned as [9]: L = D− 1 2 · L · D− 1 2 (5) Since the normalized (or standard) graph Laplacian is a real valued symmetric matrix, it can be diagonalized using an orthonormal basis. We denote a corresponding set of orthonormal eigenvectors by {µl}l=1,2,...,N and the set of associated real, non-negative eigenvalues by {λl}l=1,2,...,N when those are ordered from the lowest eigenvalue to the largest one. In particular, we have: Lµl = λlµl (6) It is well-known that [6].: 0 = λ1 ≤ λ2 ≤ ... ≤ λN λmax ≤ 2 (7) The literature gives many results binding eigenvalues with properties of the graph. As an example, the number of con- nected components of the graph is given by the multiplicity of the eigenvalue zero. For instance, if the graph is connected, the multiplicity of the eingenvalue zero is one. Also the highest eigenvalue is equal to 2 if and only if the graph is bipartite. The ﬁrst eigenvector µ1 has a closed-form given by the following formula [6]: µ1(i) = d(i) u∈V d(u) (8) In the case of regular graphs, all the vertices have the same degree so µ1 is a constant vector. Eigenvectors of the graph normalized Laplacian extend the principles of the Fourier transform for classical signal processing. To understand this bindings, let us recall that the classical Fourier transform of a signal f is given by: ˜f(ω) = ˆ R f(x).e−iωx dx (9) Having: d2 dx2 e−iωx = −ω2 e−iωx (10) Figure 1. A positive graph signal deﬁned on the Peterson graph. The height of each blue bar represents the signal value at the vertex where the bar originates [8]. Figure 2. Representation of the 16 cycle graph Laplacian eigenvectors. The eigenvectors exhibit the sinusoidal characteristics of the Fourier Transform basis. Signals deﬁned on this graph are equivalent to classical descrete, periodic signals. We can notice that e−iωx is the eigenvector of the Laplace operator d2 dx2 associated with the eigenvalue −ω2 . On the other hand, we have: Lµl = λlµl (11) so the frequencies in classical signal processing are analo- gous to the eigenvalues of the normalized Laplacian in graph signal processing. Consequently the Fourier transform ˆx of a signal x on graph G is deﬁned as [6]: ˜x(λl) = N i=1 x(i)µ∗ l (i) (12) Where u∗ l represents the complexe conjugate of the eigen- vector ul. And the inverse graph Fourier transform is deﬁned as [8]: x(i) = N l=1 ˜x(λl)µl(i) (13) Finally, to characterize the smoothness of a signal on graph G, one can use the Dirichlet form [6]: S(x) = xτ Lx x 2 = 1 x 2 N l=1 λl(< x, µl >)2 (14) The smaller is S(x), the smoother is the signal x.
- 3. 3 III. SPECTRAL CLUSTERING A. Community Detection Detecting communities is a problem with many variants in mathematics and computer science [4]. Over the years, several methods have been developed for the data partitioning . One of these methods is known as “spectral clustering” which uses some properties of the normalized graph Laplacian. In this section, we will be interested in detecting communities using only the normalized graph Laplacian spectrum’s properties. Considering a population of individuals that can be par- tionned into two communities according to their properties, our aim is to detect these two communities. To achieve this, we consider a random graph characterized by two probabilities p ∈ [0 1] and q ∈ [0 1], where p is the probability to have a link between two individuals belonging to the same community and q is the probability to have a connection between two individuals belonging to different communities. The vertices of this graph are the individuals, the edges are the connections between them and the weigth of the edges are p if it is a inter-community link or q if it concerns a intra-community link. We suppose that p ≥ q. If we consider the case where (p, q) = (1, 0), the error rate of detecting the two communities should tend to zero. The other limit case is when p = q where the error rate should tend to 0.5. If we consider G to be the graph representing the popula- tion, and L its normalized graph Laplacian, the sign of L’s second eigenvector components allows us to partition the set of vertices into two different communities. Indeed, the L’s second eigenvector components with the same sign belong to the same community. We provide a proof of this principle: The graph of the population is represented by a statistic adjacency matrix. Instead of using a random graph with two different probabilities p and q, we can consider a complete simple graph where the weight of the edges linking two items belonging to the same community is equal to p and the weight of the edges linking two items belonging to different communities is equal to q. This matrix is the statistical mean of the adjacency matrices representing the Erdos Renyi random graphs generated respectively with the probabilities p and q [2]. The adjacency matrix A corresponding to this situation is given in block form as: A = p · (JN∗ − IN ) q · JN∗ q · JN∗ p · (JN∗ − IN∗ ) (15) where we denote by IN∗ the identity matrix of size N∗ , JN∗ the N∗ × N∗ matrix containing one in each component and N∗ the cardinality of each community. Then, the ﬁrst N∗ vertices belong to a speciﬁc community and the others belong to a different one. The graph considered is regular, thus the degree matrix is a scalar one given by: D = ((N∗ − 1) · p + N∗ q) · IN∗ (16) The graph Laplacian denoted by L is then given by: L = N∗ (p + q) · IN∗ − p · JN∗ q · JN∗ q · JN∗ N∗ · (p + q) · IN∗ − p · JN∗ (17) Since we are interested in ﬁnding the second eigenvector of L, and given that both L and L have the same eigenspaces, we only need to have the second eigenvector of L. To achieve this, we have to compute the characteristic polynomial of L denoted by χL in order to calculate the eigen- values and then their corresponding eigenspaces (in particular, the eigenspace associated to the second eigenvalue). We have: χL(x) = det(L − x.IN ) = M1 M2 M2 M1 (18) where M1 = N∗ (p + q − x).IN∗ − pJN∗ and M2 = qJN∗ Since JN∗ and IN∗ commute, M1 and M2 commute, and consequently we have: χL(x) = det(M2 1 − M2 2 ) = det(M1 − M2) · det(M1 + M2) (19) We can easily verify that for p = q we have: χL(x) = (p2 −q2 )N∗ ·χJN∗ ·( N∗ (p + q − x) p − q )·χJN∗ ·( N∗ (p + q − x) p + q ) (20) And knowing that: χJN∗ (x) ∝ xN∗ −1 · (x − N∗ ) (21) we can prove that: λ2 = 2 · N∗ · q (22) . Expressing explicitly ker(L − λ2.IN∗ ), we ﬁnd that the second eigenvector of L takes the form: µ2 = [x1, x1, ..., x1 N∗ , x2, x2, ..., x2] N∗ (23) The graph is regular so µ1 is a constant vector and thanks to the orthogonality of the eigenvector basis {µl}l=1,2,...,N we have: < µ1, µ2 >= 0 (24) where we denote by < •, • > the classical dot product in an euclidean space. we conclude that: x1 + x2 = 0 (25) and then: x1 = −x2. (26) Thereby, the components of the second eigenvector having the same sign represent vertices belonging to the same com- munity.
- 4. 4 Then, we compare the performance of this method with two other algorithms from the state of art: “Reichardt” [7] and “LFK” algorithms [3]. We recall that “Reichardt” method for community detection uses a greedy optimization of a modularity function Q and the aim is to compare the original network to a randomized one. The weight of the links in the randomized network depends on the probability of linking two nodes belonging to the original network as follow [7]: ai,j = wi,j − pi,j bi,j = γRpi,j (27) where wi,j represents the weight of the edge linking the nodes i and j in the original network, pi,j represents the probability of linking two nodes in the original network, γR is a parameter for optimization and ﬁnally ai,j and bi,j are the weights of the links in the randomized network. The second method for detecting communities is the “LFK” method which consists in detecting a community for each node of a network. To achieve this, we consider that a community is a subgraph which maximizes the ﬁtness function of its nodes. The ﬁtness function FSG of a subgraph is deﬁned by [3]: FSG = KSG in (KSG in + KSG out)α (28) where SG is the community, Kin is twice the number of the internal links in the community SG, Kout is the number of the edges connecting SG with the rest of the graph and α is a positive real-valued parameter, controlling the size of the community. We compare the performances of these two algorithms with the spectral clustering method in ﬁgure 3. For that, we consider random graphs with a ﬁxed probability of linking two nodes in the same community and we vary the probability of linking two nodes belonging to different communities. The size of these graphs is equal to 200 and the graphs are divided into two equal communities. B. Handwritten Digit Recognition Hanwritten digit recognition is another application of graph signal processing and one of the most classic examples in the classiﬁcation literature. It is an important task in semi- supervised machine learning that allows us to identify a digit based on a training database of digits [11]. In our case, we used the MNIST database of handwritten digits which has a training set of 60, 000 examples, and a test set of 10, 000 examples. All images in the MNIST database are of size 28 × 28. The task is to classify an unlabeled digit into one of a ﬁxed number of digit classes. Our ﬁrst recognition test is aimed at identifying a digit i amongst two possibilities of digits i1 and i2 with the conditions : i, i1, i2 ∈ [0..9] i1 = i2 i ∈ {i1, i2} (29) 0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.2 0.4 0.6 0.8 q ErrorRate 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 0.2 0.4 0.6 0.8 q ErrorRate Reichard LFK Spectral Clustering Figure 3. Computing the performance of three community detection algo- rithms using ErdosRenyi random graphs of size 200 with a ﬁxed probability of linking two individuals belonging to the same community (ﬁrst case p=0.3, second case p=0.9) and a variable probability q of linking two individuals belonging to the different communities. We notice that the spectral clustering is more accurate to detect the two communities with a lower error rate than the two other methods. The tendency towards the maximum value of error rate is lower in the case of the spectral clustering than in the case of Reichadt and LFK methods. −0.11 −0.105 −0.1 −0.095 −0.09 −0.085 −0.2 −0.1 0 0.1 0.2 First eigenVector SecondeigenVector −0.12 −0.115 −0.11 −0.105 −0.1 −0.095 −0.09 −0.085 −0.4 −0.2 0 0.2 0.4 First eigenVector SecondeigenVector first community second community Figure 4. Representation of the individuals belonging to two communities in the basis (ﬁrst eigenvector, second eigenvector)[10]. In the ﬁrst case the prob- ability of linking two individuals belonging to the same community is p=0.9 and the probability of linking two individuals from different communities is q = 0.1. In the second case: we consider p=q=0.4. We denote N1 the number of i1 images and N2 the number of i2 images; both taken from the MNIST training database. The graph is therefore composed of N1 + N2 + 1 vertices corresponding to the training examples and to the digit i. There are several possibilities to construct the graph of which one option is to make a complete undirected graph. To represent the link between two digits (2 vertices in the graph), we need a metric. In our tests we used the standard euclidean metric. The weight of an edge linking two images
- 5. 5 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 −0.2 −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2 First eigenVector SecondeigenVector 2 5 Figure 5. Representation of the digits 2 and 5 in the normalized graph Laplacian eigenvectors basis (ﬁrst eigenvector, second eigenvector)[10] I and I (matrices of size L × C) is 1 1+d2(I,I ) with d(I, I ) deﬁned as the euclidean distance between I and I d(I, I ) = 1 L × C k,l (Ikl − Ikl)2. (30) Where Ikl denote the pixel’s value localted at the kth row and the lth column of the picture I wich is considered as a matrix of size L × C. In our case, we have: L = C = 28. The graph carrying out the digits i1 and i2 can be shown in the basis composed of the ﬁrst and second normalized graph Laplacian eigenvectors [10]. The ﬁrst two eigenvectors represent low frequencies of the signal and give therefore a clear idea of the distance between the different vertices. For instance, in the case of two classes of digits (2 and 5), the associated graph is shown in ﬁgure 5. As we can notice in ﬁgure 5, it is possible to identify two communities although there are some vertices that belong to different classes but that are very close to each other. The second option to construct the graph carrying out the digits i1 and i2 is to link each vertex only to its k-nearest neighbors with a weight equal to one. k is a user deﬁned constant and the graph obtained is connected but not complete for k < N1+N2. This method is known as the KNN algorithm and it is interesting when handling very large size graphs as it allows to have a sparse matrix and therefore a better complexity. Since there are two communities of digits, the problem is similar to the one of community detection. Therefore the ﬁrst method that we use to identify the digit i is an algorithm that produces the normalized graph Laplacian matrix which is a good representation of the links between the digits in the graph. As in the case of two communities discussed in the previous section, we obtain two signs in the second eigenvector and by comparing these signs with the sign of the component corresponding to the digit i, it is possible to identify whether i belongs to i1 or to i2 community. The second method to classify the digit i is to compute the smoothest signal on graph associated to the graph of digits using the Dirichlet form [6]. The method consists of forming a signal x of size N1 + N2 + 1 such as: Algorithm k-means Laplacian method Smoothest signal Error rate(%) 11.96 9.65 6.36 Table I ERROR RATE FOR DIFFERENT CLASSIFICATION ALGORITHMS The three algorithms are tested for the digits 2 and 5 where the cardinality of each digit group is 100. 100 tests are realized for each algorithm. x[j] = 1 for j ∈ [1 N1] α for j = N1 + 1 −1 for j ∈ [N1 + 2, N1 + N2 + 1] (31) α ∈ [−1, 1] is an unknown parameter corresponding to the signal’s component for the digit i. We aim at ﬁnding the value α that makes the signal x the smoothest possible i.e. that minimizes S(x), the Dirichlet form. The sign of α allows us to identify i: if α > 0 then i = i1 if α < 0 then i = i2 (32) The results that we obtain with this algorithm show that we get the smoothest signal particularly for the values {−1, 1} of α; we have a minimum of S(x) with : α = 1 if i = i1 α = −1 otherwise (33) We tested the two methods (Laplacian and the “smoothest signal” methods) for the digits i1 = 2 and i2 = 5 and with N1 = N2 = 100. For 100 tests we obtain an error rate of 9.65% for the Laplacian method and of 6.36% for the smoothest signal method. Table 1 shows the error rates of the Laplacian method, of the smoothest signal method and of k-means algorithm (k = 2) that are tested on the same MNIST database. Our second recognition test realize the recognition of one digit i amongst l different possibilities {ik}k∈[2,l] with 2 ≤ l ≤ 10. For l = 3, we realized an algorithm that recognizes the closest digit to i between two digits i1 and i2 which is the same as the case of a classiﬁcation with two classes. The digit chosen from {i1, i2}as the closest to i is then compared with i3 based on the same two-classes algorithm and the result allows us to identify the closest digit to i from {i1, i2, i3}. Since i ∈ {i1, i2, i3} this method allows us to recognize the digit i. The same principle can be applied to the case where l > 3. The third and last test is to identify several digits si- multaneously among l different possibilities {ik}k∈[2,l] with 2 ≤ l ≤ 10. For this case we started by realizing a complete graph where the vertices include both the training and the test set of digits. And for each digit to be classiﬁed, we use the algorithm of the second recognition test, which realizes the recognition of one digit amongst l different possibilities. C. Results Analysis In both applications that we have seen, a random graph is generated. In the ﬁrst case, we use an Erdos Reyni graph
- 6. 6 Table II EXECUTION TIME (IN SECONDS) FOR DIFFERENT ALGORITHMS IN FUNCTION OF THE GRAPH SIZE. Graph Size Reichardt algorithm LFK algorithm Spectral clustering method 20 0.002 0.0869 9.98 e-0.4 50 0.0026 0.1451 0.0021 100 0.0101 0.5483 0.0103 200 0.0582 2.5145 0.0540 [9] with a probability p of linking two vertices in the same community and a probability q of linking two vertices from different communities. The condition p > q is needed in order to distinguish the two communities. The cardinality is the same in both communities and the adjacency matrix is generated in a block form, where the ﬁrst block represents the ﬁrst community and the fourth block the second community. For the second case, the digit images are chosen randomly from the MNIST handwritten database. To construct the random graph we use two methods. The ﬁrst is based on generating a complete graph where the weights are computed using the euclidean distance. The second method involves KNN algorithm and the graph generated is binary and not complete. In both these applications, we generate the normalized Laplacian matrix and use the property of smoothness in low frequency signals. In the case of Laplacian, we use the second eigenvector since it is related to the second lowest eigenvalue. In the ﬁrst application, the detection of the two communities using the Laplacian (spectral clustering) is more accurate than some of the classical methods such as Reichardt or LFK’s algorithms: the error rate in the case of spectral clustering is lower and increases more slowly with q comparing to the other two algorithms. On the other hand, this method works essentially in order to detect two comminities. In this case, it remains a very efﬁcient algorithm. The other two algorithms which were tested (Reichardt and LFK) detect in some cases more than two communities. Moreover, the spectral clustering algorithm is faster than the two other algorithms (Reichardt and LFK). Table II compares the execution times for the differents algorithms tested. IV. FUTURE WORKS Our future works consist on partitioning a set of data in more than two communities in order to generalize the principle of the spectral clustering. For that, we are thinking about applying the method presented in this paper hierarchically on a data set. So, by applying our algorithm m times, we will be able to recover 2m communities. The interations’ number of our algorithm will be choosen by optimizing a stability criterion well deﬁned. V. CONCLUSION This paper shows the importance of processing signals on graph and the advantages of using the normalized graph Laplacian in this processing. The low frequencies of the Laplacian carry indeed interesting information about the struc- ture of the graph it is representing. The use of a metric to characterize the distance between the vertices allows us to have a better idea of the link between the different vertices of the graph. These properties of the Laplacian matrix are used in two classical applications in machine learning literature: community detection and handwritten digit recognition. The spectral clustering allows us in the ﬁrst case to detect two unlabeled communities based only on the structure of the graph and in the second case to classify one or many digits into one or many labeled classes of training digits based on the similarities between the training set of digits and the digits to be classiﬁed. The spectral clustering allows us to have better efﬁciency than some classical algorithms but remains limited by the fact that it can detect only two communities at once. Therefore, the study of the normalized graph Laplacian spectrum provides us with solutions to some frequent applica- tions. There are many other use cases that can be treated using the graph Laplacian method and that need to be considered in further studies. VI. ACKNOWLEDGEMENTS The authors would like to thank Vincent Gripon, associate professor at Telecom Bretagne for giving us the opportunity to work on the domain of the graph signal processing and for helping us to improve our work thanks to his constructive comments. REFERENCES [1] Ameya Agaskar and Yue M Lu. A spectral graph uncertainty principle. Information Theory, IEEE Transactions on, 59(7):4338–4356, 2013. [2] P Erd˝os and Alfréd Rényi. On the existence of a factor of degree one of a connected random graph. Acta Mathematica Hungarica, 17(3-4):359– 368, 1966. [3] Andrea Lancichinetti, Santo Fortunato, and János Kertész. Detecting the overlapping and hierarchical community structure in complex networks. New Journal of Physics, 11(3):033015, 2009. [4] Erwan Le Martelot and Chris Hankin. Fast multi-scale community detection based on local criteria within a multi-threaded algorithm. arXiv preprint arXiv:1301.0955, 2013. [5] Mohammad Norouzi and David J Fleet. Cartesian k-means. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 3017–3024. IEEE, 2013. [6] Michael G Rabbat and Vincent Gripon. Towards a spectral characteriza- tion of signals supported on small-world networks. In Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on, pages 4793–4797. IEEE, 2014. [7] Jörg Reichardt and Stefan Bornholdt. Statistical mechanics of commu- nity detection. Physical Review E, 74(1):016110, 2006. [8] David Shuman, Sunil K Narang, Pascal Frossard, Antonio Ortega, Pierre Vandergheynst, et al. The emerging ﬁeld of signal processing on graphs: Extending high-dimensional data analysis to networks and other irregular domains. Signal Processing Magazine, IEEE, 30(3):83–98, 2013. [9] David I Shuman, Pierre Vandergheynst, and Pascal Frossard. Distributed signal processing via chebyshev polynomial approximation. arXiv preprint arXiv:1111.5239, 2011. [10] Daniel A Spielman. Spectral graph theory and its applications. In null, pages 29–38. IEEE, 2007. [11] M Van Breukelen, Robert PW Duin, David MJ Tax, and JE Den Hartog. Handwritten digit recognition by combined classiﬁers. Kybernetika, 34(4):381–386, 1998.