O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems

370 visualizações

Publicada em

In recent years, the generated and collected data is increasing at an almost exponential rate. At the same time, the data’s value has been identified in terms of insights that can be provided. However, retrieving the value requires powerful analysis tools, since valuable insights are buried deep in large amounts of noise. Unfortunately, analytic capacities did not scale well with the growing data. Many existing tools run only on a single computer and are limited in terms of data size by its memory. A very promising solution to deal with large-scale data is scaling systems and exploiting parallelism.
In this presentation, we propose Gilbert, a distributed sparse linear algebra system, to decrease the imminent lack of analytic capacities. Gilbert offers a MATLAB-like programming language for linear algebra programs, which are automatically executed in parallel. Transparent parallelization is achieved by compiling the linear algebra operations first into an intermediate representation. This language-independent form enables high-level algebraic optimizations. Different optimization strategies are evaluated and the best one is chosen by a cost-based optimizer. The optimized result is then transformed into a suitable format for parallel execution. Gilbert generates execution plans for Apache Spark and Apache Flink, two massively parallel dataflow systems. Distributed matrices are represented by square blocks to guarantee a well-balanced trade-off between data parallelism and data granularity.
An exhaustive evaluation indicates that Gilbert is able to process varying amounts of data exceeding the memory of a single computer on clusters of different sizes. Two well known machine learning (ML) algorithms, namely PageRank and Gaussian non-negative matrix factorization (GNMF), are implemented with Gilbert. The performance of these algorithms is compared to optimized implementations based on Spark and Flink. Even though Gilbert is not as fast as the optimized algorithms, it simplifies the development process significantly due to its high-level programming abstraction.

Publicada em: Ciências
  • Entre para ver os comentários

  • Seja a primeira pessoa a gostar disto

Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems

  1. 1. Motivation Gilbert Evaluation Conclusion Gilbert: Declarative Sparse Linear Algebra on Massively Parallel Dataflow Systems Till Rohrmann 1 Sebastian Schelter 2 Tilmann Rabl 2 Volker Markl 2 1 Apache Software Foundation 2 Technische Universität Berlin March 8, 2017 1 / 25
  2. 2. Motivation Gilbert Evaluation Conclusion Motivation 2 / 25
  3. 3. Motivation Gilbert Evaluation Conclusion Information Age Collected data grows exponentially Valuable information stored in data Need for scalable analytical methods 3 / 25
  4. 4. Motivation Gilbert Evaluation Conclusion Distributed Computing and Data Analytics Writing parallel algorithms is tedious and error-prone Huge existing code base in form of libraries Need for parallelization tool 4 / 25
  5. 5. Motivation Gilbert Evaluation Conclusion Requirements Linear algebra is lingua franca of analytics Parallelize programs automatically to simplify development Sparse operations to support sparse problems efficiently Goal Development of distributed sparse linear algebra system 5 / 25
  6. 6. Motivation Gilbert Evaluation Conclusion Gilbert 6 / 25
  7. 7. Motivation Gilbert Evaluation Conclusion Gilbert in a Nutshell 7 / 25
  8. 8. Motivation Gilbert Evaluation Conclusion System architecture 8 / 25
  9. 9. Motivation Gilbert Evaluation Conclusion Gilbert Language Subset of MATLAB R language Support of basic linear algebra operations Fixpoint operator serves as side-effect free loop abstraction Expressive enough to implement a wide variety of machine learning algorithms 1 A = rand (10 , 2 ) ; 2 B = eye ( 1 0 ) ; 3 A’∗B; 4 f = @( x ) x . ^ 2 . 0 ; 5 eps = 0 . 1 ; 6 c = @(p , c ) norm (p−c , 2 ) < eps ; 7 f i x p o i n t (1/2 , f , 10 , c ) ; 9 / 25
  10. 10. Motivation Gilbert Evaluation Conclusion Gilbert Typer Matlab is dynamically typed Dataflow systems require type knowledge at compile type Automatic type inference using the Hindley-Milner type inference algorithm Infer also matrix dimensions for optimizations 1 A = rand (10 , 2 ) : Matrix ( Double , 10 , 2) 2 B = eye ( 1 0 ) : Matrix ( Double , 10 , 10) 3 A’∗B: Matrix ( Double , 2 , 10) 4 f = @( x ) x . ^ 2 . 0 : N −> N 5 eps = 0 . 1 : Double 6 c = @(p , c ) norm (p−c , 2 ) < eps : (N,N) −> Boolean 7 f i x p o i n t (1/2 , f , 10 , c ) : Double 10 / 25
  11. 11. Motivation Gilbert Evaluation Conclusion Intermediate Representation & Gilbert Optimizer Language independent representation of linear algebra programs Abstraction layer facilitates easy extension with new programming languages (such as R) Enables language independent optimizations Transpose push down Matrix multiplication re-ordering 11 / 25
  12. 12. Motivation Gilbert Evaluation Conclusion Distributed Matrices (a) Row partitioning (b) Quadratic block partitioning Which partitioning is better suited for matrix multiplications? io_costrow = O n3 io_costblock = O n2 √ n 12 / 25
  13. 13. Motivation Gilbert Evaluation Conclusion Distributed Operations: Addition Apache Flink and Apache Spark offer MapReduce-like API with additional operators: join, coGroup, cross 13 / 25
  14. 14. Motivation Gilbert Evaluation Conclusion Evaluation 14 / 25
  15. 15. Motivation Gilbert Evaluation Conclusion Gaussian Non-Negative Matrix Factorization Given V ∈ Rd×w find W ∈ Rd×t and H ∈ Rt×w such that V ≈ WH Used in many fields: Computer vision, document clustering and topic modeling Efficient distributed implementation for MapReduce systems Algorithm H ← randomMatrix(t, w) W ← randomMatrix(d, t) while V − WH 2 > eps do H ← H · (W T V /W T WH) W ← W · (VHT /WHHT ) end while 15 / 25
  16. 16. Motivation Gilbert Evaluation Conclusion Testing Setup Set t = 10 and w = 100000 V ∈ Rd×100000 with sparsity 0.001 Block size 500 × 500 Numbers of cores 64 Flink 1.1.2 & Spark 2.0.0 Gilbert implementation: 5 lines Distributed GNMF on Flink: 70 lines 1 V = rand ( $rows , 100000 , 0 , 1 , 0 . 0 0 1 ) ; 2 H = rand (10 , 100000 , 0 , 1 ) ; 3 W = rand ( $rows , 10 , 0 , 1 ) ; 4 nH = H. ∗ ( (W’ ∗V) . / (W’∗W∗H)) 5 nW = W. ∗ (V∗nH ’ ) . / (W∗nH∗nH ’ ) 16 / 25
  17. 17. Motivation Gilbert Evaluation Conclusion Gilbert Optimizations 103 104 0 100 200 300 Rows d of V Executiontimetins Optimized Spark Optimized Flink Non-optimized Spark Non-optimized Flink 17 / 25
  18. 18. Motivation Gilbert Evaluation Conclusion Optimizations Explained Matrix updates H ← H · (W T V /W T WH) W ← W · (VHT /WHHT ) Non-optimized matrix multiplications ∈R10×100000 W T W ∈R10×10 H ∈Rd×10 (WH) ∈Rd×100000 HT Optimized matrix multiplications ∈R10×100000 W T W ∈R10×10 H ∈Rd×10 W HHT ∈R10×10 18 / 25
  19. 19. Motivation Gilbert Evaluation Conclusion GNMF Step: Scaling Problem Size 103 104 105 101 102 Number of rows of matrix V Executiontimetins Flink SP Flink Spark SP Spark Local Distributed Gilbert execution handles much larger problem sizes than local execution Specialized implementation is slightly faster than Gilbert 19 / 25
  20. 20. Motivation Gilbert Evaluation Conclusion GNMF Step: Weak Scaling 100 101 102 0 20 40 60 Number of cores Executiontimetins Flink Spark Both distributed backends show good weak scaling behaviour 20 / 25
  21. 21. Motivation Gilbert Evaluation Conclusion PageRank Ranking between entities with reciprocal quotations and references PR(pi ) = d pj ∈L(pi ) PR(pj ) D(pj ) + 1 − d N N - number of pages d - damping factor L(pi ) - set of pages being linked by pi D(pi ) - number of linked pages by pi M - transition matrix derived from adjacency matrix R = d · MR + 1 − d N · 1 21 / 25
  22. 22. Motivation Gilbert Evaluation Conclusion PageRank Implementation MATLAB R 1 i t = 10; 2 d = sum(A, 2) ; 3 M = ( diag (1 . / d ) ∗ A) ’ ; 4 r_0 = ones (n , 1) / n ; 5 e = ones (n , 1) / n ; 6 f o r i = 1: i t 7 r = .85 ∗ M ∗ r + .15 ∗ e 8 end Gilbert 1 i t = 10; 2 d = sum(A, 2) ; 3 M = ( diag (1 ./ d ) ∗ A) ’ ; 4 r_0 = ones (n , 1) / n ; 5 e = ones (n , 1) / n ; 6 f i x p o i n t ( r_0 , 7 @( r ) .85 ∗ M ∗ r + .15 ∗ e , 8 i t ) 22 / 25
  23. 23. Motivation Gilbert Evaluation Conclusion PageRank: 10 Iterations 104 105 101 102 103 104 Number of vertices n Executiontimetins Spark Flink SP Flink SP Spark Gilbert backends show similar performance Specialized implementation faster because it can fuse operations 23 / 25
  24. 24. Motivation Gilbert Evaluation Conclusion Conclusion 24 / 25
  25. 25. Motivation Gilbert Evaluation Conclusion Conclusion Easy to use sparse linear algebra environment for people familiar with MATLAB R Scales to data sizes exceeding a single computer High-level linear algebra optimizations improve runtime Slower than specialized implementations due to abstraction overhead 25 / 25

×