O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Apache SystemML Architecture by Niketan Panesar

267 visualizações

Publicada em

This deck will present high level Apache SystemML design and architecture containing language, compiler and runtime modules. It will describe how compilation chain gets generated and variable analysis done. It will show HOPs and runtime plan for sample use case. It will show how to get statistics, and some diagnostic tools can be used.

Publicada em: Educação
  • Login to see the comments

  • Seja a primeira pessoa a gostar disto

Apache SystemML Architecture by Niketan Panesar

  1. 1. SystemML Architecture Niketan Pansare, Berthold Reinwald July 25th, 2016
  2. 2. Agenda • High-level Design & APIs • Architecture Overview • Tooling • Important links 2 From http://systemml.apache.org/
  3. 3. Agenda • High-level Design & APIs • Architecture Overview • Language • Compiler • Runtime • Two examples: • Simple DML expression with an example dataset • Linear Regression with varying datasizes • Tooling • Important links 3
  4. 4. Agenda • High-level Design & APIs • Architecture Overview • Language • Compiler • Runtime • Two examples: • Simple DML expression with an example dataset • Linear Regression with varying datasizes • Tooling • Important links 4
  5. 5. SystemML Design 5 DML (Declarative Machine Learning Language) Hadoop or Spark Cluster (scale-out) since 2010 In-Memory Single Node (scale-up) since 2012 since 2015 DML Scripts Data CP + b sb _mVar1 SPARK mapmm X _mvar1 _mVar2 RIGHT false NONE CP * y _mVar2 _mVar3 Hybrid execution plans* SystemML3. double [] [] 1. On disk/HDFS 2. RDD/DataFrame
  6. 6. SystemML Design 6 Hadoop or Spark Cluster (scale-out) since 2010 In-Memory Single Node (scale-up) since 2012 DML Scripts Data SystemML 1. On disk/HDFS 2. RDD/DataFrame 3. double [] [] Command line API* (also MLContext*) -exec hadoop
  7. 7. SystemML Design 7 Hadoop or Spark Cluster (scale-out) In-Memory Single Node (scale-up) since 2012 DML Scripts Data SystemML 1. On disk/HDFS 2. RDD/DataFrame 3. double [] [] Two options: 1. –exec singlenode 2. Use standalone jar (preserves rewrites, but may spawn Local MR jobs) Command line API* (also MLContext*)
  8. 8. SystemML Design 8 Spark Cluster (scale-out) In-Memory Single Node (scale-up) since 2012 since 2015 DML Scripts Data SystemML 1. On disk/HDFS 2. RDD/DataFrame 3. double [] [] Command line API* (also MLContext*)
  9. 9. SystemML Design 9 Spark Cluster (scale-out) In-Memory Single Node (scale-up) since 2012 since 2015 DML Scripts Data SystemML 1. On disk/HDFS 2. RDD/DataFrame 3. double [] [] MLContext API - Java/Python/Scala https://apache.github.io/incubator-systemml/spark-mlcontext-programming-guide.html
  10. 10. SystemML Design 10 In-Memory Single Node (scale-up) since 2012 DML Scripts Data SystemML 1. On disk/HDFS 2. RDD/DataFrame 3. double [] [] JMLC API https://apache.github.io/incubator-systemml/jmlc.html
  11. 11. Agenda • High-level Design & APIs • Architecture Overview • Language • Compiler • Runtime • Two examples: • Simple DML expression with an example dataset • Linear Regression with varying datasizes • Tooling • Important links 11
  12. 12. From DML to Execution Plan 12 Hadoop or Spark Cluster (scale-out) In-Memory Single Node (scale-up) DML Scripts DML (Declarative Machine Learning Language) since 2010since 2012 since 2015 Data CP + b sb _mVar1 SPARK mapmm X _mvar1 _mVar2 RIGHT false NONE CP * y _mVar2 _mVar3 Hybrid execution plans* SystemML
  13. 13. From DML to Execution Plan 13 Hadoop or Spark Cluster (scale-out) In-Memory Single Node (scale-up) Runtime Compiler Language DML Scripts DML (Declarative Machine Learning Language) since 2010since 2012 since 2015 Data CP + b sb _mVar1 SPARK mapmm X _mvar1 _mVar2 RIGHT false NONE CP * y _mVar2 _mVar3 Hybrid execution plans* Assuming an example dataset X: 100M X 500, y: 100M X 1, b/sb: 500 X 1
  14. 14. SystemML Compilation Chain 14
  15. 15. SystemML Compilation Chain 15 • Parsing • Parse input DML/PyDML using Antlr v4 (see Dml.g4 and Pydml.g4) • Perform syntactic validation • Construct DMLProgram (=> list of Statement and function blocks) • Live Variable Analysis • Classic dataflow analysis • A variable is “live” if it holds value that may be needed in future • Dead code elimination • Semantic Validation
  16. 16. SystemML Compilation Chain 16 • Dataflow in DAGs of operations on matrices, frames, and scalars • Choosing from alternative execution plans based on memory and cost estimates • Operator ordering & selection; hybrid plans
  17. 17. SystemML Compilation Chain 17 * Discussed later in Tooling spark-submit --master yarn-client --driver-memory 20G --num-executors 4 --executor-memory 40G --executor-cores 24 SystemML.jar -f test.dml -explain hops
  18. 18. SystemML Compilation Chain 18 • Low-level physical execution plan (LOPDags) • Over key-value pairs for MR • Over RDDs for Spark • “Piggybacking” operations into minimal number Map-Reduce jobs
  19. 19. SystemML Compilation Chain 19 Spark CP + b sb _mVar1 SPARK mapmm X.MATRIX.DOUBLE _mvar1.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE RIGHT false NONE CP * y _mVar2 _mVar3
  20. 20. SystemML Runtime • Hybrid Runtime • CP: single machine operations & orchestrate jobs • MR: generic Map-Reduce jobs & operations • SP: Spark Jobs • Numerically stable operators • Dense / sparse matrix representation • Multi-Level buffer pool (caching) to evict in-memory objects • Dynamic Recompilation for initial unknowns Control Program Runtime Program Buffer Pool ParFor Optimizer/ Runtime MR InstSpark Inst CP Inst Recompiler DFS IOMem/FS IO Generic MR Jobs MatrixBlock Library (single/multi-threaded)
  21. 21. From DML to Execution Plan 21 Hadoop or Spark Cluster (scale-out) In-Memory Single Node (scale-up) Runtime Compiler Language DML Scripts DML (Declarative Machine Learning Language) since 2010since 2012 since 2015 Data CP + b sb _mVar1 SPARK mapmm X_mvar1 _mVar2 RIGHT false NONE CP * y _mVar2 _mVar3 Hybrid execution plans* Varying data sizes LinearRegression.dml
  22. 22. A Data Scientist – Linear Regression 22 X ≈ Explanatory/ Independent Variables Predicted/ Dependant VariableModel w w = argminw ||Xw-y||2 +λ||w||2 Optimization Problem: next direction Iterate until convergence initialize step size update w initial direction accuracy measures Conjugate GradientMethod: • Start off with the (negative) gradient • For each step 1. Move to the optimal point along the chosen direction; 2. Recompute the gradient; 3. Project it onto the subspace conjugate* to allprior directions; 4. Use this as the next direction (* conjugate =orthogonalgiven A as the metric) A = XT X + λ y
  23. 23. SystemML – Run LinReg CG on Spark 23 100M 10,000 100M 1 yX 100M 1,000 X 100M 100 X 100M 10 X 100M 1 y 100M 1 y 100M 1 y 8 TB 800 GB 80 GB 8 GB … tMMp … Multithreaded Single Node 20 GB Driver on 16c 6 x 55 GB Executors Hybrid Plan with RDD caching and fused operator Hybrid Plan with RDD out-of- core and fused operator Hybrid Plan with RDD out-of- core and different operators … x.persist(); ... X.mapValues(tMMv ) .reduce () … Driver Fused Executors … RDD cache: X tMMv tMMv … x.persist(); ... X.mapValues(tMMv) .reduce() ... Executors … RDD cache: X tMMv tMMv Driver Spilling … x.persist(); ... // 2 MxV mult // with broadcast, // mapToPair, and // reduceByKey ... Executors … RDD cache: X Mv tvM Mv tvM Driver Driver Cache
  24. 24. Agenda • Architecture Overview • Language & APIs • Compiler • Runtime • Two examples: • Simple DML expression with an example dataset • Linear Regression with varying datasizes • Tooling • Important links 24
  25. 25. SystemML’s Compilation Chain / Overview Tools 25 EXPLAIN hops STATS DEBUG EXPLAIN runtime [Matthias Boehm et al: SystemML's Optimizer: Plan Generation for Large-Scale Machine Learning Programs. IEEE Data Eng. Bull 2014] HOP (High-level operator) LOP (Low-level operator) EXPLAIN *_recompile
  26. 26. Explain (Understanding Execution Plans) • Overview • Shows generated execution plan (at different compilation steps) • Introduced 05/2014 for internal usage • Important tool for understanding/debugging optimizer choices! • Usage • hadoop jar SystemML.jar -f test.dml –explain [hops | runtime | hops_recompile | runtime_recompile] • Hops • Program w/ hop dags after optimization • Runtime (default) • Program w/ generated runtime instructions • Hops_recompile: • See hops + hop dag after every recompile • Runtime_recompile: • See runtime + generated runtime instructions after every recompile 26
  27. 27. Explain: Understanding HOP DAGs (simple DML) 27 Spark • HOP ID • HOP opcode • HOP input data dependencies (via HOP IDs) • HOP output matrix characteristics (rlen, clen, brlen, bclen, nnz) • Hop memory estimates (all inputs, intermediates, output à operation mem) • Hop execution type (CP/SP/MR) • Optional: indicators of reblock/checkpointing (caching) of hop outputs -explain hops -explain recompile_hops spark-submit --master yarn-client --driver-memory 20G --num-executors 4 --executor-memory 40G --executor-cores 24 SystemML.jar -f test.dml -explain hops Broadcast mem budget
  28. 28. Explain: Understanding HOP DAGs (entire script) • Example DML Script (Simplified LinregDS) 28 X = read($1); y = read($2); intercept = $3; lambda = $4; if( intercept == 1 ) { ones = matrix(1, nrow(X), 1); X = append(X, ones); } I = matrix(1, ncol(X), 1); A = t(X) %*% X + diag(I*lambda); b = t(X) %*% y; beta = solve(A, b); write(beta, $5); Invocation: hadoop jar SystemML.jar -f linregds.dml -args X y 0 0 beta Scenario: X: 100,000 x 1,000, 1.0 y: 100,000 x 1, 1.0 (800MB, 200+GFlop)
  29. 29. Explain: Understanding HOP DAGs (2) • Explain Hops 29 15/07/05 17:18:06 INFO api.DMLScript: EXPLAIN (HOPS): # Memory Budget local/remote = 57344MB/1434MB/1434MB # Degree of Parallelism (vcores) local/remote = 24/144/72 PROGRAM --MAIN PROGRAM ----GENERIC (lines 1-4) [recompile=false] ------(10) PRead X [100000,1000,1000,1000,100000000] [0,0,763 -> 763MB], CP ------(11) TWrite X (10) [100000,1000,1000,1000,100000000] [763,0,0 -> 763MB], CP ------(21) PRead y [100000,1,1000,1000,100000] [0,0,1 -> 1MB], CP ------(22) TWrite y (21) [100000,1,1000,1000,100000] [1,0,0 -> 1MB], CP ------(24) TWrite intercept [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP ------(26) TWrite lambda [0,0,-1,-1,-1] [0,0,0 -> 0MB], CP ----GENERIC (lines 11-16) [recompile=false] ------(42) TRead X [100000,1000,1000,1000,100000000] [0,0,763 -> 763MB], CP ------(52) r(t) (42) [1000,100000,1000,1000,100000000] [763,0,763 -> 1526MB] ------(53) ba(+*) (52,42) [1000,1000,1000,1000,-1] [1526,8,8 -> 1541MB], CP ------(43) TRead y [100000,1,1000,1000,100000] [0,0,1 -> 1MB], CP ------(59) ba(+*) (52,43) [1000,1,1000,1000,-1] [764,0,0 -> 764MB], CP ------(60) b(solve) (53,59) [1000,1,1000,1000,-1] [8,8,0 -> 15MB], CP ------(66) PWrite beta (60) [1000,1,-1,-1,-1] [0,0,0 -> 0MB], CP Cluster Characteristics Program Structure (incl recompile) Unrolled HOP DAG Notes: if branch (6-9) and regularization removed by rewrites
  30. 30. Explain: Understanding Runtime Plans (1) • Explain Runtime (simplified filenames, removed rmvar) 30 IBM Research 15/07/05 17:18:53 INFO api.DMLScript: EXPLAIN (RUNTIME): # Memory Budget local/remote = 57344MB/1434MB/1434MB # Degree of Parallelism (vcores) local/remote = 24/144/72 PROGRAM ( size CP/MR = 25/0 ) --MAIN PROGRAM ----GENERIC (lines 1-4) [recompile=false] ------CP createvar pREADX X false binaryblock 100000 1000 1000 1000 100000000 ------CP createvar pREADy y false binaryblock 100000 1 1000 1000 100000 ------CP assignvar 0.SCALAR.INT.true intercept.SCALAR.INT ------CP assignvar 0.0.SCALAR.DOUBLE.true lambda.SCALAR.DOUBLE ------CP cpvar pREADX X ------CP cpvar pREADy y ----GENERIC (lines 11-16) [recompile=false] ------CP createvar _mVar2 .../_t0/temp1 true binaryblock 1000 1000 1000 1000 -1 ------CP tsmm X.MATRIX.DOUBLE _mVar2.MATRIX.DOUBLE LEFT 24 ------CP createvar _mVar3 .../_t0/temp2 true binaryblock 1 100000 1000 1000 100000 ------CP r' y.MATRIX.DOUBLE _mVar3.MATRIX.DOUBLE ------CP createvar _mVar4 .../_t0/temp3 true binaryblock 1 1000 1000 1000 -1 ------CP ba+* _mVar3.MATRIX.DOUBLE X.MATRIX.DOUBLE _mVar4.MATRIX.DOUBLE 24 ------CP createvar _mVar5 .../_t0/temp4 true binaryblock 1000 1 1000 1000 -1 ------CP r' _mVar4.MATRIX.DOUBLE _mVar5.MATRIX.DOUBLE ------CP createvar _mVar6 .../_t0/temp5 true binaryblock 1000 1 1000 1000 -1 ------CP solve _mVar2.MATRIX.DOUBLE _mVar5.MATRIX.DOUBLE _mVar6.MATRIX.DOUBLE ------CP write _mVar6.MATRIX.DOUBLE .../beta.SCALAR.STRING.true textcell.SCALAR.STRING.true Literally a string representation of runtime instructions
  31. 31. Stats (Profiling Runtime Statistics) • Overview • Profiles and shows aggregated runtime statistics of potential bottlenecks • Introduced 01/2014 for internal usage, extension of buffer pool stats 01/2013 • Important tool for understanding runtime characteristics and profiling/tuning system internals by developers • Usage • hadoop jar SystemML.jar -f test.dml -stats 31 IBM Research
  32. 32. SystemML Statistics Total exec time Buffer pool stats Dynamic recompilation stats JVM stats (JIT, GC) Heavy hitter instructions (incl. buffer pool times) optional: parfor stats (if program contains parfors)
  33. 33. Debug (Script Debugging) • Overview • Script-level debugging by end-users (and developers) • Introduced 09/2014 as result of intern project • gdb-inspired command-line debugger interface • Usage • hadoop jar SystemML.jar -f test.dml -debug 33
  34. 34. Agenda • Architecture Overview • Language & APIs • Compiler • Runtime • Two examples: • Simple DML expression with an example dataset • Linear Regression with varying datasizes • Tooling • Important links 34
  35. 35. Important Links • Website: http://systemml.apache.org/ 35
  36. 36. Important Links • Website: http://systemml.apache.org/ • Interested in SystemML ? • Go to https://github.com/apache/incubator-systemml and “Star it” 36
  37. 37. Important Links • Website: http://systemml.apache.org/ • Interested in SystemML ? • Go to https://github.com/apache/incubator-systemml and “Star it” • Want to contribute to SystemML ? • See http://apache.github.io/incubator-systemml/contributing-to- systemml.html • List of issues: https://issues.apache.org/jira/browse/SYSTEMML/ • Ask any of our PMC members for suggestions • Want to try out SystemML ? • Laptop: http://apache.github.io/incubator-systemml/quick-start-guide.html (Does not require Hadoop/Spark installation) • Spark Cluster: http://apache.github.io/incubator-systemml/spark- mlcontext-programming-guide.html (Includes Jupyter/Zeppelin demo) 37
  38. 38. Thank You

×