Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
SAIS/DWS2018報告会 #saisdws2018
1. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
2018年8月6日
李 燮鳴
Spark AI Summit 2018 報告会
2. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
自己紹介
2
李 燮鳴 (リ ショウメイ)
2017年3月筑波大学大学院博士(工学)取得
• 並列ファイルシステムのためのスケジューラの研究
2017年4月からはヤフーに入社
• 入社後はHadoopクラスタのDevOpsを担当
3. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Spark AI Summit 2018
3
開催日時:2018/06/04~2018/06/06
場所:San Francisco Moscone Center West
参加者:6000名ほど
セッション:約9~11並列発表され、合計で193セッション
4. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
会場 (外観)
4
5. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
会場 (内部)
5
7. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
アジェンダ
7
• MLを便利にするフレームワーク(2件)
• Spark SQLについて(2件)
• クラスタアーキテクチャー(2件)
8. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
MLを便利にするフレームワーク
9. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
MEET UP: Horovod
9
• TensorFlowの分散型学習を高速化した
フレームワーク
• MPIのALL REDUCEを利用してGradientsの平均値
の計算を高速化した
Alexander Sergeev, Uber
10. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
MEET UP: Horovod (1)
10
Alexander Sergeev, Uber
Uber. (2018, February 19). Meet Horovod: Uber's Open Source Distributed Deep Learning Framework for TensorFlow. Retrieved July 9, 2018, from
https://eng.uber.com/horovod/
TensorFlowの分散型学習ではParameter Serverを使用し、各Workerで求まったGradientの平均値の計算
難点:Parameter Serverの構成を選択するのは難しい
11. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
MEET UP: Horovod (2)
11
Alexander Sergeev, Uber
Horovodでは、Parameter Serverを使用せず、NCCL (NVIDIA Collective
Communications Library, MPIで実装)のRing ALL REDUCEでGradientsの交換・平均計算
を行った
Uber. (2018, February 19). Meet Horovod: Uber's Open Source Distributed Deep Learning Framework for TensorFlow. Retrieved July 9, 2018, from
https://eng.uber.com/horovod/
12. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
MEET UP: Horovod (3)
12
Alexander Sergeev, Uber
TensorFlow のオフィシャルのベンチマークを用いた性能評価では約2倍ほどの性能向上を確
認できた
Uber. (2018, February 19). Meet Horovod: Uber's Open Source Distributed Deep Learning Framework for TensorFlow. Retrieved July 9, 2018, from
https://eng.uber.com/horovod/
13. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
MEET UP: Horovod (4)
13
Alexander Sergeev, Uber
InfinibandでRDMA (Remote Direct Memory Access)を使用すると、性能がさらに上がった
Uber. (2018, February 19). Meet Horovod: Uber's Open Source Distributed Deep Learning Framework for TensorFlow. Retrieved July 9, 2018, from
https://eng.uber.com/horovod/
14. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
KEYNOTE: Hydrogen
14
Reynold Xin, Databricks
DLのフレームワークをSparkで効率できるようにする提案
• SPIP= Spark Project Improvement Proposal
• 現時点ではDesign Sketchが完了(Designが15%終了,
SPARK-24374)
15. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
KEYNOTE: Hydrogen (1)
15
Reynold Xin, Databricks
Databricks. Project Hydrogen: Unifying State-of-the-art AI and Big Data in Apache Spark. Retrieved July 9, 2018, from https://databricks.com/session/databricks-keynote-2
16. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
KEYNOTE: Hydrogen (2)
16
Reynold Xin, Databricks
Databricks. Project Hydrogen: Unifying State-of-the-art AI and Big Data in Apache Spark. Retrieved July 9, 2018, from https://databricks.com/session/databricks-keynote-2
17. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Spark SQL
18. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Deep Dive into Spark SQL with Advanced Performance Tuning
18
Xiao Li, Wenchen Fan, Databricks
Spark SQLがクエリから実行されるまでの各段階で実施できるパラ
メータチューニングの手法を紹介した
Databricks Follow. (2018, June 20). Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao L... Retrieved July 10, 2018, from https://www.slideshare.net/databricks/deep-dive-
into-spark-sql-with-advanced-performance-tuning-with-xiao-li-wenchen-fan
19. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Deep Dive into Spark SQL with Advanced Performance Tuning
19
Xiao Li, Wenchen Fan, Databricks
Databricks Follow. (2018, June 20). Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao L... Retrieved July 10, 2018, from https://www.slideshare.net/databricks/deep-dive-
into-spark-sql-with-advanced-performance-tuning-with-xiao-li-wenchen-fan
20. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Deep Dive into Spark SQL with Advanced Performance Tuning
20
Xiao Li, Wenchen Fan, Databricks
Databricks Follow. (2018, June 20). Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao L... Retrieved July 10, 2018, from https://www.slideshare.net/databricks/deep-dive-
into-spark-sql-with-advanced-performance-tuning-with-xiao-li-wenchen-fan
21. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Deep Dive into Spark SQL with Advanced Performance Tuning
21
Xiao Li, Wenchen Fan, Databricks
Databricks Follow. (2018, June 20). Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao L... Retrieved July 10, 2018, from https://www.slideshare.net/databricks/deep-dive-
into-spark-sql-with-advanced-performance-tuning-with-xiao-li-wenchen-fan
22. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Deep Dive into Spark SQL with Advanced Performance Tuning
22
Xiao Li, Wenchen Fan, Databricks
Spark SQLがクエリから実行されるまでの各段階で実施できるパラ
メータチューニングの手法を紹介した
Databricks Follow. (2018, June 20). Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao L... Retrieved July 10, 2018, from https://www.slideshare.net/databricks/deep-dive-
into-spark-sql-with-advanced-performance-tuning-with-xiao-li-wenchen-fan
23. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Spark SQL Adaptive Execution
23
Carson Wang, Intel, Yuanjian Li, Baidu
Spark SQLの実行をランタイムで変更させて効率よくした
• 最適なReducerの数をランタイムで決める
• 適切なJoin手法をランタイムで決める
• BaiduではProd環境で使用(SPARK-23128)
24. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Spark SQL Adaptive Execution (1)
24
Carson Wang, Intel, Yuanjian Li, Baidu
Reducerの数のチューニング
• 少なすぎる場合: Spill, OOM
• 多すぎる場合: Scheduling overhead. More IO
requests. Too many small output files
• すべてのstages に適した数を指定するのはむずかしい
26. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Spark SQL Adaptive Execution (3)
26
Carson Wang, Intel, Yuanjian Li, Baidu
SQL Query Logical Plan
Optimized
Logical Plan
Multiple
Physical Plan
Selected
Physical Plan
Cost Modelで評価Join2
Join1
Exchange1
T1
Exchange2
T2
Exchange3
T3
最適ではない
Joinが選ばれる
Plannerの予測値と
実際の値と大きく異
なる場合がある
27. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Spark SQL Adaptive Execution (4)
27
Carson Wang, Intel, Yuanjian Li, Baidu
Join2
Join1
Exchange1
T1
Exchange2
T2
Exchange3
T3
QueryStage4
Join2
Join1
QueryStage
Input1
QueryStage
Input2
QueryStage
Input3
QueryStage1
Exchange1
T1
QueryStage2
Exchange2
T2
QueryStage3
Exchange3
T3
実際の値を把
握できる
28. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Spark SQL Adaptive Execution (5)
28
Carson Wang, Intel, Yuanjian Li, Baidu
BaiduでAdaptive Exectionを適用した結果
• SortMergeJoinがBroadcastJoinに変更され、
50%~200%の性能向上を確認した
• 実行時間が1時間以上のジョブでは適切なReducer数が
指定され、50%~100%の性能向上を確認した
29. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
クラスタアーキテクチャー
30. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Taking Advantage of a Disaggregated Storage and
Compute Architecture
30
Brian Cho, Facebook
• データと計算を分離したアーキテクチャーの紹介
• データと計算を分離したアーキテクチャーにおけるSpark
の最適化
1. Fileインターフェイスの定義
2. SparkのTemporaryファイルのアクセス最適化
3. Spark shuffleの最適化
31. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Taking Advantage of a Disaggregated Storage and
Compute Architecture (1)
31
Brian Cho, Facebook
Databricks. Taking Advantage of a Disaggregated Storage and Compute Architecture. Retrieved July 10, 2018, from
https://databricks.com/session/taking-advantage-of-a-disaggregated-storage-and-compute-architecture
32. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Taking Advantage of a Disaggregated Storage and
Compute Architecture (2)
32
Brian Cho, Facebook
Databricks. Taking Advantage of a Disaggregated Storage and Compute Architecture. Retrieved July 10, 2018, from
https://databricks.com/session/taking-advantage-of-a-disaggregated-storage-and-compute-architecture
33. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Taking Advantage of a Disaggregated Storage and
Compute Architecture (3)
33
Brian Cho, Facebook
データと計算を分離したアーキテクチャーのメリット
• それぞれデータと計算に適したサーバー調達できる
• キャパシティプランニングが簡単
• それぞれのチームでメンテナンスできる
34. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Taking Advantage of a Disaggregated Storage and
Compute Architecture (4)
34
Brian Cho, Facebook
Databricks. Taking Advantage of a Disaggregated Storage and Compute Architecture. Retrieved July 10, 2018, from
https://databricks.com/session/taking-advantage-of-a-disaggregated-storage-and-compute-architecture
35. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Taking Advantage of a Disaggregated Storage and
Compute Architecture (5)
35
Brian Cho, Facebook
Databricks. Taking Advantage of a Disaggregated Storage and Compute Architecture. Retrieved July 10, 2018, from
https://databricks.com/session/taking-advantage-of-a-disaggregated-storage-and-compute-architecture
36. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Taking Advantage of a Disaggregated Storage and
Compute Architecture (5)
36
Brian Cho, Facebook
Databricks. Taking Advantage of a Disaggregated Storage and Compute Architecture. Retrieved July 10, 2018, from
https://databricks.com/session/taking-advantage-of-a-disaggregated-storage-and-compute-architecture
37. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Taking Advantage of a Disaggregated Storage and
Compute Architecture (6)
37
Brian Cho, Facebook
Executor Executor Executor
ESS ESS ESS
Local FS Local FS Local FS
ここはLocalアクセス
計算
38. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Taking Advantage of a Disaggregated Storage and
Compute Architecture (7)
38
Brian Cho, Facebook
Executor Executor Executor
ESS ESS ESS
Warm Storage
ここはRemoteアクセス
*Network Transfer
計算
ストレージ
39. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Taking Advantage of a Disaggregated Storage and
Compute Architecture (8)
39
Brian Cho, Facebook
Databricks. Taking Advantage of a Disaggregated Storage and Compute Architecture. Retrieved July 10, 2018, from
https://databricks.com/session/taking-advantage-of-a-disaggregated-storage-and-compute-architecture
Index, shuffle
shuffle
shuffle
Index
40. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Apache Spark on Kubernetes Clusters
40
Sean Suchter, PepperData, Anirudh Ramanathan, Google
Kubernetesの概要、Spark on Kubernetesの実装と今後
の予定を紹介した
Spark DriverはKubernetesのCustom Controllerとして
実装されている
将来的に追加される機能(ピックアップ)
• PySpark: SPARK-23984
• Dynamic Allocation: SPARK-24432
• Driver HA
41. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Apache Spark on Kubernetes Clusters (1)
41
Sean Suchter, PepperData, Anirudh Ramanathan, Google
Databricks. Apache Spark on Kubernetes Clusters. Retrieved July 11, 2018, from https://databricks.com/session/apache-spark-on-kubernetes-clusters
42. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Apache Spark on Kubernetes Clusters (2)
42
Sean Suchter, PepperData, Anirudh Ramanathan, Google
Databricks. Apache Spark on Kubernetes Clusters. Retrieved July 11, 2018, from https://databricks.com/session/apache-spark-on-kubernetes-clusters
43. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Apache Spark on Kubernetes Clusters (3)
43
Sean Suchter, PepperData, Anirudh Ramanathan, Google
bin/spark-submit ¥
--master k8s://<server:port> ¥
--deploy-mode cluster ¥
--name spark-pi ¥
--class org.apach.spark.examples.SparkPi ¥
--conf spark.executor.instances=5 ¥
--conf spark.kubernetes.container.image=<spark-image> ¥
local:///path/to/examples.jar
利用者はほぼ今まで通りの方法でジョブを提出
44. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
Apache Spark on Kubernetes Clusters (4)
44
Sean Suchter, PepperData, Anirudh Ramanathan, Google
Databricks. Apache Spark on Kubernetes Clusters. Retrieved July 11, 2018, from https://databricks.com/session/apache-spark-on-kubernetes-clusters
Spark on Kubernetes Roadmap
45. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
EOP
46. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
予備スライド
47. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
KEYNOTE: MLflow
47
Matei Zaharia, Databricks
SparkのMachine Learningのライフサイクル
管理フレームワーク
SparkのMLが難しい3つのポイントがあること挙げたう
え、それぞれのポイントに対して解決策を提供した
48. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
KEYNOTE: Mlflow (1)
48
Matei Zaharia, Databricks
Databricks. Project Hydrogen: Unifying State-of-the-art AI and Big Data in Apache Spark. Retrieved July 9, 2018, from
https://databricks.com/session/unifying-data-and-ai-for-better-data-products
49. Copyright (C) 2018 Yahoo Japan Corporation. All Rights Reserved.
KEYNOTE: Mlflow (2)
49
Matei Zaharia, Databricks
Databricks. Project Hydrogen: Unifying State-of-the-art AI and Big Data in Apache Spark. Retrieved July 9, 2018, from
https://databricks.com/session/unifying-data-and-ai-for-better-data-products