Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017

Build Scalable and Modular
Learning Systems
Tianqi Chen
University of Washington
tqchen@cs.washington.edu
Joint work with contributors from

Machine Learning Impact us All
Advance Science
Improve Our LifeImprove Web
Experience

Learning System: The Engine of Revolution
Flexible Scalable
Modular Lightweight

A Method to Solve Half of the Problems
age < 15
is male?
+2 -1+0.1
Y N
Y N
Use Computer
Daily
Y N
+0.9 -0.9
f( ) = 2 + 0.9= 2.9 f( )= -1 - 0.9= -1.9
Tree Boosting (Friedman 1999)
Used by 17 out of 29 Kaggle winners last
year and more, winning solutions for the
all the problems the last slide
All use

XGBoost is Great and is Getting Better

Monotonic Functions
• Constraint prediction to be monotonic to certain features
• Useful for interpretability and generalization

Fast Histogram-based Trees
• Bring techniques of recent improvements in histogram based
tree construction to XGBoost
• FastBDT (Thomas Keck), LightGBM (Ke et.al)
• Optimized for both categorical and continuous features.
Contributed by Hyunsu Cho,
University of Washington

GPU based Optimization
• Run each boosting iteration on the GPU
• Uses fast parallel prefix sum / radix sort operations
• Available now in XGboost
Dataset i7-6700K (s) Titan X (s) Speedup
Yahoo LTR 3738 507 7.37
Higgs 31352 4173 7.51
Bosch 9460 1009 9.38
Contributed by Rory Mitchell,
Waikato University

Modularity: Platform Agnostic Engine
In any language
On any Platform
• YARN, MPI, Flink, Spark, ...
• Easily extendible to
other cloud data flow

Solution to the other half of problems

MXNet: A Scalable Deep Learning System
Flexibility Scalable
Modular

Declarative vs Imperative Programs
• Declarative graphs are easy to store, port, and optimize
• Theano, TensorFlow
• Imperative programs are flexible but hard to optimize
• PyTorch, Chainer, Numpy

MXNet’s Approach: Mixed Programming
Imperative
NDArray API
>>> import mxnet as mx
>>> a = mx.nd.zeros((100, 50))
>>> a.shape
(100L, 50L)
>>> b = mx.nd.ones((100, 50))
>>> c = a + b
>>> b += c
>>> import mxnet as mx
>>> net = mx.symbol.Variable('data')
>>> net = mx.symbol.FullyConnected(data=net, num_hidden=128)
>>> net = mx.symbol.SoftmaxOutput(data=net)
>>> type(net)
<class ‘mxnet.symbol.Symbol’>
>>> texec = net.simple_bind(data=data_shape)
Declarative API

MXNet
Flexible Scalable
Modular

• Speed is critical to deep learning
• Parallelism leads to higher performance
• Parallelization across multiple GPUs
• Parallel execution of small kernels
• Overlapping memory/networking transfer and computation
• …
Need for Parallelism

Parallel Programs are Painful to Write…
• … because of dependencies

Solution: Auto Parallelization with
Dependency Engine
• Single thread abstraction of parallel environment
• Works for both symbolic and imperative programs
Dependency
Engine

Scaling up to 256 AWS GPUs
• Weak scaling (fix batch-size per GPU)
• Need to tune different optimal
parameters with more GPUs
• Larger learning rate
• More noised augmentation
Bias variance trade-off
https://github.com/dmlc/mxnet/tree/master/example/image-classification#scalability-results
Adopted as AWS’s choice of deep learning system

Scaling Up is Good, How about Big Model?
Many model is bounded by memory

Memory Optimization
• Memory sharing, in-place optimization

Trade Computation for Memory
• Do re-computation instead of saving
• Training Deep Nets with Sublinear Memory Cost Chen.et.al arXiv 1604.06174
• Memory-Efficient Backpropagation Through Time Gruslys.et.al arXiv:1606.03401

O(sqrt(N)) memory cost with 25% overhead
ImageNet ResNet configurations
Train Bigger models on a single GPU

Deep Learning System will Become more
Heterogeneous
Mobile System A
Operators of A
Graph,
without gradient
System B
Code generators
Computation Graph Def,
Gradient and Execution B
Front-end
Operators B
System C
Operators of C
Front-end
• More heterogeneous
• Need different system for specific cases(with common modules)

Unix Philosophy vs Monolithic System
• Monolithic: Build one system to solve everything
• Unix Philosophy: Build modules to solve one thing well, work with
other pieces

NNVM: High Level Graph Optimization for
Deep learning
• Allow different front-ends and back-ends
• Allow extensive optimizations:
• Memory reusing.
• Runtime kernel fusion.
• Automatic tensor partition and placement.
• …
Lightweight

The Challenge for IR of Deep Learning Systems
FGradient FInferShape
Conv Relu Symbolic
Differentiation
Shape
Inference
Operators
Optimization Passes
Use Set of Common Operator Attributes
BatchNorm
The need for adding new operators
Code Gen
The need for adding new optimizations
FCodeGen

The Challenge for IR of Deep Learning Systems
The Systems Add New Operator Add New Optimization Pass
Most DL Systems(e.g. old MXNet) Easy Fixed Set of Optimization Pass
LLVM Fixed Set of Primitive Ops Easy
NNVM Easy Easy
Comparison:
• Ease of adding new operator, optimization pass without changing the core interface
• Fixed interface is useful for decentralization
• New optimization directly usable by other projects, without pushing back to centralized repo
• Easy removable of not relevant passes

Graph, Attribute, and Pass
Attr Key Attr Value
“A_shape” (256, 512)
“A_dtype” Float
“B_shape” (256, 512)
“B_dtype” Float
… …
GradientPass
PlaceDevicePass
InferShapePass
InferTypePass
PlanMemoryPass
Semantic Graph
Execution Graph
…
…

In Progress: Multi-Level Compilation Pipeline
NNVM
TVM
Automatic Differentiation
Memory Reuse- Planning
Predefined Executors/OpsTiling
Parallel Pattern
TinyFlow MXNet
High level NNVM Pass
Frontend
Memory Hierarchy
Low level Pass
CUDA x86/armMetal OpenCLLLVM
….

Recap
Flexible Scalable
Modular

MXNet and XGBoost is developed by over 100 collaborators
Special thanks to
Tianqi Chen
UW
Mu Li
CMU/Amazon
Bing Xu
Turi
Chiyuan Zhang
MIT
Junyuan Xie
UW
Yizhi Liu
MediaV
Tianjun Xiao
Microsoft
Yutian Li
Stanford
Yuan Tang
Uptake
Qian Kou
Indiana University
Hu Shiwen
Shanghai
Chuntao Hong
Microsoft
Min Lin
Qihoo 360
Naiyan Wang
TuSimple
Tong He
Simon Fraser University
Minjie Wang
NYU
Valentin Churavy
OIST
Ali Farhadi
UW/AI2
Carlos Guestrin
UW/Turi
Alexander Smola
CMU/Amazon
Zheng Zhang
NYU Shanghai

Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017

Semelhante a Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017 (20)

Mais de MLconf

Mais de MLconf (20)

Último

Último (20)

Tianqi Chen, PhD Student, University of Washington, at MLconf Seattle 2017