【14-C-7】コンピュータビジョンを支える深層学習技術の新潮流

© 2019, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Masaki Samejima
Machine Learning Solutions Architect, Amazon Web Services Japan.
2019.2.14
Developers Summit 2019

Agenda
•
•
•
•

•
•
Demographic Data
Facial Landmarks
Sentiment Expressed
Image Quality
General Attributes

2012
SuperVision[1]
ILSVRC2012
[1] A. Krizhevsky, et al., Imagenet classification with deep convolutional neural networks, NIPS 2012.
[2] R Girshick, et al., Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR 2014.
[3] I.J. Goodfellow, et al., Generative Adversarial Nets, NIPS 2014.
[4] V. Badrinarayanan, et al, SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. PAMI 2017
2014
R-CNN[2] Pascal
VOC GAN[3] SegNet[4]
2015

https://gluon-cv.mxnet.io/model_zoo/classification.html
senet_154
resnet_v1d
resnet_v1c
resnet_v1b
resnet_v1
densenet
darknet
VGG
resnet_v2
mobilenet
mobilenetv2
0.80
0.75
0.70
Accuracy
1000 2000 #sample/sec.3000 4000
• ImageNet 80%
• V100 GPU

https://gluon-cv.mxnet.io/model_zoo/detection.html
mAP
10 100
#sample/sec.
40
35
30
yolo3
faster_rcnn
ssd
• (IoU )
mAP 30-40%
•

https://gluon-cv.mxnet.io/model_zoo/segmentation.html
0
10
20
30
40
50
60
70
80
90
100
fcn_resnet101 psp_resnet101 deeplab_resnet101 fcn_resnet101 psp_resnet101 deeplab_resnet101 deeplab_resnet152
COCO VOC
IoU

3 [1]
[1] B. Tekin, et al., Real-Time Seamless Single Shot 6D Object Pose Prediction, CVPR 2018.
[2] R. Girdhar, et al., Detect-and-Track: Efficient Pose Estimation in Videos, CVPR 2018.
[3] L. Chen, et al., MaskLab: Instance Segmentation by Refining Object Detection with
Semantic and Direction Features, CVPR 2018.
[2]
[3]

GANNoise
Text-to-image [3]
(and Image-to-text)[1]
[2]
[1] P. Isola, et al., Image-to-Image Translation with Conditional Adversarial Nets, CVPR 2017.
[2] C. Ledig, et al., Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, CVPR 2017.
[3] S. Reed, et al., Generative Adversarial Text to Image Synthesis, ICML 2016.

Saliency ( ) [1]
[1] N. Liu, et al., PiCANet: Learning Pixel-wise Contextual Attention for Saliency Detection, CVPR 2018.
[2] Z. Li, et al., MegaDepth: Learning Single-View Depth Prediction from Internet Photos, CVPR 2018.
[2]

0
2
4
6
8
10
12
14
16
18
20
1 2 3 4 5 6 7 8 9 1011121314151617181920
ID
[1] O. Vinyals, et al., Matching Networks for One Shot Learning, arXiv:1606.04080
•
• [1]

• Deep Learning
•
X. Yuan, et al., Adversarial Examples: Attacks and Defenses for Deep Learning, IEEE Trans Neural Netw Learn Syst. 2019.

•
•
•
•
ONNX
AutoML
Define-by-run

•
•
TensorFlow models
TF slim
GluonCV ChainerCV PyTorchCV

ResNet (Gluon vs MXNet)
num_unit = len(units)
assert(num_unit == num_stages)
data = mx.sym.Variable(name='data')
if dtype == 'float32':
data = mx.sym.identity(data=data, name='id')
else:
if dtype == 'float16':
data = mx.sym.Cast(data=data, dtype=np.float16)
data = mx.sym.BatchNorm(data=data, fix_gamma=True, eps=2e-5, momentum=bn_mom, name='bn_data')
(nchannel, height, width) = image_shape
if height <= 32: # such as cifar10
body = mx.sym.Convolution(data=data, num_filter=filter_list[0], kernel=(3, 3), stride=(1,1), pad=(1, 1),
no_bias=True, name="conv0", workspace=workspace)
else: # often expected to be 224 such as imagenet
body = mx.sym.Convolution(data=data, num_filter=filter_list[0], kernel=(7, 7), stride=(2,2), pad=(3, 3),
no_bias=True, name="conv0", workspace=workspace)
body = mx.sym.BatchNorm(data=body, fix_gamma=False, eps=2e-5, momentum=bn_mom, name='bn0')
body = mx.sym.Activation(data=body, act_type='relu', name='relu0')
body = mx.sym.Pooling(data=body, kernel=(3, 3), stride=(2,2), pad=(1,1), pool_type='max')
for i in range(num_stages):
body = residual_unit(body, filter_list[i+1], (1 if i==0 else 2, 1 if i==0 else 2), False,
name='stage%d_unit%d' % (i + 1, 1), bottle_neck=bottle_neck, workspace=workspace,
memonger=memonger)
for j in range(units[i]-1):
body = residual_unit(body, filter_list[i+1], (1,1), True, name='stage%d_unit%d' % (i + 1, j + 2),
bottle_neck=bottle_neck, workspace=workspace, memonger=memonger)
bn1 = mx.sym.BatchNorm(data=body, fix_gamma=False, eps=2e-5, momentum=bn_mom, name='bn1')
relu1 = mx.sym.Activation(data=bn1, act_type='relu', name='relu1')
MXNet
from mxnet.gluon.model_zoo import vision
resnet18 = vision.resnet18_v1()
Gluon

ONNX (Open Neural Network Exchange)
MXNet
Caffe2
PyTorch
TF
CNTKCoreML
Tensor
RT
NGraph
SNPE
•
ONNX ONNX
•

ONNX
Protocol Buffers
•
•
• API
Protocol Buffers
Graph Operator Tensor, …
Operator Definitions
ONNX Python API

Define-and-run Define-by-run
• Define-and-run
•
• TensorFlow, MXNet
• Define-by-run
•
• Chainer PyTorch, TensorFlow, MXNet

def our_function(A, B):
C = A + B
return C
A = Load_Data_A()
B = Load_Data_B()
result = our_function(A, B)
A = placeholder()
B = placeholder()
C = A + B
our_function =
compile(inputs=[A, B], outputs =[C])
A = Load_Data_A()
B = Load_Data_B()
https://gluon.mxnet.io/chapter07_distributed-learning/hybridize.html
Define
Run
Define, Run

Define-by-run
def our_function(A, B):
C = A + B
return C
A = Load_Data_A()
B = Load_Data_B()
A = placeholder()
B = placeholder()
C = A + B
our_function =
compile(inputs=[A, B], outputs =[C])
A = Load_Data_A()
B = Load_Data_B()

AutoML
•
• , etc.
D. Bayor, et al., TFX: A TensorFlow-Based Production-Scale Machine Learning Platform, KDD 2017.

AutoML
• AutoML
• ICML 2014 AutoML *
•
•
• Meta-Learning, Learning to learn
* https://sites.google.com/site/automlwsicml14/

AutoML

AutoML Amazon Forecast
User
CSV file
1. S3
2. Forecast
3. Forecast
4.

•
•
Model Server
Interpretable ML

Model Server
•
•
Model Server
•
• REST/RPC
Model Server Mobile client
Deploy
REST/RPC

TensorFlow Serving
[1] C. Olston, et al., TensorFlow-Serving: Flexible, High-Performance ML Serving, NIPS 2017.
• Controller, Synchronizer Serving job
• Router Serving job

MXNet Model Server
https://aws.amazon.com/jp/blogs/news/model-server-for-apache-mxnet-v1-0-released/
•
REST
API
• MMS 1.0
1,000
MMS 1.0
MMS 0.4

•
•
•
• AWS, SageMaker Neo
• Nvidia, TensorRT
Raspberry Pi
ResNet18 Mobilenet
11.5x
2.2x

SageMaker Neo / TVM
• Operator Fusion
• Data Layout Transformation
4x4 4x4
• Tensor Expression and Schedule Space
• Nested Parallelism with Cooperation
• etc…
T. Chen, et al., TVM: An Automated End-to-End Optimizing Compiler for Deep Learning, OSDI 2018.

TensorRT
• Layer & Tensor Fusion
1
• FP16 and INT8 Precision Calibration
FP32 FP16 INT8
• Kernel Auto-Tuning
• Dynamic Tensor Memory
• Multi Stream Execution
https://devblogs.nvidia.com/tensorrt-3-faster-tensorflow-inference/

Interpretable ML:
: SVM GBT
C. Molnar, Interpretable Machine Learning, https://christophm.github.io/interpretable-ml-book/
>900< 900
< 2000 km2 > 2000 km2

Interpretable ML for computer vision
•
•
M.T. Ribeiro, et al., Anchors: High-Precision Model-Agnostic Explanations, AAAI 2018.

•
•
•
• 1 1
•
• AWS Inferentia
• Intel Nervana

Machine Learning on FPGA
• FPGA
• AWS F1 instance Amazon Machine
Image
•
Loop tiling [1]
[1] C. Zhang, et al., Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks, FPGA 2015.

•
• GPU

•
•
AutoML AI
•

https://amzn.to/aws_dev

【14-C-7】コンピュータビジョンを支える深層学習技術の新潮流

Recomendados

Recomendados

Mais conteúdo relacionado

Semelhante a 【14-C-7】コンピュータビジョンを支える深層学習技術の新潮流

Semelhante a 【14-C-7】コンピュータビジョンを支える深層学習技術の新潮流 (20)

Mais de Developers Summit

Mais de Developers Summit (20)

Último

Último (20)

【14-C-7】コンピュータビジョンを支える深層学習技術の新潮流