Pr057 mask rcnn

•

8 gostaram•4,392 visualizações

Taeoh Kim

Tensorflow Korea 논문읽기 모임 PR12의 57번째 발표는 Instance Segmentation Framework인 Mask R-CNN 입니다

Engenharia

Bbox
Regression
Classification
RoI
from
Selective Search
RoI Pooling
FixedSizeRepresentation

Bbox
Regression
Classification
RoI Pooling
FixedSizeRepresentation
Bbox
Regression
Objectness
RPN
Region
Proposal
Network

32x32x3
Conv1
Pool1
16x16x64
Conv2
Pool2
8x8x128
Conv3
Pool3
4x4x256
Conv4
Pool4
2x2x512
Conv5
Pool5
1x1x512
1x1x512 Conv
1x1 Heatmap
x32 Upsample
Softmax
Remove Pooling
1x1 Conv for Heatmap Output

SlidefromMaskR-CNNTutorial, K.He.ICCV2017

BBox
Classification
Segmentation
Classification

BBox
Classification
Segmentation
Classification
Can Separate
Cannot Segment

BBox
Classification
Segmentation
Classification
Can Separate
Cannot Segment
Cannot Separate
Can Segment

BBox
Classification
Segmentation
Classification
Segmentation
in BBox
Classification
+ =
Can Separate
Cannot Segment
Cannot Separate
Can Segment

BBox
Classification
Segmentation
Classification
Segmentation
in BBox
Classification
Faster R-CNN FCN FCN
on BBOX !
+ =
+ =
Can Separate
Cannot Segment
Cannot Separate
Can Segment

FCN
• Pixel-level Classification
• Per Pixel Softmax (Multinomial)
• Multi Instance

FCN
• Pixel-level Classification
• Per Pixel Softmax (Multinomial)
• Multi Instance
Faster R-CNN
• Classification
• Instance Level RoI

FCN
• Pixel-level Classification
• Per Pixel Softmax  Sigmoid (Binary)
• Multi Instance
Faster R-CNN
• Classification
• Instance Level RoI

DB
BBox + Class + Mask
𝐿 = 𝐿𝑐𝑙𝑠 + 𝐿 𝑏𝑜𝑥 + 𝐿 𝑚𝑎𝑠𝑘
𝐿𝑐𝑙𝑠: Softmax Cross Entropy
𝐿 𝑏𝑜𝑥: Regression
𝐿 𝑚𝑎𝑠𝑘: Binary Cross Entropy

Training Phase
𝐿 𝑚𝑎𝑠𝑘 = 𝐿𝑐1 + 𝐿𝑐2 + ⋯+ 𝐿𝑐𝑘
𝐿 𝑚𝑎𝑠𝑘 = 𝐿𝑐3
if) GT Class is 3

Training Phase
𝐿 𝑚𝑎𝑠𝑘 = 𝐿𝑐1 + 𝐿𝑐2 + ⋯+ 𝐿𝑐𝑘
𝐿 𝑚𝑎𝑠𝑘 = 𝐿𝑐3
if) GT Class is 3
Mask Branch Only Learns How to Mask independent of Class

Test Phase
Predicts Human Mask
Predicts Car Mask
Predicts Horse Mask
Predicts ...

Test Phase
Predicts Human Mask
Predicts Car Mask
Predicts Horse Mask
Predicts ...
Winner Takes All

SlidefromMaskR-CNNTutorial, K.He.ICCV2017 FasterR-CNN,S.Ren,NIPS2015

SlidefromMaskR-CNNTutorial, K.He.ICCV2017
Deconv
2x2 str2
Deconv
2x2 str2

SlidefromMaskR-CNNTutorial, K.He.ICCV2017 3x3 Conv
4 Layer

SlidefromMaskR-CNNTutorial, K.He.ICCV2017
1x1 Conv
1x1 Conv

Bbox
Regression
Classification
RoI Pooling
FixedSizeRepresentation
Pooled Feature
7x7

RoI Pooling (Fast R-CNN)
• Input: Each RoI
• Output: 7x7 Pooled Feature
RoI Align (Mask R-CNN)
• Input: Each RoI
• Output: 7x7 Pooled Feature

Feature Map
RoI
Note:
Region Proposal Network RoI Prediction
= Floating Point Representation

Feature Map
RoI
2x2 Subcells for Precision

Bbox
Regression
Classification
RoI Align
Bbox
Regression
Objectness
RPN
Binary Mask

Bbox
Regression
Classification
RoI Align
Bbox
Regression
Objectness
RPN
Binary Mask
Paste Back

• Faster R-CNN + ResNet
Deep ResidualLearning for Image Recognition, K He, 2016 CVPR
• Faster R-CNN + FPN
Feature Pyramid Networks for Object Detection, T.Y.Lin 2017 CVPR

• Faster R-CNN + ResNet
Deep ResidualLearning for Image Recognition, K He, 2016 CVPR

• Faster R-CNN + FPN
Feature Pyramid Networks for Object Detection, T.Y.Lin 2017 CVPR

Faster R-CNN + Binary Mask Prediction + FCN + RoIAlign

Mais conteúdo relacionado

Mais procurados

Transformer in Computer VisionDongmin Choi

Introduction to Visual transformers leopauly

Tutorial on Object Detection (Faster R-CNN)Hwa Pyung Kim

Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya

Visualization using tSNEYan Xu

Object detection and Instance SegmentationHichem Felouat

Transformer in VisionSangmin Woo

[Paper] Multiscale Vision Transformers(MVit)Susang Kim

Faster R-CNN - PR012Jinwon Lee

Deep Learning for Computer Vision: Object Detection (UPC 2016)Universitat Politècnica de Catalunya

Emerging Properties in Self-Supervised Vision TransformersSungchul Kim

Semantic Segmentation Methods using Deep LearningSungjoon Choi

End-to-End Object Detection with TransformersSeunghyun Hwang

U-Net (1).pptxChangjin Lee

Convolutional neural network from VGG to DenseNetSungminYou

Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation岳華杜

Deep Learning for Video: Action Recognition (UPC 2018)Universitat Politècnica de Catalunya

ベイズ深層学習5章　ニューラルネットワークのベイズ推論　Bayesian deep learningssuserca2822

GoogLeNet InsightsAuro Tripathy

Deep Learning - Convolutional Neural NetworksChristian Perone

Mais procurados (20)

Transformer in Computer Vision

Introduction to Visual transformers

Tutorial on Object Detection (Faster R-CNN)

Image Retrieval (D4L5 2017 UPC Deep Learning for Computer Vision)

Visualization using tSNE

Object detection and Instance Segmentation

Transformer in Vision

[Paper] Multiscale Vision Transformers(MVit)

Faster R-CNN - PR012

Deep Learning for Computer Vision: Object Detection (UPC 2016)

Emerging Properties in Self-Supervised Vision Transformers

Semantic Segmentation Methods using Deep Learning

End-to-End Object Detection with Transformers

U-Net (1).pptx

Convolutional neural network from VGG to DenseNet

Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation

Deep Learning for Video: Action Recognition (UPC 2018)

ベイズ深層学習5章　ニューラルネットワークのベイズ推論　Bayesian deep learning

GoogLeNet Insights

Deep Learning - Convolutional Neural Networks

Semelhante a Pr057 mask rcnn

Pr045 deep lab_semantic_segmentationTaeoh Kim

Image-to-Image TranslationJunho Kim

Deep Learning for New User Interactions (Gestures, Speech and Emotions)Olivia Klose

On-the-fly Visual Category Search in Web-scale Image CollectionsKen Chatfield

Lec11 object-re-idUnited States Air Force Academy

Ilsvrc2015 deep residual_learning_kaiminghepramod naik

[第34回 WBA若手の会勉強会] Microsoft AI platformNaoki (Neo) SATO

ECCV2010: feature learning for image classification, part 4zukun

Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI ProjectsWee Hyong Tok

Class Weighted Convolutional Features for Image Retrieval Universitat Politècnica de Catalunya

Deep Learning for Computer Vision: Segmentation (UPC 2016)Universitat Politècnica de Catalunya

Auro tripathy - Localizing with CNNsAuro Tripathy

D3L4-objects.pdfssusere945ae

Pixel RNN to Pixel CNN++Dongheon Lee

Object Detection - Míriam Bellver - UPC Barcelona 2018Universitat Politècnica de Catalunya

Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya

The impact of visual saliency prediction in image classificationUniversitat Politècnica de Catalunya

Windows to reality getting the most out of direct3 d 10 graphics in your gameschangehee lee

Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...Universitat Politècnica de Catalunya

GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用NVIDIA Taiwan

Semelhante a Pr057 mask rcnn (20)

Pr045 deep lab_semantic_segmentation

Image-to-Image Translation

Deep Learning for New User Interactions (Gestures, Speech and Emotions)

On-the-fly Visual Category Search in Web-scale Image Collections

Lec11 object-re-id

Ilsvrc2015 deep residual_learning_kaiminghe

[第34回 WBA若手の会勉強会] Microsoft AI platform

ECCV2010: feature learning for image classification, part 4

Discovering Your AI Super Powers - Tips and Tricks to Jumpstart your AI Projects

Class Weighted Convolutional Features for Image Retrieval

Deep Learning for Computer Vision: Segmentation (UPC 2016)

Auro tripathy - Localizing with CNNs

D3L4-objects.pdf

Pixel RNN to Pixel CNN++

Object Detection - Míriam Bellver - UPC Barcelona 2018

Object Segmentation (D2L7 Insight@DCU Machine Learning Workshop 2017)

The impact of visual saliency prediction in image classification

Windows to reality getting the most out of direct3 d 10 graphics in your games

Interpretability of Convolutional Neural Networks - Eva Mohedano - UPC Barcel...

GTC Taiwan 2017 GPU 平台上導入深度學習於半導體產業之 EDA 應用

Mais de Taeoh Kim

CNN Attention NetworksTaeoh Kim

PR 127: FaceNetTaeoh Kim

PR 113: The Perception Distortion TradeoffTaeoh Kim

PR 103: t-SNETaeoh Kim

Pr083 Non-local Neural NetworksTaeoh Kim

Pr072 deep compressionTaeoh Kim

Mais de Taeoh Kim (6)

CNN Attention Networks

PR 127: FaceNet

PR 113: The Perception Distortion Tradeoff

PR 103: t-SNE

Pr083 Non-local Neural Networks

Pr072 deep compression

Último

welding defects observed during the weldingMuhammadUzairLiaqat

Vishratwadi & Ghorpadi Bridge Tender documentsSachinPawar510423

home automation using Arduino by Aditya Prasadaditya806802

Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3

Software and Systems Engineering Standards: Verification and Validation of Sy...VICTOR MAESTRE RAMIREZ

Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721

Solving The Right Triangles PowerPoint 2.pptJasonTagapanGulla

NO1 Certified Black Magic Specialist Expert Amil baba in Uae Dubai Abu Dhabi ...Amil Baba Dawood bangali

complete construction, environmental and economics information of biomass com...asadnawaz62

System Simulation and Modelling with types and Event SchedulingBootNeck1

THE SENDAI FRAMEWORK FOR DISASTER RISK REDUCTIONjhunlian

IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst

Internet of things -Arshdeep Bahga .pptxVelmuruganTECE

Correctly Loading Incremental Data at ScaleAlluxio, Inc.

Introduction-To-Agricultural-Surveillance-Rover.pptxk795866

Past, Present and Future of Generative AIabhishek36461

Transport layer issues and challenges - GuideGOPINATHS437943

Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis

Research Methodology for Engineering pdfCaalaaAbdulkerim

National Level Hackathon Participation Certificate.pdfRajuKanojiya4

Pr057 mask rcnn

1. Yonsei University MVP Lab.

3. Bbox Regression Classification RoI from Selective Search RoI Pooling FixedSizeRepresentation

4. Bbox Regression Classification RoI Pooling FixedSizeRepresentation Bbox Regression Objectness RPN Region Proposal Network

5. 32x32x3 Conv1 Pool1 16x16x64 Conv2 Pool2 8x8x128 Conv3 Pool3 4x4x256 Conv4 Pool4 2x2x512 Conv5 Pool5 1x1x512 1x1x512 Conv 1x1 Heatmap x32 Upsample Softmax Remove Pooling 1x1 Conv for Heatmap Output

7. SlidefromMaskR-CNNTutorial, K.He.ICCV2017

8. SlidefromMaskR-CNNTutorial, K.He.ICCV2017

9. Sheep Dog Human Sheep Sheep Sheep Sheep

10. Sheep Dog Human

11. Dog Human Sheep Sheep Sheep Sheep Sheep

12. BBox Classification Segmentation Classification

13. BBox Classification Segmentation Classification Can Separate Cannot Segment

14. BBox Classification Segmentation Classification Can Separate Cannot Segment Cannot Separate Can Segment

15. BBox Classification Segmentation Classification Segmentation in BBox Classification + = Can Separate Cannot Segment Cannot Separate Can Segment

16. BBox Classification Segmentation Classification Segmentation in BBox Classification + = Can Separate Cannot Segment Cannot Separate Can Segment Faster R-CNN FCN

17. BBox Classification Segmentation Classification Segmentation in BBox Classification Faster R-CNN FCN FCN on BBOX ! + = + = Can Separate Cannot Segment Cannot Separate Can Segment

18. SlidefromMaskR-CNNTutorial, K.He.ICCV2017

19.

20.

21.

22.

23.

24.

25.

26.

27.

28. FCN • Pixel-level Classification • Per Pixel Softmax (Multinomial) • Multi Instance

29. FCN • Pixel-level Classification • Per Pixel Softmax (Multinomial) • Multi Instance Faster R-CNN • Classification • Instance Level RoI

30. FCN • Pixel-level Classification • Per Pixel Softmax (Multinomial) • Multi Instance Faster R-CNN • Classification • Instance Level RoI

31. FCN • Pixel-level Classification • Per Pixel Softmax  Sigmoid (Binary) • Multi Instance Faster R-CNN • Classification • Instance Level RoI

32. FCN • Pixel-level Classification • Per Pixel Softmax  Sigmoid (Binary) • Multi Instance Faster R-CNN • Classification • Instance Level RoI

33. DB BBox + Class + Mask 𝐿 = 𝐿𝑐𝑙𝑠 + 𝐿 𝑏𝑜𝑥 + 𝐿 𝑚𝑎𝑠𝑘 𝐿𝑐𝑙𝑠: Softmax Cross Entropy 𝐿 𝑏𝑜𝑥: Regression 𝐿 𝑚𝑎𝑠𝑘: Binary Cross Entropy

34. Training Phase 𝐿 𝑚𝑎𝑠𝑘 = 𝐿𝑐1 + 𝐿𝑐2 + ⋯+ 𝐿𝑐𝑘 𝐿 𝑚𝑎𝑠𝑘 = 𝐿𝑐3 if) GT Class is 3

35. Training Phase 𝐿 𝑚𝑎𝑠𝑘 = 𝐿𝑐1 + 𝐿𝑐2 + ⋯+ 𝐿𝑐𝑘 𝐿 𝑚𝑎𝑠𝑘 = 𝐿𝑐3 if) GT Class is 3 Mask Branch Only Learns How to Mask independent of Class

36. Test Phase Predicts Human Mask Predicts Car Mask Predicts Horse Mask Predicts ...

37. Test Phase Predicts Human Mask Predicts Car Mask Predicts Horse Mask Predicts ... Winner Takes All

38.

39. SlidefromMaskR-CNNTutorial, K.He.ICCV2017

40. SlidefromMaskR-CNNTutorial, K.He.ICCV2017

41. SlidefromMaskR-CNNTutorial, K.He.ICCV2017 FasterR-CNN,S.Ren,NIPS2015

42. SlidefromMaskR-CNNTutorial, K.He.ICCV2017 Deconv 2x2 str2 Deconv 2x2 str2

43. SlidefromMaskR-CNNTutorial, K.He.ICCV2017 3x3 Conv 4 Layer

44. SlidefromMaskR-CNNTutorial, K.He.ICCV2017 1x1 Conv 1x1 Conv

45.

46. SlidefromMaskR-CNNTutorial, K.He.ICCV2017

47. Bbox Regression Classification RoI Pooling FixedSizeRepresentation Pooled Feature 7x7

48. RoI Pooling (Fast R-CNN) • Input: Each RoI • Output: 7x7 Pooled Feature RoI Align (Mask R-CNN) • Input: Each RoI • Output: 7x7 Pooled Feature

49. RoI Pooling (Fast R-CNN) • Input: Each RoI • Output: 7x7 Pooled Feature RoI Align (Mask R-CNN) • Input: Each RoI • Output: 7x7 Pooled Feature

50. Feature Map RoI Note: Region Proposal Network RoI Prediction = Floating Point Representation

51. Feature Map RoI

52. Feature Map RoI

53. Feature Map RoI Max Pooling

54. Feature Map RoI Max Pooling

55. Feature Map RoI

56. Feature Map RoI

57. Feature Map RoI 2x2 Subcells for Precision

58. = 0.15 + 0.25 + 0.25 + 0.35 RoI

59. Feature Map RoI 2x2 Subcell Max Pooling

60. Bbox Regression Classification RoI Align Bbox Regression Objectness RPN Binary Mask

61. Bbox Regression Classification RoI Align Bbox Regression Objectness RPN Binary Mask Paste Back

62. SlidefromMaskR-CNNTutorial, K.He.ICCV2017

63.

64. • Faster R-CNN + ResNet Deep ResidualLearning for Image Recognition, K He, 2016 CVPR • Faster R-CNN + FPN Feature Pyramid Networks for Object Detection, T.Y.Lin 2017 CVPR

65. • Faster R-CNN + ResNet Deep ResidualLearning for Image Recognition, K He, 2016 CVPR

66. • Faster R-CNN + FPN Feature Pyramid Networks for Object Detection, T.Y.Lin 2017 CVPR

67.

68. Faster R-CNN + Binary Mask Prediction + FCN + RoIAlign

69. Faster R-CNN + Binary Mask Prediction + FCN + RoIAlign

70. Detection Performance Improvement

71.

72.

73. Q&A?

Pr057 mask rcnn

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Pr057 mask rcnn

Semelhante a Pr057 mask rcnn (20)

Mais de Taeoh Kim

Mais de Taeoh Kim (6)

Último

Último (20)

Pr057 mask rcnn