Deep learning techniques ignited a great progress in many computer vision tasks like image classification, object detection, and segmentation. Almost every month a new method is published that achieves state-of-the-art result on some common benchmark dataset. In addition to that, DL is being applied to new problems in CV.
In the talk we’re going to focus on DL application to image segmentation task. We want to show the practical importance of this task for the fashion industry by presenting our case study with results achieved with various attempts and methods.
6. Instance Aware Segmentation
◦ Detect instances
◦ Annotate each pixel
◦ Simultaneous
detection and
segmentation
◦ Recent challenge in
MS-COCO
7. Traditional methods
Kota Yamaguchi, M Hadi Kiapour, Tamara L Berg, "Paper Doll Parsing:
Retrieving Similar Styles to Parse Clothing Items", ICCV 2013
● Multi-stage pipeline with image features engineered by
hand (HoGs, MR8 etc.)
● Segmentation -> classification of every pixel with linear
regression
9. Convolutional neural networks
● Firstly used successfully in classification task
● Three basic operations: convolution, pooling,
nonlinearity function
10. Semantic segmentation with CNN
CNN DRESS
Input Extract Patch Classify
center pixel
Repeat for each
pixel
23. Learnable upsampling: deconvolution
3 x 3 “deconvolution”, stride 2 pad 1
Input: 2 x 2 Output: 4 x 4
Input gives
weight for filter
Sum where
output overlaps
24. Deconvolution Network for Semantic Segmentation
Normal VGG “Upside down”
VGG
Noh, Hong and Hang, “Learning Deconvolution Network for Semantic
Segmentation”, arXiv 2015
28. DeepLab: Atrous Convolution and Fully Connected CRFs
Chen, Papandreou, Kokkinos, Murphy, Yuille “Semantic Image Segmentation with Deep
Convolutional Nets and Fully Connected CRFs”, ICLR 2015
● Conditional random field used as a post-processing
step
31. Atrous convolution
● Performing convolution on downsampled input, later upsampling the result to
original resolution
● Performing convolution with holes on originally-sized input
33. Clothing parsing
◦ Goal: detect and segment some basic clothing
categories: dresses, bags, shoes, trousers etc. on
humans
◦ We need precise clothing masks for further
processing (image search, color detection)
◦ The biggest publicly available dataset contains 7,7k
images
36. Clothing parsing with general segmentation
◦ DeepLab model basing on VGG-16 architecture
◦ Both variants: with and without CRF post-processing
◦ Finetuning from VGG-16 trained on ImageNet
classification challenge
◦ Images resized to 513 x 513 resolution
◦ Training details
▫ Batch size: 8
▫ 20k iterations - 10 epochs
▫ Dataset divided into train/test in ratio = 0.9
37. Clothing parsing with general segmentation: results
Input
DeepLab
+ CRFDeepLab
Ground
truth
38. Clothing parsing with general segmentation: results
DeepLab:
DeepLab
+ CRF:
Ground
truth
Input
40. Clothing parsing with detection and segmentation
● Detecting category with
object detector like R-CNN,
SSD, YOLO etc.
● Segmenting the object inside
bounding box with models
like DeepLab, DeepCut etc.
● Motivation: it’s much faster
to gather bounding box level
annotations than pixel-wise
annotations
● Hypothesis: given correct
bounding box it’s easier to
segment clothing item than
on whole image
41. Single Shot Multibox Detector (SSD)
Wen Liu et. al,, "SSD: Single Shot Multibox Detector",
2016