Event: GDG Munich February Meetup: Machine Learning, 27.02.2018
Speaker: Stanislav Frolov, inovex
Mehr Tech-Vorträge: https://www.inovex.de/de/content-pool/vortraege/
Mehr Tech-Artikel im inovex Blog: https://www.inovex.de/blog
2. ● Computer vision
● Human vision
● Traditional approaches and methods
● Artificial neural networks
● Summary
2
Outline of this talk
What we are going to talk about
3. ● trained deep neural networks for object
detection during master thesis
● still fascinated and interested
3
Stanislav Frolov
Big Data Engineer @inovex
4. ● Teach computers how to see
● Automatic extraction, analysis and understanding of
images
● Infer useful information, interpret and make decisions
● Automate tasks that human visual system can do
● One of the most exciting fields in AI and ML
4
What is computer vision
General
5. 5
What is computer vision
Motivation
● Era of pixels
● Internet consists
mostly of images
● Explosion of visual
data
● Cannot be labeled
by humans
6. 6
What is computer vision
Drivers
● Two drivers for computer vision explosion
○ Compute (faster and cheaper)
○ Data (more data > algorithms)
7. 7
What is computer vision
Interdisciplinary field
Computer Science
Mathematics
Engineering
Physics
Biology
Psychology
Information
Retrieval
Machine
LearningGraphs,
Algorithms
Systems
Architecture
Robotics
Speech,
NLP
Image
Processing
Optics
Solid-State
Physics
Neuroscience
Cognitive
SciencesBiological vision
9. ● Imaging for statistical pattern recognition
● Image transformations such as pixel-by-pixel operations
○ Contrast enhancement
○ Edge extraction
○ Noise reduction
○ Geometrical and spatial operations (i.e rotations)
9
What is computer vision
Related fields - image processing
10. ● Creates new images from scene descriptions
● Produces image data from 3D models
● “Inverse” of computer vision
● AR as a combination of both
10
What is computer vision
Related fields - computer graphics
11. ● Mainly manufacturing applications
● Image-based automatic inspection, process control,
robot guidance
● Usually employs strong assumptions (colour, shape,
light, structure, orientation, ...) -> works very well
● Output often pass/fail or good/bad
● Additionally numerical/measurement data, counts
11
What is computer vision
Related fields - machine vision
12. ● Create “intelligent” systems
● Studying computational aspects of intelligence
● Make computers do things at which, at the moment,
people are better
● Many techniques play an important role (ML, ANNs)
● Currently does a few things better/faster at scale than
humans can
● Ability to do anything “human” is not answered
12
What is computer vision
Related fields - AI
13. ● Related fields have a large intersection
● Basic techniques used, developed and studied are very
similar
13
What is computer vision
Related fields- summary
15. ● Two stage process
○ Eyes take in light reflected off the objects and retina
converts 3D objects into 2D images
○ Brain’s visual system interprets 2D images and “rebuilds”
a 3D model
15
What is human vision
General
16. ● Pair of 2D images with slightly different view allows to
infer depth
● Position of nearby objects will vary more across the two
images than the position of more distant objects
16
What is human vision
Stereoscopic vision
17. ● Prior knowledge of relative sizes and depths is often key
for understanding and interpretation
17
What is human vision
Prior knowledge
18. ● Texture and texture change helps solving depth
perception
18
What is human vision
Texture pattern
19. 19
What is human vision
Biases and illusions in human perception
● Shadows make all the difference in interpretation
● Gradual changes in light ignored to not be misled by
shadow
20. 20
What is human vision
A few more illusions
● Two arrows with different orientations have the same
length
21. ● Assumptions and familiarity (distorted room)
● Face recognition bias
● Up-down orientation bias
21
What is human vision
Biases and illusions in human perception
22. 22
What is human vision
Summary
● Illusions are fun, but the complete puzzle to understand
human vision is far from being complete
34. 34
What is computer vision
Why it is difficult
● Occlusion
● Deformation
● Scale
● Clutter
● Illumination
● Viewpoint
● Object pose
● Tons of classes and
variants
● Often n:1 mapping
● Computationally
expensive
● Full understanding of
biological vision is
missing
36. ● Input: image(s) + labels
● Output: Semantic data, labels
● Digital image pixels usually have three channels [R,G,B]
each [0...255] + Location[x,y]
● Digital images are just vectors
36
What is computer vision
System overview
37. 1. Image acquisition (camera, sensors)
2. Pre-processing (sampling, noise reduction,
augmentation)
3. Feature extraction (lines, edges, regions, points)
4. Detection and segmentation
5. Post-processing (verification, estimation, recognition)
6. Decision making
● -> Ability of a machine to step back and interpret the big
picture of those pixels
37
What is computer vision
System overview
39. 1950s
● 2D imaging for statistical pattern recognition
● Theory of optical flow based on a fixed point
towards which one moves
39
What is computer vision
History
40. Image processing
● Histograms
● Filtering
● Stitching
● Thresholding
● ...
40
What is computer vision
Traditional approaches
41. 1960s
● Desire to extract 3D structure from 2D images for
scene understanding
● Began at pioneering AI universities to mimic human
visual system as stepping stone for intelligent robots
● Summer vision project at MIT: attach camera to
computer and having it “describe what it saw”
41
What is computer vision
History
42. ● Given to 10 undergraduate students
● … an attempt to use our summer workers effectively …
● … construction of a significant part of a visual system …
● … task can be segmented into sub-problems …
● … participate in the construction of a system complex
enough to be a real landmark in the development of
“pattern recognition” …
42
What is computer vision
History: summer vision project @MIT 1966
43. ● Goal: analyse scenes and identify objects
● Structure of system:
○ Region proposal
○ Property lists for regions
○ Boundary construction
○ Match with properties
○ Segment
● Basic foreground/background segmentation with simple
objects (cubes, cylinders, ….)
43
What is computer vision
History: summer vision project @MIT 1966
44. ● Unlike general intelligence, computer vision seemed
tractable
● Amusing anecdote, but it did never aimed to “solve”
computer vision
● Computer vision today differs from what it was thought
to be in 1966
44
What is computer vision
History: summer vision project @MIT 1966
45. 1970s
● Formed many algorithms that exist today
● Edges, lines and objects as interconnected
structures
45
What is computer vision
History
46. 46
What is computer vision
Traditional approaches
Edge detection based on
● Brightness
● Gradients
● Geometry
● Illumination
47. 47
What is computer vision
Traditional approaches - part based detector
● Objects composed of features of parts and their spatial
relationship
● Challenge: how to define and combine
48. 1980s
● More rigorous mathematical analysis and
quantitative aspects
● Optical character recognition
● Sliding window approaches
● Usage of artificial neural networks
48
What is computer vision
History
49. 49
What is computer vision
Traditional approaches - HOG detection (histogram of
oriented gradients)
● Concept in 80s but used only in 2005
● Create HOG descriptors (object generalizations)
● One feature vector per object
● Train with SVM
● Sliding window @multiple scales
50. 50
What is computer vision
Traditional approaches - HOG detection (histogram of
oriented gradients)
● Computation of HOG descriptors:
1. Compute gradients
2. Compute histograms on cells
3. Normalize histograms
4. Concatenate histograms
● Requires a lot of engineering
● Must build ensembles of feature descriptors
51. 1990s
● Significant interaction with computer graphics
(rendering, morphing, stitching)
● Approaches using statistical learning
● Eigenface (Ghostfaces) through principal component
analysis (PCA)
51
What is computer vision
History
52. 52
What is computer vision
Traditional approaches - deformable parts model (DPM)
● Objects constructed by its parts
● First match whole object, then refine on the parts
● HOG + part-based + modern features
● Slow but good at difficult objects
● Involves many heuristics
53. 53
What is computer vision
Features
● Feature points
○ Small area of pixels with certain properties
● Feature detection
○ Use features for identification
○ Activate if “object” present
● Examples:
○ Lines, edges, colours, blobs, …
○ Animals, faces, cars, ...
54. 54
What is computer vision
Traditional approaches - classical recognition
● Init: extract features for objects in different scales,
colours, orientations, rotations, occlusion levels
● Inference: extract features from query image and find
closest match in database or train a classifier
● Computationally expensive (hundreds of features in
image, millions in database) and complex due to errors
and mismatches
55. 55
What is computer vision
History
Before the new era
● Bags of features
● Handcrafted ensembles
Input Feat. 2
Feat. 1
Feat. n
Final
Decision
Feature Extraction
57. ● Elementary building
block
● Inspired by biological
neurons
● Mathematical function
y=f(wx+b)
● Learnable weights
57
Artificial neural networks
Fundamentals - artificial neuron
58. ● Collection of neurons
organized in layers
● Universal
approximators
● Fully-connected
network here
58
Artificial neural networks
Fundamentals - artificial neural networks
59. 59
Artificial neural networks
Fundamentals - training
● Basically an optimization
problem
● Find minimum of a loss
function by an iterative
process (training)
● Designing the loss function
is sometimes tricky
60. 60
Artificial neural networks
Fundamentals - training
Simple optimizer algorithm:
1. Forward pass with a batch of data
2. Calculate error between actual and wanted output
3. Nudge weights in proportion to error into the right
direction (same data would result in smaller error)
4. Repeat until convergence
61. 61
Artificial neural networks
Fundamentals - CNN
● Local neighborhood
contributes to activation
● Exploit spatial
information
● Hierarchical feature
extractors
● Less parameters input
activation
filters
receptive field
69. ● Challenges on a subset of ImageNet
○ 14kk labeled images
○ 20k object categories
● ILSVRC* usually on 10k categories including 90 out of
120 dog breeds
69
Benchmark
Datasets - ImageNet
*ImageNet Large Scale Visual Recognition
Challenge
70. ● ILSVRC 2012 winner by a large margin from 25% to 16%
● Proved effectiveness of CNNs and kicked of a new era
● 8 layers, 650k neurons, 60kk parameters
70
Artificial neural networks
Roadmap - AlexNet
71. ● ILSVRC 2013 winner with a best top-5 error of 11.6%
● AlexNet but using smaller 7x7 kernels to keep more
information in deeper layers
71
Artificial neural networks
Roadmap - ZFNet
72. ● ILSVRC 2013 localization winner
● Uses AlexNet on multi-scale input images with sliding
window approach
● Accumulates bounding boxes for final detection (instead
of non-max suppression)
72
Artificial neural networks
Roadmap - OverFeat
73. ● 2k proposals generated by selective search
● SVM trained for classification
● Multi-stage pipeline
73
Artificial neural networks
Roadmap - RCNN (region based CNN)
74. ● Not a winner but famous due to simplicity and
effectiveness
● Replace large-kernel convolutions by stacking several
small-kernel convolutions
74
Artificial neural networks
Roadmap - VGGNet
76. ● Jointly learns region proposal and detection
● Employs a region of interest (RoI) that allows to reuse
the computations
76
Artificial neural networks
Roadmap - Fast RCNN
77. ● Directly predicts all objects and classes in one shot
● Very fast
● Processes images at ~40 FPS on a Titan X GPU
● First real-time state-of-the-art detector
● Divides input images into multiple grid cells which are
then classified
77
Artificial neural networks
Roadmap - YOLO (you only look once)
78. ● ILSVRC 2015 winner with a 3.6% error rate (human
performance is 5-10%)
● Employs residual blocks which allows to build deep
networks (hundreds of layers)
● Additional identity mapping
78
Artificial neural networks
Roadmap - ResNet (Microsoft)
79. ● Not a recognition network
● A region proposal network
● Popularized prior/anchor boxes (found through
clustering) to predict offsets
● Much better strategy than starting the predictions with
random coordinates
● Since then heuristic approaches have been gradually
fading out and replaced
79
Artificial neural networks
Roadmap - MultiBox
80. ● Fast RCNN with heuristic region proposal replaced by
region proposal network (RPN) inspired by MultiBox
● RPN shares full-image convolutional features with the
detection network (cost-free region proposal)
● RPN uses “attention” mechanism to tell where to look
● ~5 FPS on a Titan K40 GPU
● End-to-end training
80
Artificial neural networks
Roadmap - Faster RCNN
81. ● SSD leverages the Faster RCNN’s RPN to directly
classify objects inside each prior box (similar to YOLO)
● Predicts category scores and box offsets for a fixed set
of default bounding boxes
● Fixes the predefined grid cells used in YOLO by using
multiple aspect ratios
● Produces predictions of different scales
● ~59 FPS
81
Artificial neural networks
Roadmap - SSD (single shot multibox detector)
82. ● Open-source software library for machine learning
applications
● Tensorflow Object Detection API
○ A collection of pretrained models
○ construct, train and deploy object detection models
82
Artificial neural networks
TensorFlow object detection API
84. ● Humans are good at understanding the big picture
● Neural networks are good at details
● But they can be fooled...
84
Summary
Human vs machine
85. ● Need a large amount data
● Lots of engineering
● Trial and error
● Long training time
● Still lots of hyperparameter parameter tuning
● No general network (generalization not answered)
● Little mathematical foundation
85
Summary
Computer vision is still difficult
86. ● Despite all of these advances, the dream of having a
computer interpret an image at the same level as a
human remains unrealized
86
Summary
Computer vision is hard
87. Thank You
Stanislav Frolov
Big Data Engineer
sfrolov@inovex.de
0173 318 11 35
inovex GmbH
Lindberghstraße 3
80939 München