SlideShare uma empresa Scribd logo
1 de 87
Baixar para ler offline
Computer Vision
From traditional approaches to deep neural
networks
Stanislav Frolov München, 27.02.2018
● Computer vision
● Human vision
● Traditional approaches and methods
● Artificial neural networks
● Summary
2
Outline of this talk
What we are going to talk about
● trained deep neural networks for object
detection during master thesis
● still fascinated and interested
3
Stanislav Frolov
Big Data Engineer @inovex
● Teach computers how to see
● Automatic extraction, analysis and understanding of
images
● Infer useful information, interpret and make decisions
● Automate tasks that human visual system can do
● One of the most exciting fields in AI and ML
4
What is computer vision
General
5
What is computer vision
Motivation
● Era of pixels
● Internet consists
mostly of images
● Explosion of visual
data
● Cannot be labeled
by humans
6
What is computer vision
Drivers
● Two drivers for computer vision explosion
○ Compute (faster and cheaper)
○ Data (more data > algorithms)
7
What is computer vision
Interdisciplinary field
Computer Science
Mathematics
Engineering
Physics
Biology
Psychology
Information
Retrieval
Machine
LearningGraphs,
Algorithms
Systems
Architecture
Robotics
Speech,
NLP
Image
Processing
Optics
Solid-State
Physics
Neuroscience
Cognitive
SciencesBiological vision
Synonyms?
8
● Imaging for statistical pattern recognition
● Image transformations such as pixel-by-pixel operations
○ Contrast enhancement
○ Edge extraction
○ Noise reduction
○ Geometrical and spatial operations (i.e rotations)
9
What is computer vision
Related fields - image processing
● Creates new images from scene descriptions
● Produces image data from 3D models
● “Inverse” of computer vision
● AR as a combination of both
10
What is computer vision
Related fields - computer graphics
● Mainly manufacturing applications
● Image-based automatic inspection, process control,
robot guidance
● Usually employs strong assumptions (colour, shape,
light, structure, orientation, ...) -> works very well
● Output often pass/fail or good/bad
● Additionally numerical/measurement data, counts
11
What is computer vision
Related fields - machine vision
● Create “intelligent” systems
● Studying computational aspects of intelligence
● Make computers do things at which, at the moment,
people are better
● Many techniques play an important role (ML, ANNs)
● Currently does a few things better/faster at scale than
humans can
● Ability to do anything “human” is not answered
12
What is computer vision
Related fields - AI
● Related fields have a large intersection
● Basic techniques used, developed and studied are very
similar
13
What is computer vision
Related fields- summary
Short trip to human vision
14
● Two stage process
○ Eyes take in light reflected off the objects and retina
converts 3D objects into 2D images
○ Brain’s visual system interprets 2D images and “rebuilds”
a 3D model
15
What is human vision
General
● Pair of 2D images with slightly different view allows to
infer depth
● Position of nearby objects will vary more across the two
images than the position of more distant objects
16
What is human vision
Stereoscopic vision
● Prior knowledge of relative sizes and depths is often key
for understanding and interpretation
17
What is human vision
Prior knowledge
● Texture and texture change helps solving depth
perception
18
What is human vision
Texture pattern
19
What is human vision
Biases and illusions in human perception
● Shadows make all the difference in interpretation
● Gradual changes in light ignored to not be misled by
shadow
20
What is human vision
A few more illusions
● Two arrows with different orientations have the same
length
● Assumptions and familiarity (distorted room)
● Face recognition bias
● Up-down orientation bias
21
What is human vision
Biases and illusions in human perception
22
What is human vision
Summary
● Illusions are fun, but the complete puzzle to understand
human vision is far from being complete
Back to computer vision
23
● Recognition
● Localization
● Detection
● Segmentation
24
What is computer vision
Typical tasks
● Part-based detection
○ Deformable parts model
○ Pose estimation and poselets
25
What is computer vision
Typical tasks
● Image captioning
(actions, attributes)
26
What is computer vision
Typical tasks
● Motion analysis
○ Egomotion (camera)
○ Optical flow (pixels)
27
What is computer vision
Typical tasks
● Scene understanding and reconstruction
28
What is computer vision
Typical tasks
● Image restoration
● Colouring black & white photos
29
What is computer vision
Typical tasks
Solving this is useful for many applications
30
31
What is computer vision
Typical applications
● Assistance systems for cars and people
● Surveillance
● Navigation (obstacle avoidance, road following, path
planning)
● Photo interpretation
● Military (“smart” weapons)
● Manufacturing (inspection, identification)
● Robotics
● Autonomous vehicles (dangerous zones)
32
What is computer vision
Typical applications
● Recognition and tracking
● Event detection
● Interaction (man-machine interfaces)
● Modeling (medical, manufacturing, training, education)
● Organizing (database index, sorting/clustering)
● Fingerprint and biometrics
● …
Why so difficult?
33
34
What is computer vision
Why it is difficult
● Occlusion
● Deformation
● Scale
● Clutter
● Illumination
● Viewpoint
● Object pose
● Tons of classes and
variants
● Often n:1 mapping
● Computationally
expensive
● Full understanding of
biological vision is
missing
System overview
35
● Input: image(s) + labels
● Output: Semantic data, labels
● Digital image pixels usually have three channels [R,G,B]
each [0...255] + Location[x,y]
● Digital images are just vectors
36
What is computer vision
System overview
1. Image acquisition (camera, sensors)
2. Pre-processing (sampling, noise reduction,
augmentation)
3. Feature extraction (lines, edges, regions, points)
4. Detection and segmentation
5. Post-processing (verification, estimation, recognition)
6. Decision making
● -> Ability of a machine to step back and interpret the big
picture of those pixels
37
What is computer vision
System overview
Some history
38
1950s
● 2D imaging for statistical pattern recognition
● Theory of optical flow based on a fixed point
towards which one moves
39
What is computer vision
History
Image processing
● Histograms
● Filtering
● Stitching
● Thresholding
● ...
40
What is computer vision
Traditional approaches
1960s
● Desire to extract 3D structure from 2D images for
scene understanding
● Began at pioneering AI universities to mimic human
visual system as stepping stone for intelligent robots
● Summer vision project at MIT: attach camera to
computer and having it “describe what it saw”
41
What is computer vision
History
● Given to 10 undergraduate students
● … an attempt to use our summer workers effectively …
● … construction of a significant part of a visual system …
● … task can be segmented into sub-problems …
● … participate in the construction of a system complex
enough to be a real landmark in the development of
“pattern recognition” …
42
What is computer vision
History: summer vision project @MIT 1966
● Goal: analyse scenes and identify objects
● Structure of system:
○ Region proposal
○ Property lists for regions
○ Boundary construction
○ Match with properties
○ Segment
● Basic foreground/background segmentation with simple
objects (cubes, cylinders, ….)
43
What is computer vision
History: summer vision project @MIT 1966
● Unlike general intelligence, computer vision seemed
tractable
● Amusing anecdote, but it did never aimed to “solve”
computer vision
● Computer vision today differs from what it was thought
to be in 1966
44
What is computer vision
History: summer vision project @MIT 1966
1970s
● Formed many algorithms that exist today
● Edges, lines and objects as interconnected
structures
45
What is computer vision
History
46
What is computer vision
Traditional approaches
Edge detection based on
● Brightness
● Gradients
● Geometry
● Illumination
47
What is computer vision
Traditional approaches - part based detector
● Objects composed of features of parts and their spatial
relationship
● Challenge: how to define and combine
1980s
● More rigorous mathematical analysis and
quantitative aspects
● Optical character recognition
● Sliding window approaches
● Usage of artificial neural networks
48
What is computer vision
History
49
What is computer vision
Traditional approaches - HOG detection (histogram of
oriented gradients)
● Concept in 80s but used only in 2005
● Create HOG descriptors (object generalizations)
● One feature vector per object
● Train with SVM
● Sliding window @multiple scales
50
What is computer vision
Traditional approaches - HOG detection (histogram of
oriented gradients)
● Computation of HOG descriptors:
1. Compute gradients
2. Compute histograms on cells
3. Normalize histograms
4. Concatenate histograms
● Requires a lot of engineering
● Must build ensembles of feature descriptors
1990s
● Significant interaction with computer graphics
(rendering, morphing, stitching)
● Approaches using statistical learning
● Eigenface (Ghostfaces) through principal component
analysis (PCA)
51
What is computer vision
History
52
What is computer vision
Traditional approaches - deformable parts model (DPM)
● Objects constructed by its parts
● First match whole object, then refine on the parts
● HOG + part-based + modern features
● Slow but good at difficult objects
● Involves many heuristics
53
What is computer vision
Features
● Feature points
○ Small area of pixels with certain properties
● Feature detection
○ Use features for identification
○ Activate if “object” present
● Examples:
○ Lines, edges, colours, blobs, …
○ Animals, faces, cars, ...
54
What is computer vision
Traditional approaches - classical recognition
● Init: extract features for objects in different scales,
colours, orientations, rotations, occlusion levels
● Inference: extract features from query image and find
closest match in database or train a classifier
● Computationally expensive (hundreds of features in
image, millions in database) and complex due to errors
and mismatches
55
What is computer vision
History
Before the new era
● Bags of features
● Handcrafted ensembles
Input Feat. 2
Feat. 1
Feat. n
Final
Decision
Feature Extraction
The new era of computer vision
56
● Elementary building
block
● Inspired by biological
neurons
● Mathematical function
y=f(wx+b)
● Learnable weights
57
Artificial neural networks
Fundamentals - artificial neuron
● Collection of neurons
organized in layers
● Universal
approximators
● Fully-connected
network here
58
Artificial neural networks
Fundamentals - artificial neural networks
59
Artificial neural networks
Fundamentals - training
● Basically an optimization
problem
● Find minimum of a loss
function by an iterative
process (training)
● Designing the loss function
is sometimes tricky
60
Artificial neural networks
Fundamentals - training
Simple optimizer algorithm:
1. Forward pass with a batch of data
2. Calculate error between actual and wanted output
3. Nudge weights in proportion to error into the right
direction (same data would result in smaller error)
4. Repeat until convergence
61
Artificial neural networks
Fundamentals - CNN
● Local neighborhood
contributes to activation
● Exploit spatial
information
● Hierarchical feature
extractors
● Less parameters input
activation
filters
receptive field
62
Artificial neural networks
Fundamentals - CNN
● Filter of size 3x3 applied to an input of 7x7
63
Artificial neural networks
Fundamentals - pooling
● Max-pooling
● Dimension reduction/adaption
● Existence is more important than location
64
Artificial neural networks
Fundamentals - pooling
● Zero-padding
● Controlling dimensions
65
Artificial neural networks
Fundamentals - general network architecture
Input
image
convolutional layers
... Final
decision
66
Artificial neural networks
Fundamentals - hierarchical feature extractors
Lines, edges, blobs,
colours, ...
Abstract objectsParts of abstract objects
First layers Deeper layers
Activations
for:
Modern history of object recognition
67
● Classification and detection
○ 27k images
○ 20 classes
■ person, bird, cat, cow, dog, horse, sheep, aeroplane,
bicycle, boat, bus, car, motorbike, train, bottle,
chair, dining table, potted plant, sofa, tv/ monitor
68
Benchmark
Datasets - PASCAL VOC
● Challenges on a subset of ImageNet
○ 14kk labeled images
○ 20k object categories
● ILSVRC* usually on 10k categories including 90 out of
120 dog breeds
69
Benchmark
Datasets - ImageNet
*ImageNet Large Scale Visual Recognition
Challenge
● ILSVRC 2012 winner by a large margin from 25% to 16%
● Proved effectiveness of CNNs and kicked of a new era
● 8 layers, 650k neurons, 60kk parameters
70
Artificial neural networks
Roadmap - AlexNet
● ILSVRC 2013 winner with a best top-5 error of 11.6%
● AlexNet but using smaller 7x7 kernels to keep more
information in deeper layers
71
Artificial neural networks
Roadmap - ZFNet
● ILSVRC 2013 localization winner
● Uses AlexNet on multi-scale input images with sliding
window approach
● Accumulates bounding boxes for final detection (instead
of non-max suppression)
72
Artificial neural networks
Roadmap - OverFeat
● 2k proposals generated by selective search
● SVM trained for classification
● Multi-stage pipeline
73
Artificial neural networks
Roadmap - RCNN (region based CNN)
● Not a winner but famous due to simplicity and
effectiveness
● Replace large-kernel convolutions by stacking several
small-kernel convolutions
74
Artificial neural networks
Roadmap - VGGNet
● ILSVRC 2014 winner
● Stacks up “inception” modules
● 22 layers, 5kk parameters
75
Artificial neural networks
Roadmap - InceptionNet (GoogleNet)
● Jointly learns region proposal and detection
● Employs a region of interest (RoI) that allows to reuse
the computations
76
Artificial neural networks
Roadmap - Fast RCNN
● Directly predicts all objects and classes in one shot
● Very fast
● Processes images at ~40 FPS on a Titan X GPU
● First real-time state-of-the-art detector
● Divides input images into multiple grid cells which are
then classified
77
Artificial neural networks
Roadmap - YOLO (you only look once)
● ILSVRC 2015 winner with a 3.6% error rate (human
performance is 5-10%)
● Employs residual blocks which allows to build deep
networks (hundreds of layers)
● Additional identity mapping
78
Artificial neural networks
Roadmap - ResNet (Microsoft)
● Not a recognition network
● A region proposal network
● Popularized prior/anchor boxes (found through
clustering) to predict offsets
● Much better strategy than starting the predictions with
random coordinates
● Since then heuristic approaches have been gradually
fading out and replaced
79
Artificial neural networks
Roadmap - MultiBox
● Fast RCNN with heuristic region proposal replaced by
region proposal network (RPN) inspired by MultiBox
● RPN shares full-image convolutional features with the
detection network (cost-free region proposal)
● RPN uses “attention” mechanism to tell where to look
● ~5 FPS on a Titan K40 GPU
● End-to-end training
80
Artificial neural networks
Roadmap - Faster RCNN
● SSD leverages the Faster RCNN’s RPN to directly
classify objects inside each prior box (similar to YOLO)
● Predicts category scores and box offsets for a fixed set
of default bounding boxes
● Fixes the predefined grid cells used in YOLO by using
multiple aspect ratios
● Produces predictions of different scales
● ~59 FPS
81
Artificial neural networks
Roadmap - SSD (single shot multibox detector)
● Open-source software library for machine learning
applications
● Tensorflow Object Detection API
○ A collection of pretrained models
○ construct, train and deploy object detection models
82
Artificial neural networks
TensorFlow object detection API
Summary
83
● Humans are good at understanding the big picture
● Neural networks are good at details
● But they can be fooled...
84
Summary
Human vs machine
● Need a large amount data
● Lots of engineering
● Trial and error
● Long training time
● Still lots of hyperparameter parameter tuning
● No general network (generalization not answered)
● Little mathematical foundation
85
Summary
Computer vision is still difficult
● Despite all of these advances, the dream of having a
computer interpret an image at the same level as a
human remains unrealized
86
Summary
Computer vision is hard
Thank You
Stanislav Frolov
Big Data Engineer
sfrolov@inovex.de
0173 318 11 35
inovex GmbH
Lindberghstraße 3
80939 München

Mais conteúdo relacionado

Mais procurados

Graphics pipeline and rendering
Graphics pipeline and renderingGraphics pipeline and rendering
Graphics pipeline and renderingiain bruce
 
Computer vision - edge detection
Computer vision - edge detectionComputer vision - edge detection
Computer vision - edge detectionWael Badawy
 
Computer vision basics
Computer vision basicsComputer vision basics
Computer vision basicsShilpa Sharma
 
Diabetic Retinopathy Analysis using Fundus Image
Diabetic Retinopathy Analysis using Fundus ImageDiabetic Retinopathy Analysis using Fundus Image
Diabetic Retinopathy Analysis using Fundus ImageManjushree Mashal
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningYu Huang
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
Understanding neural radiance fields
Understanding neural radiance fieldsUnderstanding neural radiance fields
Understanding neural radiance fieldsVarun Bhaseen
 
Image processing on matlab presentation
Image processing on matlab presentationImage processing on matlab presentation
Image processing on matlab presentationNaatchammai Ramanathan
 
Computer animation
Computer animationComputer animation
Computer animationshusrusha
 
Introduction of augmented reality
Introduction of augmented realityIntroduction of augmented reality
Introduction of augmented realityTakashi Yoshinaga
 
A Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth EstimationA Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth EstimationRyo Takahashi
 
Computer animation Computer Graphics
Computer animation Computer Graphics Computer animation Computer Graphics
Computer animation Computer Graphics University of Potsdam
 
Introduction to AR with Unity3D
Introduction to AR with Unity3DIntroduction to AR with Unity3D
Introduction to AR with Unity3DAndreas Blick
 
Immersive technologies.pptx
Immersive technologies.pptxImmersive technologies.pptx
Immersive technologies.pptxAnandSri5
 

Mais procurados (20)

Graphics pipeline and rendering
Graphics pipeline and renderingGraphics pipeline and rendering
Graphics pipeline and rendering
 
Object Recognition
Object RecognitionObject Recognition
Object Recognition
 
Mixed reality
Mixed realityMixed reality
Mixed reality
 
Computer vision - edge detection
Computer vision - edge detectionComputer vision - edge detection
Computer vision - edge detection
 
Computer vision basics
Computer vision basicsComputer vision basics
Computer vision basics
 
Programming with OpenGL
Programming with OpenGLProgramming with OpenGL
Programming with OpenGL
 
=SLAM ppt.pdf
=SLAM ppt.pdf=SLAM ppt.pdf
=SLAM ppt.pdf
 
Diabetic Retinopathy Analysis using Fundus Image
Diabetic Retinopathy Analysis using Fundus ImageDiabetic Retinopathy Analysis using Fundus Image
Diabetic Retinopathy Analysis using Fundus Image
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learning
 
Augmented Reality
Augmented RealityAugmented Reality
Augmented Reality
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Understanding neural radiance fields
Understanding neural radiance fieldsUnderstanding neural radiance fields
Understanding neural radiance fields
 
Image processing on matlab presentation
Image processing on matlab presentationImage processing on matlab presentation
Image processing on matlab presentation
 
Computer animation
Computer animationComputer animation
Computer animation
 
Introduction of augmented reality
Introduction of augmented realityIntroduction of augmented reality
Introduction of augmented reality
 
A Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth EstimationA Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth Estimation
 
Computer animation Computer Graphics
Computer animation Computer Graphics Computer animation Computer Graphics
Computer animation Computer Graphics
 
Introduction to AR with Unity3D
Introduction to AR with Unity3DIntroduction to AR with Unity3D
Introduction to AR with Unity3D
 
Ray tracing
 Ray tracing Ray tracing
Ray tracing
 
Immersive technologies.pptx
Immersive technologies.pptxImmersive technologies.pptx
Immersive technologies.pptx
 

Semelhante a Computer Vision – From traditional approaches to deep neural networks

Application of image processing.ppt
Application of image processing.pptApplication of image processing.ppt
Application of image processing.pptDevesh448679
 
Lecture 1 computer vision introduction
Lecture 1 computer vision introductionLecture 1 computer vision introduction
Lecture 1 computer vision introductioncairo university
 
Computer Vision(4).pptx
Computer Vision(4).pptxComputer Vision(4).pptx
Computer Vision(4).pptxGouthamMaliga
 
Saksham seminar report
Saksham seminar reportSaksham seminar report
Saksham seminar reportSakshamTurki
 
Machine Vision Concepts ,Application & Components.pptx
Machine Vision Concepts ,Application & Components.pptxMachine Vision Concepts ,Application & Components.pptx
Machine Vision Concepts ,Application & Components.pptxCiceer Ghimirey
 
Materi_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdfMateri_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdfichsan6
 
Synthetic Data and Graphics Techniques in Robotics
Synthetic Data and Graphics Techniques in RoboticsSynthetic Data and Graphics Techniques in Robotics
Synthetic Data and Graphics Techniques in RoboticsPrabindh Sundareson
 
Computer vision suprim regmi
Computer vision suprim regmi Computer vision suprim regmi
Computer vision suprim regmi Suprim Regmi
 
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...Dataconomy Media
 
Color based image processing , tracking and automation using matlab
Color based image processing , tracking and automation using matlabColor based image processing , tracking and automation using matlab
Color based image processing , tracking and automation using matlabKamal Pradhan
 
Realism in Computer Graphics
Realism in Computer GraphicsRealism in Computer Graphics
Realism in Computer GraphicsBarani Tharan
 
Problem Solving Methods
Problem Solving MethodsProblem Solving Methods
Problem Solving MethodsMaikel Mardjan
 
Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...
Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...
Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...Skolkovo Robotics Center
 
A real-time big data architecture for glasses detection using computer vision...
A real-time big data architecture for glasses detection using computer vision...A real-time big data architecture for glasses detection using computer vision...
A real-time big data architecture for glasses detection using computer vision...Alberto Fernandez Villan
 

Semelhante a Computer Vision – From traditional approaches to deep neural networks (20)

Application of image processing.ppt
Application of image processing.pptApplication of image processing.ppt
Application of image processing.ppt
 
Computer vision
Computer visionComputer vision
Computer vision
 
Lecture 1 computer vision introduction
Lecture 1 computer vision introductionLecture 1 computer vision introduction
Lecture 1 computer vision introduction
 
Computer Vision(4).pptx
Computer Vision(4).pptxComputer Vision(4).pptx
Computer Vision(4).pptx
 
Saksham seminar report
Saksham seminar reportSaksham seminar report
Saksham seminar report
 
10833762.ppt
10833762.ppt10833762.ppt
10833762.ppt
 
Machine Vision Concepts ,Application & Components.pptx
Machine Vision Concepts ,Application & Components.pptxMachine Vision Concepts ,Application & Components.pptx
Machine Vision Concepts ,Application & Components.pptx
 
Computer vision
Computer visionComputer vision
Computer vision
 
Find your interest
Find your interestFind your interest
Find your interest
 
Materi_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdfMateri_01_VK_2223_3.pdf
Materi_01_VK_2223_3.pdf
 
Synthetic Data and Graphics Techniques in Robotics
Synthetic Data and Graphics Techniques in RoboticsSynthetic Data and Graphics Techniques in Robotics
Synthetic Data and Graphics Techniques in Robotics
 
Computer vision suprim regmi
Computer vision suprim regmi Computer vision suprim regmi
Computer vision suprim regmi
 
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
 
PPT s01-machine vision-s2
PPT s01-machine vision-s2PPT s01-machine vision-s2
PPT s01-machine vision-s2
 
Color based image processing , tracking and automation using matlab
Color based image processing , tracking and automation using matlabColor based image processing , tracking and automation using matlab
Color based image processing , tracking and automation using matlab
 
Realism in Computer Graphics
Realism in Computer GraphicsRealism in Computer Graphics
Realism in Computer Graphics
 
Problem Solving Methods
Problem Solving MethodsProblem Solving Methods
Problem Solving Methods
 
ICS1020 CV
ICS1020 CVICS1020 CV
ICS1020 CV
 
Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...
Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...
Burnaev and Notchenko. Skoltech. Bridging gap between 2D and 3D with Deep Lea...
 
A real-time big data architecture for glasses detection using computer vision...
A real-time big data architecture for glasses detection using computer vision...A real-time big data architecture for glasses detection using computer vision...
A real-time big data architecture for glasses detection using computer vision...
 

Mais de inovex GmbH

lldb – Debugger auf Abwegen
lldb – Debugger auf Abwegenlldb – Debugger auf Abwegen
lldb – Debugger auf Abwegeninovex GmbH
 
Are you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIAre you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIinovex GmbH
 
Why natural language is next step in the AI evolution
Why natural language is next step in the AI evolutionWhy natural language is next step in the AI evolution
Why natural language is next step in the AI evolutioninovex GmbH
 
Network Policies
Network PoliciesNetwork Policies
Network Policiesinovex GmbH
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learninginovex GmbH
 
Jenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen UmgebungenJenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen Umgebungeninovex GmbH
 
AI auf Edge-Geraeten
AI auf Edge-GeraetenAI auf Edge-Geraeten
AI auf Edge-Geraeteninovex GmbH
 
Prometheus on Kubernetes
Prometheus on KubernetesPrometheus on Kubernetes
Prometheus on Kubernetesinovex GmbH
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systemsinovex GmbH
 
Representation Learning von Zeitreihen
Representation Learning von ZeitreihenRepresentation Learning von Zeitreihen
Representation Learning von Zeitreiheninovex GmbH
 
Talk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale AssistentenTalk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale Assistenteninovex GmbH
 
Künstlich intelligent?
Künstlich intelligent?Künstlich intelligent?
Künstlich intelligent?inovex GmbH
 
Das Android Open Source Project
Das Android Open Source ProjectDas Android Open Source Project
Das Android Open Source Projectinovex GmbH
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretabilityinovex GmbH
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use caseinovex GmbH
 
People & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessPeople & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessinovex GmbH
 
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with PulumiInfrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with Pulumiinovex GmbH
 

Mais de inovex GmbH (20)

lldb – Debugger auf Abwegen
lldb – Debugger auf Abwegenlldb – Debugger auf Abwegen
lldb – Debugger auf Abwegen
 
Are you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIAre you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AI
 
Why natural language is next step in the AI evolution
Why natural language is next step in the AI evolutionWhy natural language is next step in the AI evolution
Why natural language is next step in the AI evolution
 
WWDC 2019 Recap
WWDC 2019 RecapWWDC 2019 Recap
WWDC 2019 Recap
 
Network Policies
Network PoliciesNetwork Policies
Network Policies
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Jenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen UmgebungenJenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen Umgebungen
 
AI auf Edge-Geraeten
AI auf Edge-GeraetenAI auf Edge-Geraeten
AI auf Edge-Geraeten
 
Prometheus on Kubernetes
Prometheus on KubernetesPrometheus on Kubernetes
Prometheus on Kubernetes
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Azure IoT Edge
Azure IoT EdgeAzure IoT Edge
Azure IoT Edge
 
Representation Learning von Zeitreihen
Representation Learning von ZeitreihenRepresentation Learning von Zeitreihen
Representation Learning von Zeitreihen
 
Talk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale AssistentenTalk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale Assistenten
 
Künstlich intelligent?
Künstlich intelligent?Künstlich intelligent?
Künstlich intelligent?
 
Dev + Ops = Go
Dev + Ops = GoDev + Ops = Go
Dev + Ops = Go
 
Das Android Open Source Project
Das Android Open Source ProjectDas Android Open Source Project
Das Android Open Source Project
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
 
People & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessPeople & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madness
 
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with PulumiInfrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
 

Último

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 

Último (20)

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

Computer Vision – From traditional approaches to deep neural networks

  • 1. Computer Vision From traditional approaches to deep neural networks Stanislav Frolov München, 27.02.2018
  • 2. ● Computer vision ● Human vision ● Traditional approaches and methods ● Artificial neural networks ● Summary 2 Outline of this talk What we are going to talk about
  • 3. ● trained deep neural networks for object detection during master thesis ● still fascinated and interested 3 Stanislav Frolov Big Data Engineer @inovex
  • 4. ● Teach computers how to see ● Automatic extraction, analysis and understanding of images ● Infer useful information, interpret and make decisions ● Automate tasks that human visual system can do ● One of the most exciting fields in AI and ML 4 What is computer vision General
  • 5. 5 What is computer vision Motivation ● Era of pixels ● Internet consists mostly of images ● Explosion of visual data ● Cannot be labeled by humans
  • 6. 6 What is computer vision Drivers ● Two drivers for computer vision explosion ○ Compute (faster and cheaper) ○ Data (more data > algorithms)
  • 7. 7 What is computer vision Interdisciplinary field Computer Science Mathematics Engineering Physics Biology Psychology Information Retrieval Machine LearningGraphs, Algorithms Systems Architecture Robotics Speech, NLP Image Processing Optics Solid-State Physics Neuroscience Cognitive SciencesBiological vision
  • 9. ● Imaging for statistical pattern recognition ● Image transformations such as pixel-by-pixel operations ○ Contrast enhancement ○ Edge extraction ○ Noise reduction ○ Geometrical and spatial operations (i.e rotations) 9 What is computer vision Related fields - image processing
  • 10. ● Creates new images from scene descriptions ● Produces image data from 3D models ● “Inverse” of computer vision ● AR as a combination of both 10 What is computer vision Related fields - computer graphics
  • 11. ● Mainly manufacturing applications ● Image-based automatic inspection, process control, robot guidance ● Usually employs strong assumptions (colour, shape, light, structure, orientation, ...) -> works very well ● Output often pass/fail or good/bad ● Additionally numerical/measurement data, counts 11 What is computer vision Related fields - machine vision
  • 12. ● Create “intelligent” systems ● Studying computational aspects of intelligence ● Make computers do things at which, at the moment, people are better ● Many techniques play an important role (ML, ANNs) ● Currently does a few things better/faster at scale than humans can ● Ability to do anything “human” is not answered 12 What is computer vision Related fields - AI
  • 13. ● Related fields have a large intersection ● Basic techniques used, developed and studied are very similar 13 What is computer vision Related fields- summary
  • 14. Short trip to human vision 14
  • 15. ● Two stage process ○ Eyes take in light reflected off the objects and retina converts 3D objects into 2D images ○ Brain’s visual system interprets 2D images and “rebuilds” a 3D model 15 What is human vision General
  • 16. ● Pair of 2D images with slightly different view allows to infer depth ● Position of nearby objects will vary more across the two images than the position of more distant objects 16 What is human vision Stereoscopic vision
  • 17. ● Prior knowledge of relative sizes and depths is often key for understanding and interpretation 17 What is human vision Prior knowledge
  • 18. ● Texture and texture change helps solving depth perception 18 What is human vision Texture pattern
  • 19. 19 What is human vision Biases and illusions in human perception ● Shadows make all the difference in interpretation ● Gradual changes in light ignored to not be misled by shadow
  • 20. 20 What is human vision A few more illusions ● Two arrows with different orientations have the same length
  • 21. ● Assumptions and familiarity (distorted room) ● Face recognition bias ● Up-down orientation bias 21 What is human vision Biases and illusions in human perception
  • 22. 22 What is human vision Summary ● Illusions are fun, but the complete puzzle to understand human vision is far from being complete
  • 23. Back to computer vision 23
  • 24. ● Recognition ● Localization ● Detection ● Segmentation 24 What is computer vision Typical tasks
  • 25. ● Part-based detection ○ Deformable parts model ○ Pose estimation and poselets 25 What is computer vision Typical tasks
  • 26. ● Image captioning (actions, attributes) 26 What is computer vision Typical tasks
  • 27. ● Motion analysis ○ Egomotion (camera) ○ Optical flow (pixels) 27 What is computer vision Typical tasks
  • 28. ● Scene understanding and reconstruction 28 What is computer vision Typical tasks
  • 29. ● Image restoration ● Colouring black & white photos 29 What is computer vision Typical tasks
  • 30. Solving this is useful for many applications 30
  • 31. 31 What is computer vision Typical applications ● Assistance systems for cars and people ● Surveillance ● Navigation (obstacle avoidance, road following, path planning) ● Photo interpretation ● Military (“smart” weapons) ● Manufacturing (inspection, identification) ● Robotics ● Autonomous vehicles (dangerous zones)
  • 32. 32 What is computer vision Typical applications ● Recognition and tracking ● Event detection ● Interaction (man-machine interfaces) ● Modeling (medical, manufacturing, training, education) ● Organizing (database index, sorting/clustering) ● Fingerprint and biometrics ● …
  • 34. 34 What is computer vision Why it is difficult ● Occlusion ● Deformation ● Scale ● Clutter ● Illumination ● Viewpoint ● Object pose ● Tons of classes and variants ● Often n:1 mapping ● Computationally expensive ● Full understanding of biological vision is missing
  • 36. ● Input: image(s) + labels ● Output: Semantic data, labels ● Digital image pixels usually have three channels [R,G,B] each [0...255] + Location[x,y] ● Digital images are just vectors 36 What is computer vision System overview
  • 37. 1. Image acquisition (camera, sensors) 2. Pre-processing (sampling, noise reduction, augmentation) 3. Feature extraction (lines, edges, regions, points) 4. Detection and segmentation 5. Post-processing (verification, estimation, recognition) 6. Decision making ● -> Ability of a machine to step back and interpret the big picture of those pixels 37 What is computer vision System overview
  • 39. 1950s ● 2D imaging for statistical pattern recognition ● Theory of optical flow based on a fixed point towards which one moves 39 What is computer vision History
  • 40. Image processing ● Histograms ● Filtering ● Stitching ● Thresholding ● ... 40 What is computer vision Traditional approaches
  • 41. 1960s ● Desire to extract 3D structure from 2D images for scene understanding ● Began at pioneering AI universities to mimic human visual system as stepping stone for intelligent robots ● Summer vision project at MIT: attach camera to computer and having it “describe what it saw” 41 What is computer vision History
  • 42. ● Given to 10 undergraduate students ● … an attempt to use our summer workers effectively … ● … construction of a significant part of a visual system … ● … task can be segmented into sub-problems … ● … participate in the construction of a system complex enough to be a real landmark in the development of “pattern recognition” … 42 What is computer vision History: summer vision project @MIT 1966
  • 43. ● Goal: analyse scenes and identify objects ● Structure of system: ○ Region proposal ○ Property lists for regions ○ Boundary construction ○ Match with properties ○ Segment ● Basic foreground/background segmentation with simple objects (cubes, cylinders, ….) 43 What is computer vision History: summer vision project @MIT 1966
  • 44. ● Unlike general intelligence, computer vision seemed tractable ● Amusing anecdote, but it did never aimed to “solve” computer vision ● Computer vision today differs from what it was thought to be in 1966 44 What is computer vision History: summer vision project @MIT 1966
  • 45. 1970s ● Formed many algorithms that exist today ● Edges, lines and objects as interconnected structures 45 What is computer vision History
  • 46. 46 What is computer vision Traditional approaches Edge detection based on ● Brightness ● Gradients ● Geometry ● Illumination
  • 47. 47 What is computer vision Traditional approaches - part based detector ● Objects composed of features of parts and their spatial relationship ● Challenge: how to define and combine
  • 48. 1980s ● More rigorous mathematical analysis and quantitative aspects ● Optical character recognition ● Sliding window approaches ● Usage of artificial neural networks 48 What is computer vision History
  • 49. 49 What is computer vision Traditional approaches - HOG detection (histogram of oriented gradients) ● Concept in 80s but used only in 2005 ● Create HOG descriptors (object generalizations) ● One feature vector per object ● Train with SVM ● Sliding window @multiple scales
  • 50. 50 What is computer vision Traditional approaches - HOG detection (histogram of oriented gradients) ● Computation of HOG descriptors: 1. Compute gradients 2. Compute histograms on cells 3. Normalize histograms 4. Concatenate histograms ● Requires a lot of engineering ● Must build ensembles of feature descriptors
  • 51. 1990s ● Significant interaction with computer graphics (rendering, morphing, stitching) ● Approaches using statistical learning ● Eigenface (Ghostfaces) through principal component analysis (PCA) 51 What is computer vision History
  • 52. 52 What is computer vision Traditional approaches - deformable parts model (DPM) ● Objects constructed by its parts ● First match whole object, then refine on the parts ● HOG + part-based + modern features ● Slow but good at difficult objects ● Involves many heuristics
  • 53. 53 What is computer vision Features ● Feature points ○ Small area of pixels with certain properties ● Feature detection ○ Use features for identification ○ Activate if “object” present ● Examples: ○ Lines, edges, colours, blobs, … ○ Animals, faces, cars, ...
  • 54. 54 What is computer vision Traditional approaches - classical recognition ● Init: extract features for objects in different scales, colours, orientations, rotations, occlusion levels ● Inference: extract features from query image and find closest match in database or train a classifier ● Computationally expensive (hundreds of features in image, millions in database) and complex due to errors and mismatches
  • 55. 55 What is computer vision History Before the new era ● Bags of features ● Handcrafted ensembles Input Feat. 2 Feat. 1 Feat. n Final Decision Feature Extraction
  • 56. The new era of computer vision 56
  • 57. ● Elementary building block ● Inspired by biological neurons ● Mathematical function y=f(wx+b) ● Learnable weights 57 Artificial neural networks Fundamentals - artificial neuron
  • 58. ● Collection of neurons organized in layers ● Universal approximators ● Fully-connected network here 58 Artificial neural networks Fundamentals - artificial neural networks
  • 59. 59 Artificial neural networks Fundamentals - training ● Basically an optimization problem ● Find minimum of a loss function by an iterative process (training) ● Designing the loss function is sometimes tricky
  • 60. 60 Artificial neural networks Fundamentals - training Simple optimizer algorithm: 1. Forward pass with a batch of data 2. Calculate error between actual and wanted output 3. Nudge weights in proportion to error into the right direction (same data would result in smaller error) 4. Repeat until convergence
  • 61. 61 Artificial neural networks Fundamentals - CNN ● Local neighborhood contributes to activation ● Exploit spatial information ● Hierarchical feature extractors ● Less parameters input activation filters receptive field
  • 62. 62 Artificial neural networks Fundamentals - CNN ● Filter of size 3x3 applied to an input of 7x7
  • 63. 63 Artificial neural networks Fundamentals - pooling ● Max-pooling ● Dimension reduction/adaption ● Existence is more important than location
  • 64. 64 Artificial neural networks Fundamentals - pooling ● Zero-padding ● Controlling dimensions
  • 65. 65 Artificial neural networks Fundamentals - general network architecture Input image convolutional layers ... Final decision
  • 66. 66 Artificial neural networks Fundamentals - hierarchical feature extractors Lines, edges, blobs, colours, ... Abstract objectsParts of abstract objects First layers Deeper layers Activations for:
  • 67. Modern history of object recognition 67
  • 68. ● Classification and detection ○ 27k images ○ 20 classes ■ person, bird, cat, cow, dog, horse, sheep, aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant, sofa, tv/ monitor 68 Benchmark Datasets - PASCAL VOC
  • 69. ● Challenges on a subset of ImageNet ○ 14kk labeled images ○ 20k object categories ● ILSVRC* usually on 10k categories including 90 out of 120 dog breeds 69 Benchmark Datasets - ImageNet *ImageNet Large Scale Visual Recognition Challenge
  • 70. ● ILSVRC 2012 winner by a large margin from 25% to 16% ● Proved effectiveness of CNNs and kicked of a new era ● 8 layers, 650k neurons, 60kk parameters 70 Artificial neural networks Roadmap - AlexNet
  • 71. ● ILSVRC 2013 winner with a best top-5 error of 11.6% ● AlexNet but using smaller 7x7 kernels to keep more information in deeper layers 71 Artificial neural networks Roadmap - ZFNet
  • 72. ● ILSVRC 2013 localization winner ● Uses AlexNet on multi-scale input images with sliding window approach ● Accumulates bounding boxes for final detection (instead of non-max suppression) 72 Artificial neural networks Roadmap - OverFeat
  • 73. ● 2k proposals generated by selective search ● SVM trained for classification ● Multi-stage pipeline 73 Artificial neural networks Roadmap - RCNN (region based CNN)
  • 74. ● Not a winner but famous due to simplicity and effectiveness ● Replace large-kernel convolutions by stacking several small-kernel convolutions 74 Artificial neural networks Roadmap - VGGNet
  • 75. ● ILSVRC 2014 winner ● Stacks up “inception” modules ● 22 layers, 5kk parameters 75 Artificial neural networks Roadmap - InceptionNet (GoogleNet)
  • 76. ● Jointly learns region proposal and detection ● Employs a region of interest (RoI) that allows to reuse the computations 76 Artificial neural networks Roadmap - Fast RCNN
  • 77. ● Directly predicts all objects and classes in one shot ● Very fast ● Processes images at ~40 FPS on a Titan X GPU ● First real-time state-of-the-art detector ● Divides input images into multiple grid cells which are then classified 77 Artificial neural networks Roadmap - YOLO (you only look once)
  • 78. ● ILSVRC 2015 winner with a 3.6% error rate (human performance is 5-10%) ● Employs residual blocks which allows to build deep networks (hundreds of layers) ● Additional identity mapping 78 Artificial neural networks Roadmap - ResNet (Microsoft)
  • 79. ● Not a recognition network ● A region proposal network ● Popularized prior/anchor boxes (found through clustering) to predict offsets ● Much better strategy than starting the predictions with random coordinates ● Since then heuristic approaches have been gradually fading out and replaced 79 Artificial neural networks Roadmap - MultiBox
  • 80. ● Fast RCNN with heuristic region proposal replaced by region proposal network (RPN) inspired by MultiBox ● RPN shares full-image convolutional features with the detection network (cost-free region proposal) ● RPN uses “attention” mechanism to tell where to look ● ~5 FPS on a Titan K40 GPU ● End-to-end training 80 Artificial neural networks Roadmap - Faster RCNN
  • 81. ● SSD leverages the Faster RCNN’s RPN to directly classify objects inside each prior box (similar to YOLO) ● Predicts category scores and box offsets for a fixed set of default bounding boxes ● Fixes the predefined grid cells used in YOLO by using multiple aspect ratios ● Produces predictions of different scales ● ~59 FPS 81 Artificial neural networks Roadmap - SSD (single shot multibox detector)
  • 82. ● Open-source software library for machine learning applications ● Tensorflow Object Detection API ○ A collection of pretrained models ○ construct, train and deploy object detection models 82 Artificial neural networks TensorFlow object detection API
  • 84. ● Humans are good at understanding the big picture ● Neural networks are good at details ● But they can be fooled... 84 Summary Human vs machine
  • 85. ● Need a large amount data ● Lots of engineering ● Trial and error ● Long training time ● Still lots of hyperparameter parameter tuning ● No general network (generalization not answered) ● Little mathematical foundation 85 Summary Computer vision is still difficult
  • 86. ● Despite all of these advances, the dream of having a computer interpret an image at the same level as a human remains unrealized 86 Summary Computer vision is hard
  • 87. Thank You Stanislav Frolov Big Data Engineer sfrolov@inovex.de 0173 318 11 35 inovex GmbH Lindberghstraße 3 80939 München