Computer Vision Introduction

Computer Vision – Intro
Images are taken from: Computer Vision : Algorithms and Applications / Richard Szeliski

Standard
Computer Vision
Tasks

Open CV
OpenCV (Open Source Computer Vision Library: http://opencv.org) is an open-
source BSD-licensed library that includes several hundreds of computer vision
algorithms.

Open CV – hard facts
• OpenCV is released under a BSD license
• Free for both academic and commercial use.
• C++, C, Python and Java interfaces.
• Supports Windows, Linux, Mac OS, iOS and Android.
• Written in optimized C/C++
• Ctake advantage of multi-core processing.
• Downloads exceeding 6 million.
• Latest version 2.4.6

Open CV – intro (1/2)
OpenCV has a modular structure, which means that the package includes
several shared or static libraries. The following modules are available:
core - a compact module defining basic data structures, including the dense
multi-dimensional array Mat and basic functions used by all other modules.
imgproc - an image processing module that includes linear and non-linear
image filtering, geometrical image transformations (resize, affine and
perspective warping, generic table-based remapping), color space conversion,
histograms, and so on.
video - a video analysis module that includes motion estimation, background
subtraction, and object tracking algorithms.
calib3d - basic multiple-view geometry algorithms, single and stereo camera
calibration, object pose estimation, stereo correspondence algorithms, and
elements of 3D reconstruction.

Open CV – intro (2/2)
features2d - salient feature detectors, descriptors, and descriptor matchers.
objdetect - detection of objects and instances of the predefined classes (for
example, faces, eyes, mugs, people, cars, and so on).
highgui - an easy-to-use interface to video capturing, image and video codecs,
as well as simple UI capabilities.
gpu - GPU-accelerated algorithms from different OpenCV modules.
... some other helper modules, such as FLANN and Google test wrappers,
Python bindings, and others.
http://docs.opencv.org/doc/tutorials/tutorials.html

Android programming - steps
Minimum skills:
Java for android / Objective C for iOS
openCV
C++ for native code
Minimum installation for android: (We will learn and apply later today)
Eclipse IDE
Android ADT
openCV
openCV C++/native
Simulator

Canny Edge Detector
• void Canny(InputArray image, OutputArray edges, double threshold1,
double threshold2, int apertureSize=3, bool L2gradient=false )
• Parameters:
image – single-channel 8-bit input image.
edges – output edge map; it has the same size and type as image .
threshold1 – first threshold for the hysteresis procedure.
threshold2 – second threshold for the hysteresis procedure.
apertureSize – aperture size for the Sobel() operator.
L2gradient – a flag, indicating whether a more accurate L_2 norm =sqrt{(dI/dx)^2 + (dI/dy)^2}
should be used to calculate the image gradient magnitude ( L2gradient=true ), or whether the
default L_1 norm =|dI/dx|+|dI/dy| is enough ( L2gradient=false ).

Canny Edge Detector - code
Mat src, src_gray;
Mat dst, detected_edges;
int edgeThresh = 1;
int lowThreshold = 1;
int const max_lowThreshold = 100;
int kernel_size = 3;
char* window_name = "Edge Map";
/// Reduce noise with a kernel 3x3. Assume src_gray is already read
blur( src_gray, detected_edges, Size(3,3) );
/// Canny detector
Canny( detected_edges, detected_edges, lowThreshold, lowThreshold, kernel_size );
/// Using Canny's output as a mask, we display our result
dst = Scalar::all(0);
src.copyTo( dst, detected_edges);
imshow( window_name, dst );

Hough Transform
• void HoughLines(InputArray image, OutputArray lines, double rho, double theta,
Int threshold, double srn=0, double stn=0 )
• Parameters:
image – 8-bit, single-channel binary source image.
lines – Output vector of lines
rho – Distance resolution of the accumulator in pixels.
theta – Angle resolution of the accumulator in radians.
threshold – Accumulator threshold parameter.
srn – For the multi-scale Hough transform, it is a divisor for the distance
resolution rho.
stn – For the multi-scale Hough transform, it is a divisor for the distance
resolution theta.

Hough Transform - code
Mat dst, cdst;
Canny(src, dst, 50, 200, 3);
cvtColor(dst, cdst, CV_GRAY2BGR);
vector<Vec2f> lines;
HoughLines(dst, lines, 1, CV_PI/180, 100, 0, 0 );
// Draw the lines
for( size_t i = 0; i < lines.size(); i++ )
{
float rho = lines[i][0], theta = lines[i][1];
Point pt1, pt2;
double a = cos(theta), b = sin(theta);
double x0 = a*rho, y0 = b*rho;
pt1.x = cvRound(x0 + 1000*(-b));
pt1.y = cvRound(y0 + 1000*(a));
pt2.x = cvRound(x0 - 1000*(-b));
pt2.y = cvRound(y0 - 1000*(a));
line( cdst, pt1, pt2, Scalar(0,0,255), 3, CV_AA);
}

Cascade classifier
• void CascadeClassifier::detectMultiScale(const Mat& image, vector<Rect>& objects, double
scaleFactor=1.1, int minNeighbors=3, int flags=0, Size minSize=Size(), Size maxSize=Size())
• Parameters:
cascade – Haar classifier cascade (OpenCV 1.x API only). It can be loaded from
XML or YAML file using Load().
image – Matrix of the type CV_8U containing an image where objects are
detected.
objects – Vector of rectangles where each rectangle contains the detected
object.
scaleFactor – Parameter specifying how much the image size is reduced at each
image scale.
minNeighbors – Parameter specifying how many neighbors each candidate
rectangle should have to retain it.
flags – Parameter with the same meaning for an old cascade as in the function
cvHaarDetectObjects. It is not used for a new cascade.
minSize – Minimum possible object size. Objects smaller than that are ignored.
maxSize – Maximum possible object size. Objects larger than that are ignored.

Cascade classifier - codeString face_cascade_name = "haarcascade_frontalface_alt.xml";
CascadeClassifier face_cascade;
// load cascade
face_cascade.load( face_cascade_name ) ;
eyes_cascade.load( eyes_cascade_name );
Mat frame_gray;
cvtColor( frame, frame_gray, CV_BGR2GRAY );
equalizeHist( frame_gray, frame_gray );
// Detect faces
face_cascade.detectMultiScale( frame_gray, faces, 1.1, 2,0|CV_HAAR_SCALE_IMAGE, Size(30, 30) );
// Draw ellipses
for( int i = 0; i < faces.size(); i++ )
{
Point center( faces[i].x + faces[i].width*0.5, faces[i].y + faces[i].height*0.5 );
ellipse( frame, center, Size( faces[i].width*0.5, faces[i].height*0.5), 0, 0, 360, Scalar( 255, 0, 255 ), 4, 8, 0 );
}

Computer Vision Introduction

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (20)

Semelhante a Computer Vision Introduction

Semelhante a Computer Vision Introduction (20)

Mais de Camera Culture Group, MIT Media Lab

Mais de Camera Culture Group, MIT Media Lab (20)

Último

Último (20)

Computer Vision Introduction