Intro to computer vision in .net update

@slorello
Intro to Computer Vision in .NET
Steve Lorello
.NET Developer Advocate @Vonage
Twitter: @slorello

@slorello
What is Computer Vision?

@slorello
“ “
The Goal of computer vision
is to write computer
programs that can interpret
images
Steve Seitz

@slorello
1. What is a Digital Image?
2. Hello OpenCV in .NET
3. Convolution and Edge Detection
4. Convolutional Neural Networks
5. Facial Detection
6. Facial Detection with Vonage Video API
Agenda

@slorello
What is a
Digital
Image?

@slorello
● An Image is a Function
● A function of Intensity Values
at Given Positions
● Those Intensity Values Fall
Along an Arbitrary Range

@slorello Source: Aaron Bobick’s Intro to Computer Vision Udacity

@slorello
Using Computer Vision in .NET

@slorello
● OpenCV (Open Source Computer Vision
Library): https://opencv.org/
● Emgu CV: http://www.emgu.com/

@slorello
● Create a Project in Visual Studio
● Install EmguCv with package manager:
Emgu.CV.runtime.<platform>

@slorello https://github.com/slorello89/ShowImage
var zero = CvInvoke.Imread(Path.Join("resources","zero.jpg"));
CvInvoke.Imshow("zero", zero);
CvInvoke.WaitKey(0);

@slorello
Convolution and Edge Detection

@slorello https://carbon.now.sh/

@slorello http://homepages.inf.ed.ac.uk/rbf/HIPR2/sobel.htm
Sobel Operator

@slorello https://github.com/slorello89/BasicSobel
CvInvoke.CvtColor(img, gray, Emgu.CV.CvEnum.ColorConversion.Bgr2Gray);
CvInvoke.GaussianBlur(gray, gray, new System.Drawing.Size(3, 3), 0);
CvInvoke.Sobel(gray, gradX, Emgu.CV.CvEnum.DepthType.Cv16S, 1, 0, 3);
CvInvoke.Sobel(gray, gradY, Emgu.CV.CvEnum.DepthType.Cv16S, 0, 1, 3);
CvInvoke.ConvertScaleAbs(gradX, absGradX, 1, 0);
CvInvoke.ConvertScaleAbs(gradY, absGradY, 1, 0);
CvInvoke.AddWeighted(absGradX, .5, absGradY, .5, 0, sobelGrad);

@slorello
Gradient in X Gradient in Y

@slorello Source: https://dsp.stackexchange.com/
Gaussian Kernel

@slorello
Here it is at 10X detail

@slorello https://www.researchgate.net

@slorello Source: Thad Starner

@slorello https://www.youtube.com/watch?v=Ilg3gGewQ5U

@slorello
● Not appropriate for every use case
● Layers with image classiﬁcation are huge
Limitations of a Neural Net

@slorello
Convolutional Neural Networks

@slorello
_pipeline = . . .;
IDataView trainingData = _mlContext.Data
.LoadFromTextFile<ImageData>(path:
_trainTagsTsv, hasHeader: false);
_model = _pipeline.Fit(trainingData);
Detector.cs

@slorello
var imageData = new ImageData()
{
ImagePath = filename
};
var predictor = _mlContext.Model.CreatePredictionEngine<ImageData,
ImagePrediction>(_model);
var prediction = predictor.Predict(imageData);
Detector.cs

@slorello
var appId = _config["APP_ID"];
var privateKey = _config["privateKeyPath"];
var creds = Credentials.FromAppIdAndPrivateKeyPath(appId, privateKey);
var content = new { type = "text", dogPrediction };
var message = new { content };
var to = new { type = "whatsapp", number = toNum };
var from = new { type = "whatsapp", number = fromNum };
var request = new { to, from, message };
var uri = new Uri("https://api.nexmo.com/v0.1/messages");
var response = ApiRequest.DoRequestWithJsonContent<JObject>
("POST", uri, request, ApiRequest.AuthType.Bearer, creds);
WhatsAppController.cs

@slorello
1. Use Haar-Like features as masks
2. Use integral images to calculate relative
shading per these masks
3. Use a Cascading Classiﬁer to detect faces
Viola-Jones Technique

@slorello https://www.quora.com/How-can-I-understand-Haar-like-feature-for-face-detection
Haar-like features

@slorello Source https://www.mathworks.com/help/images/integral-image.html
Integral Images or Summed Area table

@slorello
● Construct Cascading Classifier
● Run Classification
● Use Rectangles from classification to draw
boxes around faces

@slorello https://github.com/slorello89/FacialDetection
var faceClassifier = new CascadeClassifier(Path.Join("resources",
"haarcascade_frontalface_default.xml"));
var img = CvInvoke.Imread(Path.Join("resources", "imageWithFace.jpg"));
var faces = faceClassifier.DetectMultiScale(img,
minSize: new System.Drawing.Size(300,300));
foreach(var face in faces)
{
CvInvoke.Rectangle(img, face,
new Emgu.CV.Structure.MCvScalar(255, 0, 0), 10);
}

@slorello
Face Detection With the Vonage Video API
https://www.vonage.com/communications-apis/video/

@slorello
● Create a WPF app
● Add the OpenTok.Client SDK to it
● Add a new class implementing IVideoRender
called and extending Control
FaceDetectionVideoRenderer
● Add a Control to the Main Xaml ﬁle where we’ll
put publisher video - call it “PublisherVideo”
● Add a Detect Faces and Connect button

@slorello https://github.com/opentok-community/wpf-facial-detection
Publisher = new Publisher(Context.Instance,
renderer: PublisherVideo);
Session = new Session(Context.Instance, API_KEY, SESSION_ID);

private void Connect_Click(object sender, RoutedEventArgs e)
{
if (Disconnect)
{
Session.Unpublish(Publisher);
Session.Disconnect();
}
else
{
Session.Connect(TOKEN);
}
Disconnect = !Disconnect;
ConnectDisconnectButton.Content = Disconnect ? "Disconnect" : "Connect";
}

private void DetectFacesButton_Click(object sender, RoutedEventArgs e)
{
PublisherVideo.ToggleFaceDetection(!PublisherVideo.DetectingFaces);
foreach (var subscriber in SubscriberByStream.Values)
{
((FaceDetectionVideoRenderer)subscriber.VideoRenderer)
.ToggleFaceDetection(PublisherVideo.DetectingFaces);
}
}

private void Session_StreamReceived(object sender, Session.StreamEventArgs e)
{
FaceDetectionVideoRenderer renderer = new FaceDetectionVideoRenderer();
renderer.ToggleFaceDetection(PublisherVideo.DetectingFaces);
SubscriberGrid.Children.Add(renderer);
UpdateGridSize(SubscriberGrid.Children.Count);
Subscriber subscriber = new Subscriber(Context.Instance, e.Stream, renderer);
SubscriberByStream.Add(e.Stream, subscriber);
Session.Subscribe(subscriber);
}

@slorello
● Intercept each frame before it’s rendered.
● Run face detection on each frame
● Draw a rectangle on each frame to show
where the face is
● Render the Frame

VideoBitmap = new WriteableBitmap(frame.Width,
frame.Height, 96, 96, PixelFormats.Bgr32, null);
if (Background is ImageBrush)
{
ImageBrush b = (ImageBrush)Background;
b.ImageSource = VideoBitmap;
}

@slorello https://romannurik.github.io/SlidesCodeHighlighter/
if (VideoBitmap != null)
{
VideoBitmap.Lock();
IntPtr[] buffer = { VideoBitmap.BackBuffer };
int[] stride = { VideoBitmap.BackBufferStride };
frame.ConvertInPlace(OpenTok.PixelFormat.FormatArgb32, buffer, stride);
if (DetectingFaces)
{
using (var image = new Image<Bgr, byte>(frame.Width, frame.Height, stride[0], buffer[0]))
{
if (_watch.ElapsedMilliseconds > INTERVAL)
{
var reduced = image.Resize(1.0 / SCALE_FACTOR, Emgu.CV.CvEnum.Inter.Linear);
_watch.Restart();
_images.Add(reduced);
}
}
DrawRectanglesOnBitmap(VideoBitmap, _faces);
}
VideoBitmap.AddDirtyRect(new Int32Rect(0, 0, FrameWidth, FrameHeight));
VideoBitmap.Unlock();
}

System.Threading.ThreadPool.QueueUserWorkItem(delegate
{
try
{
while (true)
{
using (var image = _images.Take(token))
{
_faces = _profileClassifier.DetectMultiScale(image);
}
}
}
catch (OperationCanceledException)
{
//exit gracefully
}
}, null);

public static void DrawRectanglesOnBitmap(WriteableBitmap bitmap, Rectangle[] rectangles)
{
foreach (var rect in rectangles)
{
var x1 = (int)((rect.X * (int)SCALE_FACTOR) * PIXEL_POINT_CONVERSION);
var x2 = (int)(x1 + (((int)SCALE_FACTOR * rect.Width) * PIXEL_POINT_CONVERSION));
var y1 = rect.Y * (int)SCALE_FACTOR;
var y2 = y1 + ((int)SCALE_FACTOR * rect.Height);
bitmap.DrawLineAa(x1, y1, x2, y1, strokeThickness: 5, color: Colors.Blue);
}
}

@slorello
Feature Detection, Tracking, Image
Projection
https://www.vonage.com/communications-apis/video/

@slorello
● What’s a good Feature?
● Detect Features with Orb
● Feature Tracking with a BF
tracker
● Project an image.

@slorello
● A good feature is a part
of the image, where
there are multiple edges
● Thus we often think of
them as Corners
● We can use the ORB
method (Oriented FAST
and rotated BRIEF)
https://www.slideshare.net/slksaad/multiimage-matching-using-multiscale
-oriented-patches

@slorello https://github.com/slorello89/FeatureDetection
var orbDetector = new ORBDetector(10000);
var features1 = new VectorOfKeyPoint();
var descriptors1 = new Mat();
orbDetector.DetectAndCompute(img, null, features1, descriptors1, false);
Features2DToolbox.DrawKeypoints(img, features1, img, new Bgr(255, 0, 0));

@slorello
● Now that we have some features we can
match them to features in other images!
● We’ll use K-nearest-neighbors matching
on the Brute-force matcher

var bfMatcher = new BFMatcher(DistanceType.L1);
bfMatcher.Add(descriptors1);
bfMatcher.KnnMatch(descriptors2, knnMatches, k:1,mask:null,compactResult:true);
foreach(var matchSet in knnMatches.ToArrayOfArray())
{
if(matchSet.Length>0 && matchSet[0].Distance < 400)
{
matchList.Add(matchSet[0]);
var featureModel = features1[matchSet[0].TrainIdx];
var featureTrain = features2[matchSet[0].QueryIdx];
srcPts.Add(featureModel.Point);
dstPts.Add(featureTrain.Point);
}
}
var matches = new VectorOfDMatch(matchList.ToArray());
var imgOut = new Mat();
Features2DToolbox.DrawMatches(img, features1, img2, features2, matches,
imgOut, new MCvScalar(255, 0, 0), new MCvScalar(0, 0, 255));

@slorello
● Image transformations
● 8 degrees of freedom
● Need at least 4 matches
● Homographies
Image Projection

@slorello https://inst.eecs.berkeley.edu/~cs194-26/fa17/upload/ﬁles/proj6B/cs194-26-aap/h2.png

var srcPoints = InputImageToPointCorners(cat);
var dstPoints = FaceToCorners(face);
var homography = CvInvoke.FindHomography(srcPoints, dstPoints,
Emgu.CV.CvEnum.RobustEstimationAlgorithm.Ransac, 5.0);
CvInvoke.WarpPerspective(cat, projected, homography, img.Size);
img.Mat.CopyTo(projected, 1 - projected);

@slorello
A Little More About Me
● .NET Developer & Software Engineer
● .NET Developer Advocate @Vonage
● Computer Science Graduate Student
@GeorgiaTech - specializing in Computer
Perception
● Blog posts: https://dev.to/slorello or
https://www.nexmo.com/blog/author/stevelorello
● Twitter: @slorello

@slorello
A Little More About Vonage
● Vonage provides a full suite of communications APIs
○ https://developer.nexmo.com
○ Coupon Code: 21GCSLH €10
● Vonage Video API
○ https://www.vonage.com/communications-apis/video/

@slorello
https://github.com/slorello89/ShowImage
https://github.com/opentok-community/wpf-facial-detection
https://github.com/slorello89/BasicSobel
https://github.com/slorello89/FacialDetection
https://github.com/slorello89/WhatsAppDogDetector
http://www.emgu.com/
https://opencv.org/
https://tokbox.com/developer/tutorials/
https://developer.nexmo.com/
https://www.nexmo.com/blog/2020/03/18/real-time-face-detec
tion-in-net-with-opentok-and-opencv-dr
Resources
LinkedIn: https://www.linkedin.com/in/stephen-lorello-143086a9/
Twitter: @slorello

@slorello Attribution if needed

@slorello
An image
with some
text on the
side.
URL ATTRIBUTION GOES HERE

@slorello
An image with some text over it
Attribution if needed

@slorello
“ “
A really large quote would
go here so everyone can
read it.
Some Persons Name
https://website.com

@slorello
Code Snippet Examples

@slorello https://romannurik.github.io/SlidesCodeHighlighter/
var faceClassifier = new CascadeClassifier(Path.Join("resources",
"haarcascade_frontalface_default.xml"));
var img = CvInvoke.Imread(Path.Join("resources", "imageWithFace.jpg"));
var faces = faceClassifier.DetectMultiScale(img,
minSize: new System.Drawing.Size(300,300));
foreach(var face in faces)
{
CvInvoke.Rectangle(img, face,
new Emgu.CV.Structure.MCvScalar(255, 0, 0), 10);
}

@slorello
Example Web Page Slides

@slorello https://developer.nexmo.com

@slorello https://developer.nexmo.com
Website in a
mobile phone.

Intro to computer vision in .net update

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (19)

Semelhante a Intro to computer vision in .net update

Semelhante a Intro to computer vision in .net update (20)

Último

Último (20)

Intro to computer vision in .net update