SlideShare uma empresa Scribd logo
1 de 32
Detect Known People in a
Video
Yonatan Katz
My journey to
The Journey
Deep
Learning
Face
Detection
Shot
Boundaries
Detection
Face
Recognition
Object
Tracking
Computer
Vision
The Problem: When a specific
person appears in a video?
D. Trump:
[0:07- 1:23, 1:52-2:03]
B. Obama (nickname:
Obamush):
[0:07- 1:23]
Journey Outline
1. We will parse the video into frames
2. We will detect faces in the frame
3. We will try to recognize the faces
4. We will track the faces back and forth in the video
a. We will split the video into shots
Parsing the video
(or: choosing the technology)
● Why Python?
● OpenCV
● NumPy
● Code example:
video = cv2.VideoCapture(video_path)
video.set(3, cv2.cv.CV_CAP_PROP_FRAME_WIDTH)
video.set(4, cv2.cv.CV_CAP_PROP_FRAME_HEIGHT)
while True:
ret, frame = video.read()
if frame is None:
break
cv2.imshow('video', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
video.release()
cv2.destoryAllWindows()
Detect Faces
(sounds complicated, it’s not)
Let’s examine the code first
win = dlib.image_window()
image = io.imread(file_name)
face_detector = dlib.get_frontal_face_detector()
detected_faces = face_detector(image, 1)
win.set_image(image)
for i, face_rect in enumerate(detected_faces):
win.add_overlay(face_rect)
dlib.hit_enter_to_continue()
The MAGIC is here.
You don’t need to
invent anything
But how does it really work??
Taken from this great meduim post
1. Convert to grayscale image
2. Look at every pixel, and the pixels
surrounding it
But how does it really work??
Taken from this great meduim post
3. Find the direction where pixels become
darker
But how does it really work??
Taken from this great meduim post
4. Convert the image to “darker vectors”
ONLY THE “DARKNESS RATIO” METTERS - works
on both dark and bright images!
But how does it really work??
Taken from this great meduim post
6. Compare patterns!
5. Reduce the size of the vector
Recognize Faces
(Deep learning. Not only a buzzword)
Intro to machine learning
1. Train:
a. Find the data that may affect the end result (“features”)
b. Train a model that takes as an input:
i. List of features
ii. The end result (“label”)
c. Get the weights for each feature
2. Test:
a. Apply the weights on the your data
b. Compute the most relevant result
I’m a man. 32 years old. I watched 32 drama movies, 3 comedy movies (in average, I saw
only 75 % of these boring movies) and no action movie. What youtube will recommend me?
1. Borat
2. Hit
3. Titanik
Do you want to be data scientist?
13 x Feature1 + 5 x Feature2…. = score
Intro to deep learning
● How does a child learn to ride a bicycle?
● Neural network is trying to imitate a man learning process
● Invented by psychologist - ‫עושים‬‫היסטוריה‬
Deep Learning in computer vision
● Classic problem: what is this number?
● Are these images represent the same number?
Back to our journey
● The problem: recognize people
Donald Trump of course!
KE’ILU DA!
I have no idea. But he is pretty
similar to this weirdo guy:
Moment before we jump into code...
● In order to compare faces, we need to center the face (“apples to apples”)
● In order to do saw, we need to find landmarks
Alignment code example can be found here
From their website:
OpenFace is a Python and Torch implementation of face recognition with deep neural networks and is based on the CVPR 2015 paper
FaceNet: A Unified Embedding for Face Recognition and Clustering by Florian Schroff, Dmitry Kalenichenko, and James Philbin at Google.
Torch allows the network to be executed on a CPU or with CUDA.
Nightmare to install
:(
Finally - CODE !
align = openface.AlignDlib(args.dlibFacePredictor)
net = openface.TorchNeuralNet(args.networkModel, args.imgDim)
def getRep(imgPath):
bgrImg = cv2.imread(imgPath)
rgbImg = cv2.cvtColor(bgrImg, cv2.COLOR_BGR2RGB)
bb = align.getLargestFaceBoundingBox(rgbImg)
alignedFace = align.align(args.imgDim, rgbImg, bb,
landmarkIndices=openface.AlignDlib.OUTER_EYES_AND_NOSE)
rep = net.forward(alignedFace)
return rep
d = getRep(img1) - getRep(img2)
print("distance between representations: {:0.3f}".format(np.dot(d, d)))
Full code can be found here
Summery
● Assuming we know who’s gonna be in the video, we download images of
these people
● We run over the video frame - by - frame:
○ For each frame, search for faces
■ For each face -
● Make some image manipulation to align the face image
● Get its representation from the neural network (openface)
● Compare the representation with the representation of the pre-downloaded images
Object Tracking
(or: Why recognition over video is different from loop
over image recognition algorithm)
Problem Definition
● We are good at finding frontal faces, but not profile faces
○ There are some models that support profile pictures as well
● It is problematic to compare profile pictures
○ We need to train a model (is there data scientist in the room?)
○ We need to have too many profile pictures…
● What if our dear president-elect decides to turn around?
Object Tracking
● Dlib have an API for tracking objects
● We need to run forward and backward once we find a face
● Problem: if there is a camera cut in the middle, it doesn’t know.
video = cv2.VideoCapture(video_path)
video.set(3, cv2.cv.CV_CAP_PROP_FRAME_WIDTH)
video.set(4, cv2.cv.CV_CAP_PROP_FRAME_HEIGHT)
tracker = dlib.correlation_tracker()
ret, frame = video.read()
tracker.start_track(frame, face_rectangle)
while True:
ret, frame = video.read()
if frame is None:
break
tracker.update(frame)
pos = tracker.get_position()
bl = (int(pos.left()), int(pos.bottom()))
tr = (int(pos.right()), int(pos.top()))
cv2.rectangle(frame, bl, tr, color=(153, 255, 204), thickness=3)
cv2.imshow('video', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
video.release()
cv2.destoryAllWindows()
Dlib is a modern C++ toolkit containing
machine learning algorithms and tools
for creating complex software in C++ to
solve real world problems. It is used in
both industry and academia in a wide
range of domains including robotics,
embedded devices, mobile phones, and
large high performance computing
environments
From dlib website:
Shot Boundaries
Detection
(last known stop in our journey)
Movie Shots
● We need it in order to cut the object trackers
● Shot types:
○ Camera cut
○ Dissolve
○ Wipe
○ Fade-in / Fade out
● Tools that do shot detection:
○ Ffmpg
○ Scene Segmentation
● Not good enough...
Comparison Metrics
● Color histogram
Comparison Metrics
● Edge Change Ratio - Compare the in-pixels and out-pixels
Frame # NFrame # N -1
Considerations (ok ok , and some code…)
● Thresholds for shot change
● Compare every two consecutive frames, or distant frames
● Do we prefer more shots (maybe wrong ones), or less shots (and miss ones)
● Check the complete frame, or the tracked object square
● Crop the image before comparison (prevent subtitles, logo noises, etc.)
● What will happen if a cat is sitting on a table, and then jumps?
● ECR doesn’t have much effect. But it’s cool!
● ECR code here
So Where are We Standing?
● Problems with model (= neural network)
○ Grayscale images
○ Colored people
● We need validation of 3rd party
○ But not on all frames
● We want to build an images database
● Hardware requirements are very high
○ Maybe we will process only ‘important videos’
Q&A

Mais conteúdo relacionado

Semelhante a People detection in a video

Using the code below- I need help with creating code for the following.pdf
Using the code below- I need help with creating code for the following.pdfUsing the code below- I need help with creating code for the following.pdf
Using the code below- I need help with creating code for the following.pdfacteleshoppe
 
426 lecture 4: AR Developer Tools
426 lecture 4: AR Developer Tools426 lecture 4: AR Developer Tools
426 lecture 4: AR Developer ToolsMark Billinghurst
 
License Plate Recognition System
License Plate Recognition System License Plate Recognition System
License Plate Recognition System Hira Rizvi
 
building_games_with_ruby_rubyconf
building_games_with_ruby_rubyconfbuilding_games_with_ruby_rubyconf
building_games_with_ruby_rubyconftutorialsruby
 
building_games_with_ruby_rubyconf
building_games_with_ruby_rubyconfbuilding_games_with_ruby_rubyconf
building_games_with_ruby_rubyconftutorialsruby
 
Advanced Game Development with the Mobile 3D Graphics API
Advanced Game Development with the Mobile 3D Graphics APIAdvanced Game Development with the Mobile 3D Graphics API
Advanced Game Development with the Mobile 3D Graphics APITomi Aarnio
 
A Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth EstimationA Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth EstimationRyo Takahashi
 
project_final_seminar
project_final_seminarproject_final_seminar
project_final_seminarMUKUL BICHKAR
 
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understandingToru Tamaki
 
The not so short
The not so shortThe not so short
The not so shortAXM
 
A Bizarre Way to do Real-Time Lighting
A Bizarre Way to do Real-Time LightingA Bizarre Way to do Real-Time Lighting
A Bizarre Way to do Real-Time LightingSteven Tovey
 
Need helping adding to the code below to plot the images from the firs.pdf
Need helping adding to the code below to plot the images from the firs.pdfNeed helping adding to the code below to plot the images from the firs.pdf
Need helping adding to the code below to plot the images from the firs.pdfactexerode
 
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkitde:code 2017
 
Python tools to deploy your machine learning models faster
Python tools to deploy your machine learning models fasterPython tools to deploy your machine learning models faster
Python tools to deploy your machine learning models fasterJeff Hale
 
2 d gameplaytutorial
2 d gameplaytutorial2 d gameplaytutorial
2 d gameplaytutorialunityshare
 
Doom Technical Review
Doom Technical ReviewDoom Technical Review
Doom Technical ReviewAli Salehi
 
PyCon 2012: Militarizing Your Backyard: Computer Vision and the Squirrel Hordes
PyCon 2012: Militarizing Your Backyard: Computer Vision and the Squirrel HordesPyCon 2012: Militarizing Your Backyard: Computer Vision and the Squirrel Hordes
PyCon 2012: Militarizing Your Backyard: Computer Vision and the Squirrel Hordeskgrandis
 

Semelhante a People detection in a video (20)

Using the code below- I need help with creating code for the following.pdf
Using the code below- I need help with creating code for the following.pdfUsing the code below- I need help with creating code for the following.pdf
Using the code below- I need help with creating code for the following.pdf
 
05-Debug.pdf
05-Debug.pdf05-Debug.pdf
05-Debug.pdf
 
426 lecture 4: AR Developer Tools
426 lecture 4: AR Developer Tools426 lecture 4: AR Developer Tools
426 lecture 4: AR Developer Tools
 
License Plate Recognition System
License Plate Recognition System License Plate Recognition System
License Plate Recognition System
 
engine terminology 2
 engine terminology 2 engine terminology 2
engine terminology 2
 
building_games_with_ruby_rubyconf
building_games_with_ruby_rubyconfbuilding_games_with_ruby_rubyconf
building_games_with_ruby_rubyconf
 
building_games_with_ruby_rubyconf
building_games_with_ruby_rubyconfbuilding_games_with_ruby_rubyconf
building_games_with_ruby_rubyconf
 
Advanced Game Development with the Mobile 3D Graphics API
Advanced Game Development with the Mobile 3D Graphics APIAdvanced Game Development with the Mobile 3D Graphics API
Advanced Game Development with the Mobile 3D Graphics API
 
A Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth EstimationA Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth Estimation
 
project_final_seminar
project_final_seminarproject_final_seminar
project_final_seminar
 
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
動画像理解のための深層学習アプローチ Deep learning approaches to video understanding
 
OpenCV+Android.pptx
OpenCV+Android.pptxOpenCV+Android.pptx
OpenCV+Android.pptx
 
The not so short
The not so shortThe not so short
The not so short
 
A Bizarre Way to do Real-Time Lighting
A Bizarre Way to do Real-Time LightingA Bizarre Way to do Real-Time Lighting
A Bizarre Way to do Real-Time Lighting
 
Need helping adding to the code below to plot the images from the firs.pdf
Need helping adding to the code below to plot the images from the firs.pdfNeed helping adding to the code below to plot the images from the firs.pdf
Need helping adding to the code below to plot the images from the firs.pdf
 
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit[AI07] Revolutionizing Image Processing with Cognitive Toolkit
[AI07] Revolutionizing Image Processing with Cognitive Toolkit
 
Python tools to deploy your machine learning models faster
Python tools to deploy your machine learning models fasterPython tools to deploy your machine learning models faster
Python tools to deploy your machine learning models faster
 
2 d gameplaytutorial
2 d gameplaytutorial2 d gameplaytutorial
2 d gameplaytutorial
 
Doom Technical Review
Doom Technical ReviewDoom Technical Review
Doom Technical Review
 
PyCon 2012: Militarizing Your Backyard: Computer Vision and the Squirrel Hordes
PyCon 2012: Militarizing Your Backyard: Computer Vision and the Squirrel HordesPyCon 2012: Militarizing Your Backyard: Computer Vision and the Squirrel Hordes
PyCon 2012: Militarizing Your Backyard: Computer Vision and the Squirrel Hordes
 

Último

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 

Último (20)

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 

People detection in a video

  • 1. Detect Known People in a Video Yonatan Katz My journey to
  • 3. The Problem: When a specific person appears in a video? D. Trump: [0:07- 1:23, 1:52-2:03] B. Obama (nickname: Obamush): [0:07- 1:23]
  • 4. Journey Outline 1. We will parse the video into frames 2. We will detect faces in the frame 3. We will try to recognize the faces 4. We will track the faces back and forth in the video a. We will split the video into shots
  • 5. Parsing the video (or: choosing the technology)
  • 6. ● Why Python? ● OpenCV ● NumPy ● Code example: video = cv2.VideoCapture(video_path) video.set(3, cv2.cv.CV_CAP_PROP_FRAME_WIDTH) video.set(4, cv2.cv.CV_CAP_PROP_FRAME_HEIGHT) while True: ret, frame = video.read() if frame is None: break cv2.imshow('video', frame) if cv2.waitKey(1) & 0xFF == ord('q'): break video.release() cv2.destoryAllWindows()
  • 8. Let’s examine the code first win = dlib.image_window() image = io.imread(file_name) face_detector = dlib.get_frontal_face_detector() detected_faces = face_detector(image, 1) win.set_image(image) for i, face_rect in enumerate(detected_faces): win.add_overlay(face_rect) dlib.hit_enter_to_continue() The MAGIC is here. You don’t need to invent anything
  • 9. But how does it really work?? Taken from this great meduim post 1. Convert to grayscale image 2. Look at every pixel, and the pixels surrounding it
  • 10. But how does it really work?? Taken from this great meduim post 3. Find the direction where pixels become darker
  • 11. But how does it really work?? Taken from this great meduim post 4. Convert the image to “darker vectors” ONLY THE “DARKNESS RATIO” METTERS - works on both dark and bright images!
  • 12. But how does it really work?? Taken from this great meduim post 6. Compare patterns! 5. Reduce the size of the vector
  • 13. Recognize Faces (Deep learning. Not only a buzzword)
  • 14. Intro to machine learning 1. Train: a. Find the data that may affect the end result (“features”) b. Train a model that takes as an input: i. List of features ii. The end result (“label”) c. Get the weights for each feature 2. Test: a. Apply the weights on the your data b. Compute the most relevant result I’m a man. 32 years old. I watched 32 drama movies, 3 comedy movies (in average, I saw only 75 % of these boring movies) and no action movie. What youtube will recommend me? 1. Borat 2. Hit 3. Titanik Do you want to be data scientist? 13 x Feature1 + 5 x Feature2…. = score
  • 15. Intro to deep learning ● How does a child learn to ride a bicycle? ● Neural network is trying to imitate a man learning process ● Invented by psychologist - ‫עושים‬‫היסטוריה‬
  • 16. Deep Learning in computer vision ● Classic problem: what is this number? ● Are these images represent the same number?
  • 17. Back to our journey ● The problem: recognize people Donald Trump of course! KE’ILU DA! I have no idea. But he is pretty similar to this weirdo guy:
  • 18. Moment before we jump into code... ● In order to compare faces, we need to center the face (“apples to apples”) ● In order to do saw, we need to find landmarks Alignment code example can be found here
  • 19. From their website: OpenFace is a Python and Torch implementation of face recognition with deep neural networks and is based on the CVPR 2015 paper FaceNet: A Unified Embedding for Face Recognition and Clustering by Florian Schroff, Dmitry Kalenichenko, and James Philbin at Google. Torch allows the network to be executed on a CPU or with CUDA. Nightmare to install :(
  • 20. Finally - CODE ! align = openface.AlignDlib(args.dlibFacePredictor) net = openface.TorchNeuralNet(args.networkModel, args.imgDim) def getRep(imgPath): bgrImg = cv2.imread(imgPath) rgbImg = cv2.cvtColor(bgrImg, cv2.COLOR_BGR2RGB) bb = align.getLargestFaceBoundingBox(rgbImg) alignedFace = align.align(args.imgDim, rgbImg, bb, landmarkIndices=openface.AlignDlib.OUTER_EYES_AND_NOSE) rep = net.forward(alignedFace) return rep d = getRep(img1) - getRep(img2) print("distance between representations: {:0.3f}".format(np.dot(d, d))) Full code can be found here
  • 21. Summery ● Assuming we know who’s gonna be in the video, we download images of these people ● We run over the video frame - by - frame: ○ For each frame, search for faces ■ For each face - ● Make some image manipulation to align the face image ● Get its representation from the neural network (openface) ● Compare the representation with the representation of the pre-downloaded images
  • 22. Object Tracking (or: Why recognition over video is different from loop over image recognition algorithm)
  • 23. Problem Definition ● We are good at finding frontal faces, but not profile faces ○ There are some models that support profile pictures as well ● It is problematic to compare profile pictures ○ We need to train a model (is there data scientist in the room?) ○ We need to have too many profile pictures… ● What if our dear president-elect decides to turn around?
  • 24.
  • 25. Object Tracking ● Dlib have an API for tracking objects ● We need to run forward and backward once we find a face ● Problem: if there is a camera cut in the middle, it doesn’t know. video = cv2.VideoCapture(video_path) video.set(3, cv2.cv.CV_CAP_PROP_FRAME_WIDTH) video.set(4, cv2.cv.CV_CAP_PROP_FRAME_HEIGHT) tracker = dlib.correlation_tracker() ret, frame = video.read() tracker.start_track(frame, face_rectangle) while True: ret, frame = video.read() if frame is None: break tracker.update(frame) pos = tracker.get_position() bl = (int(pos.left()), int(pos.bottom())) tr = (int(pos.right()), int(pos.top())) cv2.rectangle(frame, bl, tr, color=(153, 255, 204), thickness=3) cv2.imshow('video', frame) if cv2.waitKey(1) & 0xFF == ord('q'): break video.release() cv2.destoryAllWindows() Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real world problems. It is used in both industry and academia in a wide range of domains including robotics, embedded devices, mobile phones, and large high performance computing environments From dlib website:
  • 27. Movie Shots ● We need it in order to cut the object trackers ● Shot types: ○ Camera cut ○ Dissolve ○ Wipe ○ Fade-in / Fade out ● Tools that do shot detection: ○ Ffmpg ○ Scene Segmentation ● Not good enough...
  • 29. Comparison Metrics ● Edge Change Ratio - Compare the in-pixels and out-pixels Frame # NFrame # N -1
  • 30. Considerations (ok ok , and some code…) ● Thresholds for shot change ● Compare every two consecutive frames, or distant frames ● Do we prefer more shots (maybe wrong ones), or less shots (and miss ones) ● Check the complete frame, or the tracked object square ● Crop the image before comparison (prevent subtitles, logo noises, etc.) ● What will happen if a cat is sitting on a table, and then jumps? ● ECR doesn’t have much effect. But it’s cool! ● ECR code here
  • 31. So Where are We Standing? ● Problems with model (= neural network) ○ Grayscale images ○ Colored people ● We need validation of 3rd party ○ But not on all frames ● We want to build an images database ● Hardware requirements are very high ○ Maybe we will process only ‘important videos’
  • 32. Q&A