SlideShare uma empresa Scribd logo
1 de 33
Baixar para ler offline
Breaking Through the Challenges of
Scalable Deep Learning for
Video Analytics
Steven Flores, sflores@compthree.com
Luke Hosking, lhosking@compthree.com
Use cases
A customer is somebody with a lot of unannotated video whose content they
want annotated and indexed into a searchable database. For example,
● Media: video library going back decades.
● Research institutions: video from a lecture series.
● Management and HR: conference/meetings notes.
What info do we want from video?
● What and who is in the video?
● What happens in the video?
● What is the video about?
(Example here: https://www.youtube.com/watch?v=X3a-ZX6ObJU)
Information from audio
● Topic modeling speech transcripts.
● Sentiment analysis of speech transcripts.
● Hot language and/or loud sounds heat map.
● Keywords (named entities) from transcripts. The Federal Reserve is widely expected to
increase interest rates again Wednesday...
Politics and policy
Sports
Science and
Technology
Using keywords to extract info
Within transcripts, keywords such as people, locations, organizations, and
geo-political entities carry much of the latent information we seek from a video.
For example, a video transcript containing the excerpt
...probably confirm the North Korean side in its willingness…
should appear if we search for the term “North Korea.” Also, the presence of
this term, along with other keywords, may support a topic assignment.
Keyword extraction
Keyword extraction can be a difficult problem. Free extractors always come
with their own ridgid taxonomy and may not be production quality:
For example, with the python natural language toolkit (NLTK)...
...probably confirm the North Korean side in its willingness…
Geo-socio-political group Geo-political entity
Using a human-curated whitelist
We maintain a “whitelist” of extracted keywords. This solves two problems:
● Quality control supervision of proposed keywords.
● Better custom keyword taxonomies are assigned to keywords on the list.
NLTK finds “North Korean” in the text, and we find it in the whitelist with its tag
...probably confirm the North Korean side in its willingness…
Ethnicity
But we have two more problems:
● Human supervision is time-consuming (prohibitively so with a large list).
● This doesn’t solve the case of a keyword phrase incorrectly split by NLTK.
Building a custom keyword extractor
The article Natural Language Processing (almost) from Scratch (R. Collobert et
al. 2011) introduces the “senna” named entity (keyword) extractor:
● A two-layer fully connected neural network.
● For each word, the input is its surrounding “context” words in the text.
● Input context words are mapped to 50-dim vectors in a word2vec model.
cat
sat
on
the
mat
I
O
E
B
S
The senna architecture
Natural Language Processing (almost) from Scratch (R. Collobert et al. 2011)
Senna architecture advantages
● Results are often better than NLTK, thus requiring less human supervision.
● Minimal text preprocessing (for example, no chunking) is required.
● Because input is context-based, it may be possible to train a senna
network with automatically generated partially-annotated training data.
● With greater ease of generating training data, we can train keyword
extractors that are tailored to customer needs (taxonomy, jargon, etc.).
Sentiment heat maps
Sentiment heat maps indicate areas of potentially high interest in the video.
● Based on word sentiment and heated language.
● This may not be sufficient. We can also incorporate information from the
audio stream, such as loudness, to indicate areas of interest.
Challenges and future work
Keyword extraction:
● Adapting the senna model for in-house custom keyword extractors.
● Improving keyword extraction for “messy” spoken-language transcripts.
● How to quickly create training data for customer-dependent taxonomies?
Topic modeling:
● Supervised for customer-dependent topics?
● Unsupervised if the user wants to discover unknown information?
● How to do good topic modeling for “messy” spoken-language transcripts?
Information from video
● Object detection
● Face recognition
● Scene recognition
Object detection
Performing object detection on frames tells you what objects appear in a video:
We use various pre-trained models from the TensorFlow detection model zoo.
Challenges with object detection
Freely-available object detection models based on ResNet and Inception
architectures are production quality. Nonetheless, there are some challenges:
● What objects do we want to detect? Is this customer dependent?
● How to we create enough training data to build custom models quickly?
Scene recognition
We train a wide-ResNet model (S. Zagoruyko et al. 2016) to recognize scenes:
We train the network using the Places365 dataset with consolidated scene
categories (for example, not distinguishing stores based on their interiors).
Face recognition
A face recognition model require millions of faces for training and comprises
many steps: face detection, cropping and re-scaling, and classification.
To train such a model from scratch is very time-consuming. However, near
state-of-the art models are freely available. We are using dlib face recognition.
Face embeddings
Rather than simply recognize faces from a small list of people, most face
recognition models are trained to give good face-to-vector embeddings.
The model user then provides a list of images of faces to recognize, the model
maps the faces to vectors, and query faces are identified via k-nn search.
Who should we recognize?
What faces should we recognize? The answer may be customer dependent:
In generic situations, we should recognize people who are “famous enough”
(well-known politicians, celebrities, artists, scientists, thought-leaders, etc.)
What constitutes famous enough? How do we make a list of their names?
Given the list of names, how do we get enough pictures of their faces?
Steven Flores
(Engineer, Comp Three)
Luke Hosking
(Engineer, Comp Three)
Famous enough?
Our criteria for “famous enough” is partly set by our need to get a list of names
of such famous people: famous = has a wikipedia biography with birthday.
We can easily pull this list of famous people from the wikidata API. We record
each person’s name, birthday, occupation(s), and wikipedia page address.
Brad Pitt is in... Rich Skrenta is out (no b-day on wikipedia)
The gallery problem
Many state of the art facial recognition systems are still not good at picking the
correct face from a large gallery of faces. They generate many false positives.
The rank-1 accuracy decreases as the gallery “distractor” face count increases. (The MegaFace
Benchmark: 1 Million Faces for Recognition at Scale, I. Kemelmacher-Shlizerman et al. 2015)
A potential solution...
Given some faces each with a list of candidate names, use other information
(topic modeling, co-occurrence frequency) to find optimal name assignments:
On the left, Idina Menzel is correctly tagged. On the right, Amy Grant is wrongly
tagged “Fanny Cadeo;” her name is the second choice based on the image.
Use the fact that both are musicians to correct the second tag to “Amy Grant.”
Processing time considerations
● Estimated size of a “large” video cache: 40,000
● Number of frames in a typical 30 second video: 750
● Average video frame processing time (GTX 1080 GPU): about 1 second
→ Estimated time to process the entire video cache: almost one year...
The long time to process this hypothetical video cache is way too long!
Solution: only sample video keyframes (frames at shot changes or high-action
moments). These may contain most of the relevant information. For example,
● https://www.youtube.com/watch?v=_7WZ74F3j_I: 2650 frames
● Number of “irregularly spaced” keyframes processed: 10 keyframes
Challenges and future work
Object detection and scene recognition:
● What do we want to detect? (Customer-dependent?)
● How to we generate enough training data quickly and efficiently?
● What benchmarks do we need to hit for production quality?
Face recognition:
● Who can we / do we want to detect? (Customer-dependent?)
● How can we use other information to improve face-to-name assignments?
● What benchmarks do we need to hit for production quality?
Scalability:
● How can we speed up the wait time for image evaluation?
● What tradeoffs must we make to minimize video processing time?
● What can we trim without compromising performance benchmarks?
Augi Demo
Digital Ocean Instance
Docker Host
Augi Real-time Components
Port 5000
Augi Backend
Port 5001
Text Annotator
index.html
bundle.js
Port 80
Nginx
Elasticsearch
File System
Video Object
Store
Port 9200/videos/
Port 5002
Image Service
Real-time Technologies
Frontend
● React
● Apollo
● ChartJS
Backend
● Flask
● Graphene
● Elasticsearch Client
Microservices
Augi Preprocessing Pipeline
Python Code
Video Frame
Sampling
Transcript
Extractor
Audio
Extractor
Elasticsearch
Text
Annotation
Video Store on
File System
Classify
Image
DataConsolidation
ESDocumentInsert
LoopOverVideos
Preprocessing Technologies
● Core pipeline
○ ffmpeg
○ Google Cloud Speech
○ Amazon S3
○ Elasticsearch
● Image classification
○ Tensorflow
○ dlib
○ flask
● Text annotation
○ pygtrie
○ flask
Where the magic happens
Augi Preprocessing Workflow
Python scripts
● download videos and video metadata (youtube, proprietary APIs)
● manage overall process for list of videos to be enriched
Docker
● text Annotator
● image Classifier
Modular architecture
● file system based cache
● orchestration with override flags
Challenges
Iterative development over tens, to hundreds, of thousands of videos
File system based cache of data produced by each step in preprocessing,
along with granular overrides for each preprocessing method, allow for targeted
testing and implementation.
On-prem challenge: no internet access
We needed the architecture to be usable on-prem for clients that require data
security (confidential/healthcare sectors). Current external services used are
Google Cloud Speech and AWS S3, disk storage and products like Nuance
Dragon could be run on-prem.
Questions?

Mais conteúdo relacionado

Semelhante a Breaking Through The Challenges of Scalable Deep Learning for Video Analytics

Dl applicationlandscape-mar2018-180405144127
Dl applicationlandscape-mar2018-180405144127Dl applicationlandscape-mar2018-180405144127
Dl applicationlandscape-mar2018-180405144127Aravindharamanan S
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...Daniel Zivkovic
 
Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...HRITIKKHURANA1
 
Google Cloud: Data Analysis and Machine Learningn Technologies
Google Cloud: Data Analysis and Machine Learningn Technologies Google Cloud: Data Analysis and Machine Learningn Technologies
Google Cloud: Data Analysis and Machine Learningn Technologies Andrés Leonardo Martinez Ortiz
 
Xuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsXuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsMachine Learning Prague
 
Technology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and BeyondTechnology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and BeyondJames Huang
 
Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Dhruv Gohil
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTrivadis
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2Sara Hooker
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014Paris Open Source Summit
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingSeth Grimes
 
Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geißler
 
Chatbots and Natural Language Generation - A Bird Eyes View
Chatbots and Natural Language Generation - A Bird Eyes ViewChatbots and Natural Language Generation - A Bird Eyes View
Chatbots and Natural Language Generation - A Bird Eyes ViewMark Cieliebak
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)HPCC Systems
 
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdfITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdfOrtus Solutions, Corp
 
Career in Software Development
Career in Software Development  Career in Software Development
Career in Software Development neosphere
 

Semelhante a Breaking Through The Challenges of Scalable Deep Learning for Video Analytics (20)

Dl applicationlandscape-mar2018-180405144127
Dl applicationlandscape-mar2018-180405144127Dl applicationlandscape-mar2018-180405144127
Dl applicationlandscape-mar2018-180405144127
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...Info Session : University Institute of engineering and technology , Kurukshet...
Info Session : University Institute of engineering and technology , Kurukshet...
 
Google Cloud: Data Analysis and Machine Learningn Technologies
Google Cloud: Data Analysis and Machine Learningn Technologies Google Cloud: Data Analysis and Machine Learningn Technologies
Google Cloud: Data Analysis and Machine Learningn Technologies
 
Demo day
Demo dayDemo day
Demo day
 
Xuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent ApplicationsXuedong Huang - Deep Learning and Intelligent Applications
Xuedong Huang - Deep Learning and Intelligent Applications
 
Technology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and BeyondTechnology and AI sharing - From 2016 to Y2017 and Beyond
Technology and AI sharing - From 2016 to Y2017 and Beyond
 
Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical Nautral Langauge Processing - Basics / Non Technical
Nautral Langauge Processing - Basics / Non Technical
 
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - TrivadisTechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
TechEvent 2019: Artificial Intelligence in Dev & Ops; Martin Luckow - Trivadis
 
Module 9: Natural Language Processing Part 2
Module 9:  Natural Language Processing Part 2Module 9:  Natural Language Processing Part 2
Module 9: Natural Language Processing Part 2
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 
Recent Advances in Natural Language Processing
Recent Advances in Natural Language ProcessingRecent Advances in Natural Language Processing
Recent Advances in Natural Language Processing
 
Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019 Stefan Geissler kairntech - SDC Nice Apr 2019
Stefan Geissler kairntech - SDC Nice Apr 2019
 
Dato Keynote
Dato KeynoteDato Keynote
Dato Keynote
 
Chatbots and Natural Language Generation - A Bird Eyes View
Chatbots and Natural Language Generation - A Bird Eyes ViewChatbots and Natural Language Generation - A Bird Eyes View
Chatbots and Natural Language Generation - A Bird Eyes View
 
Machine learning 101 Talk at Freshworks
Machine learning 101 Talk at FreshworksMachine learning 101 Talk at Freshworks
Machine learning 101 Talk at Freshworks
 
AI 2023.pdf
AI 2023.pdfAI 2023.pdf
AI 2023.pdf
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)
 
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdfITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
ITB_2023_Chatgpt_Box_Scott_Steinbeck.pdf
 
Career in Software Development
Career in Software Development  Career in Software Development
Career in Software Development
 

Último

%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxAnnaArtyushina1
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
tonesoftg
tonesoftgtonesoftg
tonesoftglanshi9
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationJuha-Pekka Tolvanen
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 

Último (20)

%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
WSO2CON 2024 - WSO2's Digital Transformation Journey with Choreo: A Platforml...
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 

Breaking Through The Challenges of Scalable Deep Learning for Video Analytics

  • 1. Breaking Through the Challenges of Scalable Deep Learning for Video Analytics Steven Flores, sflores@compthree.com Luke Hosking, lhosking@compthree.com
  • 2. Use cases A customer is somebody with a lot of unannotated video whose content they want annotated and indexed into a searchable database. For example, ● Media: video library going back decades. ● Research institutions: video from a lecture series. ● Management and HR: conference/meetings notes.
  • 3. What info do we want from video? ● What and who is in the video? ● What happens in the video? ● What is the video about? (Example here: https://www.youtube.com/watch?v=X3a-ZX6ObJU)
  • 4. Information from audio ● Topic modeling speech transcripts. ● Sentiment analysis of speech transcripts. ● Hot language and/or loud sounds heat map. ● Keywords (named entities) from transcripts. The Federal Reserve is widely expected to increase interest rates again Wednesday... Politics and policy Sports Science and Technology
  • 5. Using keywords to extract info Within transcripts, keywords such as people, locations, organizations, and geo-political entities carry much of the latent information we seek from a video. For example, a video transcript containing the excerpt ...probably confirm the North Korean side in its willingness… should appear if we search for the term “North Korea.” Also, the presence of this term, along with other keywords, may support a topic assignment.
  • 6. Keyword extraction Keyword extraction can be a difficult problem. Free extractors always come with their own ridgid taxonomy and may not be production quality: For example, with the python natural language toolkit (NLTK)... ...probably confirm the North Korean side in its willingness… Geo-socio-political group Geo-political entity
  • 7. Using a human-curated whitelist We maintain a “whitelist” of extracted keywords. This solves two problems: ● Quality control supervision of proposed keywords. ● Better custom keyword taxonomies are assigned to keywords on the list. NLTK finds “North Korean” in the text, and we find it in the whitelist with its tag ...probably confirm the North Korean side in its willingness… Ethnicity But we have two more problems: ● Human supervision is time-consuming (prohibitively so with a large list). ● This doesn’t solve the case of a keyword phrase incorrectly split by NLTK.
  • 8. Building a custom keyword extractor The article Natural Language Processing (almost) from Scratch (R. Collobert et al. 2011) introduces the “senna” named entity (keyword) extractor: ● A two-layer fully connected neural network. ● For each word, the input is its surrounding “context” words in the text. ● Input context words are mapped to 50-dim vectors in a word2vec model. cat sat on the mat I O E B S
  • 9. The senna architecture Natural Language Processing (almost) from Scratch (R. Collobert et al. 2011)
  • 10. Senna architecture advantages ● Results are often better than NLTK, thus requiring less human supervision. ● Minimal text preprocessing (for example, no chunking) is required. ● Because input is context-based, it may be possible to train a senna network with automatically generated partially-annotated training data. ● With greater ease of generating training data, we can train keyword extractors that are tailored to customer needs (taxonomy, jargon, etc.).
  • 11. Sentiment heat maps Sentiment heat maps indicate areas of potentially high interest in the video. ● Based on word sentiment and heated language. ● This may not be sufficient. We can also incorporate information from the audio stream, such as loudness, to indicate areas of interest.
  • 12. Challenges and future work Keyword extraction: ● Adapting the senna model for in-house custom keyword extractors. ● Improving keyword extraction for “messy” spoken-language transcripts. ● How to quickly create training data for customer-dependent taxonomies? Topic modeling: ● Supervised for customer-dependent topics? ● Unsupervised if the user wants to discover unknown information? ● How to do good topic modeling for “messy” spoken-language transcripts?
  • 13. Information from video ● Object detection ● Face recognition ● Scene recognition
  • 14. Object detection Performing object detection on frames tells you what objects appear in a video: We use various pre-trained models from the TensorFlow detection model zoo.
  • 15. Challenges with object detection Freely-available object detection models based on ResNet and Inception architectures are production quality. Nonetheless, there are some challenges: ● What objects do we want to detect? Is this customer dependent? ● How to we create enough training data to build custom models quickly?
  • 16. Scene recognition We train a wide-ResNet model (S. Zagoruyko et al. 2016) to recognize scenes: We train the network using the Places365 dataset with consolidated scene categories (for example, not distinguishing stores based on their interiors).
  • 17. Face recognition A face recognition model require millions of faces for training and comprises many steps: face detection, cropping and re-scaling, and classification. To train such a model from scratch is very time-consuming. However, near state-of-the art models are freely available. We are using dlib face recognition.
  • 18. Face embeddings Rather than simply recognize faces from a small list of people, most face recognition models are trained to give good face-to-vector embeddings. The model user then provides a list of images of faces to recognize, the model maps the faces to vectors, and query faces are identified via k-nn search.
  • 19. Who should we recognize? What faces should we recognize? The answer may be customer dependent: In generic situations, we should recognize people who are “famous enough” (well-known politicians, celebrities, artists, scientists, thought-leaders, etc.) What constitutes famous enough? How do we make a list of their names? Given the list of names, how do we get enough pictures of their faces? Steven Flores (Engineer, Comp Three) Luke Hosking (Engineer, Comp Three)
  • 20. Famous enough? Our criteria for “famous enough” is partly set by our need to get a list of names of such famous people: famous = has a wikipedia biography with birthday. We can easily pull this list of famous people from the wikidata API. We record each person’s name, birthday, occupation(s), and wikipedia page address. Brad Pitt is in... Rich Skrenta is out (no b-day on wikipedia)
  • 21. The gallery problem Many state of the art facial recognition systems are still not good at picking the correct face from a large gallery of faces. They generate many false positives. The rank-1 accuracy decreases as the gallery “distractor” face count increases. (The MegaFace Benchmark: 1 Million Faces for Recognition at Scale, I. Kemelmacher-Shlizerman et al. 2015)
  • 22. A potential solution... Given some faces each with a list of candidate names, use other information (topic modeling, co-occurrence frequency) to find optimal name assignments: On the left, Idina Menzel is correctly tagged. On the right, Amy Grant is wrongly tagged “Fanny Cadeo;” her name is the second choice based on the image. Use the fact that both are musicians to correct the second tag to “Amy Grant.”
  • 23. Processing time considerations ● Estimated size of a “large” video cache: 40,000 ● Number of frames in a typical 30 second video: 750 ● Average video frame processing time (GTX 1080 GPU): about 1 second → Estimated time to process the entire video cache: almost one year... The long time to process this hypothetical video cache is way too long! Solution: only sample video keyframes (frames at shot changes or high-action moments). These may contain most of the relevant information. For example, ● https://www.youtube.com/watch?v=_7WZ74F3j_I: 2650 frames ● Number of “irregularly spaced” keyframes processed: 10 keyframes
  • 24. Challenges and future work Object detection and scene recognition: ● What do we want to detect? (Customer-dependent?) ● How to we generate enough training data quickly and efficiently? ● What benchmarks do we need to hit for production quality? Face recognition: ● Who can we / do we want to detect? (Customer-dependent?) ● How can we use other information to improve face-to-name assignments? ● What benchmarks do we need to hit for production quality? Scalability: ● How can we speed up the wait time for image evaluation? ● What tradeoffs must we make to minimize video processing time? ● What can we trim without compromising performance benchmarks?
  • 26. Digital Ocean Instance Docker Host Augi Real-time Components Port 5000 Augi Backend Port 5001 Text Annotator index.html bundle.js Port 80 Nginx Elasticsearch File System Video Object Store Port 9200/videos/ Port 5002 Image Service
  • 27. Real-time Technologies Frontend ● React ● Apollo ● ChartJS Backend ● Flask ● Graphene ● Elasticsearch Client
  • 28. Microservices Augi Preprocessing Pipeline Python Code Video Frame Sampling Transcript Extractor Audio Extractor Elasticsearch Text Annotation Video Store on File System Classify Image DataConsolidation ESDocumentInsert LoopOverVideos
  • 29. Preprocessing Technologies ● Core pipeline ○ ffmpeg ○ Google Cloud Speech ○ Amazon S3 ○ Elasticsearch ● Image classification ○ Tensorflow ○ dlib ○ flask ● Text annotation ○ pygtrie ○ flask
  • 30. Where the magic happens
  • 31. Augi Preprocessing Workflow Python scripts ● download videos and video metadata (youtube, proprietary APIs) ● manage overall process for list of videos to be enriched Docker ● text Annotator ● image Classifier Modular architecture ● file system based cache ● orchestration with override flags
  • 32. Challenges Iterative development over tens, to hundreds, of thousands of videos File system based cache of data produced by each step in preprocessing, along with granular overrides for each preprocessing method, allow for targeted testing and implementation. On-prem challenge: no internet access We needed the architecture to be usable on-prem for clients that require data security (confidential/healthcare sectors). Current external services used are Google Cloud Speech and AWS S3, disk storage and products like Nuance Dragon could be run on-prem.