SlideShare uma empresa Scribd logo
1 de 62
Deep Representation: Building
a Semantic Image Search
Engine
Emmanuel Ameisen
InfoQ.com: News & Community Site
• Over 1,000,000 software developers, architects and CTOs read the site world-
wide every month
• 250,000 senior developers subscribe to our weekly newsletter
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• 2 dedicated podcast channels: The InfoQ Podcast, with a focus on
Architecture and The Engineering Culture Podcast, with a focus on building
• 96 deep dives on innovative topics packed as downloadable emags and
minibooks
• Over 40 new content items per week
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
semantic-search-engine
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Presented at QCon San Francisco
www.qconsf.com
PINTEREST SEARCH
IMAGE SEARCH ENGINE
IMAGE TAGGING
thenextweb.com
BACKGROUND
▰ Why am I speaking about this?
ABOUT INSIGHT
DATA SCIENCE
DATA ENGINEERING
HEALTH DATA
ARTIFICIAL INTELLIGENCE
7-Week Fellowship in
+ REMOTE
SILICON VALLEY & SAN
FRANCISCO
NEW YORK
BOSTON
SEATTLE
PRODUCT MANAGEMENT
TORONTO
DEVOPS
www.insightdata.ai
INSIGHT DATA – FELLOW PROJECTS
FASHION CLASSIFIER AUTOMATIC REVIEW GENERATION READING TEXT IN VIDEOS
HEART SEGMENTATION SUPPORT REQUEST CLASSIFICATION SPEECH UNSAMPLING
1,600+
INSIGHT ALUMNI
INSIGHT FELLOWS ARE DATA SCIENTISTS AND DATA ENGINEERS
EVERYWHERE
400+
COMPANIES
ON THE MENU
▰ A quick overview of Computer Vision (CV) tasks and challenges
▰ Natural Language Processing (NLP) tasks and challenges
▰ Challenges in combining both
▰ Representations learning in CV
▰ Representation learning in NLP
▰ Combining both
ON THE MENU
▰ A quick overview of Computer Vision (CV) tasks and challenges
▰ Natural Language Processing (NLP) tasks and challenges
▰ Challenges in combining both
▰ Representations learning in CV
▰ Representation learning in NLP
▰ Combining both
▰ Massive models
▻ Dataset of 1M+images
▻ For multiple days
▰ Automates feature engineering
▰ Use cases
▻ Fashion
▻ Security
▻ Medicine
▻ …
CONVOLUTIONAL NEURAL NETWORKS
(CNN)
▰ Incorporates local and global information
▰ Use cases
▻ Medical
▻ Security
▻ Autonomous Vehicles
EXTRACTING INFORMATION
@arthur_ouaknine
▰ Pose Estimation
▰ Scene Parsing
▰ 3D Point cloud estimation
ADVANCED APPLICATIONS
Insight Fellow Project with Piccolo
Felipe Mejia
▰ A quick overview of Computer Vision (CV) tasks and challenges
▰ Natural Language Processing (NLP) tasks and challenges
▰ Challenges in combining both
▰ Representations learning in CV
▰ Representation learning in NLP
▰ Combining both
ON THE MENU
▰ Traditional NLP tasks
▻ Classification (sentiment analysis, spam detection, code classification)
▰ Extracting Information
▻ Named Entity Recognition, Information extraction
▰ Advanced applications
▻ Translation, sequence to sequence learning
NLP
▰ Sequence to sequence models are still often too
rough to be deployed, even with sizable datasets
▻ Recognized Tosh as a swear word
▰ They can be used efficiently for data augmentation
▻ Paired with other latent approaches
SENTENCE PARAPHRASING
Victor Suthichai
▰ A quick overview of Computer Vision (CV) tasks and challenges
▰ Natural Language Processing (NLP) tasks and challenges
▰ Challenges in combining both
▰ Representations learning in CV
▰ Representation learning in NLP
▰ Combining both
ON THE MENU
▰ Prime language model with features
extracted from CNN
▰ Feed to an NLP language model
▰ End-to-end
▻ Elegant
▻ Hard to debug and validate
▻ Hard to productionize
IMAGE CAPTIONING
A horse is standing in a field with a fence in the background.
CODE GENERATION
Ashwin Kumar
§ Harder problem for humans
- Anyone can describe an image
- Coding takes specific training
§ We can solve it using a similar model
§ The trick is in getting the data!
▰ These methods mix and match different architectures
▰ The combined representation is often learned implicitly
▻ Hard to cache and optimize to re-use across services
▻ Hard to validate and do QA on
▰ The models are entangled
▻ What if we want to learn a simple joint representation?
BUT DOES IT SCALE?
Image Search
Goals
§ Searching for similar images to an input image
- Computer Vision: (Image → Image)
§ Searching for images using text & generating tags for images
- Computer Vision + Natural Language Processing: (Image ↔ Text)
§ Bonus: finding similar words to an input word
- Natural Language Processing: (Text → Text)
▰ A quick overview of Computer Vision (CV) tasks and challenges
▰ Natural Language Processing (NLP) tasks and challenges
▰ Challenges in combining both
▰ Representations learning in CV
▰ Representation learning in NLP
▰ Combining both
ON THE MENU
Image Based Search
Let’s build this!
Dataset
Credit to Cyrus Rashtchian, PeterYoung, Micah Hodosh, and Julia Hockenmaier for the dataset.
§ 1000 images
- 20 classes, 50 images per class
§ 3 orders of magnitude smaller than usual deep
learning datasets
§ Noisy
WHICH CLASS?
DATA PROBLEMS
Bottle L
A FEW APPROACHES
§ Ways to think about searching for similar images
IF WE HAD INFINITE DATA
§ Train on all images
§ Pros:
- One Forward Pass (fast inference)
§ Cons:
- Hard too optimize
- Poor scaling
- Frequent Retraining
SIMILARITY MODEL
§ Train on each image pair
§ Pros:
- Scales to large datasets
§ Cons:
- Slow
- Does not work for text
- Needs good examples
EMBEDDING MODEL
§ Find embedding for each image
§ Calculate ahead of time
§ Pros:
- Scalable
- Fast
§ Cons:
- Simple representations
Mikolov et Al. 2013
WORD EMBEDDINGS
LEVERAGING A PRE-TRAINED MODEL
HOW AN EMBEDDING LOOKS
PROXIMITY SEARCH IS FAST
How do you find the 5 most similar images to a given one when you
have over a million users?
▰ Fast index search
▰ Spotify uses annoy (we will as well)
▰ Flickr uses LOPQ
▰ Nmslib is also very fast
▰ Some rely on making the queries approximate in order to make them
fast
PRETTY IMPRESSIVE!
IN OUT
FOCUSING OUR SEARCH
§ Sometimes we are only interested in part of the image.
§ For example, given an image of a cat and a bottle, we might be only interested in similar cats, not similar bottles.
§ How do we incorporate this information
IMPROVING RESULTS: STILL NO TRAINING
§ Computationally expensive approach:
- Object detection model first
- (We don’t do this)
- Image search on a cropped image
- (We don’t do this)
§ Semi-Supervised approach:
- Hacky, but efficient!
- re-weighing the activations
- Only use the class of interest to re-
weigh embeddings
EVEN BETTER
IN OUT
▰ A quick overview of Computer Vision (CV) tasks and challenges
▰ Natural Language Processing (NLP) tasks and challenges
▰ Challenges in combining both
▰ Representations learning in CV
▰ Representation learning in NLP
▰ Combining both
ON THE MENU
GENERALIZING
§ We have added some ability to guide the search, but it is limited to classes our model was initially trained on
§ We would like to be able to use any word
§ How do we combine words and images?
Mikolov et Al. 2013
WORD EMBEDDINGS
SEMANTIC TEXT!
§ Load a set of pre-trained vectors (GloVe)
- Wikipedia data
- Semantic relationships
§ One big issue:
- The embeddings for images are of size 4096
- While those for words are of size 300 
- And both models trained in a different fashion
§ What we need: Joint model!
▰ A quick overview of Computer Vision (CV) tasks and challenges
▰ Natural Language Processing (NLP) tasks and challenges
▰ Challenges in combining both
▰ Representations learning in CV
▰ Representation learning in NLP
▰ Combining both
ON THE MENU
Inspiration
TIME TO TRAIN
Image à Image Image à Text
IMAGE à TEXT
§ Re-train model to predict the word vector
- i.e. 300-length vector associated with cat
§ Training
- Takes more time per example than image à class
- But much faster than on Imagenet (7 hours, no GPU)
§ Important to note
- Training data can be very small: ~1000 images
- Miniscule compared to Imagenet (1+ Million images)
§ Once model is trained
- Build a new fast index of images
- Save to disk
How do you
think this
model will
perform?
IMAGE à TEXT
GENERALIZED IMAGE SEARCH WITH MINIMAL
DATA
IN:“DOG” OUT
SEARCH FOR WORD NOT IN DATASET
IN:“OCEAN” OUT
SEARCH FOR WORD NOT IN DATASET
IN:“STREET” OUT
MULTIPLE WORDS!
MULTIPLE WORDS!
IN:“CAT SOFA” OUT
Learn More:
Find the repo on Github!
Next steps
§ Incorporating user feedback
- Most real world image search systems use user clicks as a signal
§ Capturing domain specific aspects
- Often times, users have different meanings for similarity
§ Keep the conversation going
- Reach me on Twitter @EmmanuelAmeisen
www.insightdata.ai/apply
EMMANUEL AMEISEN
Head of AI, ML Engineer
@emmanuelameisen
emmanuel@insightdata.ai
bit.ly/imagefromscratch
CV Approaches
White-box Algorithms Black-Box Algorithms
@Andrey Nikishaev
▰ NLP Classification is generally more shallow
▻ Logistic Regression/Naïve Bayes
▻ Two layer CNN
▰ This is starting to change
▻ The triumph of pre-training and transfer learning
CLASSIFICATION
Watch the video with slide
synchronization on InfoQ.com!
https://www.infoq.com/presentations/
semantic-search-engine

Mais conteúdo relacionado

Semelhante a Building a Joint Image-Text Semantic Search Engine in Under 10 Lines of Code

Declarative programming v5
Declarative programming v5Declarative programming v5
Declarative programming v5jcarpioc
 
OOPs concepts.ppt
OOPs concepts.pptOOPs concepts.ppt
OOPs concepts.pptkrutika95
 
OOPs concepts.ppt
OOPs concepts.pptOOPs concepts.ppt
OOPs concepts.pptAkash553872
 
Nailing Distributed Development With Effective Collaboration - Matt Ryall
Nailing Distributed Development With Effective Collaboration - Matt RyallNailing Distributed Development With Effective Collaboration - Matt Ryall
Nailing Distributed Development With Effective Collaboration - Matt RyallAtlassian
 
Machine Learning for Designers
Machine Learning for DesignersMachine Learning for Designers
Machine Learning for DesignersMemi Beltrame
 
Agile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business Agility
Agile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business AgilityAgile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business Agility
Agile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business AgilityAgileNetwork
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondNUS-ISS
 
Empirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an OverviewEmpirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an Overviewalessio_ferrari
 
Spreadmart To Data Mart BISIG Presentation
Spreadmart To Data Mart BISIG PresentationSpreadmart To Data Mart BISIG Presentation
Spreadmart To Data Mart BISIG PresentationDan English
 
Introducción a NLP (Natural Language Processing) en Azure
Introducción a NLP (Natural Language Processing) en AzureIntroducción a NLP (Natural Language Processing) en Azure
Introducción a NLP (Natural Language Processing) en AzurePlain Concepts
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014Paris Open Source Summit
 
How to Internationalize Products by fmr Condé Nast Int. PM
How to Internationalize Products by fmr Condé Nast Int. PMHow to Internationalize Products by fmr Condé Nast Int. PM
How to Internationalize Products by fmr Condé Nast Int. PMProduct School
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchRachel Berryman
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPMachine Learning Prague
 
Week3 final
Week3 finalWeek3 final
Week3 finaleducw200
 

Semelhante a Building a Joint Image-Text Semantic Search Engine in Under 10 Lines of Code (20)

Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 
Declarative programming v5
Declarative programming v5Declarative programming v5
Declarative programming v5
 
OOPs concepts.ppt
OOPs concepts.pptOOPs concepts.ppt
OOPs concepts.ppt
 
OOPs concepts.ppt
OOPs concepts.pptOOPs concepts.ppt
OOPs concepts.ppt
 
OOPs concepts.ppt
OOPs concepts.pptOOPs concepts.ppt
OOPs concepts.ppt
 
OOPs concepts.ppt
OOPs concepts.pptOOPs concepts.ppt
OOPs concepts.ppt
 
OOPs concepts.ppt
OOPs concepts.pptOOPs concepts.ppt
OOPs concepts.ppt
 
Nailing Distributed Development With Effective Collaboration - Matt Ryall
Nailing Distributed Development With Effective Collaboration - Matt RyallNailing Distributed Development With Effective Collaboration - Matt Ryall
Nailing Distributed Development With Effective Collaboration - Matt Ryall
 
Machine Learning for Designers
Machine Learning for DesignersMachine Learning for Designers
Machine Learning for Designers
 
Agile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business Agility
Agile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business AgilityAgile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business Agility
Agile Mumbai 2022 - Vikesh Morye | Transfer Learning for Business Agility
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
 
Empirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an OverviewEmpirical Methods in Software Engineering - an Overview
Empirical Methods in Software Engineering - an Overview
 
Spreadmart To Data Mart BISIG Presentation
Spreadmart To Data Mart BISIG PresentationSpreadmart To Data Mart BISIG Presentation
Spreadmart To Data Mart BISIG Presentation
 
Introducción a NLP (Natural Language Processing) en Azure
Introducción a NLP (Natural Language Processing) en AzureIntroducción a NLP (Natural Language Processing) en Azure
Introducción a NLP (Natural Language Processing) en Azure
 
OWF14 - Big Data : The State of Machine Learning in 2014
OWF14 - Big Data : The State of Machine  Learning in 2014OWF14 - Big Data : The State of Machine  Learning in 2014
OWF14 - Big Data : The State of Machine Learning in 2014
 
How to Internationalize Products by fmr Condé Nast Int. PM
How to Internationalize Products by fmr Condé Nast Int. PMHow to Internationalize Products by fmr Condé Nast Int. PM
How to Internationalize Products by fmr Condé Nast Int. PM
 
From SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the SwitchFrom SQL to Python - A Beginner's Guide to Making the Switch
From SQL to Python - A Beginner's Guide to Making the Switch
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
Week3 final
Week3 finalWeek3 final
Week3 final
 

Mais de C4Media

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoC4Media
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileC4Media
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020C4Media
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsC4Media
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No KeeperC4Media
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like OwnersC4Media
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaC4Media
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideC4Media
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDC4Media
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine LearningC4Media
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsC4Media
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsC4Media
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerC4Media
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleC4Media
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeC4Media
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereC4Media
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing ForC4Media
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data EngineeringC4Media
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreC4Media
 

Mais de C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Último

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 

Último (20)

Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 

Building a Joint Image-Text Semantic Search Engine in Under 10 Lines of Code

  • 1. Deep Representation: Building a Semantic Image Search Engine Emmanuel Ameisen
  • 2. InfoQ.com: News & Community Site • Over 1,000,000 software developers, architects and CTOs read the site world- wide every month • 250,000 senior developers subscribe to our weekly newsletter • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • 2 dedicated podcast channels: The InfoQ Podcast, with a focus on Architecture and The Engineering Culture Podcast, with a focus on building • 96 deep dives on innovative topics packed as downloadable emags and minibooks • Over 40 new content items per week Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ semantic-search-engine
  • 3. Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide Presented at QCon San Francisco www.qconsf.com
  • 7. BACKGROUND ▰ Why am I speaking about this?
  • 8. ABOUT INSIGHT DATA SCIENCE DATA ENGINEERING HEALTH DATA ARTIFICIAL INTELLIGENCE 7-Week Fellowship in + REMOTE SILICON VALLEY & SAN FRANCISCO NEW YORK BOSTON SEATTLE PRODUCT MANAGEMENT TORONTO DEVOPS www.insightdata.ai
  • 9. INSIGHT DATA – FELLOW PROJECTS FASHION CLASSIFIER AUTOMATIC REVIEW GENERATION READING TEXT IN VIDEOS HEART SEGMENTATION SUPPORT REQUEST CLASSIFICATION SPEECH UNSAMPLING
  • 11. INSIGHT FELLOWS ARE DATA SCIENTISTS AND DATA ENGINEERS EVERYWHERE 400+ COMPANIES
  • 12. ON THE MENU ▰ A quick overview of Computer Vision (CV) tasks and challenges ▰ Natural Language Processing (NLP) tasks and challenges ▰ Challenges in combining both ▰ Representations learning in CV ▰ Representation learning in NLP ▰ Combining both
  • 13. ON THE MENU ▰ A quick overview of Computer Vision (CV) tasks and challenges ▰ Natural Language Processing (NLP) tasks and challenges ▰ Challenges in combining both ▰ Representations learning in CV ▰ Representation learning in NLP ▰ Combining both
  • 14. ▰ Massive models ▻ Dataset of 1M+images ▻ For multiple days ▰ Automates feature engineering ▰ Use cases ▻ Fashion ▻ Security ▻ Medicine ▻ … CONVOLUTIONAL NEURAL NETWORKS (CNN)
  • 15. ▰ Incorporates local and global information ▰ Use cases ▻ Medical ▻ Security ▻ Autonomous Vehicles EXTRACTING INFORMATION @arthur_ouaknine
  • 16. ▰ Pose Estimation ▰ Scene Parsing ▰ 3D Point cloud estimation ADVANCED APPLICATIONS Insight Fellow Project with Piccolo Felipe Mejia
  • 17. ▰ A quick overview of Computer Vision (CV) tasks and challenges ▰ Natural Language Processing (NLP) tasks and challenges ▰ Challenges in combining both ▰ Representations learning in CV ▰ Representation learning in NLP ▰ Combining both ON THE MENU
  • 18. ▰ Traditional NLP tasks ▻ Classification (sentiment analysis, spam detection, code classification) ▰ Extracting Information ▻ Named Entity Recognition, Information extraction ▰ Advanced applications ▻ Translation, sequence to sequence learning NLP
  • 19. ▰ Sequence to sequence models are still often too rough to be deployed, even with sizable datasets ▻ Recognized Tosh as a swear word ▰ They can be used efficiently for data augmentation ▻ Paired with other latent approaches SENTENCE PARAPHRASING Victor Suthichai
  • 20. ▰ A quick overview of Computer Vision (CV) tasks and challenges ▰ Natural Language Processing (NLP) tasks and challenges ▰ Challenges in combining both ▰ Representations learning in CV ▰ Representation learning in NLP ▰ Combining both ON THE MENU
  • 21. ▰ Prime language model with features extracted from CNN ▰ Feed to an NLP language model ▰ End-to-end ▻ Elegant ▻ Hard to debug and validate ▻ Hard to productionize IMAGE CAPTIONING A horse is standing in a field with a fence in the background.
  • 22. CODE GENERATION Ashwin Kumar § Harder problem for humans - Anyone can describe an image - Coding takes specific training § We can solve it using a similar model § The trick is in getting the data!
  • 23. ▰ These methods mix and match different architectures ▰ The combined representation is often learned implicitly ▻ Hard to cache and optimize to re-use across services ▻ Hard to validate and do QA on ▰ The models are entangled ▻ What if we want to learn a simple joint representation? BUT DOES IT SCALE?
  • 25. Goals § Searching for similar images to an input image - Computer Vision: (Image → Image) § Searching for images using text & generating tags for images - Computer Vision + Natural Language Processing: (Image ↔ Text) § Bonus: finding similar words to an input word - Natural Language Processing: (Text → Text)
  • 26. ▰ A quick overview of Computer Vision (CV) tasks and challenges ▰ Natural Language Processing (NLP) tasks and challenges ▰ Challenges in combining both ▰ Representations learning in CV ▰ Representation learning in NLP ▰ Combining both ON THE MENU
  • 28. Dataset Credit to Cyrus Rashtchian, PeterYoung, Micah Hodosh, and Julia Hockenmaier for the dataset. § 1000 images - 20 classes, 50 images per class § 3 orders of magnitude smaller than usual deep learning datasets § Noisy
  • 31. A FEW APPROACHES § Ways to think about searching for similar images
  • 32. IF WE HAD INFINITE DATA § Train on all images § Pros: - One Forward Pass (fast inference) § Cons: - Hard too optimize - Poor scaling - Frequent Retraining
  • 33. SIMILARITY MODEL § Train on each image pair § Pros: - Scales to large datasets § Cons: - Slow - Does not work for text - Needs good examples
  • 34. EMBEDDING MODEL § Find embedding for each image § Calculate ahead of time § Pros: - Scalable - Fast § Cons: - Simple representations
  • 35. Mikolov et Al. 2013 WORD EMBEDDINGS
  • 38. PROXIMITY SEARCH IS FAST How do you find the 5 most similar images to a given one when you have over a million users? ▰ Fast index search ▰ Spotify uses annoy (we will as well) ▰ Flickr uses LOPQ ▰ Nmslib is also very fast ▰ Some rely on making the queries approximate in order to make them fast
  • 40. FOCUSING OUR SEARCH § Sometimes we are only interested in part of the image. § For example, given an image of a cat and a bottle, we might be only interested in similar cats, not similar bottles. § How do we incorporate this information
  • 41. IMPROVING RESULTS: STILL NO TRAINING § Computationally expensive approach: - Object detection model first - (We don’t do this) - Image search on a cropped image - (We don’t do this) § Semi-Supervised approach: - Hacky, but efficient! - re-weighing the activations - Only use the class of interest to re- weigh embeddings
  • 43. ▰ A quick overview of Computer Vision (CV) tasks and challenges ▰ Natural Language Processing (NLP) tasks and challenges ▰ Challenges in combining both ▰ Representations learning in CV ▰ Representation learning in NLP ▰ Combining both ON THE MENU
  • 44. GENERALIZING § We have added some ability to guide the search, but it is limited to classes our model was initially trained on § We would like to be able to use any word § How do we combine words and images?
  • 45. Mikolov et Al. 2013 WORD EMBEDDINGS
  • 46. SEMANTIC TEXT! § Load a set of pre-trained vectors (GloVe) - Wikipedia data - Semantic relationships § One big issue: - The embeddings for images are of size 4096 - While those for words are of size 300  - And both models trained in a different fashion § What we need: Joint model!
  • 47. ▰ A quick overview of Computer Vision (CV) tasks and challenges ▰ Natural Language Processing (NLP) tasks and challenges ▰ Challenges in combining both ▰ Representations learning in CV ▰ Representation learning in NLP ▰ Combining both ON THE MENU
  • 49. TIME TO TRAIN Image à Image Image à Text
  • 50. IMAGE à TEXT § Re-train model to predict the word vector - i.e. 300-length vector associated with cat § Training - Takes more time per example than image à class - But much faster than on Imagenet (7 hours, no GPU) § Important to note - Training data can be very small: ~1000 images - Miniscule compared to Imagenet (1+ Million images) § Once model is trained - Build a new fast index of images - Save to disk How do you think this model will perform?
  • 52. GENERALIZED IMAGE SEARCH WITH MINIMAL DATA IN:“DOG” OUT
  • 53. SEARCH FOR WORD NOT IN DATASET IN:“OCEAN” OUT
  • 54. SEARCH FOR WORD NOT IN DATASET IN:“STREET” OUT
  • 57. Learn More: Find the repo on Github!
  • 58. Next steps § Incorporating user feedback - Most real world image search systems use user clicks as a signal § Capturing domain specific aspects - Often times, users have different meanings for similarity § Keep the conversation going - Reach me on Twitter @EmmanuelAmeisen
  • 59. www.insightdata.ai/apply EMMANUEL AMEISEN Head of AI, ML Engineer @emmanuelameisen emmanuel@insightdata.ai bit.ly/imagefromscratch
  • 60. CV Approaches White-box Algorithms Black-Box Algorithms @Andrey Nikishaev
  • 61. ▰ NLP Classification is generally more shallow ▻ Logistic Regression/Naïve Bayes ▻ Two layer CNN ▰ This is starting to change ▻ The triumph of pre-training and transfer learning CLASSIFICATION
  • 62. Watch the video with slide synchronization on InfoQ.com! https://www.infoq.com/presentations/ semantic-search-engine