SlideShare a Scribd company logo
1 of 22
Download to read offline
How Clarifai uses NATS and
Kubernetes for Machine Learning
Jack Li
10 May 2017
2
www.yourdomain.co
m
Who am I?
01• Senior Infrastructure Engineer @ Clarifai
• Duke/GaTech Grad
• Role:
• Backend systems implementation
• Core infrastructure - machines, cluster, all databases
• DevOps
3
www.yourdomain.co
m
Overview
01• Introduction to Clarifai
• Kubernetes & NATS @ Clarifai
• Experience with NATS and Lessons Learned
• Q&A
Clarifai is a
market leader in
visual recognition
technology
Proven, award-winning technology
Leading computer vision expert
Top VC investors
Matt Zeiler, CEO & Founder
Machine Learning PhD
Neural Network History
01
Geoffrey
Hinton
Yann
LeCun
1980s 2009
30x	Speedup
• more	data
• bigger	models
• faster	iterations
ImageNet Challenge
Convolutional Nets
Extracting Features from Layers
Zeiler et. al. Visualizing and Understanding Convolutional Networks.
ECCV 2014
Easy to use API
Here’s how you train AI with Custom Training
Here’s how you train AI with Custom Training
Oreo cookie 0.500 probabilityOreo cookie 0.588 probabilityOreo cookie 0.645 probabilityOreo cookie 0.766 probabilityOreo cookie 0.897 probabilityOreo cookie 0.912 probabilityOreo cookie 0.941 probabilityOreo cookie 0.956 probabilityOreo cookie 0.995 probability
With a few more examples, the AI gets more accurate.
Microservices at Clarifai
01• Transition from v1 to v2 microservice architecture
• Benefits:
Decouple functionality
Scale individual services
Parallel development and testing
IPC/intranode communication replaced with the cost of extra network latency
and serialization
Kubernetes at Clarifai
01• v1 architecture used a collection of Ansible scripts with AWS AMIs for deployment
• Transition off of virtual machine AMIs to Docker image
• Facilitated by our move to microservices
Many advantages to Kubernetes:
1. Speed up operations – CI/CD
2. Improved automation
3. Simplify management tools (Helm)
4. Improve security
5. Increase productivity
Microservices and Messaging
01• Communication among microservices required a messaging middleware
• Already using GRPCs to define services
• Problems: not easily extensible and requires client-side compilation of protobufs
Requirements:
1. Lightweight
2. Easy to configure and deploy on Kubernetes
3. Message persistence
4. Message queueing
5. Active community
Deciding on the Messaging System
01• Considered many options Amazon SQS, Kafka, RabbitMQ, NATS
• NATS:
1. Lightweight – Docker image size only a few MB, memory footprint in production less than 50 MB
2. Great documentation and easy to configure and deploy to Kubernetes cluster
3. Message persistence with file store (Kubernetes volumes)
4. Message queueing with exact source ordering (and at-least-once-delivery with NATS streaming)
5. Active Slack community
6. High messaging throughput and minimal latency
7. Written in Go
8. Message replay
Architecture @ Clarifai with NATS
01
Use Case of NATS @ Clarifai
01• Job queue for microservice workers
• Both fast and slow subscribers
• Trigger certain actions in services
• Message persistence during rolling continuous deployments
• Controlling the flow of messages to certain services (“pausing” subscriptions)
Monitoring NATS Streaming Subscriptions
01
Results
01• Implemented NATS into our backend in three weeks with one service
• In production for 5 months, currently used by five different services and growing
• 100% uptime with NATS
• 100k+ messages sent through NATS per day
Lessons Learned
01• Problems with NATS mainly stemmed from how we were using it
• Manual acking caused lots of issues and harder to get right, reverted to automatic acking for many
subscriptions
• Hard to monitor if there are many subscriptions (inject metrics and monitor the queue using separate
service)
Feature Requests
• Dead-letter queue
• Flushing/invalidating messages from specific subscriptions
• Clustering
E-mail: jack.li@clarifai.com
Questions?

More Related Content

What's hot

NATS in action - A Real time Microservices Architecture handled by NATS
NATS in action - A Real time Microservices Architecture handled by NATSNATS in action - A Real time Microservices Architecture handled by NATS
NATS in action - A Real time Microservices Architecture handled by NATS
Raül Pérez
 

What's hot (17)

GopherCon 2017 - Writing Networking Clients in Go: The Design & Implementati...
GopherCon 2017 -  Writing Networking Clients in Go: The Design & Implementati...GopherCon 2017 -  Writing Networking Clients in Go: The Design & Implementati...
GopherCon 2017 - Writing Networking Clients in Go: The Design & Implementati...
 
KubeCon NA 2019 Keynote | NATS - Past, Present, and the Future
KubeCon NA 2019 Keynote | NATS - Past, Present, and the FutureKubeCon NA 2019 Keynote | NATS - Past, Present, and the Future
KubeCon NA 2019 Keynote | NATS - Past, Present, and the Future
 
Simple Solutions for Complex Problems - Boulder Meetup
Simple Solutions for Complex Problems - Boulder MeetupSimple Solutions for Complex Problems - Boulder Meetup
Simple Solutions for Complex Problems - Boulder Meetup
 
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
KubeCon + CloudNative Con NA 2021 | A New Generation of NATSKubeCon + CloudNative Con NA 2021 | A New Generation of NATS
KubeCon + CloudNative Con NA 2021 | A New Generation of NATS
 
Easy, Secure, and Fast: Using NATS.io for Streams and Services
Easy, Secure, and Fast: Using NATS.io for Streams and ServicesEasy, Secure, and Fast: Using NATS.io for Streams and Services
Easy, Secure, and Fast: Using NATS.io for Streams and Services
 
Implementing Microservices with NATS
Implementing Microservices with NATSImplementing Microservices with NATS
Implementing Microservices with NATS
 
Deep Dive into Building a Secure & Multi-tenant SaaS Solution with NATS
Deep Dive into Building a Secure & Multi-tenant SaaS Solution with NATSDeep Dive into Building a Secure & Multi-tenant SaaS Solution with NATS
Deep Dive into Building a Secure & Multi-tenant SaaS Solution with NATS
 
NATS Connect Live!
NATS Connect Live!NATS Connect Live!
NATS Connect Live!
 
NATS Connect Live | Serverless on Kubernetes with OpenFaaS & NATS
NATS Connect Live | Serverless on Kubernetes with OpenFaaS & NATSNATS Connect Live | Serverless on Kubernetes with OpenFaaS & NATS
NATS Connect Live | Serverless on Kubernetes with OpenFaaS & NATS
 
OSCON: Building Cloud Native Apps with NATS
OSCON:  Building Cloud Native Apps with NATSOSCON:  Building Cloud Native Apps with NATS
OSCON: Building Cloud Native Apps with NATS
 
MRA AMA Part 7: The Circuit Breaker Pattern
MRA AMA Part 7: The Circuit Breaker PatternMRA AMA Part 7: The Circuit Breaker Pattern
MRA AMA Part 7: The Circuit Breaker Pattern
 
Botconf ppt
Botconf   pptBotconf   ppt
Botconf ppt
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
NATS in action - A Real time Microservices Architecture handled by NATS
NATS in action - A Real time Microservices Architecture handled by NATSNATS in action - A Real time Microservices Architecture handled by NATS
NATS in action - A Real time Microservices Architecture handled by NATS
 
Running a Robust DNS Infrastructure with CloudFlare Virtual DNS
Running a Robust DNS Infrastructure with CloudFlare Virtual DNSRunning a Robust DNS Infrastructure with CloudFlare Virtual DNS
Running a Robust DNS Infrastructure with CloudFlare Virtual DNS
 
Scaling MQTT With Apache Kafka
Scaling MQTT With Apache KafkaScaling MQTT With Apache Kafka
Scaling MQTT With Apache Kafka
 
NGINX, Istio, and the Move to Microservices and Service Mesh
NGINX, Istio, and the Move to Microservices and Service MeshNGINX, Istio, and the Move to Microservices and Service Mesh
NGINX, Istio, and the Move to Microservices and Service Mesh
 

Similar to How Clarifai uses NATS and Kubernetes for Machine Learning

OpenStack Block Storage 101
OpenStack Block Storage 101OpenStack Block Storage 101
OpenStack Block Storage 101
NetApp
 

Similar to How Clarifai uses NATS and Kubernetes for Machine Learning (20)

Simplify Your Way To Expert Kubernetes Management
Simplify Your Way To Expert Kubernetes ManagementSimplify Your Way To Expert Kubernetes Management
Simplify Your Way To Expert Kubernetes Management
 
DCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at NetflixDCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at Netflix
 
170215 msa intro
170215 msa intro170215 msa intro
170215 msa intro
 
Openstack.pptx.pdf
Openstack.pptx.pdfOpenstack.pptx.pdf
Openstack.pptx.pdf
 
Monitoring MySQL at scale
Monitoring MySQL at scaleMonitoring MySQL at scale
Monitoring MySQL at scale
 
Azure meetup cloud native concepts - may 28th 2018
Azure meetup   cloud native concepts - may 28th 2018Azure meetup   cloud native concepts - may 28th 2018
Azure meetup cloud native concepts - may 28th 2018
 
Liberty release: Preliminary marketing materials & messages
Liberty release: Preliminary marketing materials & messagesLiberty release: Preliminary marketing materials & messages
Liberty release: Preliminary marketing materials & messages
 
Do You Need A Service Mesh?
Do You Need A Service Mesh?Do You Need A Service Mesh?
Do You Need A Service Mesh?
 
MicroService Architecture
MicroService ArchitectureMicroService Architecture
MicroService Architecture
 
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...
Room 2 - 6 - Đinh Tuấn Phong - Migrate opensource database to Kubernetes easi...
 
Bol.com Tech lab September 2017 - Microservices in action at the Dutch Nation...
Bol.com Tech lab September 2017 - Microservices in action at the Dutch Nation...Bol.com Tech lab September 2017 - Microservices in action at the Dutch Nation...
Bol.com Tech lab September 2017 - Microservices in action at the Dutch Nation...
 
PyCon India 2012: Celery Talk
PyCon India 2012: Celery TalkPyCon India 2012: Celery Talk
PyCon India 2012: Celery Talk
 
QCon 2015 - Microservices Track Notes
QCon 2015 - Microservices Track Notes QCon 2015 - Microservices Track Notes
QCon 2015 - Microservices Track Notes
 
Highlights of OpenStack Mitaka and the OpenStack Summit
Highlights of OpenStack Mitaka and the OpenStack SummitHighlights of OpenStack Mitaka and the OpenStack Summit
Highlights of OpenStack Mitaka and the OpenStack Summit
 
Do I Need A Service Mesh.pptx
Do I Need A Service Mesh.pptxDo I Need A Service Mesh.pptx
Do I Need A Service Mesh.pptx
 
Building an Enterprise CaaS with Kubernetes and Rancher 2.0
Building an Enterprise CaaS with Kubernetes and Rancher 2.0Building an Enterprise CaaS with Kubernetes and Rancher 2.0
Building an Enterprise CaaS with Kubernetes and Rancher 2.0
 
GeeCon Prague 2023 - Unleash the power of your applications with Micronaut®, ...
GeeCon Prague 2023 - Unleash the power of your applications with Micronaut®, ...GeeCon Prague 2023 - Unleash the power of your applications with Micronaut®, ...
GeeCon Prague 2023 - Unleash the power of your applications with Micronaut®, ...
 
SevillaJUG - Unleash the power of your applications with Micronaut® ,GraalVM...
SevillaJUG - Unleash the power of your applications with Micronaut®  ,GraalVM...SevillaJUG - Unleash the power of your applications with Micronaut®  ,GraalVM...
SevillaJUG - Unleash the power of your applications with Micronaut® ,GraalVM...
 
Bakyaraj_Resume
Bakyaraj_ResumeBakyaraj_Resume
Bakyaraj_Resume
 
OpenStack Block Storage 101
OpenStack Block Storage 101OpenStack Block Storage 101
OpenStack Block Storage 101
 

More from Apcera

More from Apcera (20)

Gopher fest 2017: Adding Context To NATS
Gopher fest 2017: Adding Context To NATSGopher fest 2017: Adding Context To NATS
Gopher fest 2017: Adding Context To NATS
 
Modernizing IT in the Platform Era
Modernizing IT in the Platform EraModernizing IT in the Platform Era
Modernizing IT in the Platform Era
 
Debugging Network Issues
Debugging Network IssuesDebugging Network Issues
Debugging Network Issues
 
IT Modernization Doesn’t Mean You Leave Your Legacy Apps Behind
IT Modernization Doesn’t Mean You Leave Your Legacy Apps BehindIT Modernization Doesn’t Mean You Leave Your Legacy Apps Behind
IT Modernization Doesn’t Mean You Leave Your Legacy Apps Behind
 
How Greta uses NATS to revolutionize data distribution on the Internet
How Greta uses NATS to revolutionize data distribution on the InternetHow Greta uses NATS to revolutionize data distribution on the Internet
How Greta uses NATS to revolutionize data distribution on the Internet
 
Simple and Scalable Microservices: Using NATS with Docker Compose and Swarm
Simple and Scalable Microservices: Using NATS with Docker Compose and SwarmSimple and Scalable Microservices: Using NATS with Docker Compose and Swarm
Simple and Scalable Microservices: Using NATS with Docker Compose and Swarm
 
The Zen of High Performance Messaging with NATS
The Zen of High Performance Messaging with NATSThe Zen of High Performance Messaging with NATS
The Zen of High Performance Messaging with NATS
 
Actor Patterns and NATS - Boulder Meetup
Actor Patterns and NATS - Boulder MeetupActor Patterns and NATS - Boulder Meetup
Actor Patterns and NATS - Boulder Meetup
 
NATS Connector Framework - Boulder Meetup
NATS Connector Framework - Boulder MeetupNATS Connector Framework - Boulder Meetup
NATS Connector Framework - Boulder Meetup
 
Patterns for Asynchronous Microservices with NATS
Patterns for Asynchronous Microservices with NATSPatterns for Asynchronous Microservices with NATS
Patterns for Asynchronous Microservices with NATS
 
NATS vs HTTP
NATS vs HTTPNATS vs HTTP
NATS vs HTTP
 
NATS: A Central Nervous System for IoT Messaging - Larry McQueary
NATS: A Central Nervous System for IoT Messaging - Larry McQuearyNATS: A Central Nervous System for IoT Messaging - Larry McQueary
NATS: A Central Nervous System for IoT Messaging - Larry McQueary
 
Securing the Cloud Native Stack
Securing the Cloud Native StackSecuring the Cloud Native Stack
Securing the Cloud Native Stack
 
How to Migrate to Cloud with Complete Confidence and Trust
How to Migrate to Cloud with Complete Confidence and TrustHow to Migrate to Cloud with Complete Confidence and Trust
How to Migrate to Cloud with Complete Confidence and Trust
 
KURMA - A Containerized Container Platform - KubeCon 2016
KURMA - A Containerized Container Platform - KubeCon 2016KURMA - A Containerized Container Platform - KubeCon 2016
KURMA - A Containerized Container Platform - KubeCon 2016
 
Integration Patterns and Anti-Patterns for Microservices Architectures
Integration Patterns and Anti-Patterns for Microservices ArchitecturesIntegration Patterns and Anti-Patterns for Microservices Architectures
Integration Patterns and Anti-Patterns for Microservices Architectures
 
Kubernetes, The Day After
Kubernetes, The Day AfterKubernetes, The Day After
Kubernetes, The Day After
 
Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World
Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud WorldPolicy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World
Policy-based Cloud Storage: Persisting Data in a Multi-Site, Multi-Cloud World
 
Integration Patterns for Microservices Architectures
Integration Patterns for Microservices ArchitecturesIntegration Patterns for Microservices Architectures
Integration Patterns for Microservices Architectures
 
Nats meetup sf 20150826
Nats meetup sf   20150826Nats meetup sf   20150826
Nats meetup sf 20150826
 

Recently uploaded

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 

Recently uploaded (20)

AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 

How Clarifai uses NATS and Kubernetes for Machine Learning

  • 1. How Clarifai uses NATS and Kubernetes for Machine Learning Jack Li 10 May 2017
  • 2. 2 www.yourdomain.co m Who am I? 01• Senior Infrastructure Engineer @ Clarifai • Duke/GaTech Grad • Role: • Backend systems implementation • Core infrastructure - machines, cluster, all databases • DevOps
  • 3. 3 www.yourdomain.co m Overview 01• Introduction to Clarifai • Kubernetes & NATS @ Clarifai • Experience with NATS and Lessons Learned • Q&A
  • 4. Clarifai is a market leader in visual recognition technology Proven, award-winning technology Leading computer vision expert Top VC investors Matt Zeiler, CEO & Founder Machine Learning PhD
  • 5. Neural Network History 01 Geoffrey Hinton Yann LeCun 1980s 2009 30x Speedup • more data • bigger models • faster iterations
  • 8. Extracting Features from Layers Zeiler et. al. Visualizing and Understanding Convolutional Networks. ECCV 2014
  • 10. Here’s how you train AI with Custom Training
  • 11. Here’s how you train AI with Custom Training
  • 12. Oreo cookie 0.500 probabilityOreo cookie 0.588 probabilityOreo cookie 0.645 probabilityOreo cookie 0.766 probabilityOreo cookie 0.897 probabilityOreo cookie 0.912 probabilityOreo cookie 0.941 probabilityOreo cookie 0.956 probabilityOreo cookie 0.995 probability With a few more examples, the AI gets more accurate.
  • 13. Microservices at Clarifai 01• Transition from v1 to v2 microservice architecture • Benefits: Decouple functionality Scale individual services Parallel development and testing IPC/intranode communication replaced with the cost of extra network latency and serialization
  • 14. Kubernetes at Clarifai 01• v1 architecture used a collection of Ansible scripts with AWS AMIs for deployment • Transition off of virtual machine AMIs to Docker image • Facilitated by our move to microservices Many advantages to Kubernetes: 1. Speed up operations – CI/CD 2. Improved automation 3. Simplify management tools (Helm) 4. Improve security 5. Increase productivity
  • 15. Microservices and Messaging 01• Communication among microservices required a messaging middleware • Already using GRPCs to define services • Problems: not easily extensible and requires client-side compilation of protobufs Requirements: 1. Lightweight 2. Easy to configure and deploy on Kubernetes 3. Message persistence 4. Message queueing 5. Active community
  • 16. Deciding on the Messaging System 01• Considered many options Amazon SQS, Kafka, RabbitMQ, NATS • NATS: 1. Lightweight – Docker image size only a few MB, memory footprint in production less than 50 MB 2. Great documentation and easy to configure and deploy to Kubernetes cluster 3. Message persistence with file store (Kubernetes volumes) 4. Message queueing with exact source ordering (and at-least-once-delivery with NATS streaming) 5. Active Slack community 6. High messaging throughput and minimal latency 7. Written in Go 8. Message replay
  • 17. Architecture @ Clarifai with NATS 01
  • 18. Use Case of NATS @ Clarifai 01• Job queue for microservice workers • Both fast and slow subscribers • Trigger certain actions in services • Message persistence during rolling continuous deployments • Controlling the flow of messages to certain services (“pausing” subscriptions)
  • 19. Monitoring NATS Streaming Subscriptions 01
  • 20. Results 01• Implemented NATS into our backend in three weeks with one service • In production for 5 months, currently used by five different services and growing • 100% uptime with NATS • 100k+ messages sent through NATS per day
  • 21. Lessons Learned 01• Problems with NATS mainly stemmed from how we were using it • Manual acking caused lots of issues and harder to get right, reverted to automatic acking for many subscriptions • Hard to monitor if there are many subscriptions (inject metrics and monitor the queue using separate service) Feature Requests • Dead-letter queue • Flushing/invalidating messages from specific subscriptions • Clustering