SlideShare a Scribd company logo
1 of 35
OS for AI
Jon Peck
Making state-of-the-art algorithms
discoverable and accessible to everyone
Full-Spectrum Developer & Advocate
jpeck@algorithmia.com
@peckjon
bit.ly/nordic-ai
2
The Problem: ML is in a huge growth phase,
difficult/expensive for DevOps to keep up
Initially:
● A few models, a couple frameworks, 1-2 languages
● Dedicated hardware or VM Hosting
● IT Team for DevOps
● High time-to-deploy, manual discoverability
● Few end-users, heterogenous APIs (if any)
Pretty soon...
● > 5,000 algorithms (50k versions) on many runtimes / frameworks
● > 60k algorithm developers: heterogenous, largely unpredictable
● Each algorithm: 1 to 1,000 calls/second, a lot of variance
● Need auto-deploy, discoverability, low (15ms) latency
● Common API, composability, fine-grained security
3
The Need: an “Operating System for AI”
AI/ML scalable infrastructure on demand + marketplace
● Function-as-a-service for Machine & Deep Learning
● Discoverable, live inventory of AI via APIs
● Anyone can contribute & use
● Composable, Monetizable
● Every developer on earth can make their app intelligent
An Operating System for AI
What did the evolution of OS look like?
iOS/Android
Built-in App Store
(Discoverability)
Punch Cards
1970s
Unix
Multi-tenancy, Composability
DOS
Hardware Abstraction
GUI (Win/Mac)
Accessibility
4
General-purpose computing had a long evolution, as we learned what the common
problems were / what abstractions to build. AI is in the earlier stages of that evolution.
An Operating System:
• Provides common functionality needed by many programs
• Standardizes conventions to make systems easier to work with
• Presents a higher level abstraction of the underlying hardware
Use Case
Jian Yang made an app to recognize food “SeeFood”
© HBO All Rights Reserved 5
Use Case
He deployed his trained model to a GPU-enabled server
GPU-enabled
Server
?
6
Use Case
The app is a hit!
SeeFood
Productivity
7
?
?
Use Case
… and now his server is overloaded.
GPU-enabled
Server
?
xN
8
• Two distinct phases: training and inference
• Lots of processing power
• Heterogenous hardware (CPU, GPU, FPGA, TPU, etc.)
• Limited by compute rather than bandwidth
• “Tensorflow is open source, scaling it is not.”
Characteristics of AI
9
10
TRAINING
Long compute cycle
Fixed load (Inelastic)
Stateful
OWNER: Data Scientists
Single user
TRAINING
11
Long compute cycle
Fixed load (Inelastic)
Stateful
OWNER: Data Scientists
Single user
Analogous to dev tool chain.
Building and iterating over a
model is similar to building an
app.
Metal or VM
12
INFERENCE
Short compute bursts
OWNER: DevOps
TRAINING
Long compute cycle
Fixed load (Inelastic)
Stateful
OWNER: Data Scientists
Multiple usersSingle user
Stateless
Elastic
Analogous to dev tool chain.
Building and iterating over a
model is similar to building an
app.
Metal or VM
13
INFERENCE
Short compute bursts
OWNER: DevOps
TRAINING
Long compute cycle
Fixed load (Inelastic)
Stateful
OWNER: Data Scientists
Multiple usersSingle user
Stateless
Elastic
Analogous to an OS.
Running concurrent models
requires task scheduling.
Analogous to dev tool chain.
Building and iterating over a
model is similar to building an
app.
Metal or VM
14
INFERENCE
Short compute bursts
OWNER: DevOps
TRAINING
Long compute cycle
Fixed load (Inelastic)
Stateful
OWNER: Data Scientists
Multiple usersSingle user
Stateless
Elastic
Containers
Analogous to an OS.
Running concurrent models
requires task scheduling.
Analogous to dev tool chain.
Building and iterating over a
model is similar to building an
app.
Metal or VM
15
INFERENCE
Short compute bursts
OWNER: DevOps
TRAINING
Long compute cycle
Fixed load (Inelastic)
Stateful
OWNER: Data Scientists
Multiple usersSingle user
Stateless
Elastic
Containers Kubernetes
Analogous to an OS.
Running concurrent models
requires task scheduling.
Analogous to dev tool chain.
Building and iterating over a
model is similar to building an
app.
Metal or VM
16
INFERENCE
Short compute bursts
Stateless
Elastic
Multiple users
Containers Kubernetes
OWNER: DevOps
TRAINING
Long compute cycle
Fixed load (Inelastic)
Stateful
Single user
OWNER: Data Scientists
Analogous to an OS.
Running concurrent models
requires task scheduling.
Analogous to dev tool chain.
Building and iterating over a
model is similar to building an
app.
Metal or VM
MICROSERVICES: the design of a system as
independently deployable, loosely coupled
services.
Microservices & Serverless Computing => ML Hosting
ADVANTAGES
• Maintainable, Scalable
• Software & Hardware Agnostic
• Rolling deployments
SERVERLESS: the encapsulation, starting, and
stopping of singular functions per request, with a
just-in-time-compute model.
ADVANTAGES
• Elasticity, Cost Efficiency
• Concurrency
• Improved Latency
+ +
17
Why Serverless - Cost EfficiencyCallsperSecond
Max calls/s
Avg calls/s
40
35
30
25
20
15
10
5
GPUServerInstances
12
AM
02
AM
04
AM
06
AM
08
AM
10
AM
12
PM
02
PM
04
PM
06
PM
08
PM
10
PM
160
140
120
100
80
60
40
20
Jian Yang’s “SeeFood” is most active during lunchtime.
18
Traditional Architecture - Design for Maximum
CallsperSecond
Max calls/s
Avg calls/s
40
35
30
25
20
15
10
5
12
AM
02
AM
04
AM
06
AM
08
AM
10
AM
12
PM
02
PM
04
PM
06
PM
08
PM
10
PM
40 machines 24 hours. $648 * 40 = $25,920 per month
GPUServerInstances
160
140
120
100
80
60
40
20
19
Autoscale Architecture - Design for Local Maximum
CallsperSecond
Max calls/s
Avg calls/s
40
35
30
25
20
15
10
5
12
AM
02
AM
04
AM
06
AM
08
AM
10
AM
12
PM
02
PM
04
PM
06
PM
08
PM
10
PM
19 machines 24 hours. $648 * 40 = $12,312 per month
GPUServerInstances
160
140
120
100
80
60
40
20
20
Serverless Architecture - Design for Minimum
CallsperSecond
Max calls/s
Avg calls/s
40
35
30
25
20
15
10
5
12
AM
02
AM
04
AM
06
AM
08
AM
10
AM
12
PM
02
PM
04
PM
06
PM
08
PM
10
PM
Avg. of 21 calls / sec, or equivalent of 6 machines. $648 * 6 = $3,888 per month
160
140
120
100
80
60
40
20
GPUServerInstances
21
?
?
Why Serverless - Concurrency
GPU-enabled
Servers
?
LoadBalancer
22
Why Serverless - Improved Latency
Portability = Low Latency
23
24
+ +
Almost there! We also need:
GPU Memory Management, Job Scheduling, Cloud Abstraction,
Discoverability, Authentication, Logging, etc.
25
Elastic Scale
User
Web Load Balancer
API Load Balancer
Web Servers
API Servers
Cloud Region #1
Worker xN
Docker(algorithm#1)
..
Docker(algorithm#n)
Cloud Region #2
Worker xN
Docker(algorithm#1)
..
Docker(algorithm#n)
26
Elastic Scaling with
Intelligent Orchestration
Knowing that:
● Algorithm A always calls Algorithm B
● Algorithm A consumes X CPU, X Memory, etc
● Algorithm B consumes X CPU, X Memory, etc
Therefore we can slot them in a way that:
● Reduce network latency
● Increase cluster utilization
● Build dependency graphs
FoodClassifier
FruitClassifier VeggieClassifier
Runtime Abstraction
27
Composability
Composability is critical for AI workflows because of data
processing pipelines and ensembles.
Fruit or Veggie
Classifier
Fruit
Classifier
Veggie
Classifiercat file.csv | grep foo | wc -l
28
Cloud Abstraction - Storage
# No storage abstraction
s3 = boto3.client("s3")
obj = s3.get_object(Bucket="bucket-name", Key="records.csv")
data = obj["Body"].read()
# With storage abstraction
data = client.file("blob://records.csv").get()
s3://foo/bar
blob://foo/bar
hdfs://foo/bar
dropbox://foo/bar
etc.
29
Compute EC2 CE VM Nova
Autoscaling Autoscaling Group Autoscaler Scale Set Heat Scaling Policy
Load Balancing
Elastic Load
Balancer
Load Balancer Load Balancer LBaaS
Remote Storage Elastic Block Store Persistent Disk File Storage Block Storage
Partial Source: Sam Ghods, KubeConf 2016
Cloud Abstraction
30
Runtime Abstraction
Support any
programming language
or framework, including
interoperability between
mixed stacks.
Elastic Scale
Prioritize and
automatically optimize
execution of concurrent
short-lived jobs.
Cloud Abstraction
Provide portability to
algorithms, including
public clouds or private
clouds.
Discoverability, Authentication, Instrumentation, etc.
Shell & Services
Kernel
An Operating System for AI: the “AI Layer”
31
Discoverability: an App Store for AI
32
Algorithmia’s OS for AI: discover a model
1. Discover a model
● AppStore-like interface
● Categorized, tagged, rated
● Well-described
(purpose, source, API)
33
Algorithmia’s OS for AI: execute a model
2. Execute from any language
● Raw JSON, or lang stubs
● Common syntax
● Autoscaled elastic cloud-exec
● Secure, isolated
● Concurrent, orchestrated
● 15ms overhead
● Hardware agnostic
34
Algorithmia’s OS for AI: add a model
3. Add new models
● Many languages, frameworks
● Instant JSON API
● Call other models seamlessly
(regardless of lang)
● Granular permissions
● GPU environments
● Namespaces & versioning
Jon Peck Developer Advocate
Thank you!
FREE STUFF
$50 free at Algorithmia.com
signup code: NORDIC18
jpeck@algorithmia.com
@peckjon
bit.ly/nordic-ai WE ARE HIRING
algorithmia.com/jobs
● Seattle or Remote
● Bright, collaborative env
● Unlimited PTO
● Dog-friendly

More Related Content

What's hot

Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Erik Onnen
 

What's hot (20)

Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)Kubeflow Pipelines (with Tekton)
Kubeflow Pipelines (with Tekton)
 
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflowsCloud nativecomputingtechnologysupportinghpc cognitiveworkflows
Cloud nativecomputingtechnologysupportinghpc cognitiveworkflows
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
Apache Kafka, Tiered Storage and TensorFlow for Streaming Machine Learning wi...
 
K8s vs Cloud Foundry
K8s vs Cloud FoundryK8s vs Cloud Foundry
K8s vs Cloud Foundry
 
Leveraging Microservices and Apache Kafka to Scale Developer Productivity
Leveraging Microservices and Apache Kafka to Scale Developer ProductivityLeveraging Microservices and Apache Kafka to Scale Developer Productivity
Leveraging Microservices and Apache Kafka to Scale Developer Productivity
 
Confluent Developer Training
Confluent Developer TrainingConfluent Developer Training
Confluent Developer Training
 
KFServing - Serverless Model Inferencing
KFServing - Serverless Model InferencingKFServing - Serverless Model Inferencing
KFServing - Serverless Model Inferencing
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
 
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
Data Summer Conf 2018, “Building unified Batch and Stream processing pipeline...
 
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
Real-Time Analytics Visualized w/ Kafka + Streamliner + MemSQL + ZoomData, An...
 
Real time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and CouchbaseReal time Messages at Scale with Apache Kafka and Couchbase
Real time Messages at Scale with Apache Kafka and Couchbase
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Containerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with KubernetesContainerised ASP.NET Core apps with Kubernetes
Containerised ASP.NET Core apps with Kubernetes
 
12.07.2017 Docker Meetup - POSTGRE SQL ON KUBERNETES
12.07.2017 Docker Meetup - POSTGRE SQL ON KUBERNETES12.07.2017 Docker Meetup - POSTGRE SQL ON KUBERNETES
12.07.2017 Docker Meetup - POSTGRE SQL ON KUBERNETES
 
Scalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and MesosScalable On-Demand Hadoop Clusters with Docker and Mesos
Scalable On-Demand Hadoop Clusters with Docker and Mesos
 
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
Tales from the four-comma club: Managing Kafka as a service at Salesforce | L...
 
Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014 Cloudify workshop at CCCEU 2014
Cloudify workshop at CCCEU 2014
 
IBM COE AI Lab at your University
IBM COE AI Lab at your University IBM COE AI Lab at your University
IBM COE AI Lab at your University
 
How Microsoft Built and Scaled Cosmos
How Microsoft Built and Scaled CosmosHow Microsoft Built and Scaled Cosmos
How Microsoft Built and Scaled Cosmos
 

Similar to OS for AI: Elastic Microservices & the Next Gen of ML

Similar to OS for AI: Elastic Microservices & the Next Gen of ML (20)

Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Red Hat Storage Roadmap
Red Hat Storage RoadmapRed Hat Storage Roadmap
Red Hat Storage Roadmap
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019Productionizing Machine Learning - Bigdata meetup 5-06-2019
Productionizing Machine Learning - Bigdata meetup 5-06-2019
 
High Performance Computing Pitch Deck
High Performance Computing Pitch DeckHigh Performance Computing Pitch Deck
High Performance Computing Pitch Deck
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model ServingKubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
Kubecon 2023 EU - KServe - The State and Future of Cloud-Native Model Serving
 
NextGenML
NextGenML NextGenML
NextGenML
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
Lessons Learned from Deploying Apache Spark as a Service on IBM Power Systems...
 
Apache Drill (ver. 0.2)
Apache Drill (ver. 0.2)Apache Drill (ver. 0.2)
Apache Drill (ver. 0.2)
 
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
How the Automation of a Benchmark Famework Keeps Pace with the Dev Cycle at I...
 
Containerizing couchbase with microservice architecture on mesosphere.pptx
Containerizing couchbase with microservice architecture on mesosphere.pptxContainerizing couchbase with microservice architecture on mesosphere.pptx
Containerizing couchbase with microservice architecture on mesosphere.pptx
 
Puppet devops wdec
Puppet devops wdecPuppet devops wdec
Puppet devops wdec
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
Sanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticiansSanger, upcoming Openstack for Bio-informaticians
Sanger, upcoming Openstack for Bio-informaticians
 
Flexible compute
Flexible computeFlexible compute
Flexible compute
 

More from Nordic APIs

The Federated Future: Pioneering Next-Gen Solutions in API Management - Marku...
The Federated Future: Pioneering Next-Gen Solutions in API Management - Marku...The Federated Future: Pioneering Next-Gen Solutions in API Management - Marku...
The Federated Future: Pioneering Next-Gen Solutions in API Management - Marku...
Nordic APIs
 
Inclusive, Accessible Tech: Bias-Free Language in Code and Configurations - A...
Inclusive, Accessible Tech: Bias-Free Language in Code and Configurations - A...Inclusive, Accessible Tech: Bias-Free Language in Code and Configurations - A...
Inclusive, Accessible Tech: Bias-Free Language in Code and Configurations - A...
Nordic APIs
 

More from Nordic APIs (20)

How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...
How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...
How I Built Bill, the AI-Powered Chatbot That Reads Our Docs for Fun , by Tod...
 
The Art of API Design, by David Biesack at Apiture
The Art of API Design, by David Biesack at ApitureThe Art of API Design, by David Biesack at Apiture
The Art of API Design, by David Biesack at Apiture
 
ABAC, ReBAC, Zanzibar, ALFA… How Should I Implement AuthZ in My APIs? by Dav...
ABAC, ReBAC, Zanzibar, ALFA…  How Should I Implement AuthZ in My APIs? by Dav...ABAC, ReBAC, Zanzibar, ALFA…  How Should I Implement AuthZ in My APIs? by Dav...
ABAC, ReBAC, Zanzibar, ALFA… How Should I Implement AuthZ in My APIs? by Dav...
 
Crafting a Cloud Native API Platform to Accelerate Your Platform Maturity - B...
Crafting a Cloud Native API Platform to Accelerate Your Platform Maturity - B...Crafting a Cloud Native API Platform to Accelerate Your Platform Maturity - B...
Crafting a Cloud Native API Platform to Accelerate Your Platform Maturity - B...
 
The Federated Future: Pioneering Next-Gen Solutions in API Management - Marku...
The Federated Future: Pioneering Next-Gen Solutions in API Management - Marku...The Federated Future: Pioneering Next-Gen Solutions in API Management - Marku...
The Federated Future: Pioneering Next-Gen Solutions in API Management - Marku...
 
API Authorization Using an Identity Server and Gateway - Aldo Pietropaolo, SGNL
API Authorization Using an Identity Server and Gateway - Aldo Pietropaolo, SGNLAPI Authorization Using an Identity Server and Gateway - Aldo Pietropaolo, SGNL
API Authorization Using an Identity Server and Gateway - Aldo Pietropaolo, SGNL
 
API Discovery from Crawl to Run - Rob Dickinson, Graylog
API Discovery from Crawl to Run - Rob Dickinson, GraylogAPI Discovery from Crawl to Run - Rob Dickinson, Graylog
API Discovery from Crawl to Run - Rob Dickinson, Graylog
 
Productizing and Monetizing APIs - Derric Gilling, Moseif
Productizing and Monetizing APIs - Derric Gilling, MoseifProductizing and Monetizing APIs - Derric Gilling, Moseif
Productizing and Monetizing APIs - Derric Gilling, Moseif
 
Securely Boosting Any Product with Generative AI APIs - Ruben Sitbon, Sipios
Securely Boosting Any Product with Generative AI APIs - Ruben Sitbon, SipiosSecurely Boosting Any Product with Generative AI APIs - Ruben Sitbon, Sipios
Securely Boosting Any Product with Generative AI APIs - Ruben Sitbon, Sipios
 
Security of LLM APIs by Ankita Gupta, Akto.io
Security of LLM APIs by Ankita Gupta, Akto.ioSecurity of LLM APIs by Ankita Gupta, Akto.io
Security of LLM APIs by Ankita Gupta, Akto.io
 
I'm an API Hacker, Here's How to Go from Making APIs to Breaking Them - Katie...
I'm an API Hacker, Here's How to Go from Making APIs to Breaking Them - Katie...I'm an API Hacker, Here's How to Go from Making APIs to Breaking Them - Katie...
I'm an API Hacker, Here's How to Go from Making APIs to Breaking Them - Katie...
 
Unleashing the Potential of GraphQL with Streaming Data - Kishore Banala, Net...
Unleashing the Potential of GraphQL with Streaming Data - Kishore Banala, Net...Unleashing the Potential of GraphQL with Streaming Data - Kishore Banala, Net...
Unleashing the Potential of GraphQL with Streaming Data - Kishore Banala, Net...
 
Reigniting the API Description Wars with TypeSpec and the Next Generation of ...
Reigniting the API Description Wars with TypeSpec and the Next Generation of...Reigniting the API Description Wars with TypeSpec and the Next Generation of...
Reigniting the API Description Wars with TypeSpec and the Next Generation of ...
 
Establish, Grow, and Mature Your API Platform - James Higginbotham, LaunchAny
Establish, Grow, and Mature Your API Platform - James Higginbotham, LaunchAnyEstablish, Grow, and Mature Your API Platform - James Higginbotham, LaunchAny
Establish, Grow, and Mature Your API Platform - James Higginbotham, LaunchAny
 
Inclusive, Accessible Tech: Bias-Free Language in Code and Configurations - A...
Inclusive, Accessible Tech: Bias-Free Language in Code and Configurations - A...Inclusive, Accessible Tech: Bias-Free Language in Code and Configurations - A...
Inclusive, Accessible Tech: Bias-Free Language in Code and Configurations - A...
 
Going Platinum: How to Make a Hit API by Bill Doerrfeld, Nordic APIs
Going Platinum: How to Make a Hit API by Bill Doerrfeld, Nordic APIsGoing Platinum: How to Make a Hit API by Bill Doerrfeld, Nordic APIs
Going Platinum: How to Make a Hit API by Bill Doerrfeld, Nordic APIs
 
Getting Better at Risk Management Using Event Driven Mesh Architecture - Ragh...
Getting Better at Risk Management Using Event Driven Mesh Architecture - Ragh...Getting Better at Risk Management Using Event Driven Mesh Architecture - Ragh...
Getting Better at Risk Management Using Event Driven Mesh Architecture - Ragh...
 
GenAI: Producing and Consuming APIs by Paul Dumas, Gartner
GenAI: Producing and Consuming APIs by Paul Dumas, GartnerGenAI: Producing and Consuming APIs by Paul Dumas, Gartner
GenAI: Producing and Consuming APIs by Paul Dumas, Gartner
 
The SAS developer portal – developer.sas.com 2.0: How we built it by Joe Furb...
The SAS developer portal –developer.sas.com 2.0: How we built it by Joe Furb...The SAS developer portal –developer.sas.com 2.0: How we built it by Joe Furb...
The SAS developer portal – developer.sas.com 2.0: How we built it by Joe Furb...
 
How Netflix Uses Data Abstraction to Operate Services at Scale - Vidhya Arvin...
How Netflix Uses Data Abstraction to Operate Services at Scale - Vidhya Arvin...How Netflix Uses Data Abstraction to Operate Services at Scale - Vidhya Arvin...
How Netflix Uses Data Abstraction to Operate Services at Scale - Vidhya Arvin...
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Recently uploaded (20)

%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Vancouver Psychic Readings, Attraction spells,Br...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Generic or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 

OS for AI: Elastic Microservices & the Next Gen of ML

  • 1. OS for AI Jon Peck Making state-of-the-art algorithms discoverable and accessible to everyone Full-Spectrum Developer & Advocate jpeck@algorithmia.com @peckjon bit.ly/nordic-ai
  • 2. 2 The Problem: ML is in a huge growth phase, difficult/expensive for DevOps to keep up Initially: ● A few models, a couple frameworks, 1-2 languages ● Dedicated hardware or VM Hosting ● IT Team for DevOps ● High time-to-deploy, manual discoverability ● Few end-users, heterogenous APIs (if any) Pretty soon... ● > 5,000 algorithms (50k versions) on many runtimes / frameworks ● > 60k algorithm developers: heterogenous, largely unpredictable ● Each algorithm: 1 to 1,000 calls/second, a lot of variance ● Need auto-deploy, discoverability, low (15ms) latency ● Common API, composability, fine-grained security
  • 3. 3 The Need: an “Operating System for AI” AI/ML scalable infrastructure on demand + marketplace ● Function-as-a-service for Machine & Deep Learning ● Discoverable, live inventory of AI via APIs ● Anyone can contribute & use ● Composable, Monetizable ● Every developer on earth can make their app intelligent
  • 4. An Operating System for AI What did the evolution of OS look like? iOS/Android Built-in App Store (Discoverability) Punch Cards 1970s Unix Multi-tenancy, Composability DOS Hardware Abstraction GUI (Win/Mac) Accessibility 4 General-purpose computing had a long evolution, as we learned what the common problems were / what abstractions to build. AI is in the earlier stages of that evolution. An Operating System: • Provides common functionality needed by many programs • Standardizes conventions to make systems easier to work with • Presents a higher level abstraction of the underlying hardware
  • 5. Use Case Jian Yang made an app to recognize food “SeeFood” © HBO All Rights Reserved 5
  • 6. Use Case He deployed his trained model to a GPU-enabled server GPU-enabled Server ? 6
  • 7. Use Case The app is a hit! SeeFood Productivity 7
  • 8. ? ? Use Case … and now his server is overloaded. GPU-enabled Server ? xN 8
  • 9. • Two distinct phases: training and inference • Lots of processing power • Heterogenous hardware (CPU, GPU, FPGA, TPU, etc.) • Limited by compute rather than bandwidth • “Tensorflow is open source, scaling it is not.” Characteristics of AI 9
  • 10. 10 TRAINING Long compute cycle Fixed load (Inelastic) Stateful OWNER: Data Scientists Single user
  • 11. TRAINING 11 Long compute cycle Fixed load (Inelastic) Stateful OWNER: Data Scientists Single user Analogous to dev tool chain. Building and iterating over a model is similar to building an app. Metal or VM
  • 12. 12 INFERENCE Short compute bursts OWNER: DevOps TRAINING Long compute cycle Fixed load (Inelastic) Stateful OWNER: Data Scientists Multiple usersSingle user Stateless Elastic Analogous to dev tool chain. Building and iterating over a model is similar to building an app. Metal or VM
  • 13. 13 INFERENCE Short compute bursts OWNER: DevOps TRAINING Long compute cycle Fixed load (Inelastic) Stateful OWNER: Data Scientists Multiple usersSingle user Stateless Elastic Analogous to an OS. Running concurrent models requires task scheduling. Analogous to dev tool chain. Building and iterating over a model is similar to building an app. Metal or VM
  • 14. 14 INFERENCE Short compute bursts OWNER: DevOps TRAINING Long compute cycle Fixed load (Inelastic) Stateful OWNER: Data Scientists Multiple usersSingle user Stateless Elastic Containers Analogous to an OS. Running concurrent models requires task scheduling. Analogous to dev tool chain. Building and iterating over a model is similar to building an app. Metal or VM
  • 15. 15 INFERENCE Short compute bursts OWNER: DevOps TRAINING Long compute cycle Fixed load (Inelastic) Stateful OWNER: Data Scientists Multiple usersSingle user Stateless Elastic Containers Kubernetes Analogous to an OS. Running concurrent models requires task scheduling. Analogous to dev tool chain. Building and iterating over a model is similar to building an app. Metal or VM
  • 16. 16 INFERENCE Short compute bursts Stateless Elastic Multiple users Containers Kubernetes OWNER: DevOps TRAINING Long compute cycle Fixed load (Inelastic) Stateful Single user OWNER: Data Scientists Analogous to an OS. Running concurrent models requires task scheduling. Analogous to dev tool chain. Building and iterating over a model is similar to building an app. Metal or VM
  • 17. MICROSERVICES: the design of a system as independently deployable, loosely coupled services. Microservices & Serverless Computing => ML Hosting ADVANTAGES • Maintainable, Scalable • Software & Hardware Agnostic • Rolling deployments SERVERLESS: the encapsulation, starting, and stopping of singular functions per request, with a just-in-time-compute model. ADVANTAGES • Elasticity, Cost Efficiency • Concurrency • Improved Latency + + 17
  • 18. Why Serverless - Cost EfficiencyCallsperSecond Max calls/s Avg calls/s 40 35 30 25 20 15 10 5 GPUServerInstances 12 AM 02 AM 04 AM 06 AM 08 AM 10 AM 12 PM 02 PM 04 PM 06 PM 08 PM 10 PM 160 140 120 100 80 60 40 20 Jian Yang’s “SeeFood” is most active during lunchtime. 18
  • 19. Traditional Architecture - Design for Maximum CallsperSecond Max calls/s Avg calls/s 40 35 30 25 20 15 10 5 12 AM 02 AM 04 AM 06 AM 08 AM 10 AM 12 PM 02 PM 04 PM 06 PM 08 PM 10 PM 40 machines 24 hours. $648 * 40 = $25,920 per month GPUServerInstances 160 140 120 100 80 60 40 20 19
  • 20. Autoscale Architecture - Design for Local Maximum CallsperSecond Max calls/s Avg calls/s 40 35 30 25 20 15 10 5 12 AM 02 AM 04 AM 06 AM 08 AM 10 AM 12 PM 02 PM 04 PM 06 PM 08 PM 10 PM 19 machines 24 hours. $648 * 40 = $12,312 per month GPUServerInstances 160 140 120 100 80 60 40 20 20
  • 21. Serverless Architecture - Design for Minimum CallsperSecond Max calls/s Avg calls/s 40 35 30 25 20 15 10 5 12 AM 02 AM 04 AM 06 AM 08 AM 10 AM 12 PM 02 PM 04 PM 06 PM 08 PM 10 PM Avg. of 21 calls / sec, or equivalent of 6 machines. $648 * 6 = $3,888 per month 160 140 120 100 80 60 40 20 GPUServerInstances 21
  • 22. ? ? Why Serverless - Concurrency GPU-enabled Servers ? LoadBalancer 22
  • 23. Why Serverless - Improved Latency Portability = Low Latency 23
  • 24. 24 + + Almost there! We also need: GPU Memory Management, Job Scheduling, Cloud Abstraction, Discoverability, Authentication, Logging, etc.
  • 25. 25 Elastic Scale User Web Load Balancer API Load Balancer Web Servers API Servers Cloud Region #1 Worker xN Docker(algorithm#1) .. Docker(algorithm#n) Cloud Region #2 Worker xN Docker(algorithm#1) .. Docker(algorithm#n)
  • 26. 26 Elastic Scaling with Intelligent Orchestration Knowing that: ● Algorithm A always calls Algorithm B ● Algorithm A consumes X CPU, X Memory, etc ● Algorithm B consumes X CPU, X Memory, etc Therefore we can slot them in a way that: ● Reduce network latency ● Increase cluster utilization ● Build dependency graphs FoodClassifier FruitClassifier VeggieClassifier Runtime Abstraction
  • 27. 27 Composability Composability is critical for AI workflows because of data processing pipelines and ensembles. Fruit or Veggie Classifier Fruit Classifier Veggie Classifiercat file.csv | grep foo | wc -l
  • 28. 28 Cloud Abstraction - Storage # No storage abstraction s3 = boto3.client("s3") obj = s3.get_object(Bucket="bucket-name", Key="records.csv") data = obj["Body"].read() # With storage abstraction data = client.file("blob://records.csv").get() s3://foo/bar blob://foo/bar hdfs://foo/bar dropbox://foo/bar etc.
  • 29. 29 Compute EC2 CE VM Nova Autoscaling Autoscaling Group Autoscaler Scale Set Heat Scaling Policy Load Balancing Elastic Load Balancer Load Balancer Load Balancer LBaaS Remote Storage Elastic Block Store Persistent Disk File Storage Block Storage Partial Source: Sam Ghods, KubeConf 2016 Cloud Abstraction
  • 30. 30 Runtime Abstraction Support any programming language or framework, including interoperability between mixed stacks. Elastic Scale Prioritize and automatically optimize execution of concurrent short-lived jobs. Cloud Abstraction Provide portability to algorithms, including public clouds or private clouds. Discoverability, Authentication, Instrumentation, etc. Shell & Services Kernel An Operating System for AI: the “AI Layer”
  • 32. 32 Algorithmia’s OS for AI: discover a model 1. Discover a model ● AppStore-like interface ● Categorized, tagged, rated ● Well-described (purpose, source, API)
  • 33. 33 Algorithmia’s OS for AI: execute a model 2. Execute from any language ● Raw JSON, or lang stubs ● Common syntax ● Autoscaled elastic cloud-exec ● Secure, isolated ● Concurrent, orchestrated ● 15ms overhead ● Hardware agnostic
  • 34. 34 Algorithmia’s OS for AI: add a model 3. Add new models ● Many languages, frameworks ● Instant JSON API ● Call other models seamlessly (regardless of lang) ● Granular permissions ● GPU environments ● Namespaces & versioning
  • 35. Jon Peck Developer Advocate Thank you! FREE STUFF $50 free at Algorithmia.com signup code: NORDIC18 jpeck@algorithmia.com @peckjon bit.ly/nordic-ai WE ARE HIRING algorithmia.com/jobs ● Seattle or Remote ● Bright, collaborative env ● Unlimited PTO ● Dog-friendly