SlideShare a Scribd company logo
1 of 27
Download to read offline
Emad Elwany - CTO, Lexion
Evolution of ML Infrastructure at an AI-First Startup
Rsqrd AI Meetup - May 2020
Agenda
● Lexion Overview
● Document Understanding Pipeline
● Evolution of ML Infrastructure at Lexion
● Deep Dive - Model Versioning
Lexion: Applying NLP to legal agreements
Creating this simple report could take weeks without automation.
It’s a complex NLP problem
● Messy PDFs make OCR non-trivial
● Long, multi-agreement documents
● Domain specific language
● Complex schemas/ontologies
● Mix of non/semi/fully structured data
Sample: Identify Contract Term
Contract term is AUTO RENEW if, e.g.:
“will automatically renew for three year terms”
“shall continue on a month to month basis until terminated”
Contract term is FIXED if, e.g.:
“terminate effective April 1, 2007.”
“will continue until the 1 year anniversary”
Document Understanding Pipeline
Input
OCR
Output
BL
.
.
.
Entities
Classes
Relations
Text
Layout
Structured
Data
.
.
.
Many many models!
Key Takeaway: Every node in this graph is a “model” (of hundreds), and the remainder of this talk applies to
each and every one of them.
Initial Goals (Pre-MVP)
● Evaluate technical feasibility: Can we build it?
● Evaluate business viability: Will they find it useful?
● Move very quickly: Can we ship it before we run out of money?
Use tools that are easy to
● Understand
● Setup
● Deploy
Steady state Goals (Post-MVP)
● Scale model development
● Scale model deployment
● Keep users happy at all times
Use tools that are easy to
● Integrate
● Configure
● Scale
Typical model lifecycle
Experience with ML in
research, applications,
and platforms:
Data
EARLY
● Finding the data
Scrapers/FOIA
● Cleaning the data
Scripting + Rules
● Annotating the data
Simple annotation tools
LATER
● Managing the data
Data Stores and Caches
● Protecting the data
Encryption and Access control
● Scaling annotation
Weakly/Unsupervised
Training
EARLY
Optimize for Speed of Results
Jupyter, Scripts
Goal: does it work?
LATER
Optimize for speed of Experimentation
Frameworks and metrics
Goal: make it the best!
Packaging
EARLY
Optimize for shipping the models
REST endpoint (online)
Batch script (offline)
LATER
Optimize for operationalizing the
model
Versioning of artefacts
Dependency management
Cost management
More on this a bit later...
Validate Model
EARLY
● Does it work well enough?
Simple high level metrics (F1, P, R etc.)
LATER
● Is it better?
● Why is it better?
● How is it better?
Much more rigor:
● Validation sets
● E2E tests
● More detailed metrics
Deployment
EARLY
Optimize for Speed of deployment
LATER
Optimize for Scale of deployment
● Inference time
● Priority vs. starvation
● Rapid update deployment
Monitor
EARLY
Bare minimum to ensure things are
working:
● High level E2E alert
LATER
Invest in monitoring all aspects of the
models:
● Detailed KPIs
● Model Drift
● User DSAT
Logging, Dashboards, Alerts
Deep Dive: Model Versioning
Real life problems
● “We used to predict the right X on this document - when/why did it break?”
○ Usually accompanied by an alert or even worse: a user complaint.
● “The model we trained 2 months ago was so much better at Y - we can’t seem
to get the same performance. How do we roll back?”
○ Usually accompanied by a frustrated product manager / quality engineering.
● “I swear I got better results over the weekend for the same experiment, I don’t
know what changed!”
○ Usually accompanied by a confused data scientist.
But first: can you reproduce your model results to the 10th decimal place? If not, STOP!
Wait… didn’t we solve this problem a long time ago?
Source control has been used for decades. How is this different?
Versioning ML models shares a lot with code versioning, for e.g.:
But it also includes a lot more:
Code (*) Config
Library dependencies Topology
Training Data Training Parameters
Model State (weights, hyperparameters) Hardware
(*) Code is a lot of things in the context of ML models, it’s data prep, libraries, models, featurizers etc.
What exactly is Versioning for ML models?
L1: Production/Staging slots.
Allows very short-term rollback/rollforward.
L2: Reproducing Inference.
Once you have a trained model, this kind of versioning allows you to deterministically
reconstruct a model for inference. Allows pinning models for a long time as well as long-term
rollback/rollforward.
L3: Reproducing Training.
You can at any point in time, re-train a model that yields the exact same model you had
previously trained. This is a much stronger kind of versioninging, it enables reproducibility as
well as dealing with issues as training data corruption.
Artefacts that need to be versioned
Simple examples Inference Training
Model Hyper Parameters Size of Layer N
Featurizer Code Input feature vector size
Featurizer Data Vocab
Model Code NN Architecture
Model Config Remove Stop Words?
Model State Model Weights
Library Dependencies PyTorch Version
Hardware V100
Training Config Early Stopping Criteria
Training Data Data + Labels
Remember this pipeline?
Input
OCR
Output
BL
.
.
.
Entities
Classes
Relations
Text
Layout
Structured
Data
.
.
.
Many many models!
You need to version the aforementioned artefacts for every single node in this graph. That’s a lot of things to
version!
Some solutions (that don’t work)
● Let’s snapshot everything in a Docker image and store it forever
> How do you hotfix the model?
● Let’s mark a “stable” production model and not deploy any future “staging”
versions till they have been tested enough.
> How do you make “breaking” changes to the code?
● Let’s always support only “latest” version and never commit a new version
until we’re sure it’s good.
> How do you iterate quickly?
We evaluated some existing solutions
It’s always better to not reinvent the wheel
It’s a lot of work to move infrastructure
The question is when not if. Early stage startups need to ship and sell their
product, hard to justify infrastructure plumbing till the flywheel turns.
Instead of a full solution, these investments have paid off:
1. Versioning all model state during packaging
2. Versioning all data artefacts in our our data store and making them immutable
3. Versioning all code explicitly by keeping stable interfaces and supporting
minor/major version upgrades to model/featurizer code.
4. Pinning major versions of stable dependencies
Remember: we are building a whole user facing application on top of this,
prioritizing when to invest here is critical.
BTW, all this ML is in addition to…
● Permissions
● Email alerts
● SSO
● End-user annotations
● Custom reporting
● Full text search
● Task management
● Custom fields
● Doc schemas
● APIs
● Integrations
● Bulk export
● Integrations
● Dashboards
● Pretty charts
● Bulk ingestion
● Security
● Audit trail
… building a complete user facing application!
A note on ML technical debt
● Identify when cost debt > cost addressing debt
● Incorporate cost of ML infrastructure in your business model
● Pick the right kind of technical debt, with a plan to get out
● Model versioning is one of the areas you might want to invest in early
● Getting a great model is just the first step of a long journey. You have to build
a product customers love!
Questions?
Learn more at https://lexion.ai (we’re hiring!)

More Related Content

What's hot

03 the c language
03 the c language03 the c language
03 the c language
arafatmirza
 
Beyond Unit Testing
Beyond Unit TestingBeyond Unit Testing
Beyond Unit Testing
Søren Lund
 

What's hot (20)

Best Practices in Software Development
Best Practices in Software DevelopmentBest Practices in Software Development
Best Practices in Software Development
 
Test-drive development and Umple
Test-drive development and UmpleTest-drive development and Umple
Test-drive development and Umple
 
C programming
C programmingC programming
C programming
 
03 the c language
03 the c language03 the c language
03 the c language
 
Uses for scripting languages,web scripting in perl
Uses for scripting languages,web scripting in perlUses for scripting languages,web scripting in perl
Uses for scripting languages,web scripting in perl
 
SDL Trados Studio 2014 Masterclass
SDL Trados Studio 2014 MasterclassSDL Trados Studio 2014 Masterclass
SDL Trados Studio 2014 Masterclass
 
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worldsmbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
mbeddr meets IncQuer - Combining the Best Features of Two Modeling Worlds
 
Ncrafts.io - Refactor your software architecture
Ncrafts.io - Refactor your software architectureNcrafts.io - Refactor your software architecture
Ncrafts.io - Refactor your software architecture
 
Documenting Code - Patterns and Anti-patterns - NLPW 2016
Documenting Code - Patterns and Anti-patterns - NLPW 2016Documenting Code - Patterns and Anti-patterns - NLPW 2016
Documenting Code - Patterns and Anti-patterns - NLPW 2016
 
Bootstrapping in Compiler
Bootstrapping in CompilerBootstrapping in Compiler
Bootstrapping in Compiler
 
Beyond Unit Testing
Beyond Unit TestingBeyond Unit Testing
Beyond Unit Testing
 
Chapter 10
Chapter 10 Chapter 10
Chapter 10
 
LabVIEW: This Or That?
LabVIEW: This Or That?LabVIEW: This Or That?
LabVIEW: This Or That?
 
Lecture 29
Lecture 29Lecture 29
Lecture 29
 
Ambiguous Requirements – Translating the message from C-level to implementation
Ambiguous Requirements – Translating the message from C-level to implementationAmbiguous Requirements – Translating the message from C-level to implementation
Ambiguous Requirements – Translating the message from C-level to implementation
 
Introduction to Machine translation - AEM
Introduction to Machine translation - AEMIntroduction to Machine translation - AEM
Introduction to Machine translation - AEM
 
Documenting code yapceu2016
Documenting code yapceu2016Documenting code yapceu2016
Documenting code yapceu2016
 
Solid principles
Solid principlesSolid principles
Solid principles
 
The Psychology of C# Analysis
The Psychology of C# AnalysisThe Psychology of C# Analysis
The Psychology of C# Analysis
 
How to estimate the cost of a Maximo migration project with a high level of c...
How to estimate the cost of a Maximo migration project with a high level of c...How to estimate the cost of a Maximo migration project with a high level of c...
How to estimate the cost of a Maximo migration project with a high level of c...
 

Similar to Rsqrd AI: ML Tooling at an AI-first Startup

Sanjaykumar Kakaso Mane_MAY2016
Sanjaykumar Kakaso Mane_MAY2016Sanjaykumar Kakaso Mane_MAY2016
Sanjaykumar Kakaso Mane_MAY2016
Sanjay Mane
 
VidyaBhooshanMishra_CV
VidyaBhooshanMishra_CVVidyaBhooshanMishra_CV
VidyaBhooshanMishra_CV
Landis+Gyr
 
Prasad Rompalli latest Resume
Prasad Rompalli latest ResumePrasad Rompalli latest Resume
Prasad Rompalli latest Resume
Rsv Prasad
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
vitm11
 
Prasad Rompalli latest Resume
Prasad Rompalli latest ResumePrasad Rompalli latest Resume
Prasad Rompalli latest Resume
Rsv Prasad
 

Similar to Rsqrd AI: ML Tooling at an AI-first Startup (20)

Python for Data Logistics
Python for Data LogisticsPython for Data Logistics
Python for Data Logistics
 
Sanjaykumar Kakaso Mane_MAY2016
Sanjaykumar Kakaso Mane_MAY2016Sanjaykumar Kakaso Mane_MAY2016
Sanjaykumar Kakaso Mane_MAY2016
 
Software development life cycle
Software development life cycleSoftware development life cycle
Software development life cycle
 
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning InfrastructureML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
ML Platform Q1 Meetup: Airbnb's End-to-End Machine Learning Infrastructure
 
VidyaBhooshanMishra_CV
VidyaBhooshanMishra_CVVidyaBhooshanMishra_CV
VidyaBhooshanMishra_CV
 
[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...
[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...
[DSC Europe 22] Engineers guide for shepherding models in to production - Mar...
 
The Design, Evolution and Use of KernelF
The Design, Evolution and Use of KernelFThe Design, Evolution and Use of KernelF
The Design, Evolution and Use of KernelF
 
SudhanshuKumar
SudhanshuKumarSudhanshuKumar
SudhanshuKumar
 
Prasad Rompalli latest Resume
Prasad Rompalli latest ResumePrasad Rompalli latest Resume
Prasad Rompalli latest Resume
 
Mannu_Kumar_CV
Mannu_Kumar_CVMannu_Kumar_CV
Mannu_Kumar_CV
 
Software Development Standard Operating Procedure
Software Development Standard Operating Procedure Software Development Standard Operating Procedure
Software Development Standard Operating Procedure
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
 
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
OpenMetrics: What Does It Mean for You (PromCon 2019, Munich)
 
The working architecture of NodeJS applications, Виктор Турский
The working architecture of NodeJS applications, Виктор ТурскийThe working architecture of NodeJS applications, Виктор Турский
The working architecture of NodeJS applications, Виктор Турский
 
The working architecture of node js applications open tech week javascript ...
The working architecture of node js applications   open tech week javascript ...The working architecture of node js applications   open tech week javascript ...
The working architecture of node js applications open tech week javascript ...
 
Prasad Rompalli latest Resume
Prasad Rompalli latest ResumePrasad Rompalli latest Resume
Prasad Rompalli latest Resume
 
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
Best Practices with OLAP Modeling with Cognos Transformer (Cognos 8)
 
Shivaprasada_Kodoth
Shivaprasada_KodothShivaprasada_Kodoth
Shivaprasada_Kodoth
 
Advanced web application architecture - Talk
Advanced web application architecture - TalkAdvanced web application architecture - Talk
Advanced web application architecture - Talk
 
01lifecycles
01lifecycles01lifecycles
01lifecycles
 

More from Sanjana Chowdhury

More from Sanjana Chowdhury (12)

Rsqrd AI: Making Conversational AI Work for Everybody
Rsqrd AI: Making Conversational AI Work for EverybodyRsqrd AI: Making Conversational AI Work for Everybody
Rsqrd AI: Making Conversational AI Work for Everybody
 
Rsqrd AI: Application of Explanation Model in Healthcare
Rsqrd AI: Application of Explanation Model in HealthcareRsqrd AI: Application of Explanation Model in Healthcare
Rsqrd AI: Application of Explanation Model in Healthcare
 
Rsqrd AI: Recent Advances in Explainable Machine Learning Research
Rsqrd AI: Recent Advances in Explainable Machine Learning ResearchRsqrd AI: Recent Advances in Explainable Machine Learning Research
Rsqrd AI: Recent Advances in Explainable Machine Learning Research
 
Rsqrd AI: Incorporating Priors with Feature Attribution on Text Classification
Rsqrd AI: Incorporating Priors with Feature Attribution on Text ClassificationRsqrd AI: Incorporating Priors with Feature Attribution on Text Classification
Rsqrd AI: Incorporating Priors with Feature Attribution on Text Classification
 
Rsqrd AI: Discovering Natural Bugs Using Adversarial Perturbations
Rsqrd AI: Discovering Natural Bugs Using Adversarial PerturbationsRsqrd AI: Discovering Natural Bugs Using Adversarial Perturbations
Rsqrd AI: Discovering Natural Bugs Using Adversarial Perturbations
 
Rsqrd AI: A Survey of The Current Ecosystem of Explainability Techniques
Rsqrd AI: A Survey of The Current Ecosystem of Explainability TechniquesRsqrd AI: A Survey of The Current Ecosystem of Explainability Techniques
Rsqrd AI: A Survey of The Current Ecosystem of Explainability Techniques
 
Rsqrd AI: Explaining ML Models w/ Geometric Intuition
Rsqrd AI: Explaining ML Models w/ Geometric IntuitionRsqrd AI: Explaining ML Models w/ Geometric Intuition
Rsqrd AI: Explaining ML Models w/ Geometric Intuition
 
Rsqrd AI: Errudite- Scalable, Reproducible, and Testable Error Analysis
Rsqrd AI: Errudite- Scalable, Reproducible, and Testable Error AnalysisRsqrd AI: Errudite- Scalable, Reproducible, and Testable Error Analysis
Rsqrd AI: Errudite- Scalable, Reproducible, and Testable Error Analysis
 
Rsqrd AI: Exploring Machine Learning Model Predictions
Rsqrd AI: Exploring Machine Learning Model PredictionsRsqrd AI: Exploring Machine Learning Model Predictions
Rsqrd AI: Exploring Machine Learning Model Predictions
 
Rsqrd AI: Zestimates and Zillow AI Platform
Rsqrd AI: Zestimates and Zillow AI PlatformRsqrd AI: Zestimates and Zillow AI Platform
Rsqrd AI: Zestimates and Zillow AI Platform
 
Rsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AIRsqrd AI: From R&D to ROI of AI
Rsqrd AI: From R&D to ROI of AI
 
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible PipelineRsqrd AI: How to Design a Reliable and Reproducible Pipeline
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Rsqrd AI: ML Tooling at an AI-first Startup

  • 1. Emad Elwany - CTO, Lexion Evolution of ML Infrastructure at an AI-First Startup Rsqrd AI Meetup - May 2020
  • 2. Agenda ● Lexion Overview ● Document Understanding Pipeline ● Evolution of ML Infrastructure at Lexion ● Deep Dive - Model Versioning
  • 3. Lexion: Applying NLP to legal agreements Creating this simple report could take weeks without automation.
  • 4. It’s a complex NLP problem ● Messy PDFs make OCR non-trivial ● Long, multi-agreement documents ● Domain specific language ● Complex schemas/ontologies ● Mix of non/semi/fully structured data
  • 5. Sample: Identify Contract Term Contract term is AUTO RENEW if, e.g.: “will automatically renew for three year terms” “shall continue on a month to month basis until terminated” Contract term is FIXED if, e.g.: “terminate effective April 1, 2007.” “will continue until the 1 year anniversary”
  • 6. Document Understanding Pipeline Input OCR Output BL . . . Entities Classes Relations Text Layout Structured Data . . . Many many models! Key Takeaway: Every node in this graph is a “model” (of hundreds), and the remainder of this talk applies to each and every one of them.
  • 7. Initial Goals (Pre-MVP) ● Evaluate technical feasibility: Can we build it? ● Evaluate business viability: Will they find it useful? ● Move very quickly: Can we ship it before we run out of money? Use tools that are easy to ● Understand ● Setup ● Deploy
  • 8. Steady state Goals (Post-MVP) ● Scale model development ● Scale model deployment ● Keep users happy at all times Use tools that are easy to ● Integrate ● Configure ● Scale
  • 9. Typical model lifecycle Experience with ML in research, applications, and platforms:
  • 10. Data EARLY ● Finding the data Scrapers/FOIA ● Cleaning the data Scripting + Rules ● Annotating the data Simple annotation tools LATER ● Managing the data Data Stores and Caches ● Protecting the data Encryption and Access control ● Scaling annotation Weakly/Unsupervised
  • 11. Training EARLY Optimize for Speed of Results Jupyter, Scripts Goal: does it work? LATER Optimize for speed of Experimentation Frameworks and metrics Goal: make it the best!
  • 12. Packaging EARLY Optimize for shipping the models REST endpoint (online) Batch script (offline) LATER Optimize for operationalizing the model Versioning of artefacts Dependency management Cost management More on this a bit later...
  • 13. Validate Model EARLY ● Does it work well enough? Simple high level metrics (F1, P, R etc.) LATER ● Is it better? ● Why is it better? ● How is it better? Much more rigor: ● Validation sets ● E2E tests ● More detailed metrics
  • 14. Deployment EARLY Optimize for Speed of deployment LATER Optimize for Scale of deployment ● Inference time ● Priority vs. starvation ● Rapid update deployment
  • 15. Monitor EARLY Bare minimum to ensure things are working: ● High level E2E alert LATER Invest in monitoring all aspects of the models: ● Detailed KPIs ● Model Drift ● User DSAT Logging, Dashboards, Alerts
  • 16. Deep Dive: Model Versioning
  • 17. Real life problems ● “We used to predict the right X on this document - when/why did it break?” ○ Usually accompanied by an alert or even worse: a user complaint. ● “The model we trained 2 months ago was so much better at Y - we can’t seem to get the same performance. How do we roll back?” ○ Usually accompanied by a frustrated product manager / quality engineering. ● “I swear I got better results over the weekend for the same experiment, I don’t know what changed!” ○ Usually accompanied by a confused data scientist. But first: can you reproduce your model results to the 10th decimal place? If not, STOP!
  • 18. Wait… didn’t we solve this problem a long time ago? Source control has been used for decades. How is this different? Versioning ML models shares a lot with code versioning, for e.g.: But it also includes a lot more: Code (*) Config Library dependencies Topology Training Data Training Parameters Model State (weights, hyperparameters) Hardware (*) Code is a lot of things in the context of ML models, it’s data prep, libraries, models, featurizers etc.
  • 19. What exactly is Versioning for ML models? L1: Production/Staging slots. Allows very short-term rollback/rollforward. L2: Reproducing Inference. Once you have a trained model, this kind of versioning allows you to deterministically reconstruct a model for inference. Allows pinning models for a long time as well as long-term rollback/rollforward. L3: Reproducing Training. You can at any point in time, re-train a model that yields the exact same model you had previously trained. This is a much stronger kind of versioninging, it enables reproducibility as well as dealing with issues as training data corruption.
  • 20. Artefacts that need to be versioned Simple examples Inference Training Model Hyper Parameters Size of Layer N Featurizer Code Input feature vector size Featurizer Data Vocab Model Code NN Architecture Model Config Remove Stop Words? Model State Model Weights Library Dependencies PyTorch Version Hardware V100 Training Config Early Stopping Criteria Training Data Data + Labels
  • 21. Remember this pipeline? Input OCR Output BL . . . Entities Classes Relations Text Layout Structured Data . . . Many many models! You need to version the aforementioned artefacts for every single node in this graph. That’s a lot of things to version!
  • 22. Some solutions (that don’t work) ● Let’s snapshot everything in a Docker image and store it forever > How do you hotfix the model? ● Let’s mark a “stable” production model and not deploy any future “staging” versions till they have been tested enough. > How do you make “breaking” changes to the code? ● Let’s always support only “latest” version and never commit a new version until we’re sure it’s good. > How do you iterate quickly?
  • 23. We evaluated some existing solutions It’s always better to not reinvent the wheel
  • 24. It’s a lot of work to move infrastructure The question is when not if. Early stage startups need to ship and sell their product, hard to justify infrastructure plumbing till the flywheel turns. Instead of a full solution, these investments have paid off: 1. Versioning all model state during packaging 2. Versioning all data artefacts in our our data store and making them immutable 3. Versioning all code explicitly by keeping stable interfaces and supporting minor/major version upgrades to model/featurizer code. 4. Pinning major versions of stable dependencies Remember: we are building a whole user facing application on top of this, prioritizing when to invest here is critical.
  • 25. BTW, all this ML is in addition to… ● Permissions ● Email alerts ● SSO ● End-user annotations ● Custom reporting ● Full text search ● Task management ● Custom fields ● Doc schemas ● APIs ● Integrations ● Bulk export ● Integrations ● Dashboards ● Pretty charts ● Bulk ingestion ● Security ● Audit trail … building a complete user facing application!
  • 26. A note on ML technical debt ● Identify when cost debt > cost addressing debt ● Incorporate cost of ML infrastructure in your business model ● Pick the right kind of technical debt, with a plan to get out ● Model versioning is one of the areas you might want to invest in early ● Getting a great model is just the first step of a long journey. You have to build a product customers love!
  • 27. Questions? Learn more at https://lexion.ai (we’re hiring!)