SlideShare a Scribd company logo
1 of 45
Download to read offline
QuTrack: A Blockchain-based approach to Model
Lifecycle Management
2019 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy
s.Krishnamurthy@neu.edu
• Northeastern University
• QuantUniversity : A NEU IDEAS startup
2
• Model Life-cycle Management challenges
• Approaches
• QuTrack: A Blockchain-based approach to Model Lifecycle
Management
• Demo
Agenda
3
Machine Learning Workflow
Data Scraping/
Ingestion
Data
Exploration
Data Cleansing
and Processing
Feature
Engineering
Model
Evaluation
& Tuning
Model
Selection
Model
Deployment/
Inference
Supervised
Unsupervised
Modeling
Data Engineer, Dev Ops Engineer
Data Scientist/QuantsSoftware/Web Engineer
• AutoML
• Model Validation
• Interpretability
Robotic Process Automation (RPA) (Microservices, Pipelines )
• SW: Web/ Rest API
• HW: GPU, Cloud
• Monitoring
• Regression
• KNN
• Decision Trees
• Naive Bayes
• Neural Networks
• Ensembles
• Clustering
• PCA
• Autoencoder
• RMS
• MAPS
• MAE
• Confusion Matrix
• Precision/Recall
• ROC
• Hyper-parameter
tuning
• Parameter Grids
Risk Management/ Compliance(All stages)
Analysts&
DecisionMakers
5
• Developing ML applications involves:
▫ Heuristics
▫ Best practices (templates)
▫ Lots of experimentation
▫ Many moving pieces
▫ No “right” answer
▫ Error creep: Even small untracked errors can through off results
AI/ML application development => Design of Experiments
6
Source: Sculley et al., 2015 "Hidden Technical Debt in Machine Learning Systems"
Challenges
7
The reproducibility challenge
https://www.nature.com/news/1-500-scientists-lift-the-
lid-on-reproducibility-1.19970
8
• Repeatability (Same team, same experimental setup)
▫ The measurement can be obtained with stated precision by the same team using the
same measurement procedure, the same measuring system, under the same operating
conditions, in the same location on multiple trials. For computational experiments, this
means that a researcher can reliably repeat her own computation.
• Replicability (Different team, same experimental setup)
▫ The measurement can be obtained with stated precision by a different team using the
same measurement procedure, the same measuring system, under the same operating
conditions, in the same or a different location on multiple trials. For computational
experiments, this means that an independent group can obtain the same result using the
author’s own artifacts.
• Reproducibility (Different team, different experimental setup)
▫ The measurement can be obtained with stated precision by a different team, a different
measuring system, in a different location on multiple trials. For computational
experiments, this means that an independent group can obtain the same result using
artifacts which they develop completely independently.
Repeatable or Reproducible or Replicable
https://www.acm.org/publications/policies/artifact-review-badging
9
Many choices
Languages
Frameworks
Platforms
10
Elements of Model Risk Management
11
AI Governance is gaining focus
https://legalinstruments.oecd.org/
en/instruments/OECD-LEGAL-0449
12
AI Governance is gaining focus
https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0449
13
NLP pipeline
Data Ingestion
from Edgar
Pre-Processing
Invoking APIs to
label data
Compare APIs
Build a new
model for
sentiment
Analysis
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
• Amazon Comprehend API
• Google API
• Watson API
• Azure API
14
• Programming environment
• Execution environment
• Hardware specs
• Cloud
• GPU
• Dependencies
• Lineage/Provenance of
individual components
• Model params
• Hyper parameters
• Pipeline specifications
• Model specific
• Tests
• Data versions
Data Model
EnvironmentProcess
Components that needs to be tracked
15
Source: T. van derWeide, O. Smirnov, M. Zielinski, D. Papadopoulos, and T. van Kasteren. Versioned machine learning pipelines for batch experimentation. In ML
Systems, Workshop NIPS 2016, 2016.
Provenance and Lineage of pipelines
16
Schemas proposed
Sebastian Schelter, Joos-Hendrik Boese, Johannes Kirschnick, Thoralf Klein, and Stephan Seufert. Automatically
Tracking Metadata and Provenance of Machine Learning Experiments. NIPS Workshop on Machine Learning
Systems, 2017.
17
Schemas proposed
G. C. Publio, D. Esteves, and H. Zafar, “ML-Schema : Exposing the Semantics of Machine Learning with Schemas
and Ontologies,” in Reproducibility in ML Workshop, ICML’18, 2018.
18
MLFlow
19
DVC
20
GoCD
21
22
I. Altintas, O. Barney, and E. Jaeger-Frank. Provenance collection support in the Kepler scientific workflow
system. In Provenance and annotation of data, pages 118–132.
Current approaches
23
Miao, Hui & Chavan, Amit & Deshpande, Amol. (2016). ProvDB: A System for Lifecycle Management of
Collaborative Analysis Workflows.
Current approaches
24
Related work
Xueping Liang, Sachin Shetty, Deepak Tosh, Charles Kamhoua, Kevin Kwiat, and Laurent Njilla. 2017. ProvChain: A Blockchain-based Data Provenance Architecture in Cloud
Environment with Enhanced Privacy and Availability. In Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid '17). IEEE Press,
Piscataway, NJ, USA, 468-477. DOI: https://doi.org/10.1109/CCGRID.2017.8
Focus on Cloud data
provenance using
Blockchain
25
Related work
Ramachandran, Aravind & Kantarcioglu, Dr. (2017). Using Blockchain and smart contracts for secure data provenance management.
DataProv: Built on top of
Ethereum, the platform
utilizes smart contracts
and open provenance
model (OPM) to record
immutable data trails.
26
Related work
Sarpatwar, Kanthi & Vaculín, Roman & Min, Hong & Su, Gong & Heath, Terry & Ganapavarapu, Giridhar & Dillenberger, Donna. (2019). Towards Enabling Trusted Artificial
Intelligence via Blockchain. 10.1007/978-3-030-17277-0_8.
Trusted AI and
provenance of AI models
27
28
QuSandbox research suite
Model Analytics
Studio
QuSandboxQuTrack
QuResearchHub
Prototype, Iterate and tune
Standardize workflowsProductionize and share
Track experiments
29
The four components that need to be encapsulated for
reproducible pipelines
Code Data
Environment Process
30
QuSandbox
31
32
Model Management Studio
33
• JDF: Job Definition File; A DSL for representing Model Pipelines
• Stage
• Entity
▫ Model
▫ Data
▫ Environment
• Version format
▫ M:m:p -> Major Version: Minor Version: Patch
Terms
34
JDF- DSL
35
QuResearchHub
36
• Ability to track the evolution of experiments
• Snapshot and store the lineage of pipelines
• Ensure auditability and secure access to archived pipelines
• Track experiments and their associated parameters
• Track all aspects of experiments to ensure reproducibility
• For high-impact (regulated, critical) applications, ensure experiment
traces are immutable and verifiable
QuTrack Design requirements
37
Metadata
• Data about the information to be tracked
• Includes version number, timestamps, user information, MD5 of the artifacts
and high-level notes
Data
• Pipelines, custom DSL, standard formats for representing models
• Events (Updates, rollbacks
• JSON, Amazon ION, YAML,
Artifacts
• Model Pickle files, ONYX, COREML, Model params
• Data, blobs etc.
Architecture : What’s tracked ?
38
Blockchain-based:
• QLDB
• Ethereum
Non-Blockchain-based:
• MongoDB
Architectures supported
39
Amazon Quantum Ledger Database (QLDB)
• Fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log.
• SQL-like API
• Cost-effective
Amazon QLDB
40
Amazon QLDB
41
Amazon QLDB
https://aws.amazon.com/qldb/faqs/
42
Demo: Reference App
43
44
• Support for ONYX, CoreML
• Integration with:
▫ MLFlow, DVC, GoCD
• Integration with SCM systems
▫ Github, SVM
• Tracking Back tests
• Push Architecture -> Event-Driven Architecture
• Enriched Analytics
• Roles and Authorization
Future work
Thank You!
If you are interested in trying
out QuSandbox,
Please sign up for updates at:
www.qusandbox.com
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
45

More Related Content

What's hot

Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
Ian Foster
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biology
Neil Swainston
 

What's hot (20)

Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel Visualizing and Clustering Life Science Applications in Parallel 
Visualizing and Clustering Life Science Applications in Parallel 
 
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
Classifying Simulation and Data Intensive Applications and the HPC-Big Data C...
 
51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack51 Use Cases and implications for HPC & Apache Big Data Stack
51 Use Cases and implications for HPC & Apache Big Data Stack
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Flexible Design for Simple Digital Library Tools and Services
Flexible Design for Simple Digital Library Tools and ServicesFlexible Design for Simple Digital Library Tools and Services
Flexible Design for Simple Digital Library Tools and Services
 
2015 presentation
2015 presentation2015 presentation
2015 presentation
 
Data Automation at Light Sources
Data Automation at Light SourcesData Automation at Light Sources
Data Automation at Light Sources
 
NIH Data Commons Architecture Ideas
NIH Data Commons Architecture IdeasNIH Data Commons Architecture Ideas
NIH Data Commons Architecture Ideas
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...
Workshop: Introduction to Cytoscape at UT-KBRIN Bioinformatics Summit 2014 (4...
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop Slides
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Grid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applicationsGrid optical network service architecture for data intensive applications
Grid optical network service architecture for data intensive applications
 
Poster iscram 2008
Poster   iscram 2008Poster   iscram 2008
Poster iscram 2008
 
Data mining weka
Data mining wekaData mining weka
Data mining weka
 
Integrative information management for systems biology
Integrative information management for systems biologyIntegrative information management for systems biology
Integrative information management for systems biology
 

Similar to 2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain approach - Srikanth Krishnamurthy, October 7, 2019

FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
Carole Goble
 
Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...
Ola Spjuth
 

Similar to 2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain approach - Srikanth Krishnamurthy, October 7, 2019 (20)

QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science CentralCloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
Cloud e-Genome: NGS Workflows on the Cloud Using e-Science Central
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
 
Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and ApplicationsMachine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and Applications
 
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習 Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
Azure 機器學習 - 使用Python, R, Spark, CNTK 深度學習
 
Scientific
Scientific Scientific
Scientific
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
C4Bio paper talk
C4Bio paper talkC4Bio paper talk
C4Bio paper talk
 
GenePattern Integration with Globus
GenePattern Integration with GlobusGenePattern Integration with Globus
GenePattern Integration with Globus
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...Automating the process of continuously prioritising data, updating and deploy...
Automating the process of continuously prioritising data, updating and deploy...
 
ADDO Open Source Observability Tools
ADDO Open Source Observability Tools ADDO Open Source Observability Tools
ADDO Open Source Observability Tools
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
 
Advances in Scientific Workflow Environments
Advances in Scientific Workflow EnvironmentsAdvances in Scientific Workflow Environments
Advances in Scientific Workflow Environments
 

More from The Statistical and Applied Mathematical Sciences Institute

Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
The Statistical and Applied Mathematical Sciences Institute
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
The Statistical and Applied Mathematical Sciences Institute
 

More from The Statistical and Applied Mathematical Sciences Institute (20)

Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
Causal Inference Opening Workshop - Latent Variable Models, Causal Inference,...
 
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
2019 Fall Series: Special Guest Lecture - 0-1 Phase Transitions in High Dimen...
 
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
Causal Inference Opening Workshop - Causal Discovery in Neuroimaging Data - F...
 
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
Causal Inference Opening Workshop - Smooth Extensions to BART for Heterogeneo...
 
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
Causal Inference Opening Workshop - A Bracketing Relationship between Differe...
 
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
Causal Inference Opening Workshop - Testing Weak Nulls in Matched Observation...
 
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...Causal Inference Opening Workshop - Difference-in-differences: more than meet...
Causal Inference Opening Workshop - Difference-in-differences: more than meet...
 
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
Causal Inference Opening Workshop - New Statistical Learning Methods for Esti...
 
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
Causal Inference Opening Workshop - Bipartite Causal Inference with Interfere...
 
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
Causal Inference Opening Workshop - Bridging the Gap Between Causal Literatur...
 
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
Causal Inference Opening Workshop - Some Applications of Reinforcement Learni...
 
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
Causal Inference Opening Workshop - Bracketing Bounds for Differences-in-Diff...
 
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
Causal Inference Opening Workshop - Assisting the Impact of State Polcies: Br...
 
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
Causal Inference Opening Workshop - Experimenting in Equilibrium - Stefan Wag...
 
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
Causal Inference Opening Workshop - Targeted Learning for Causal Inference Ba...
 
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
Causal Inference Opening Workshop - Bayesian Nonparametric Models for Treatme...
 
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
2019 Fall Series: Special Guest Lecture - Adversarial Risk Analysis of the Ge...
 
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
2019 Fall Series: Professional Development, Writing Academic Papers…What Work...
 
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
2019 GDRR: Blockchain Data Analytics - Machine Learning in/for Blockchain: Fu...
 
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
2019 GDRR: Blockchain Data Analytics - Modeling Cryptocurrency Markets with T...
 

Recently uploaded

Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 

Recently uploaded (20)

Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 

2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain approach - Srikanth Krishnamurthy, October 7, 2019

  • 1. QuTrack: A Blockchain-based approach to Model Lifecycle Management 2019 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy s.Krishnamurthy@neu.edu • Northeastern University • QuantUniversity : A NEU IDEAS startup
  • 2. 2 • Model Life-cycle Management challenges • Approaches • QuTrack: A Blockchain-based approach to Model Lifecycle Management • Demo Agenda
  • 3. 3
  • 4. Machine Learning Workflow Data Scraping/ Ingestion Data Exploration Data Cleansing and Processing Feature Engineering Model Evaluation & Tuning Model Selection Model Deployment/ Inference Supervised Unsupervised Modeling Data Engineer, Dev Ops Engineer Data Scientist/QuantsSoftware/Web Engineer • AutoML • Model Validation • Interpretability Robotic Process Automation (RPA) (Microservices, Pipelines ) • SW: Web/ Rest API • HW: GPU, Cloud • Monitoring • Regression • KNN • Decision Trees • Naive Bayes • Neural Networks • Ensembles • Clustering • PCA • Autoencoder • RMS • MAPS • MAE • Confusion Matrix • Precision/Recall • ROC • Hyper-parameter tuning • Parameter Grids Risk Management/ Compliance(All stages) Analysts& DecisionMakers
  • 5. 5 • Developing ML applications involves: ▫ Heuristics ▫ Best practices (templates) ▫ Lots of experimentation ▫ Many moving pieces ▫ No “right” answer ▫ Error creep: Even small untracked errors can through off results AI/ML application development => Design of Experiments
  • 6. 6 Source: Sculley et al., 2015 "Hidden Technical Debt in Machine Learning Systems" Challenges
  • 8. 8 • Repeatability (Same team, same experimental setup) ▫ The measurement can be obtained with stated precision by the same team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same location on multiple trials. For computational experiments, this means that a researcher can reliably repeat her own computation. • Replicability (Different team, same experimental setup) ▫ The measurement can be obtained with stated precision by a different team using the same measurement procedure, the same measuring system, under the same operating conditions, in the same or a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using the author’s own artifacts. • Reproducibility (Different team, different experimental setup) ▫ The measurement can be obtained with stated precision by a different team, a different measuring system, in a different location on multiple trials. For computational experiments, this means that an independent group can obtain the same result using artifacts which they develop completely independently. Repeatable or Reproducible or Replicable https://www.acm.org/publications/policies/artifact-review-badging
  • 10. 10 Elements of Model Risk Management
  • 11. 11 AI Governance is gaining focus https://legalinstruments.oecd.org/ en/instruments/OECD-LEGAL-0449
  • 12. 12 AI Governance is gaining focus https://legalinstruments.oecd.org/en/instruments/OECD-LEGAL-0449
  • 13. 13 NLP pipeline Data Ingestion from Edgar Pre-Processing Invoking APIs to label data Compare APIs Build a new model for sentiment Analysis Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 • Amazon Comprehend API • Google API • Watson API • Azure API
  • 14. 14 • Programming environment • Execution environment • Hardware specs • Cloud • GPU • Dependencies • Lineage/Provenance of individual components • Model params • Hyper parameters • Pipeline specifications • Model specific • Tests • Data versions Data Model EnvironmentProcess Components that needs to be tracked
  • 15. 15 Source: T. van derWeide, O. Smirnov, M. Zielinski, D. Papadopoulos, and T. van Kasteren. Versioned machine learning pipelines for batch experimentation. In ML Systems, Workshop NIPS 2016, 2016. Provenance and Lineage of pipelines
  • 16. 16 Schemas proposed Sebastian Schelter, Joos-Hendrik Boese, Johannes Kirschnick, Thoralf Klein, and Stephan Seufert. Automatically Tracking Metadata and Provenance of Machine Learning Experiments. NIPS Workshop on Machine Learning Systems, 2017.
  • 17. 17 Schemas proposed G. C. Publio, D. Esteves, and H. Zafar, “ML-Schema : Exposing the Semantics of Machine Learning with Schemas and Ontologies,” in Reproducibility in ML Workshop, ICML’18, 2018.
  • 21. 21
  • 22. 22 I. Altintas, O. Barney, and E. Jaeger-Frank. Provenance collection support in the Kepler scientific workflow system. In Provenance and annotation of data, pages 118–132. Current approaches
  • 23. 23 Miao, Hui & Chavan, Amit & Deshpande, Amol. (2016). ProvDB: A System for Lifecycle Management of Collaborative Analysis Workflows. Current approaches
  • 24. 24 Related work Xueping Liang, Sachin Shetty, Deepak Tosh, Charles Kamhoua, Kevin Kwiat, and Laurent Njilla. 2017. ProvChain: A Blockchain-based Data Provenance Architecture in Cloud Environment with Enhanced Privacy and Availability. In Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid '17). IEEE Press, Piscataway, NJ, USA, 468-477. DOI: https://doi.org/10.1109/CCGRID.2017.8 Focus on Cloud data provenance using Blockchain
  • 25. 25 Related work Ramachandran, Aravind & Kantarcioglu, Dr. (2017). Using Blockchain and smart contracts for secure data provenance management. DataProv: Built on top of Ethereum, the platform utilizes smart contracts and open provenance model (OPM) to record immutable data trails.
  • 26. 26 Related work Sarpatwar, Kanthi & Vaculín, Roman & Min, Hong & Su, Gong & Heath, Terry & Ganapavarapu, Giridhar & Dillenberger, Donna. (2019). Towards Enabling Trusted Artificial Intelligence via Blockchain. 10.1007/978-3-030-17277-0_8. Trusted AI and provenance of AI models
  • 27. 27
  • 28. 28 QuSandbox research suite Model Analytics Studio QuSandboxQuTrack QuResearchHub Prototype, Iterate and tune Standardize workflowsProductionize and share Track experiments
  • 29. 29 The four components that need to be encapsulated for reproducible pipelines Code Data Environment Process
  • 31. 31
  • 33. 33 • JDF: Job Definition File; A DSL for representing Model Pipelines • Stage • Entity ▫ Model ▫ Data ▫ Environment • Version format ▫ M:m:p -> Major Version: Minor Version: Patch Terms
  • 36. 36 • Ability to track the evolution of experiments • Snapshot and store the lineage of pipelines • Ensure auditability and secure access to archived pipelines • Track experiments and their associated parameters • Track all aspects of experiments to ensure reproducibility • For high-impact (regulated, critical) applications, ensure experiment traces are immutable and verifiable QuTrack Design requirements
  • 37. 37 Metadata • Data about the information to be tracked • Includes version number, timestamps, user information, MD5 of the artifacts and high-level notes Data • Pipelines, custom DSL, standard formats for representing models • Events (Updates, rollbacks • JSON, Amazon ION, YAML, Artifacts • Model Pickle files, ONYX, COREML, Model params • Data, blobs etc. Architecture : What’s tracked ?
  • 39. 39 Amazon Quantum Ledger Database (QLDB) • Fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log. • SQL-like API • Cost-effective Amazon QLDB
  • 43. 43
  • 44. 44 • Support for ONYX, CoreML • Integration with: ▫ MLFlow, DVC, GoCD • Integration with SCM systems ▫ Github, SVM • Tracking Back tests • Push Architecture -> Event-Driven Architecture • Enriched Analytics • Roles and Authorization Future work
  • 45. Thank You! If you are interested in trying out QuSandbox, Please sign up for updates at: www.qusandbox.com Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 45