A FAIR Approach to Publishing and Sharing Machine Learning Models

Funding: 2018 Argonne Advanced Computing LDRD
Collaborators: Ryan Chard, Logan Ward, Marcus Schwarting, Kyle Chard, Zhuozhao Li, Anna
Woodard, Yadu Babuji, Steve Tuecke, Mike Franklin, Ian Foster
Blue – also presenting at this workshop
Data and Learning Hub for Science
https://www.dlhub.org
A FAIR Approach to Publishing and
Sharing Machine Learning Models
Ben Blaiszik (blaiszik@uchicago.edu)

Quick Polls
• How many of you have trained a machine learning model?
• How many of you have published papers using machine learning?
• How many of you have tried to reuse models from others?

State of Machine Learning in Science
Highs
• Rapid increase in number of
journal publications
• Advances across the scientific
domains
• Achievements on par with experts
or best-in-class methods in many
domains
• Funding agencies are coalescing
around ML (AI Initiative etc.)
Chart Source and Method:
https://github.com/blaiszik/ml_publication_charts

State of Machine Learning in Science
For a given model:
• Where is the code?
• Where are the trained models?
• Where is the training data?
• How can I reproduce these
results?
Without all of these pieces,
progress is drastically slowed
Location of many ML models after a
paper is finished
Github is another location…
Lows

FAIR Data Principles
• Findable
• Accessible
• Interoperable
• Reusable
https://www.force11.org/group/fairgroup/fairprinciples
Set of principles to help make data as
useful as possible to the community

Findable
• Data have an identifier
• Data are registered in a searchable resource
Accesible
• Data accessible via identifier
• Data retrievable by open protocols

Interoperable
• Data leverage formalized shared vocabularies
• Vocabularies themselves follow FAIR principles
Reusable
• Clear licensing
• Descriptive metadata is sufficient to promote
reuse

What Would FAIR Look Like in ML?
(1) Find Interesting Science Paper
• Links to code repository
(Github/DOI)
• Links to data repository (DOI)
• Publication describes the model
and its uses and limitations

(2) Find Code
• Has unique identifier (DOI)
• Links back to publication
(DOI)
• Has well-documented code
• Tagged with metadata to aid
discovery
• Registered in a search index
• Open license

(3) Find and Run Model
• Model has identifier (DOI)
• Model has links to data (DOI)
• Model has links to the code
(DOI/Github)
• Model has links to publication
(DOI)
• Data are accessible
• Inference run from the cloud - no
installation necessary!

11
• Collect, publish, categorize models and pre/post processing code
• Operate models as a service to simplify sharing, consumption, and
access
• Identify models with unique and persistent identifiers (e.g., DOI)
• Implement versioning, search, access controls etc.
Goal: Deliver FAIR for ML
2018 Argonne Adv. Computing LDRD
DATA AND LEARNING HUB FOR
SCIENCE (DLHUB)

DLHub: Key Concepts
Run()
• Servables are containers with defined
inputs and outputs
• Servables may represent machine
learning models or other data
transformations
• Outputs can be cached for inputs

DLHub: Key Concepts
• Servables are containers with defined
inputs and outputs
• Servables may represent machine
learning models or other data
transformations
• Outputs can be cached for inputs
Preprocess 1
Run()
Preprocess 2
Run()
Model predict
Run()

Example: Predicting Formation Enthalpy
This is what a user has
This is what a user wants

PUBLISHING A MACHINE LEARNING MODEL
16

Marking up a Model – Python SDK
Existing Model
User Mark Up with
SDK
Send to DLHub
(via Globus or HTTPS)
DLHub
Containerization
Populate Search
Index / Mint
Identifiers
SDK Extracts Metadata
for Known Model
Types

Python SDK – Automated Metadata Generation
Citation Metadata
Following Datacite
DLHub Metadata Servable Metadata
Access Control
• Public
• Globus users
• Globus groups

Using DLHub is Easy!
19 2018 Argonne Adv. Computing LDRD
Python SDK
$ pip install dlhub_sdk
1
2
Describe
Publish
• Publish to DLHub
• DLHub service creates
containers
• DLHub service creates unique
endpoint for servable
• Specify the model files
• Mark up the model with
information to make it
discoverable and usable

Using DLHub is Easy!
4
Run
• Make predictions by sending
data to DLHub and
specifying the servable to
use
3
Discover
• Discover servables with
advanced search capabilities
through Python SDK or web
UI (under construction)

Combining DLHub with Data Repositories
Get Data
Run Model
22
• Using high-throughput optical
imaging to predict material
bandgap

Get Data
Run Model
Combining DLHub with Data Repositories
23

Model-in-the-Loop Science
Select DLHub Use Cases
Funding: 2018 Argonne Adv. Computing LDRD
• Crystal structure • NIST PFHub
• Models linked to dynamic data sources
Community Model Benchmarking
Automated Model Retraining with New
Data
• Metallic glass discovery [active learning]
• XRD applications
XRD image tagging
(Yager, BNL)
(Ward, ANL/UC)
(Ward, ANL/UC) (Wheeler, Warren, Heinonen
NIST/UC/Argonne/NU)
(Center for Hierarchical Materials
Design NIST/UC/Argonne/NU)
CH MaD
XRD intensity à structure/phase
(Cherukara Argonne)

More Examples Available In Our Repositories
Cherukara et al.
Energy Storage Tomography X-Ray Science
Ward et al.
TomoGAN
Liu et al.

DLHub Architecture and Performance
• Task Managers (TM) to support
execution on various compute
resources
• Executors chosen by TM to invoke a
given servable’
• Caching at TM
• Data staging with Globus
• Batch submissions
• Scalability through deployment of
model replicas
https://arxiv.org/abs/1811.11213
zmq
Task Manager
Model
Repository
REST
CLI SDK
TF
Serving
DLHub Management
Service Key
Servable
Node
Model
Serving
Parsl
Sage
Maker
Executor Executor Executor
zmq
Task Manager
Ryan Chard Zhuozhao Li

Open Source Opportunities
2018 Argonne Adv. Computing LDRDhttps://www.dlhub.org
https://github.com/DLHub-Argonne
• Deposit models from the community
• Help build client functionality
• Build examples using existing servables
• Be you!
Contact: Ben Blaiszik (blaiszik@uchicago.edu)

Thanks to our sponsors!
U.S. DEPARTMENT OF
ENERGY
ALCF DF
Parsl Globus IMaD
DLHub Argonne
LDRD

A FAIR Approach to Publishing and Sharing Machine Learning Models

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (18)

Semelhante a A FAIR Approach to Publishing and Sharing Machine Learning Models

Semelhante a A FAIR Approach to Publishing and Sharing Machine Learning Models (20)

Último

Último (20)

A FAIR Approach to Publishing and Sharing Machine Learning Models