SlideShare uma empresa Scribd logo
1 de 28
Baixar para ler offline
Funding: 2018 Argonne Advanced Computing LDRD
Collaborators: Ryan Chard, Logan Ward, Marcus Schwarting, Kyle Chard, Zhuozhao Li, Anna
Woodard, Yadu Babuji, Steve Tuecke, Mike Franklin, Ian Foster
Blue – also presenting at this workshop
Data and Learning Hub for Science
https://www.dlhub.org
A FAIR Approach to Publishing and
Sharing Machine Learning Models
Ben Blaiszik (blaiszik@uchicago.edu)
Quick Polls
• How many of you have trained a machine learning model?
• How many of you have published papers using machine learning?
• How many of you have tried to reuse models from others?
State of Machine Learning in Science
Highs
• Rapid increase in number of
journal publications
• Advances across the scientific
domains
• Achievements on par with experts
or best-in-class methods in many
domains
• Funding agencies are coalescing
around ML (AI Initiative etc.)
Chart Source and Method:
https://github.com/blaiszik/ml_publication_charts
State of Machine Learning in Science
For a given model:
• Where is the code?
• Where are the trained models?
• Where is the training data?
• How can I reproduce these
results?
Without all of these pieces,
progress is drastically slowed
Location of many ML models after a
paper is finished
Github is another location…
Lows
FAIR Data Principles
• Findable
• Accessible
• Interoperable
• Reusable
https://www.force11.org/group/fairgroup/fairprinciples
Set of principles to help make data as
useful as possible to the community
FAIR Data Principles
Findable
• Data have an identifier
• Data are registered in a searchable resource
Accesible
• Data accessible via identifier
• Data retrievable by open protocols
FAIR Data Principles
Interoperable
• Data leverage formalized shared vocabularies
• Vocabularies themselves follow FAIR principles
Reusable
• Clear licensing
• Descriptive metadata is sufficient to promote
reuse
What Would FAIR Look Like in ML?
(1) Find Interesting Science Paper
• Links to code repository
(Github/DOI)
• Links to data repository (DOI)
• Publication describes the model
and its uses and limitations
What Would FAIR Look Like in ML?
(2) Find Code
• Has unique identifier (DOI)
• Links back to publication
(DOI)
• Has well-documented code
• Tagged with metadata to aid
discovery
• Registered in a search index
• Open license
What Would FAIR Look Like in ML?
(3) Find and Run Model
• Model has identifier (DOI)
• Model has links to data (DOI)
• Model has links to the code
(DOI/Github)
• Model has links to publication
(DOI)
• Data are accessible
• Inference run from the cloud - no
installation necessary!
11
• Collect, publish, categorize models and pre/post processing code
• Operate models as a service to simplify sharing, consumption, and
access
• Identify models with unique and persistent identifiers (e.g., DOI)
• Implement versioning, search, access controls etc.
Goal: Deliver FAIR for ML
2018 Argonne Adv. Computing LDRD
DATA AND LEARNING HUB FOR
SCIENCE (DLHUB)
DLHub: Key Concepts
Run()
• Servables are containers with defined
inputs and outputs
• Servables may represent machine
learning models or other data
transformations
• Outputs can be cached for inputs
DLHub: Key Concepts
• Servables are containers with defined
inputs and outputs
• Servables may represent machine
learning models or other data
transformations
• Outputs can be cached for inputs
Preprocess 1
Run()
Preprocess 2
Run()
Model predict
Run()
Example: Predicting Formation Enthalpy
This is what a user has
This is what a user wants
Example: Predicting Formation Enthalpy
This is what a user has
This is what a user wants
PUBLISHING A MACHINE LEARNING MODEL
16
Marking up a Model – Python SDK
Existing Model
User Mark Up with
SDK
Send to DLHub
(via Globus or HTTPS)
DLHub
Containerization
Populate Search
Index / Mint
Identifiers
SDK Extracts Metadata
for Known Model
Types
Python SDK – Automated Metadata Generation
Citation Metadata
Following Datacite
DLHub Metadata Servable Metadata
Access Control
• Public
• Globus users
• Globus groups
Using DLHub is Easy!
19 2018 Argonne Adv. Computing LDRD
Python SDK
$ pip install dlhub_sdk
1
2
Describe
Publish
• Publish to DLHub
• DLHub service creates
containers
• DLHub service creates unique
endpoint for servable
• Specify the model files
• Mark up the model with
information to make it
discoverable and usable
Using DLHub is Easy!
20 2018 Argonne Adv. Computing LDRD
4
Run
• Make predictions by sending
data to DLHub and
specifying the servable to
use
3
Discover
• Discover servables with
advanced search capabilities
through Python SDK or web
UI (under construction)
NEXT STEPS
21
Combining DLHub with Data Repositories
Get Data
Run Model
2018 Argonne Adv. Computing LDRD
22
• Using high-throughput optical
imaging to predict material
bandgap
Get Data
Run Model
Combining DLHub with Data Repositories
23
2018 Argonne Adv. Computing LDRD
Model-in-the-Loop Science
Select DLHub Use Cases
Funding: 2018 Argonne Adv. Computing LDRD
• Crystal structure • NIST PFHub
• Models linked to dynamic data sources
Community Model Benchmarking
Automated Model Retraining with New
Data
• Metallic glass discovery [active learning]
• XRD applications
XRD image tagging
(Yager, BNL)
(Ward, ANL/UC)
(Ward, ANL/UC) (Wheeler, Warren, Heinonen
NIST/UC/Argonne/NU)
(Center for Hierarchical Materials
Design NIST/UC/Argonne/NU)
CH MaD
XRD intensity à structure/phase
(Cherukara Argonne)
More Examples Available In Our Repositories
25 2018 Argonne Adv. Computing LDRD
Cherukara et al.
Energy Storage Tomography X-Ray Science
Ward et al.
TomoGAN
Liu et al.
DLHub Architecture and Performance
• Task Managers (TM) to support
execution on various compute
resources
• Executors chosen by TM to invoke a
given servable’
• Caching at TM
• Data staging with Globus
• Batch submissions
• Scalability through deployment of
model replicas
https://arxiv.org/abs/1811.11213
zmq
Task Manager
Model
Repository
REST
CLI SDK
TF
Serving
DLHub Management
Service Key
Servable
Node
Model
Serving
Parsl
Sage
Maker
Executor Executor Executor
zmq
Task Manager
Ryan Chard Zhuozhao Li
Open Source Opportunities
2018 Argonne Adv. Computing LDRDhttps://www.dlhub.org
https://github.com/DLHub-Argonne
• Deposit models from the community
• Help build client functionality
• Build examples using existing servables
• Be you!
Contact: Ben Blaiszik (blaiszik@uchicago.edu)
Thanks to our sponsors!
U.S. DEPARTMENT OF
ENERGY
ALCF DF
Parsl Globus IMaD
DLHub Argonne
LDRD

Mais conteúdo relacionado

Mais procurados

ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy  ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy Dr. Haxel Consult
 
Imaging Data Commons (IDC) - Introduction and intital approach
Imaging Data Commons (IDC) - Introduction and intital approachImaging Data Commons (IDC) - Introduction and intital approach
Imaging Data Commons (IDC) - Introduction and intital approachimgcommcall
 
Towards Generating Policy-compliant Datasets (poster)
Towards GeneratingPolicy-compliant Datasets (poster)Towards GeneratingPolicy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets (poster)Christophe Debruyne
 
ICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ KarlsruheICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ KarlsruheDr. Haxel Consult
 
DATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGEDATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGENeeraj Goswami
 
PyData - Multi-dimensional, Multi-modal Image Registration
PyData - Multi-dimensional, Multi-modal Image RegistrationPyData - Multi-dimensional, Multi-modal Image Registration
PyData - Multi-dimensional, Multi-modal Image RegistrationMatthew McCormick
 
Data Analytics.01. Data selection and capture
Data Analytics.01. Data selection and captureData Analytics.01. Data selection and capture
Data Analytics.01. Data selection and captureAlex Rayón Jerez
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligencevty
 
Knowledge Discovery & Representation
Knowledge Discovery & RepresentationKnowledge Discovery & Representation
Knowledge Discovery & RepresentationDarshan Patil
 
Semantic annotation
Semantic annotation Semantic annotation
Semantic annotation serge sonfack
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019dgarijo
 
Exploiting the value of Dublin Core through pragmatic development
Exploiting the value of Dublin Core through pragmatic developmentExploiting the value of Dublin Core through pragmatic development
Exploiting the value of Dublin Core through pragmatic developmentPaul Walk
 
Webinar: Data management and the Open Research Data Pilot in Horizon 2020
Webinar: Data management and the Open Research Data Pilot in Horizon 2020Webinar: Data management and the Open Research Data Pilot in Horizon 2020
Webinar: Data management and the Open Research Data Pilot in Horizon 2020OpenAccessBelgium
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)vty
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org sopekmir
 

Mais procurados (18)

ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy  ICIC 2017: Publication Analysis and Publication Strategy
ICIC 2017: Publication Analysis and Publication Strategy
 
KEDL DBpedia 2019
KEDL DBpedia  2019KEDL DBpedia  2019
KEDL DBpedia 2019
 
Imaging Data Commons (IDC) - Introduction and intital approach
Imaging Data Commons (IDC) - Introduction and intital approachImaging Data Commons (IDC) - Introduction and intital approach
Imaging Data Commons (IDC) - Introduction and intital approach
 
Towards Generating Policy-compliant Datasets (poster)
Towards GeneratingPolicy-compliant Datasets (poster)Towards GeneratingPolicy-compliant Datasets (poster)
Towards Generating Policy-compliant Datasets (poster)
 
ICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ KarlsruheICIC 2017: Product presentations FIZ Karlsruhe
ICIC 2017: Product presentations FIZ Karlsruhe
 
HDF5 iRODS
HDF5 iRODSHDF5 iRODS
HDF5 iRODS
 
DATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGEDATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGE
 
PyData - Multi-dimensional, Multi-modal Image Registration
PyData - Multi-dimensional, Multi-modal Image RegistrationPyData - Multi-dimensional, Multi-modal Image Registration
PyData - Multi-dimensional, Multi-modal Image Registration
 
Data Analytics.01. Data selection and capture
Data Analytics.01. Data selection and captureData Analytics.01. Data selection and capture
Data Analytics.01. Data selection and capture
 
Fighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial IntelligenceFighting COVID-19 with Artificial Intelligence
Fighting COVID-19 with Artificial Intelligence
 
Knowledge Discovery & Representation
Knowledge Discovery & RepresentationKnowledge Discovery & Representation
Knowledge Discovery & Representation
 
Semantic annotation
Semantic annotation Semantic annotation
Semantic annotation
 
Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019Towards Human-Guided Machine Learning - IUI 2019
Towards Human-Guided Machine Learning - IUI 2019
 
The HDF Group: Community models and outreach
The HDF Group: Community models and outreachThe HDF Group: Community models and outreach
The HDF Group: Community models and outreach
 
Exploiting the value of Dublin Core through pragmatic development
Exploiting the value of Dublin Core through pragmatic developmentExploiting the value of Dublin Core through pragmatic development
Exploiting the value of Dublin Core through pragmatic development
 
Webinar: Data management and the Open Research Data Pilot in Horizon 2020
Webinar: Data management and the Open Research Data Pilot in Horizon 2020Webinar: Data management and the Open Research Data Pilot in Horizon 2020
Webinar: Data management and the Open Research Data Pilot in Horizon 2020
 
Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)Running Dataverse repository in the European Open Science Cloud (EOSC)
Running Dataverse repository in the European Open Science Cloud (EOSC)
 
Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org Industry Ontologies: Case Studies in Creating and Extending Schema.org
Industry Ontologies: Case Studies in Creating and Extending Schema.org
 

Semelhante a A FAIR Approach to Publishing and Sharing Machine Learning Models

Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Debraj GuhaThakurta
 
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with GraphsNeo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with GraphsNeo4j
 
Publishing and Serving Machine Learning Models with DLHub
Publishing and Serving Machine Learning Models with DLHubPublishing and Serving Machine Learning Models with DLHub
Publishing and Serving Machine Learning Models with DLHubGlobus
 
Large scale computing
Large scale computing Large scale computing
Large scale computing Bhupesh Bansal
 
Neo4j GraphTalk Oslo - Building Intelligent Solutions with Graphs
Neo4j GraphTalk Oslo - Building Intelligent Solutions with GraphsNeo4j GraphTalk Oslo - Building Intelligent Solutions with Graphs
Neo4j GraphTalk Oslo - Building Intelligent Solutions with GraphsNeo4j
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnDatabricks
 
Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...amiraryani
 
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample TrackingBruce Kozuma
 
Introduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdfIntroduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdfJaberRad1
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficePhilip Bourne
 
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j GraphTalk Basel - Building intelligent Software with GraphsNeo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j GraphTalk Basel - Building intelligent Software with GraphsNeo4j
 
Open Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisOpen Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisMarcus Hanwell
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentationTao Feng
 
"Data in Context" IG sessions @ RDA 3rd Plenary
"Data in Context" IG sessions @  RDA 3rd Plenary"Data in Context" IG sessions @  RDA 3rd Plenary
"Data in Context" IG sessions @ RDA 3rd PlenaryBrigitte Jörg
 
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...Brigitte Jörg
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningNeo4j
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015Comsode - FP7 project
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumAnita de Waard
 

Semelhante a A FAIR Approach to Publishing and Sharing Machine Learning Models (20)

Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
 
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with GraphsNeo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
Neo4j GraphTalk Düsseldorf - Building intelligent solutions with Graphs
 
Publishing and Serving Machine Learning Models with DLHub
Publishing and Serving Machine Learning Models with DLHubPublishing and Serving Machine Learning Models with DLHub
Publishing and Serving Machine Learning Models with DLHub
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
Neo4j GraphTalk Oslo - Building Intelligent Solutions with Graphs
Neo4j GraphTalk Oslo - Building Intelligent Solutions with GraphsNeo4j GraphTalk Oslo - Building Intelligent Solutions with Graphs
Neo4j GraphTalk Oslo - Building Intelligent Solutions with Graphs
 
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-LearnApache® Spark™ MLlib: From Quick Start to Scikit-Learn
Apache® Spark™ MLlib: From Quick Start to Scikit-Learn
 
Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...Data Description Registry Interoperability WG at Research Data Alliance Third...
Data Description Registry Interoperability WG at Research Data Alliance Third...
 
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
2019-04-17 Bio-IT World G Suite-Jira Cloud Sample Tracking
 
Introduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdfIntroduction_to_knowledge_graph.pdf
Introduction_to_knowledge_graph.pdf
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) Office
 
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j GraphTalk Basel - Building intelligent Software with GraphsNeo4j GraphTalk Basel - Building intelligent Software with Graphs
Neo4j GraphTalk Basel - Building intelligent Software with Graphs
 
Open Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & AnalysisOpen Chemistry: Input Preparation, Data Visualization & Analysis
Open Chemistry: Input Preparation, Data Visualization & Analysis
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
"Data in Context" IG sessions @ RDA 3rd Plenary
"Data in Context" IG sessions @  RDA 3rd Plenary"Data in Context" IG sessions @  RDA 3rd Plenary
"Data in Context" IG sessions @ RDA 3rd Plenary
 
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
Data in Context Interest Group Sessions @ RDA 3rd Plenary, Dublin (March 26-2...
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine Learning
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015
 
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne UlitmatumElsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
Elsevier‘s RDM Program: Habits of Effective Data and the Bourne Ulitmatum
 

Último

Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxNandakishor Bhaurao Deshmukh
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Tamer Koksalan, PhD
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXDole Philippines School
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx023NiWayanAnggiSriWa
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologycaarthichand2003
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxkumarsanjai28051
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxEran Akiva Sinbar
 

Último (20)

Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptxTHE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)Carbon Dioxide Capture and Storage (CSS)
Carbon Dioxide Capture and Storage (CSS)
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTXALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
ALL ABOUT MIXTURES IN GRADE 7 CLASS PPTX
 
Bioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptxBioteknologi kelas 10 kumer smapsa .pptx
Bioteknologi kelas 10 kumer smapsa .pptx
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Davis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technologyDavis plaque method.pptx recombinant DNA technology
Davis plaque method.pptx recombinant DNA technology
 
Forensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptxForensic limnology of diatoms by Sanjai.pptx
Forensic limnology of diatoms by Sanjai.pptx
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
The dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptxThe dark energy paradox leads to a new structure of spacetime.pptx
The dark energy paradox leads to a new structure of spacetime.pptx
 

A FAIR Approach to Publishing and Sharing Machine Learning Models

  • 1. Funding: 2018 Argonne Advanced Computing LDRD Collaborators: Ryan Chard, Logan Ward, Marcus Schwarting, Kyle Chard, Zhuozhao Li, Anna Woodard, Yadu Babuji, Steve Tuecke, Mike Franklin, Ian Foster Blue – also presenting at this workshop Data and Learning Hub for Science https://www.dlhub.org A FAIR Approach to Publishing and Sharing Machine Learning Models Ben Blaiszik (blaiszik@uchicago.edu)
  • 2. Quick Polls • How many of you have trained a machine learning model? • How many of you have published papers using machine learning? • How many of you have tried to reuse models from others?
  • 3. State of Machine Learning in Science Highs • Rapid increase in number of journal publications • Advances across the scientific domains • Achievements on par with experts or best-in-class methods in many domains • Funding agencies are coalescing around ML (AI Initiative etc.) Chart Source and Method: https://github.com/blaiszik/ml_publication_charts
  • 4. State of Machine Learning in Science For a given model: • Where is the code? • Where are the trained models? • Where is the training data? • How can I reproduce these results? Without all of these pieces, progress is drastically slowed Location of many ML models after a paper is finished Github is another location… Lows
  • 5. FAIR Data Principles • Findable • Accessible • Interoperable • Reusable https://www.force11.org/group/fairgroup/fairprinciples Set of principles to help make data as useful as possible to the community
  • 6. FAIR Data Principles Findable • Data have an identifier • Data are registered in a searchable resource Accesible • Data accessible via identifier • Data retrievable by open protocols
  • 7. FAIR Data Principles Interoperable • Data leverage formalized shared vocabularies • Vocabularies themselves follow FAIR principles Reusable • Clear licensing • Descriptive metadata is sufficient to promote reuse
  • 8. What Would FAIR Look Like in ML? (1) Find Interesting Science Paper • Links to code repository (Github/DOI) • Links to data repository (DOI) • Publication describes the model and its uses and limitations
  • 9. What Would FAIR Look Like in ML? (2) Find Code • Has unique identifier (DOI) • Links back to publication (DOI) • Has well-documented code • Tagged with metadata to aid discovery • Registered in a search index • Open license
  • 10. What Would FAIR Look Like in ML? (3) Find and Run Model • Model has identifier (DOI) • Model has links to data (DOI) • Model has links to the code (DOI/Github) • Model has links to publication (DOI) • Data are accessible • Inference run from the cloud - no installation necessary!
  • 11. 11 • Collect, publish, categorize models and pre/post processing code • Operate models as a service to simplify sharing, consumption, and access • Identify models with unique and persistent identifiers (e.g., DOI) • Implement versioning, search, access controls etc. Goal: Deliver FAIR for ML 2018 Argonne Adv. Computing LDRD DATA AND LEARNING HUB FOR SCIENCE (DLHUB)
  • 12. DLHub: Key Concepts Run() • Servables are containers with defined inputs and outputs • Servables may represent machine learning models or other data transformations • Outputs can be cached for inputs
  • 13. DLHub: Key Concepts • Servables are containers with defined inputs and outputs • Servables may represent machine learning models or other data transformations • Outputs can be cached for inputs Preprocess 1 Run() Preprocess 2 Run() Model predict Run()
  • 14. Example: Predicting Formation Enthalpy This is what a user has This is what a user wants
  • 15. Example: Predicting Formation Enthalpy This is what a user has This is what a user wants
  • 16. PUBLISHING A MACHINE LEARNING MODEL 16
  • 17. Marking up a Model – Python SDK Existing Model User Mark Up with SDK Send to DLHub (via Globus or HTTPS) DLHub Containerization Populate Search Index / Mint Identifiers SDK Extracts Metadata for Known Model Types
  • 18. Python SDK – Automated Metadata Generation Citation Metadata Following Datacite DLHub Metadata Servable Metadata Access Control • Public • Globus users • Globus groups
  • 19. Using DLHub is Easy! 19 2018 Argonne Adv. Computing LDRD Python SDK $ pip install dlhub_sdk 1 2 Describe Publish • Publish to DLHub • DLHub service creates containers • DLHub service creates unique endpoint for servable • Specify the model files • Mark up the model with information to make it discoverable and usable
  • 20. Using DLHub is Easy! 20 2018 Argonne Adv. Computing LDRD 4 Run • Make predictions by sending data to DLHub and specifying the servable to use 3 Discover • Discover servables with advanced search capabilities through Python SDK or web UI (under construction)
  • 22. Combining DLHub with Data Repositories Get Data Run Model 2018 Argonne Adv. Computing LDRD 22 • Using high-throughput optical imaging to predict material bandgap
  • 23. Get Data Run Model Combining DLHub with Data Repositories 23 2018 Argonne Adv. Computing LDRD
  • 24. Model-in-the-Loop Science Select DLHub Use Cases Funding: 2018 Argonne Adv. Computing LDRD • Crystal structure • NIST PFHub • Models linked to dynamic data sources Community Model Benchmarking Automated Model Retraining with New Data • Metallic glass discovery [active learning] • XRD applications XRD image tagging (Yager, BNL) (Ward, ANL/UC) (Ward, ANL/UC) (Wheeler, Warren, Heinonen NIST/UC/Argonne/NU) (Center for Hierarchical Materials Design NIST/UC/Argonne/NU) CH MaD XRD intensity à structure/phase (Cherukara Argonne)
  • 25. More Examples Available In Our Repositories 25 2018 Argonne Adv. Computing LDRD Cherukara et al. Energy Storage Tomography X-Ray Science Ward et al. TomoGAN Liu et al.
  • 26. DLHub Architecture and Performance • Task Managers (TM) to support execution on various compute resources • Executors chosen by TM to invoke a given servable’ • Caching at TM • Data staging with Globus • Batch submissions • Scalability through deployment of model replicas https://arxiv.org/abs/1811.11213 zmq Task Manager Model Repository REST CLI SDK TF Serving DLHub Management Service Key Servable Node Model Serving Parsl Sage Maker Executor Executor Executor zmq Task Manager Ryan Chard Zhuozhao Li
  • 27. Open Source Opportunities 2018 Argonne Adv. Computing LDRDhttps://www.dlhub.org https://github.com/DLHub-Argonne • Deposit models from the community • Help build client functionality • Build examples using existing servables • Be you! Contact: Ben Blaiszik (blaiszik@uchicago.edu)
  • 28. Thanks to our sponsors! U.S. DEPARTMENT OF ENERGY ALCF DF Parsl Globus IMaD DLHub Argonne LDRD