SlideShare uma empresa Scribd logo
1 de 37
Baixar para ler offline
Open Source Tools for Materials Informatics
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
MRS Fall Meeting 2019
Slides (already) posted to hackingmaterials.lbl.gov
Staffing interdisciplinary research
Machine learningMaterials Science
I find a recurring dilemma and asymmetry in
staffing materials informatics research
Materials Informatics
3
Who has a tougher job to get started?
MS&E major CS major
• Already has background in the
material science aspects of the
project
• But needs to learn the
machine learning and
software engineering aspects
• Already has background in
software engineering and
appropriate machine learning
• But needs to learn the
materials science aspects
4
MS&E major CS major
My experience is that the
CS major typically has the
tougher road ahead of
them
Who has a tougher job to get started?
5
MS&E major CS major
My experience is that the
CS major typically has the
tougher road ahead of
them
Who has a tougher job to get started?
easier to pick up / self-learn
random forests & neural networks
than
phase diagrams & crystal structures
6
There is an asymmetry in resources available
MS&E major CS major
• Hands-on code and examples to
run and modify
• Hundreds of Youtube videos
and online courses
• Code reviews from collaborators
• And the standard books, etc.
• Books and research articles
• Conversations with colleagues,
impromptu lectures
• Practice problems? Worked
examples? Interactive code?
Outline
7
①Matminer: data and descriptors for
producing ML structure-property
relationships
② Matscholar – applying natural language
processing to materials science information
retrieval
8
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How can we quickly
represent chemistry and
structure as vectors?
How do we get
labeled training
/test data?
How do we know
if our ML model is
extraordinary?
9
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How can we quickly
represent chemistry and
structure as vectors?
>60 featurizer classes can
generate thousands of potential
descriptors that are described in
the literature
10
Matminer contains a library of descriptors for various
materials science entities
feat = EwaldEnergy([options])
y = feat.featurize([input_data])
• compatible with scikit-
learn pipelining
• automatically deploy
multiprocessing to
parallelize over data
• include citations to
methodology papers
11
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How do we get
labeled training
/test data?
• Typically, a lot of attention is given to advanced
algorithms for machine learning
– e.g., deep neural networks versus standard ML
• But perhaps there is not enough emphasis on
developing the appropriate data sets
– with enough information to train ML algorithms
– with sufficient data quality
– easy enough for anyone to at least get started without
specialized knowledge
12
What about data?
The importance of data
13
https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-
research-and-possibly-the-world/
14
What is ImageNet?
The ImageNet data
set collected and
hand-labeled (e.g.,
via Amazon
Mechanical Turk).
The latest version
has over 14 million
hand-annotated
images, organized
into ~20,000
categories
How data stimulates new algorithms
15
How data stimulates new algorithms
16
How can we create an
ImageNet for materials
science?
• We want a test set that contains a diverse array
of problems
– Smaller data versus larger data
– Different applications (electronic, mechanical, etc.)
– Composition-only or structure information available
– Classification or regression
• We also want a cross-validation metric that gives
reliable error estimates
– i.e., less dependent on specific choice of splits
17
An “ImageNet” for materials science
18
Overview of Matbench test set
Target Property Data Source Samples Method
Bulk Modulus Materials Project 10,987 DFT-GGA
Shear Modulus Materials Project 10,987 DFT-GGA
Band Gap Materials Project 106,113 DFT-GGA
Metallicity Materials Project 106,113 DFT-GGA
Band Gap Zhuo et al. [1] 6,354 Experiment
Metallicity Zhuo et al. [1] 6,354 Experiment
Bulk Metallic Glass formation Landolt -Bornstein 7,190 Experiment
Refractive index Materials Project 4,764 DFPT-GGA
Formation Energy Materials Project 132,752 DFT-GGA
Perovskite Formation Energy Castelli et al [2] 18,928 DFT-GGA
Freq. at Last Phonon PhDOS Peak Materials Project 1,296 DFPT-GGA
Exfoliation Energy JARVIS-2D 636 DFT-vDW-DF
Steel yield strength Citrine Informatics 312 Experiment
1. doi.org/10.1021/acs.jpclett.8b00124 2. doi.org/10.1039/C2EE22341D
<1K
1K-10K10K-100K
>100K
19
Diversity of benchmark suite
mechanical
electronic
stability
optical
thermal
classification
regression
experiment
(composition
only)
DFT
(structure)
application data size
problem
type
data type
20
How can we make it easy to develop and test ML models for
composition-structure-property relationships?
How do we know
if our ML model is
extraordinary?
21
How about a benchmark algorithm?
Automatminer is a ”black box” machine learning model
Give it any data set with either composition or structure inputs, and
automatminer will train an ML model (no researcher intervention)
22
Automatminer develops an ML model automatically given
raw data (structures or compositions plus output properties)
Featurizer
MagPie
SOAP
Sine Coulomb Matrix
+ many, many more
• Dropping
features with
many errors
• Missing value
imputation
• One-hot
encoding
• PCA-based
• Correlation
• Model-
based (tree)
Uses genetic
algorithms to find
the best machine
learning model +
hyperparameters
23
Can actually do apple—to-apples competition between
algorithms
24
If we can get a well-established “benchmark”, perhaps
interdisciplinary teams can start hammering on accuracy
Today
5years
10years
A lower barrier to entry
in the field means more
ideas can be tested from
more researchers
Matbenchtestset
averageerror
25
Matminer, matbench, and automatminer can all be
accessed, used, and modified by anyone
Code / Examples all on Github
• github.com/hackingmaterials/matminer
• github.com/hackingmaterials/matminer_examples
• github.com/hackingmaterials/automatminer
Matbench data on Figshare
• (coming soon, still finalizing)
Free support via Discourse
• https://discuss.matsci.org
Outline
26
① Matminer: data and descriptors for producing
ML structure-property relationships
②Matscholar – applying natural language
processing to materials science information
retrieval
We have extracted ~2
million abstracts of
relevant scientific
articles
We use natural
language processing
algorithms to try to
extract knowledge from
all this data
27
Goal: collect and organize knowledge embedded in the
materials science literature
28
We’ve developed algorithms to automatically tag keywords
in the abstracts
29
Application: a revised materials search engine
Auto-generated summaries of materials based on text mining
30
Application: materials compositions of interest …
A search for thermoelectrics that do not have Pb or Bi
• How do we get more people
benefitting from this work
and involved in improving it?
• One solution - expose an
easy-to-use web frontend,
with links to all the backend
codes in case people want to
dive further
– New tools like Plotly Dash
make this easier than ever
31
Using a web site as a “gateway” into the algorithms
frontend
backend
32
https://www.matscholar.com – demo 1
33
https://www.matscholar.com – demo 2
34
Matscholar MRS!
https://matscholar-mrs.herokuapp.com
35
Hopefully these frontend demos get you interested enough
to check the “About page”
• We need more resources to help computer
scientists learn about materials science topics
through hands-on examples and interactive demos
• Some things that can help:
– Open-source implementations of materials science
methods
– Interactive examples (e.g., Jupyter)
– Documentation and support(!)
– Labeled data sets
– Front-ends for easy exploration
36
Concluding thoughts
37
Funding acknowledgements
Slides (already) posted to hackingmaterials.lbl.gov
• Matminer
– U.S. Department of Energy, Materials Science Division
• Matscholar
– Toyota Research Institutes

Mais conteúdo relacionado

Mais procurados

Machine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methodsMachine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methodsAnubhav Jain
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Anubhav Jain
 
Inventory management and its effects on working capital”
Inventory management and its effects on working capital”Inventory management and its effects on working capital”
Inventory management and its effects on working capital”Babasab Patil
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
 
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...aimsnist
 
Materials Informatics Overview
Materials Informatics OverviewMaterials Informatics Overview
Materials Informatics OverviewTony Fast
 
Computational methods applied to materials modeling
Computational methods applied to materials modelingComputational methods applied to materials modeling
Computational methods applied to materials modelingcippo1987Ita
 
Graphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials ScienceGraphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials Scienceaimsnist
 
Global mlcc ceramic powder market report 2017
Global mlcc ceramic powder market report 2017Global mlcc ceramic powder market report 2017
Global mlcc ceramic powder market report 2017威 李
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryIan Foster
 
High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...Anubhav Jain
 
Graphene and its applications in electrical devices(batteries ,
Graphene  and its applications in electrical devices(batteries ,Graphene  and its applications in electrical devices(batteries ,
Graphene and its applications in electrical devices(batteries ,mohdabuzar5
 
Cornell Computational Chemistry Seminar
Cornell Computational Chemistry SeminarCornell Computational Chemistry Seminar
Cornell Computational Chemistry SeminarGeorge Fitzgerald
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsAnubhav Jain
 
Topological flat bands without magic angles in massive twisted bilayer graphe...
Topological flat bands without magic angles in massive twisted bilayer graphe...Topological flat bands without magic angles in massive twisted bilayer graphe...
Topological flat bands without magic angles in massive twisted bilayer graphe...JAVVAJI SRIVANI
 
HYBRID COMPOSITES- A CONCEPT OF ECOLOGICAL, BIO INSPIRED AND SYNERGISTIC STRE...
HYBRID COMPOSITES- A CONCEPT OF ECOLOGICAL, BIO INSPIRED AND SYNERGISTIC STRE...HYBRID COMPOSITES- A CONCEPT OF ECOLOGICAL, BIO INSPIRED AND SYNERGISTIC STRE...
HYBRID COMPOSITES- A CONCEPT OF ECOLOGICAL, BIO INSPIRED AND SYNERGISTIC STRE...IAEME Publication
 

Mais procurados (20)

Machine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methodsMachine learning for materials design: opportunities, challenges, and methods
Machine learning for materials design: opportunities, challenges, and methods
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Inventory management and its effects on working capital”
Inventory management and its effects on working capital”Inventory management and its effects on working capital”
Inventory management and its effects on working capital”
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...
 
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
“Materials Informatics and Big Data: Realization of 4th Paradigm of Science i...
 
Materials Informatics Overview
Materials Informatics OverviewMaterials Informatics Overview
Materials Informatics Overview
 
Computational methods applied to materials modeling
Computational methods applied to materials modelingComputational methods applied to materials modeling
Computational methods applied to materials modeling
 
Graphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials ScienceGraphs, Environments, and Machine Learning for Materials Science
Graphs, Environments, and Machine Learning for Materials Science
 
Global mlcc ceramic powder market report 2017
Global mlcc ceramic powder market report 2017Global mlcc ceramic powder market report 2017
Global mlcc ceramic powder market report 2017
 
AI at Scale for Materials and Chemistry
AI at Scale for Materials and ChemistryAI at Scale for Materials and Chemistry
AI at Scale for Materials and Chemistry
 
High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...
 
A REVIEW ON FERRITES – ITS CLASSIFICATION AND APPLICATIONS
A REVIEW ON FERRITES – ITS CLASSIFICATION AND APPLICATIONSA REVIEW ON FERRITES – ITS CLASSIFICATION AND APPLICATIONS
A REVIEW ON FERRITES – ITS CLASSIFICATION AND APPLICATIONS
 
Graphene and its applications in electrical devices(batteries ,
Graphene  and its applications in electrical devices(batteries ,Graphene  and its applications in electrical devices(batteries ,
Graphene and its applications in electrical devices(batteries ,
 
M.Sc. Thesis Presentation
M.Sc. Thesis PresentationM.Sc. Thesis Presentation
M.Sc. Thesis Presentation
 
Cornell Computational Chemistry Seminar
Cornell Computational Chemistry SeminarCornell Computational Chemistry Seminar
Cornell Computational Chemistry Seminar
 
Abhishek presentation
Abhishek presentationAbhishek presentation
Abhishek presentation
 
Bhushan steel limited
Bhushan steel limitedBhushan steel limited
Bhushan steel limited
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methods
 
Topological flat bands without magic angles in massive twisted bilayer graphe...
Topological flat bands without magic angles in massive twisted bilayer graphe...Topological flat bands without magic angles in massive twisted bilayer graphe...
Topological flat bands without magic angles in massive twisted bilayer graphe...
 
HYBRID COMPOSITES- A CONCEPT OF ECOLOGICAL, BIO INSPIRED AND SYNERGISTIC STRE...
HYBRID COMPOSITES- A CONCEPT OF ECOLOGICAL, BIO INSPIRED AND SYNERGISTIC STRE...HYBRID COMPOSITES- A CONCEPT OF ECOLOGICAL, BIO INSPIRED AND SYNERGISTIC STRE...
HYBRID COMPOSITES- A CONCEPT OF ECOLOGICAL, BIO INSPIRED AND SYNERGISTIC STRE...
 

Semelhante a Open Source Tools for Materials Informatics

Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Anubhav Jain
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAnubhav Jain
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Anubhav Jain
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesJason Hattrick-Simpers
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Anubhav Jain
 
Physics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningKAMAL CHOUDHARY
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...IEEEFINALYEARSTUDENTPROJECT
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...IEEEMEMTECHSTUDENTSPROJECTS
 
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...IEEEFINALYEARSTUDENTPROJECTS
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructureAnubhav Jain
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Anubhav Jain
 
2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML modelaimsnist
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...Anubhav Jain
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsAnubhav Jain
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data ExtractionDasha Herrmannova
 
Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsJason Hattrick-Simpers
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applicationsaimsnist
 

Semelhante a Open Source Tools for Materials Informatics (20)

Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...Evaluating Machine Learning Algorithms for Materials Science using the Matben...
Evaluating Machine Learning Algorithms for Materials Science using the Matben...
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design Problems
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Hattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop SlidesHattrick Simpers TMS Machine Learning Workshop Slides
Hattrick Simpers TMS Machine Learning Workshop Slides
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
Physics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learningPhysics inspired artificial intelligence/machine learning
Physics inspired artificial intelligence/machine learning
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
 
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
2014 IEEE JAVA DATA MINING PROJECT Mining weakly labeled web facial images fo...
 
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
IEEE 2014 JAVA DATA MINING PROJECTS Mining weakly labeled web facial images f...
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
 
2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model2D/3D Materials screening and genetic algorithm with ML model
2D/3D Materials screening and genetic algorithm with ML model
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
 
Hattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in MaterialsHattrick-Simpers MRS Webinar on AI in Materials
Hattrick-Simpers MRS Webinar on AI in Materials
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 

Mais de Anubhav Jain

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Anubhav Jain
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignAnubhav Jain
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAnubhav Jain
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software disseminationAnubhav Jain
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software disseminationAnubhav Jain
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Anubhav Jain
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Anubhav Jain
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst DesignAnubhav Jain
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Anubhav Jain
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAnubhav Jain
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …Anubhav Jain
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials ProjectAnubhav Jain
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Anubhav Jain
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectAnubhav Jain
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...Anubhav Jain
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignAnubhav Jain
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignAnubhav Jain
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAnubhav Jain
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Anubhav Jain
 
Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectAnubhav Jain
 

Mais de Anubhav Jain (20)

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst Design
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine Learning
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials Project
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 
Assessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data AnalysisAssessing Factors Underpinning PV Degradation through Data Analysis
Assessing Factors Underpinning PV Degradation through Data Analysis
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...
 
Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials Project
 

Último

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 

Último (20)

Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 

Open Source Tools for Materials Informatics

  • 1. Open Source Tools for Materials Informatics Anubhav Jain Energy Technologies Area Lawrence Berkeley National Laboratory Berkeley, CA MRS Fall Meeting 2019 Slides (already) posted to hackingmaterials.lbl.gov
  • 2. Staffing interdisciplinary research Machine learningMaterials Science I find a recurring dilemma and asymmetry in staffing materials informatics research Materials Informatics
  • 3. 3 Who has a tougher job to get started? MS&E major CS major • Already has background in the material science aspects of the project • But needs to learn the machine learning and software engineering aspects • Already has background in software engineering and appropriate machine learning • But needs to learn the materials science aspects
  • 4. 4 MS&E major CS major My experience is that the CS major typically has the tougher road ahead of them Who has a tougher job to get started?
  • 5. 5 MS&E major CS major My experience is that the CS major typically has the tougher road ahead of them Who has a tougher job to get started? easier to pick up / self-learn random forests & neural networks than phase diagrams & crystal structures
  • 6. 6 There is an asymmetry in resources available MS&E major CS major • Hands-on code and examples to run and modify • Hundreds of Youtube videos and online courses • Code reviews from collaborators • And the standard books, etc. • Books and research articles • Conversations with colleagues, impromptu lectures • Practice problems? Worked examples? Interactive code?
  • 7. Outline 7 ①Matminer: data and descriptors for producing ML structure-property relationships ② Matscholar – applying natural language processing to materials science information retrieval
  • 8. 8 How can we make it easy to develop and test ML models for composition-structure-property relationships? How can we quickly represent chemistry and structure as vectors? How do we get labeled training /test data? How do we know if our ML model is extraordinary?
  • 9. 9 How can we make it easy to develop and test ML models for composition-structure-property relationships? How can we quickly represent chemistry and structure as vectors?
  • 10. >60 featurizer classes can generate thousands of potential descriptors that are described in the literature 10 Matminer contains a library of descriptors for various materials science entities feat = EwaldEnergy([options]) y = feat.featurize([input_data]) • compatible with scikit- learn pipelining • automatically deploy multiprocessing to parallelize over data • include citations to methodology papers
  • 11. 11 How can we make it easy to develop and test ML models for composition-structure-property relationships? How do we get labeled training /test data?
  • 12. • Typically, a lot of attention is given to advanced algorithms for machine learning – e.g., deep neural networks versus standard ML • But perhaps there is not enough emphasis on developing the appropriate data sets – with enough information to train ML algorithms – with sufficient data quality – easy enough for anyone to at least get started without specialized knowledge 12 What about data?
  • 13. The importance of data 13 https://qz.com/1034972/the-data-that-changed-the-direction-of-ai- research-and-possibly-the-world/
  • 14. 14 What is ImageNet? The ImageNet data set collected and hand-labeled (e.g., via Amazon Mechanical Turk). The latest version has over 14 million hand-annotated images, organized into ~20,000 categories
  • 15. How data stimulates new algorithms 15
  • 16. How data stimulates new algorithms 16 How can we create an ImageNet for materials science?
  • 17. • We want a test set that contains a diverse array of problems – Smaller data versus larger data – Different applications (electronic, mechanical, etc.) – Composition-only or structure information available – Classification or regression • We also want a cross-validation metric that gives reliable error estimates – i.e., less dependent on specific choice of splits 17 An “ImageNet” for materials science
  • 18. 18 Overview of Matbench test set Target Property Data Source Samples Method Bulk Modulus Materials Project 10,987 DFT-GGA Shear Modulus Materials Project 10,987 DFT-GGA Band Gap Materials Project 106,113 DFT-GGA Metallicity Materials Project 106,113 DFT-GGA Band Gap Zhuo et al. [1] 6,354 Experiment Metallicity Zhuo et al. [1] 6,354 Experiment Bulk Metallic Glass formation Landolt -Bornstein 7,190 Experiment Refractive index Materials Project 4,764 DFPT-GGA Formation Energy Materials Project 132,752 DFT-GGA Perovskite Formation Energy Castelli et al [2] 18,928 DFT-GGA Freq. at Last Phonon PhDOS Peak Materials Project 1,296 DFPT-GGA Exfoliation Energy JARVIS-2D 636 DFT-vDW-DF Steel yield strength Citrine Informatics 312 Experiment 1. doi.org/10.1021/acs.jpclett.8b00124 2. doi.org/10.1039/C2EE22341D
  • 19. <1K 1K-10K10K-100K >100K 19 Diversity of benchmark suite mechanical electronic stability optical thermal classification regression experiment (composition only) DFT (structure) application data size problem type data type
  • 20. 20 How can we make it easy to develop and test ML models for composition-structure-property relationships? How do we know if our ML model is extraordinary?
  • 21. 21 How about a benchmark algorithm? Automatminer is a ”black box” machine learning model Give it any data set with either composition or structure inputs, and automatminer will train an ML model (no researcher intervention)
  • 22. 22 Automatminer develops an ML model automatically given raw data (structures or compositions plus output properties) Featurizer MagPie SOAP Sine Coulomb Matrix + many, many more • Dropping features with many errors • Missing value imputation • One-hot encoding • PCA-based • Correlation • Model- based (tree) Uses genetic algorithms to find the best machine learning model + hyperparameters
  • 23. 23 Can actually do apple—to-apples competition between algorithms
  • 24. 24 If we can get a well-established “benchmark”, perhaps interdisciplinary teams can start hammering on accuracy Today 5years 10years A lower barrier to entry in the field means more ideas can be tested from more researchers Matbenchtestset averageerror
  • 25. 25 Matminer, matbench, and automatminer can all be accessed, used, and modified by anyone Code / Examples all on Github • github.com/hackingmaterials/matminer • github.com/hackingmaterials/matminer_examples • github.com/hackingmaterials/automatminer Matbench data on Figshare • (coming soon, still finalizing) Free support via Discourse • https://discuss.matsci.org
  • 26. Outline 26 ① Matminer: data and descriptors for producing ML structure-property relationships ②Matscholar – applying natural language processing to materials science information retrieval
  • 27. We have extracted ~2 million abstracts of relevant scientific articles We use natural language processing algorithms to try to extract knowledge from all this data 27 Goal: collect and organize knowledge embedded in the materials science literature
  • 28. 28 We’ve developed algorithms to automatically tag keywords in the abstracts
  • 29. 29 Application: a revised materials search engine Auto-generated summaries of materials based on text mining
  • 30. 30 Application: materials compositions of interest … A search for thermoelectrics that do not have Pb or Bi
  • 31. • How do we get more people benefitting from this work and involved in improving it? • One solution - expose an easy-to-use web frontend, with links to all the backend codes in case people want to dive further – New tools like Plotly Dash make this easier than ever 31 Using a web site as a “gateway” into the algorithms frontend backend
  • 35. 35 Hopefully these frontend demos get you interested enough to check the “About page”
  • 36. • We need more resources to help computer scientists learn about materials science topics through hands-on examples and interactive demos • Some things that can help: – Open-source implementations of materials science methods – Interactive examples (e.g., Jupyter) – Documentation and support(!) – Labeled data sets – Front-ends for easy exploration 36 Concluding thoughts
  • 37. 37 Funding acknowledgements Slides (already) posted to hackingmaterials.lbl.gov • Matminer – U.S. Department of Energy, Materials Science Division • Matscholar – Toyota Research Institutes