O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.
Software tools, crystal descriptors, and
machine learning applied to materials design
Anubhav Jain
Energy Technologies Are...
2
This talk is centered around open-source software that you
can use to accelerate your own materials design efforts
High-...
3
Outline
High-throughput
computing and
simulations
Machine learning
Interpretable
crystal structure
representations
We know that high-throughput DFT is useful for generating
large data sets, e.g., for materials screening
4
M. De Jong, W. ...
We know that high-throughput DFT is useful for generating
large data sets, e.g., for materials screening
5
M. De Jong, W. ...
A “black-box” view of performing a calculation
6
“something”
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaA...
Unfortunately, the inside of the “black box”
is usually tedious and “low-level”
7
lots of tedious,
low-level work…
Results...
What would be a better way?
8
“something”
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
What would be a better way?
9
Results!
researcher
What is the
GGA-PBE elastic
tensor of GaAs?
Workflows to run
q band stru...
Ideally the method should scale to millions of calculations
10
Results!
researcher
Start with all binary
oxides, replace O...
Atomate tries make it easy, automatic, and flexible to
generate data with existing simulation packages
11
Results!
researc...
Each simulation procedure translates high-level instructions
into a series of low-level tasks
12
quickly and automatically...
Atomate contains a library of simulation procedures
13
VASP-based
• band structure
• spin-orbit coupling
• hybrid function...
14
Full operation diagram
job 1
job 2
job 3 job 4
structure workflow database of
all workflows
automatically submit + exec...
15
A web-based interface is in progress to give atomate users a
“personal Materials Project” of their own calculations
Atomate now powers the Materials Project
• Online resource of density
functional theory simulation data
for ~85,000 inorga...
17
Getting started with atomate
Mathew, K. et al. Atomate: A high-
level interface to generate, execute,
and analyze compu...
18
Outline
High-throughput
computing and
simulations
Machine learning
Interpretable
crystal structure
representations
• With atomate/FireWorks,
the user must decide which
calculations to perform
– E.g., which materials to
calculate
• Rocket...
20
Given a search domain, Rocketsled uses an optimization
engine to select calculations and submit to supercomputers
Optim...
21
Results of using optimization can be dramatic!
In the problem of finding materials with
high K and high G for superhard...
22
Results of using optimization can be dramatic!
In the problem of finding materials with
high K and high G for superhard...
23
Getting started with rocketsled
Dunn, A.R., et al. Rocketsled: a
software library for optimizing
high-throughput comput...
24
Outline
High-throughput
computing and
simulations
Machine learning
Interpretable
crystal structure
representations
25
What is needed to do machine learning on materials?
How can we represent
chemistry and structure as
vectors?
How do we ...
Matminer connects materials data with data mining
algorithms and data visualization libraries
26
Ward, L. et al. Matminer:...
>60 featurizer classes can
generate thousands of potential
descriptors that are described in
the literature
27
Matminer co...
28
Interactive Jupyter notebooks demonstrate use cases
https://github.com/hackingmaterials/matminer_examples
Many examples...
29
Getting started with matminer
Ward et al. Matminer : An open
source toolkit for materials data
mining. Computational Ma...
30
Outline
High-throughput
computing and
simulations
Machine learning
Interpretable
crystal structure
representations
31
Typically several steps of machine learning are performed by
a human researcher – can these be automated?
Descriptors d...
32
Automatminer develops an ML model automatically given
raw data (structures or compositions plus output properties)
Feat...
33
We are benchmarking automatminer vs current state of the
art against 11 problems intended to be a standard test set
Dat...
34
Usually, automatminer does very well
Usually, automatminer outperforms both
state-of-the-art graph based models AND
hum...
35
Graph-based approaches work better in some problems
Hypothesis – automatminer
approaches are better for smaller
data se...
36
Getting started with automatminer
Paper Docs Support
hackingmaterials.github.io
/automatminer
https://groups.google.com...
37
Outline
High-throughput
computing and
simulations
Machine learning
Interpretable
crystal structure
representations
38
Robocrystallographer is a computer program that can
analyze crystal structures and describe them as text
39
Example of fully automated robocrystallographer output
40
Example of fully automated robocrystallographer output
GaAs is zincblende structured and crystallizes in
the cubic F4 ̅...
41
Example of fully automated robocrystallographer output
42
Example of fully automated robocrystallographer output
BiOCuSe is parent of FeAs superconductors
structured and crystal...
43
Robocrystallographer is integrated into the Materials Project
Click the
robot icon for
Robocrys
Click the
speaker icon
...
44
Getting started with robocrystallographer
Submitted - waiting
for referee report!!
Paper Docs Support
hackingmaterials....
45
Conclusion: hopefully you’ve found something interesting
or useful for your own work!
High-throughput
computing and
sim...
• Lead developers:
– Atomate: Kiran Mathew
– Rocketsled: Alex Dunn
– Matminer: Logan Ward
– Automatminer: Alex Dunn
– Robo...
Próximos SlideShares
Carregando em…5
×

de

Software tools, crystal descriptors, and machine learning applied to materials design Slide 1 Software tools, crystal descriptors, and machine learning applied to materials design Slide 2 Software tools, crystal descriptors, and machine learning applied to materials design Slide 3 Software tools, crystal descriptors, and machine learning applied to materials design Slide 4 Software tools, crystal descriptors, and machine learning applied to materials design Slide 5 Software tools, crystal descriptors, and machine learning applied to materials design Slide 6 Software tools, crystal descriptors, and machine learning applied to materials design Slide 7 Software tools, crystal descriptors, and machine learning applied to materials design Slide 8 Software tools, crystal descriptors, and machine learning applied to materials design Slide 9 Software tools, crystal descriptors, and machine learning applied to materials design Slide 10 Software tools, crystal descriptors, and machine learning applied to materials design Slide 11 Software tools, crystal descriptors, and machine learning applied to materials design Slide 12 Software tools, crystal descriptors, and machine learning applied to materials design Slide 13 Software tools, crystal descriptors, and machine learning applied to materials design Slide 14 Software tools, crystal descriptors, and machine learning applied to materials design Slide 15 Software tools, crystal descriptors, and machine learning applied to materials design Slide 16 Software tools, crystal descriptors, and machine learning applied to materials design Slide 17 Software tools, crystal descriptors, and machine learning applied to materials design Slide 18 Software tools, crystal descriptors, and machine learning applied to materials design Slide 19 Software tools, crystal descriptors, and machine learning applied to materials design Slide 20 Software tools, crystal descriptors, and machine learning applied to materials design Slide 21 Software tools, crystal descriptors, and machine learning applied to materials design Slide 22 Software tools, crystal descriptors, and machine learning applied to materials design Slide 23 Software tools, crystal descriptors, and machine learning applied to materials design Slide 24 Software tools, crystal descriptors, and machine learning applied to materials design Slide 25 Software tools, crystal descriptors, and machine learning applied to materials design Slide 26 Software tools, crystal descriptors, and machine learning applied to materials design Slide 27 Software tools, crystal descriptors, and machine learning applied to materials design Slide 28 Software tools, crystal descriptors, and machine learning applied to materials design Slide 29 Software tools, crystal descriptors, and machine learning applied to materials design Slide 30 Software tools, crystal descriptors, and machine learning applied to materials design Slide 31 Software tools, crystal descriptors, and machine learning applied to materials design Slide 32 Software tools, crystal descriptors, and machine learning applied to materials design Slide 33 Software tools, crystal descriptors, and machine learning applied to materials design Slide 34 Software tools, crystal descriptors, and machine learning applied to materials design Slide 35 Software tools, crystal descriptors, and machine learning applied to materials design Slide 36 Software tools, crystal descriptors, and machine learning applied to materials design Slide 37 Software tools, crystal descriptors, and machine learning applied to materials design Slide 38 Software tools, crystal descriptors, and machine learning applied to materials design Slide 39 Software tools, crystal descriptors, and machine learning applied to materials design Slide 40 Software tools, crystal descriptors, and machine learning applied to materials design Slide 41 Software tools, crystal descriptors, and machine learning applied to materials design Slide 42 Software tools, crystal descriptors, and machine learning applied to materials design Slide 43 Software tools, crystal descriptors, and machine learning applied to materials design Slide 44 Software tools, crystal descriptors, and machine learning applied to materials design Slide 45 Software tools, crystal descriptors, and machine learning applied to materials design Slide 46
Próximos SlideShares
What to Upload to SlideShare
Avançar
Transfira para ler offline e ver em ecrã inteiro.

0 gostaram

Compartilhar

Baixar para ler offline

Software tools, crystal descriptors, and machine learning applied to materials design

Baixar para ler offline

Presentation given at TMS 2019

Audiolivros relacionados

Gratuito durante 30 dias do Scribd

Ver tudo
  • Seja a primeira pessoa a gostar disto

Software tools, crystal descriptors, and machine learning applied to materials design

  1. 1. Software tools, crystal descriptors, and machine learning applied to materials design Anubhav Jain Energy Technologies Area Lawrence Berkeley National Laboratory Berkeley, CA TMS 2019 Slides (already) posted to hackingmaterials.lbl.gov
  2. 2. 2 This talk is centered around open-source software that you can use to accelerate your own materials design efforts High-throughput computing and simulations Machine learning Interpretable crystal structure representations
  3. 3. 3 Outline High-throughput computing and simulations Machine learning Interpretable crystal structure representations
  4. 4. We know that high-throughput DFT is useful for generating large data sets, e.g., for materials screening 4 M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K. a Persson, and M. Asta, Sci. Data, 2015, 2, 150009. >10,000 elastic tensors >48000 Seebeck coefficients + cRTA transport Ricci, Chen, Aydemir, Snyder, Rignanese, Jain, & Hautier, Sci Data 2017, 4, 170085.
  5. 5. We know that high-throughput DFT is useful for generating large data sets, e.g., for materials screening 5 M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K. a Persson, and M. Asta, Sci. Data, 2015, 2, 150009. >10,000 elastic tensors >48000 Seebeck coefficients + cRTA transport Ricci, Chen, Aydemir, Snyder, Rignanese, Jain, & Hautier, Sci Data 2017, 4, 170085. Atomate’s goal: make high-throughput easy and scalable for everyone
  6. 6. A “black-box” view of performing a calculation 6 “something” Results! researcher What is the GGA-PBE elastic tensor of GaAs?
  7. 7. Unfortunately, the inside of the “black box” is usually tedious and “low-level” 7 lots of tedious, low-level work… Results! researcher What is the GGA-PBE elastic tensor of GaAs? Input file flags SLURM format how to fix ZPOTRF? q set up the structure coordinates q write input files, double-check all the flags q copy to supercomputer q submit job to queue q deal with supercomputer headaches q monitor job q fix error jobs, resubmit to queue, wait again q repeat process for subsequent calculations in workflow q parse output files to obtain results q copy and organize results, e.g., into Excel
  8. 8. What would be a better way? 8 “something” Results! researcher What is the GGA-PBE elastic tensor of GaAs?
  9. 9. What would be a better way? 9 Results! researcher What is the GGA-PBE elastic tensor of GaAs? Workflows to run q band structure q surface energies ü elastic tensor q Raman spectrum q QH thermal expansion
  10. 10. Ideally the method should scale to millions of calculations 10 Results! researcher Start with all binary oxides, replace O->S, run several different properties Workflows to run ü band structure ü surface energies ü elastic tensor q Raman spectrum q QH thermal expansion q spin-orbit coupling
  11. 11. Atomate tries make it easy, automatic, and flexible to generate data with existing simulation packages 11 Results! researcher Run many different properties of many different materials!
  12. 12. Each simulation procedure translates high-level instructions into a series of low-level tasks 12 quickly and automatically translate high-level (minimal) specifications into well-defined FireWorks workflows What is the GGA-PBE elastic tensor of GaAs? M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, et al., Charting the complete elastic properties of inorganic crystalline compounds, Sci. Data. 2 (2015).
  13. 13. Atomate contains a library of simulation procedures 13 VASP-based • band structure • spin-orbit coupling • hybrid functional calcs • elastic tensor • piezoelectric tensor • Raman spectra • NEB • GIBBS method • QH thermal expansion • AIMD • ferroelectric • surface adsorption • work functions • NMR spectra* • Bader charges* • Magnetic orderings* • SCAN functionals* Other • BoltzTraP • FEFF method • Q-Chem* *=added / major updates in past year Mathew, K. et al Atomate: A high-level interface to generate, execute, and analyze computational materials science workflows, Comput. Mater. Sci. 139 (2017) 140–152.
  14. 14. 14 Full operation diagram job 1 job 2 job 3 job 4 structure workflow database of all workflows automatically submit + executeoutput files + database
  15. 15. 15 A web-based interface is in progress to give atomate users a “personal Materials Project” of their own calculations
  16. 16. Atomate now powers the Materials Project • Online resource of density functional theory simulation data for ~85,000 inorganic materials • Includes band structures, elastic tensors, piezoelectric tensors, battery properties and more • >75,000 registered users • Free • www.materialsproject.org 16 Jain et al. Commentary: The Materials Project: A materials genome approach to accelerating materials innovation. APL Mater. 1, 11002 (2013).
  17. 17. 17 Getting started with atomate Mathew, K. et al. Atomate: A high- level interface to generate, execute, and analyze computational materials science workflows. Comput. Mater. Sci. 139, 140–152 (2017). hackingmaterials.github.io/ atomate https://groups.google.com/ forum/#!forum/atomate Paper Docs Support
  18. 18. 18 Outline High-throughput computing and simulations Machine learning Interpretable crystal structure representations
  19. 19. • With atomate/FireWorks, the user must decide which calculations to perform – E.g., which materials to calculate • Rocketsled is an extension to FireWorks that lets the computer decide what the next best calculation is based on the results of previous calculations • Works for materials design or any other “inverse computational problem” 19 Rocketsled uses adaptive design to suggest the best computations to optimize some metric
  20. 20. 20 Given a search domain, Rocketsled uses an optimization engine to select calculations and submit to supercomputers Optimization engine includes 4 built-in regressors (e.g., RandomForest, Gaussian Process) and 5 acquisition functions (e.g., Expected Improvement). Can bootstrap uncertainty estimates. Or use your own!
  21. 21. 21 Results of using optimization can be dramatic! In the problem of finding materials with high K and high G for superhard materials (7394 possibilities), Rocketsled finds solutions ~30-60X faster than randomly computing the space. Can use pure ML approaches or use matminer featurizations for materials science (latter helps give such good performance)
  22. 22. 22 Results of using optimization can be dramatic! In the problem of finding materials with high K and high G for superhard materials (7394 possibilities), Rocketsled finds solutions ~30-60X faster than randomly computing the space. Even after just 200 calculations of the 7394 possibilities, all solutions are almost certain to be found with Rocketsled. Can use pure ML approaches or use matminer featurizations for materials science (latter helps give such good performance)
  23. 23. 23 Getting started with rocketsled Dunn, A.R., et al. Rocketsled: a software library for optimizing high-throughput computational searches. J. Phys. Mater. https://doi.org/10.1088/2515- 7639/ab0c3d hackingmaterials.github.io/ rocketsled https://groups.google.com/for um/#!forum/fireworkflows Paper Docs Support
  24. 24. 24 Outline High-throughput computing and simulations Machine learning Interpretable crystal structure representations
  25. 25. 25 What is needed to do machine learning on materials? How can we represent chemistry and structure as vectors? How do we get enough output data for training?
  26. 26. Matminer connects materials data with data mining algorithms and data visualization libraries 26 Ward, L. et al. Matminer: An open source toolkit for materials data mining. Comput. Mater. Sci. 152, 60–69 (2018).
  27. 27. >60 featurizer classes can generate thousands of potential descriptors that are described in the literature 27 Matminer contains a library of descriptors for various materials science entities feat = EwaldEnergy([options]) y = feat.featurize([input_data]) • compatible with scikit- learn pipelining • automatically deploy multiprocessing to parallelize over data • include citations to methodology papers
  28. 28. 28 Interactive Jupyter notebooks demonstrate use cases https://github.com/hackingmaterials/matminer_examples Many examples available: • Retrieving data from various databases • Predicting bulk / shear modulus • Predicting formation energies: • from composition alone • with Voronoi-based structure features included • with Coulomb matrix and Orbital Field matrix descriptors (reproducing previous studies in the literature) • Making interactive visualizations • Creating an ML pipeline
  29. 29. 29 Getting started with matminer Ward et al. Matminer : An open source toolkit for materials data mining. Computational Materials Science, 152, 60–69 (2018). Paper Docs Support hackingmaterials.github.io /matminer https://groups.google.com/ forum/#!forum/matminer
  30. 30. 30 Outline High-throughput computing and simulations Machine learning Interpretable crystal structure representations
  31. 31. 31 Typically several steps of machine learning are performed by a human researcher – can these be automated? Descriptors developed and chosen by a researcher ML model developed and chosen by a researcher Why can’t we just give the computer some raw input data (compositions, crystal structures) and output properties and get back an ML model?
  32. 32. 32 Automatminer develops an ML model automatically given raw data (structures or compositions plus output properties) Featurizer MagPie SOAP Sine Coulomb Matrix + many, many more • Missing value imputation • Scaling • One-hot encoding • PCA-based • Correlation • Relief-based (MultiSURF) Uses genetic algorithms to find the best machine learning model + hyperparameters
  33. 33. 33 We are benchmarking automatminer vs current state of the art against 11 problems intended to be a standard test set Dataset Target(s) Samples Elastic Tensor KVRH (GPa), GVRH (GPa) 10,987 Dielectric Tensor Refractive index 4,765 JARVIS 2D Exfoliation energy (meV/atom) 636 Materials Project phonons Highest LO Phonon Frequency (Last PhDOS peak) 1,265 Materials Project (stable) Band gap (eV), Is metallic? (classification) 106,113 Perovskites Formation energy (eV/atom) 18,928 Experimental Band Gaps Is metallic? (classification) 6,354 Experimental Metallic Glasses Glass forms? (classification) 7,190 Materials Project (all) Formation energy (eV/atom) 132,752
  34. 34. 34 Usually, automatminer does very well Usually, automatminer outperforms both state-of-the-art graph based models AND human-generated models! But …
  35. 35. 35 Graph-based approaches work better in some problems Hypothesis – automatminer approaches are better for smaller data sets, graph-based approaches are better for larger data sets Unfortunately, it can be difficult to train some of the graph models on large data sets, particularly without GPUs, so the results are not in yet!
  36. 36. 36 Getting started with automatminer Paper Docs Support hackingmaterials.github.io /automatminer https://groups.google.com/ forum/#!forum/matminer In preparation …
  37. 37. 37 Outline High-throughput computing and simulations Machine learning Interpretable crystal structure representations
  38. 38. 38 Robocrystallographer is a computer program that can analyze crystal structures and describe them as text
  39. 39. 39 Example of fully automated robocrystallographer output
  40. 40. 40 Example of fully automated robocrystallographer output GaAs is zincblende structured and crystallizes in the cubic F4 ̅3m space group. Ga3+ is bonded to four equivalent As3– atoms to form corner- sharing GaAs4 tetrahedra. All Ga–As bond lengths are 2.49 Å. As3– is bonded in a tetrahedral geometry to four equivalent Ga3+ atoms.
  41. 41. 41 Example of fully automated robocrystallographer output
  42. 42. 42 Example of fully automated robocrystallographer output BiOCuSe is parent of FeAs superconductors structured and crystallizes in the tetragonal P4/nmm space group. The structure is two- dimensional and consists of one BiO sheet oriented in the (0, 0, 1) direction and one CuSe sheet oriented in the (0, 0, 1) direction. In the BiO sheet, Bi3+ is bonded in a 4-coordinate geometry to four equivalent O2– atoms. All Bi–O bond lengths are 2.35 Å. O2– is bonded in a tetrahedral geometry to four equivalent Bi3+ atoms. In the CuSe sheet, Cu1+ is bonded to four equivalent Se2– atoms to form a mixture of edge and corner- sharing CuSe4 tetrahedra. All Cu–Se bond lengths are 2.52 Å. Se2– is bonded in a 4-coordinate geometry to four equivalent Cu1+ atoms.
  43. 43. 43 Robocrystallographer is integrated into the Materials Project Click the robot icon for Robocrys Click the speaker icon to have it talk to you. TiO2 is Rutile structured and crystallizes in the tetragonal P4_2/mnm space group. The structure is three-dimensional. Ti4+ is bonded to six equivalent O2- atoms to form a mixture of corner and edge-sharing TiO6 octahedra. The corner-sharing octahedral tilt angles are 49°. There is four shorter (1.96 Å) and two longer (2.00 Å) Ti–O bond length. O2- is bonded in a distorted trigonal planar geometry to three equivalent Ti4+ atoms.
  44. 44. 44 Getting started with robocrystallographer Submitted - waiting for referee report!! Paper Docs Support hackingmaterials.github.io /robocrystallographer Alex Ganose aganose@lbl.gov
  45. 45. 45 Conclusion: hopefully you’ve found something interesting or useful for your own work! High-throughput computing and simulations Machine learning Interpretable crystal structure representations
  46. 46. • Lead developers: – Atomate: Kiran Mathew – Rocketsled: Alex Dunn – Matminer: Logan Ward – Automatminer: Alex Dunn – Robocrystallographer: Alex Ganose • And the dozens of other developers who have contributed to these packages or reported issues! • Funding: U.S. Department of Energy, Basic Energy Sciences, Early Career Award • AddiLonal funding from the DOE-funded Materials Project 46 Acknowledgements Slides (already) posted to hackingmaterials.lbl.gov

Presentation given at TMS 2019

Vistos

Vistos totais

433

No Slideshare

0

De incorporações

0

Número de incorporações

0

Ações

Baixados

27

Compartilhados

0

Comentários

0

Curtir

0

×