O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

High-throughput computation and machine learning methods applied to materials design

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio

Confira estes a seguir

1 de 58 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a High-throughput computation and machine learning methods applied to materials design (20)

Anúncio

Mais de Anubhav Jain (20)

Mais recentes (20)

Anúncio

High-throughput computation and machine learning methods applied to materials design

  1. 1. High-throughput computation and machine learning methods applied to materials design Anubhav Jain Energy Technologies Area Lawrence Berkeley National Laboratory Berkeley, CA Citrine Informatics Talk July 3, 2018 Slides (already) posted to hackingmaterials.lbl.gov
  2. 2. New materials discovery for devices is difficult •  Novel materials with enhanced performance characteristics could make a big dent in sustainability, scalability, and cost •  In practice, we tend to re-use the same fundamental materials for decades –  solar power w/Si since 1950s –  graphite/LiCoO2 (basis of today’s Li battery electrodes) since 1990 –  Bi2Te3 and PbTe thermoelectrics first studied ~1910 •  Although there are lots of improvements to manufacturing, microstructure, etc., there not many new basic compositions •  Why is discovering better materials such a challenge? 2
  3. 3. 3 A material is defined at multiple length scales – stick to the fundamental scale for now
  4. 4. 4 A material is defined at multiple length scales – stick to the fundamental scale for now
  5. 5. 5 Atoms in a box – the materials universe is huge! •  Bag of 30 atoms •  Each atom is one of 50 elements •  Arrange on 10x10x10 lattice •  Over 10108 possibilities! –  more than grains of sand on all beaches (1021) –  more than number of atoms in universe (1080)
  6. 6. 6 Finding the right material is like “finding a needle in a haystack”
  7. 7. What constrains traditional experimentation? 7 “[The Chevrel] discovery resulted from a lot of unsuccessful experiments of Mg ions insertion into well-known hosts for Li+ ions insertion, as well as from the thorough literature analysis concerning the possibility of divalent ions intercalation into inorganic materials.” -Aurbach group, on discovery of Chevrel cathode for multivalent (e.g., Mg2+) batteries Levi, Levi, Chasid, Aurbach J. Electroceramics (2009)
  8. 8. Outline 8 ①  From quantum mechanics to density functional theory (DFT) ②  “High-throughput” DFT ③  Data mining approaches to materials design
  9. 9. The basis of density functional theory is quantum mechanics 9 −!2 2m ∇2 Ψ(r)+V (r)Ψ(r) = EΨ(r) Schrödinger equation describes all the properties of a system through the wavefunction: Time-independent, non-relativistic Schrödinger equation
  10. 10. •  There aren’t too many real situations where we can get a closed solution to the Schrödinger equation •  Let’s pretend we want to approach things numerically for 1000 electrons –  There are ~500,000 electron-electron interactions to worry about. –  Even storing the wavefunction would take ~101000 GB! •  Discretize the x,y,z, position of each electron into a 1000- element grid = 1 billion positions per electron •  Need the wavefunction output (real + complex part) for each combination of all electron positions, i.e. 1E9 ^ (1000) * 2, or 2E9000 values •  even at 1 byte per wavefunction value (low resolution), you have about 2E1000 GB needed needed to store the wavefunction! 10 The wave function is formidable
  11. 11. Dirac summarized it best … 11 “The underlying physical laws necessary for the mathematical theory of a large part of physics and the whole of chemistry are thus completely known, and the difficulty is only that the exact application of these laws leads to equations much too complicated to be soluble.” “It therefore becomes desirable that approximate practical methods of applying quantum mechanics should be developed, which can lead to an explanation of the main features of complex atomic systems without too much computation.”
  12. 12. What is density functional theory (DFT)? 12 DFT theory: •  replaces many-body interactions with a mean field interaction that reproduces the same charge density as the original formulation •  proves that, given the correct charge density, it in principle possible to compute all ground state properties of quantum mechanics exactly So, (for the ground state properties) we went from many-body wavefunctions to mean field charge density! This was worthy of the 1998 Nobel Prize. DFT practice: •  accuracy depends on the choice of (some) parameters, the type of material, the property to be studied, and whether the simulated system (crystal) is a good approximation of reality. e– e– e– e– e– e–
  13. 13. How does one use DFT to design new materials? 13 A. Jain, Y. Shin, and K. A. Persson, Nat. Rev. Mater. 1, 15004 (2016).
  14. 14. How accurate is DFT in practice? 14 Shown are typical DFT results for (i) Li battery voltages, (ii) electronic band gaps, and (iii) bulk modulus (i) (ii) (iii) (i) V. L. Chevrier, S. P. Ong, R. Armiento, M. K. Y. Chan, and G. Ceder, Phys. Rev. B 82, 075122 (2010). (ii) M. Chan and G. Ceder, Phys. Rev. Lett. 105, 196403 (2010). (iii) M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K.A. Persson, and M. Asta, Sci. Data 2, 150009 (2015). battery voltages band gaps bulk modulus
  15. 15. •  System size is essentially limited to ~1000 atoms –  many important materials phenomena simply do not occur at this length scale •  Certain materials, such as those with strong electron correlation, remain difficult to model accurately •  Certain properties, including excited state properties such as band gap, remain difficult to model accurately •  These are all active areas of research and improvement to the theory, and the situation is improving on all fronts 15 Limitations of density functional theory
  16. 16. Outline 16 ①  From quantum mechanics to density functional theory (DFT) ②  “High-throughput” DFT ③  Data mining approaches to materials design
  17. 17. 17 From a needle in a haystack to …
  18. 18. 18 … hiring an army to search through the haystack
  19. 19. High-throughput DFT: a key idea 19 Automate the DFT procedure Supercomputing Power FireWorks Software for programming general computational workflows that can be scaled across large supercomputers. NERSC Supercomputing center, processor count is ~100,000 desktop machines. Other centers are also viable. High-throughput materials screening G. Ceder & K.A. Persson, Scientific American (2015)
  20. 20. •  The answer is “it really varies a lot” –  how big / complicated are the materials you are modeling? –  how complex / expensive are the properties you are modeling? •  Ballpark numbers: –  Low range: optimize structure of ~3-atom compounds •  time to do a million materials ~ 10 million CPU-hours –  Medium range: bulk modulus of ~50 atom compounds •  time to do a million materials ~ 2 billion CPU hours •  The largest CPU allocations from the DOE are typically in the order of ~100 million CPU-hours 20 How much computer time is needed for high-throughput DFT?
  21. 21. Examples of (early) high-throughput studies 21 Application Researcher Search space Candidates Hit rate Scintillators Klintenberg et al. 22,000 136 1/160 Curtarolo et al. 11,893 ? ? Topological insulators Klintenberg et al. 60,000 17 1/3500 Curtarolo et al. 15,000 28 1/535 High TC superconductors Klintenberg et al. 60,000 139 1/430 Thermoelectrics – ICSD - Half Heusler systems - Half Heusler best ZT Curtarolo et al. 2,500 80,000 80,000 20 75 18 1/125 1/1055 1/4400 1-photon water splitting Jacobsen et al. 19,000 20 1/950 2-photon water splitting Jacobsen et al. 19,000 12 1/1585 Transparent shields Jacobsen et al. 19,000 8 1/2375 Hg adsorbers Bligaard et al. 5,581 14 1/400 HER catalysts Greeley et al. 756 1 1/756* Li ion battery cathodes Ceder et al. 20,000 4 1/5000* Entries marked with * have experimentally verified the candidates. See also: Curtarolo et al., Nature Materials 12 (2013) 191–201.
  22. 22. Computations predict, experiments confirm 22 Sidorenkite-based Li-ion battery cathodes LED phosphors YCuTe2 thermoelectrics Wang, Z., Ha, J., Kim, Y. H., Im, W. Bin, McKittrick, J. & Ong, S. P. Mining Unexplored Chemistries for Phosphors for High-Color-Quality White-Light-Emitting Diodes. Joule 2, 914–926 (2018). Chen, H.; Hao, Q.; Zivkovic, O.; Hautier, G.; Du, L.-S.; Tang, Y.; Hu, Y.-Y.; Ma, X.; Grey, C. P.; Ceder, G. Sidorenkite (Na3MnPO4CO3): A New Intercalation Cathode Material for Na-Ion Batteries, Chem. Mater., 2013 Aydemir, U; Pohls, J-H; Zhu, H; Hautier, G; Bajaj, S; Gibbs, ZM; Chen, W; Li, G; Broberg, D; White, MA; Asta, M; Persson, K; Ceder, G; Jain, A; Snyder, GJ. Thermoelectric Properties of Intrinsically Doped YCuTe2 with CuTe4-based Layered Structure. J. Mat. Chem C, 2016 More examples here: A. Jain, Y. Shin, and K. A. Persson, Nat. Rev. Mater. 1, 15004 (2016).
  23. 23. With HT-DFT, we can generate data rapidly – what to do next? 23 M. de Jong, W. Chen, H. Geerlings, M. Asta, and K. A. Persson, Sci. Data, 2015, 2, 150053.! M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K. a Persson, and M. Asta, Sci. Data, 2015, 2, 150009.! >4500 elastic tensors >900 piezoelectric tensors >48000 Seebeck coefficients + cRTA transport Ricci, Chen, Aydemir, Snyder, Rignanese, Jain, & Hautier (in submission)!
  24. 24. With HT-DFT, we can generate data rapidly – what to do next? 24 M. de Jong, W. Chen, H. Geerlings, M. Asta, and K. A. Persson, Sci. Data, 2015, 2, 150053.! M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K. a Persson, and M. Asta, Sci. Data, 2015, 2, 150009.! >4500 elastic tensors >900 piezoelectric tensors >48000 Seebeck coefficients + cRTA transport Ricci, Chen, Aydemir, Snyder, Rignanese, Jain, & Hautier (in submission)! Goal: make it easy to generate comparable data sets on your own
  25. 25. A “black-box” view of performing a calculation 25 “something”! Results!! researcher! What is the GGA-PBE elastic tensor of GaAs?
  26. 26. Unfortunately, the inside of the “black box” is usually tedious and “low-level” 26 lots of tedious, low-level work…! Results!! researcher! What is the GGA-PBE elastic tensor of GaAs? Input file flags SLURM format how to fix ZPOTRF? q  set up the structure coordinates q  write input files, double-check all the flags q  copy to supercomputer q  submit job to queue q  deal with supercomputer headaches q  monitor job q  fix error jobs, resubmit to queue, wait again q  repeat process for subsequent calculations in workflow q  parse output files to obtain results q  copy and organize results, e.g., into Excel
  27. 27. What would be a better way? 27 “something”! Results!! researcher! What is the GGA-PBE elastic tensor of GaAs?
  28. 28. What would be a better way? 28 Results!! researcher! What is the GGA-PBE elastic tensor of GaAs? Workflows to run! q  band structure! q  surface energies! ü  elastic tensor! q  Raman spectrum! q  QH thermal expansion!
  29. 29. Ideally the method should scale to millions of calculations 29 Results!! researcher! Start with all binary oxides, replace O->S, run several different properties Workflows to run! ü  band structure! ü  surface energies! ü  elastic tensor! q  Raman spectrum! q  QH thermal expansion! q  spin-orbit coupling!
  30. 30. Atomate tries make it easy, automatic, and flexible to generate data with existing simulation packages 30 Results!! researcher! Run many different properties of many different materials!
  31. 31. Each simulation procedure translates high-level instructions into a series of low-level tasks 31 quickly and automatically translate PI-style (minimal) specifications into well-defined FireWorks workflows What is the GGA-PBE elastic tensor of GaAs? M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, et al., Charting the complete elastic properties of inorganic crystalline compounds, Sci. Data. 2 (2015).
  32. 32. Atomate contains a library of simulation procedures 32 VASP-based •  band structure •  spin-orbit coupling •  hybrid functional calcs •  elastic tensor •  piezoelectric tensor •  Raman spectra •  NEB •  GIBBS method •  QH thermal expansion •  AIMD •  ferroelectric •  surface adsorption •  work functions Other •  BoltzTraP •  FEFF method •  LAMMPS MD Mathew, K. et al Atomate: A high-level interface to generate, execute, and analyze computational materials science workflows, Comput. Mater. Sci. 139 (2017) 140–152.
  33. 33. 33 Full operation diagram job 1 job 2 job 3 job 4 structure! workflow! database of all workflows! automatically submit + execute!output files + database!
  34. 34. Atomate thus encodes and standardizes knowledge about running various kinds of simulations from domain experts 34 K. Mathew J. Montoya S. Dwaraknath A. Faghaninia All past and present knowledge, from everyone in the group, everyone previously in the group, and our collaborators, about how to run calculations M. Aykol S.P. Ong B. Bocklund T. Smidt H. Tang I.H. Chu M. Horton J. Dagdalen B. Wood Z.K. Liu J. Neaton K. Persson A. Jain +
  35. 35. Outline 35 ①  From quantum mechanics to density functional theory (DFT) ②  “High-throughput” DFT ③  Data mining approaches to materials design
  36. 36. 36 From a needle in a haystack to …
  37. 37. 37 … hiring an army to search through the haystack to …
  38. 38. 38 Armies with metal detectors
  39. 39. Can we build a general optimizer? 39 Generalizable forward solver Supercomputing Power Statistical optimization FireWorks NERSC Various optimization libraries (Figure: J. Mueller)
  40. 40. Rocketsled: Automatic materials screening that selects materials to compute AND submits them to supercomputer 40 screening space of ~20,000 potential ABX3 perovskite combinations as water splitting materials – precomputed in DFT by different group if a machine learning algorithm was in charge of picking the next compound based on past data, how efficient would it be?
  41. 41. Machine learning: the big problem in my view is connecting data to ML algorithms through features 41 Lots of data on complex objects that you want to interrelate Clustering, Regression, Feature extraction, Model-building, etc. Well developed data-mining routines that work only on numbers (ideally ones with high relevance to your problem) Need to transform materials science objects into a set of physically relevant numerical data (“features” or “descriptors”)
  42. 42. Goal of matminer: connect materials data with data mining algorithms and data visualization libraries 42 Ward, L. et al. Matminer: An open source toolkit for materials data mining. Comput. Mater. Sci. 152, 60–69 (2018).
  43. 43. >40 featurizer classes can generate thousands of potential descriptors 43 Matminer contains a library of descriptors for various materials science entities feat = EwaldEnergy([options]) y = feat.featurize([input_data]) •  compatible with scikit-learn pipelining •  automatically deploy multiprocessing to parallelize over data •  include citations to methodology papers
  44. 44. 44 Interactive Jupyter notebooks demonstrate use cases https://github.com/hackingmaterials/matminer_examples! Many examples available: •  Retrieving data from various databases •  Predicting bulk / shear modulus •  Predicting formation energies: •  from composition alone •  with Voronoi-based structure features included •  with Coulomb matrix and Orbital Field matrix descriptors (reproducing previous studies in the literature) •  Making interactive visualizations •  Creating an ML pipeline
  45. 45. Defining local order parameters for various environments 45 Use a given local order parameter with a threshold for motif recognition: If qtet > qthresh, then motif is tetrahedron. Else not (too much) a tetrahedron. Tetrahedral order parameter, qtet, [1]: [1] Zimmermann et al., J. Am. Chem. Soc., 2017, 10.1021/jacs.5b08098
  46. 46. We have now developed mathematical order parameters for various types of local environments 46
  47. 47. How well do these work? 47 1. Order parameters clearly distinguish different environments even after thermal distortion 2. Work well in applications (defect site finding, diffusion characterization) [1] Zimmermann et al., Frontiers of Materials, 2017, doi: 10.3389/fmats.2017.00034
  48. 48. 48 Can cluster crystal structures by “local environment similarity”
  49. 49. Results on MP web site, e.g. for BCC-like structures 49 https://www.materialsproject.org/materials/mp-91/! Target: W similar structures (distance near 0) Cs3Sb! TiGaFeCo! CeMg2Cu!
  50. 50. 50 Text mining: learning from scientific abstracts Matstract corpus Unlabeled data Data labels Feature engineering Text cleaning Tokenization POS tag labels Word embeddings (word2vec) Text processing Hand crafted features Supervised learning Neural network (LSTM) Logistic regression Train/test sets Named Entities Named Entities “Learning” what a scientific study is about from >2 million materials science abstracts
  51. 51. 51 Application: a revised materials search engine Auto-generated summaries of materials based on text mining
  52. 52. 52 Application: materials compositions of interest … A search for “unconventional” (i.e., non-perovskite) ferroelectrics
  53. 53. 53 •  The Materials Project today mostly compiles fundamental simulation output data •  Many users don’t really know what to “do” with this data –  i.e., they would be interested in lattice thermal conductivity, but don’t know they can get there using MP data + other models •  How can we decorate MP database with additional properties and clearly show how we got there? How to connect different materials properties together? “Constitutive relations”
  54. 54. 54 A materials map / network / atlas •  What we need is a “connected property network” •  This is a graph in which nodes are materials properties and edges are relationships between those properties •  Given a set of known properties, e.g. simulation data, one can easily figure out what are all the derived engineering properties one can get Starting with the three properties in blue, one can derive many additional properties using one (orange), two (purple), or three (green) physical models.! ! The value of computations are not only in the direct simulation outputs!!
  55. 55. 55 Materials Atlas
  56. 56. 56 We are now feeding the “atlas” into the Materials Project to derive new data
  57. 57. •  High-throughput density functional theory, materials databases, and machine learning are a new set of tools for doing materials science •  We are developing many methods and software implementations to try to advance the field •  If you are interested, give the software a try! 57 Conclusions Quantum mechanics Density functional theory High-throughput DFT e– e– e– e– e– e– Materials databases Machine learning
  58. 58. •  Atomate –  K Matthew (project lead) & team •  Structure order parameters –  N. Zimmermann (project lead) & team •  Rocketsled –  A. Dunn, J. Brenneck •  Matminer –  L. Ward (project lead, U. Chicago) & team •  Text mining –  V. Tshitoyan, J. Dagdelen, L. Weston •  Propnet –  M. K. Horton, D. Mrdjenovich •  All that provided feedback & contributed code to open-source software efforts! •  Funding: DOE-BES (Early Career + Materials Project Center) •  Computing: NERSC 58 Thank you!

×