O slideshow foi denunciado.
Seu SlideShare está sendo baixado. ×

The Status of ML Algorithms for Structure-property Relationships Using Matbench as a Test Protocol

Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio
Anúncio

Confira estes a seguir

1 de 29 Anúncio

Mais Conteúdo rRelacionado

Diapositivos para si (20)

Semelhante a The Status of ML Algorithms for Structure-property Relationships Using Matbench as a Test Protocol (20)

Anúncio

Mais de Anubhav Jain (15)

Mais recentes (20)

Anúncio

The Status of ML Algorithms for Structure-property Relationships Using Matbench as a Test Protocol

  1. 1. The Status of ML Algorithms for Structure-property Relationships Using Matbench as a Test Protocol Anubhav Jain Lawrence Berkeley National Laboratory TMS Spring 2022, March 2022 Slides (already) posted to hackingmaterials.lbl.gov
  2. 2. ML is quickly becoming a standard tool for materials screening 2 Machine learning High-throughput DFT Expensive calculation Experiment Millions of candidates
  3. 3. There are many new algorithms being published for ML in materials – New ones constantly reported! 3
  4. 4. There are many new algorithms being published for ML in materials – New ones constantly reported! 4 Q: Which one is the “best” based on the literature?
  5. 5. There are many new algorithms being published for ML in materials – New ones constantly reported! 5 Q: Which one is the “best” based on the literature? A: Can’t tell! They’re nearly all done on different data.
  6. 6. Difficulty of comparing ML algorithms 6 Data set used in study A Data set used in study B Data set used in study C • Different data sets • Source (e.g., OQMD vs MP) • Quantity (e.g., MP 2018 vs MP 2019) • Subset / data filtering (e.g., ehull<X) • Different evaluation metrics • Test set vs. cross validation? • Different test set fraction? • Often no runnable version of a published algorithm. MAE 5-Fold CV = 0.102 eV RMSE Test set = 0.098 eV vs. ? ?
  7. 7. What’s needed – an “ImageNet” for materials science 7 https://qz.com/1034972/the-data-that-changed-the-direction-of-ai-research-and-possibly-the-world/
  8. 8. What does a standard data set do for a field? 8 One of the reasons computer science / machine learning seems to advance so quickly is that they decouple data generation from algorithm development This allows groups to focus on algorithm development without all the data generation, data cleaning, etc. that often is the majority of an end-to-end data science project
  9. 9. The ingredients of the Matbench benchmark qStandard data sets qStandard test splits according to nested cross-validation procedure qAn online leaderboard that encourages reproducible results 9
  10. 10. How to design good data sets for materials science? 10 • There is no single type of problem that materials scientists are trying to solve • For now, focus on materials property prediction (from structure or composition) • We want a test set that contains a diverse array of problems • Smaller data versus larger data • Different applications (electronic, mechanical, etc.) • Composition-only or structure information available • Experimental vs. Ab-initio • Classification or regression
  11. 11. Matbench includes 13 different ML tasks 11 Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Comput Mater 2020, 6 (1), 138. https://doi.org/10.1038/s41524-020-00406-3.
  12. 12. The tasks encompass a variety of problems 12 Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Comput Mater 2020, 6 (1), 138. https://doi.org/10.1038/s41524-020-00406-3.
  13. 13. The ingredients of the Matbench benchmark ü Standard data sets q Standard test splits according to nested cross-validation procedure q An online leaderboard that encourages reproducible results 13
  14. 14. The most common method: a single hold-out test set 14 • Training/validation is used for model selection • Test/hold-out is used only for error estimation (i.e., final score)
  15. 15. Nested CV as a standard scoring metric 15 Nested CV is like hold-out, but varies the hold out set. Think of it as k different “universes” – we have a different training + validation of the model in each universe and a different hold-out.
  16. 16. Nested CV as a standard scoring metric 16 Nested CV is like hold-out, but varies the hold out set. Think of it as N different “universes” – we have a different training + validation of the model in each universe and a different hold-out. “A nested CV procedure provides an almost unbiased estimate of the true error.” Varma and Simon, Bias in error estimation when using cross-validation for model selection (2006)
  17. 17. The ingredients of the Matbench benchmark ü Standard data sets ü Standard test splits according to nested cross-validation procedure q An online leaderboard that encourages reproducible results 17
  18. 18. Matbench Website – now complete! https://matbench.materialsproject.org
  19. 19. Matbench compares ML algorithms 19 Bigger datasets Better relative performance
  20. 20. Access to Datasets/ML tasks Interactively, via Materials Project ml.materialsproject.org Programmatically via matbench in python (2 lines) *loads all 13 tasks Programmatically via matminer in python (2 lines) Direct download, via matbench.materialsproject.org Preferred/easiest method! https://github.com/hackingmaterials/matminer https://github.com/hackingmaterials/matminer
  21. 21. Programmatic Access and Analysis of Submissions 21 • Run a benchmark on your own algorithm in ~10 lines of code • Run on any combination or all of the 13 existing tasks • If your entry outperforms existing entry, submit algorithm in a pull request! Existing notebooks/code and software requirements for reproducing any benchmark {'python': [['crabnet==1.2.1', 'scikit_learn==1.0.2', 'matbench==0.5']]} Comprehensive raw data (accessible via matbench python package or any json-capable language) on all benchmarks Publicly available to anyone! In-depth performance metrics for individual ML tasks for all submissions Both visually on website, and programmatically
  22. 22. The ingredients of the Matbench benchmark ü Standard data sets ü Standard test splits according to nested cross-validation procedure ü An online leaderboard that encourages reproducible results 22
  23. 23. What algorithms have been tested on the matbench data set so far? • Magpie + sine coloumb matrix random forest (feature-based random forests) • Ward, L., Agrawal, A., Choudhary, A. et al. A general-purpose machine learning framework for predicting properties of inorganic materials. npj Comput Mater 2, 16028 (2016). https://doi.org/10.1038/npjcompumats.2016.28 • Faber, Felix, et al. "Crystal structure representations for machine learning models of formation energies." International Journal of Quantum Chemistry 115.16 (2015): 1094-1101. • Automatminer (feature-based AutoML) • Dunn, A.; Wang, Q.; Ganose, A.; Dopp, D.; Jain, A. Benchmarking Materials Property Prediction Methods: The Matbench Test Set and Automatminer Reference Algorithm. npj Comput Mater 2020, 6 (1), 138. • CGCNN (graph neural network) • Xie, T.; Grossman, J. C. Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. Phys. Rev. Lett. 2018, 120 (14), 145301. • MEGNET (graph neural network) • Chen, C.; Ye, W.; Zuo, Y.; Zheng, C.; Ong, S. P. Graph Networks as a Universal Machine Learning Framework for Molecules and Crystals. Chemistry of Materials 2019, 31 (9), 3564–3572. • MODNet (feature-based neural network) • De Breuck, P.-P.; Evans, M. L.; Rignanese, G.-M. Robust Model Benchmarking and Bias-Imbalance in Data-Driven Materials Science: A Case Study on MODNet. arXiv:2102.02263 [cond-mat] 2021. • CRABNet (attention-based composition neural network) • Wang, A.; Kauwe, S.; Murdock, R.; Sparks, T. Compositionally-Restricted Attention-Based Network for Materials Property Prediction; ChemRxiv, 2020. https://doi.org/10.26434/chemrxiv.11869026.v1. • ALIGNN (graph neural network with bond angles) • Choudhary, Kamal, and Brian DeCost. "Atomistic Line Graph Neural Network for improved materials property predictions." npj Computational Materials 7.1 (2021): 1-8. 23
  24. 24. Insights from standardized comparisons 24 • Originally, we found traditional ”hand-crafted” feature models performed best generally when ! < 10% • So it seemed matsci data – typically small datasets, esp. experimental – was best modelled by traditional ML/feature methods, e.g. Random Forest • Clever developments in neural networks have improved GNN models on smaller datasets, in part powered by competition on the Matbench leaderboard • Standardized platform has enabled easier identification of techniques which work well for certain problems, and those that do not +
  25. 25. Insights from standardized comparisons 25 Errors Predicting Final Phonon DOS Peak Frequencies Structural GNN (2022) Composition GNN (2021) Algorithm Mean MAE (cm-1) Mean RMSE (cm-1) Maximum max_error (cm-1) ALIGNN (2022) 29.5385 53.501 615.3466 MODNet v0.1.10 (2021) 38.7524 78.222 1031.8168 CrabNet (2021) 55.1114 138.3775 1452.7562 AMMExpress (2020) 56.1706 109.7048 1151.557 CGCNN (2019) 57.7635 141.7018 2504.8743 Mean Absolute Error !"#$ ± &"#$ Predicting Final PhDOS Peaks SoTA early 2020 Same data, same test; so, why are some algorithms best? • ALIGNN: Incorporation of bond angle into crystal graph • Bond angle/local env importance for vibrational properties? • Matbench enables these sorts of “instant” ablation studies
  26. 26. Insights from standardized comparisons 26 Errors Predicting Predicting Expt. !"#$ Mean Absolute Error %&'( ± *&'( Predicting Expt. !"#$ Composition GNN Algorithm Mean MAE (eV) Std. MAE (eV) Mean RMSE (eV) CrabNet 0.3463 0.0088 0.8504 MODNet (v0.1.10) 0.347 0.0222 0.7437 CrabNet v1.2.1 0.3757 0.0207 0.8805 AMMExpress v2020 0.4161 0.0194 0.9918 Traditional Features + Encoding/selection SoTA early 2020 Same data, same test; so, why are some algorithms best? • CrabNet: Importance of attention mechanism for compositional props.; low variability across folds • MODNet: Normalized Mutual Information feature selection results in high performance at risk of higher variability across folds
  27. 27. Improvements to Materials ML Benchmarks 27 Standardized Uncertainty Quantification More Datasets + Better Tasks! • ML-Materials design improved by UQ of each prediction • Enables adaptive design: • Practical: modern models (e.g., MODNet) produce UQ estimates naturally • Useful: Can analyze UQ to tell us how often samples true values actually fall outside UQ range • In progress: Coming soon to matbench package! • Impossible to represent the full field of materials design in a single set of benchmarks • However… can we come close? Aim to include a wider variety of properties and sources: • Expt. load-dependent Vicker’s hardness • Expt. superconductor Tc • Expt. Δ"# $ from crystal structure • Expt. UV-Vis measurements of metal oxides • Unique, domain-specific procedures for each task • For example: segregation of CV samples into clusters based on structure/composition (LOCOCV) • Evaluation procedures which most closely resemble real world usage of these algorithms in the most computationally feasible fashion
  28. 28. Conclusions and future • As the community increasingly develops new algorithms for machine learning materials properties, a standard way to test these algorithms is needed • Matbench represents such a standard and allows you to test your algorithms against others • Matbench also allows us to measure overall progress in the field • We hope to see you on the leaderboard! 28
  29. 29. Acknowledgements 29 Alex Dunn Lead developer Qi Wang Alex Ganose Daniel Dopp Slides (already) posted to hackingmaterials.lbl.gov

×