NANO281 is the University of California San Diego NanoEngineering Department's first course on the application of data science in materials science. It is taught by Professor Shyue Ping Ong of the Materials Virtual Lab (http://www.materialsvirtuallab.org).
2. What is Data Science?
Data science is a multi-disciplinary field that uses
scientific methods, processes, algorithms and
systems to extract knowledge and insights from
structured and unstructured data.
NANO281
5. Materials data is growing … (stats as of Jan 1
2020)
NANO281
~ 200,000 crystals
~ 400,000 crystals
Cambridge structural
database (small-molecule
organic and metal-
organic crystal structures)
since 1972…
Source: https://www.ccdc.cam.ac.uk/solutions/csd-system/components/csd/
http://cdn.rcsb.org/rcsb-pdb/v2/about-us/rcsb-pdb-impact.pdf
Protein Data Bank (PDB)
6. But quantity and quality lags many other fields….
NANO281
https://supercon.nims.go.jp/
~1000+ superconductors
(many minor composition
modifications)
One of the most comprehensive handbooks on
materials data:
• Density, thermal and electrical conductivity,
melting and boiling points, etc.
• But O(100) binaries and limited ternaries
7. “First Principles” Materials Design
Eψ(r) = −
h 2
2m
∇2
ψ(r)+V(r)ψ(r)
Schrodinger Equation
0 0.2 0.4 0.6 0.8 1
0
50
100
150
200
250
Diffusion coordinate
Energy(meV)
LCO
NCO
Material Properties
Phase stability1
Diffusion barriers2
Charge
densities6
Surface energies
and Wulff shape3
Density functional theory
(DFT) approximation
Generally applicable to
any chemistry
1 Ong et al., Chem. Mater., 2008, 20, 1798–1807.
2 Ong et al., Energy Environ. Sci., 2011, 4, 3680–3688.
3 Tran et al., Sci. Data, 2016, 3, 160080.
4 Deng et al., J. Electrochem. Soc., 2016, 163, A67–A74.
5 Wang et al., Chem. Mater., 2016, 28, 4024–4031.
6 Ong et al., Phys. Rev. B, 2012, 85, 2–5.
Mechanical
properties4
Electronic
structure5
Inherently
scalable
NANO281
8. Electronic structure calculations are today
reliable and reasonably accurate.
tials in Quantum ESPRESSO). In this case, too,
the small D values indicate a good agreement
between codes. This agreementmoreoverencom-
passes varying degrees of numerical convergence,
differences in the numerical implementation of
the particular potentials, and computational dif-
ferences beyond the pseudization scheme, most
of which are expected to be of the same order of
magnitude or smaller than the differences among
all-electron codes (1 meV per atom at most).
Conclusions and outlook
Solid-state DFT codes have evolved considerably.
The change from small and personalized codes to
widespread general-purpose packages has pushed
developers to aim for the best possible precision.
Whereas past DFT-PBE literature on the lattice
parameter of silicon indicated a spread of 0.05 Å,
the most recent versions of the implementations
discussed here agree on this value within 0.01 Å
(Fig. 1 and tables S3 to S42). By comparing codes
on a more detailed level using the D gauge, we
have found the most recent methods to yield
nearly indistinguishable EOS, with the associ-
ated error bar comparable to that between dif-
ferent high-precision experiments. This underpins
thevalidityof recentDFTEOSresults andconfirms
that correctly converged calculations yield reliable
predictions. The implications are moreover rele-
vant throughout the multidisciplinary set of fields
that build upon DFT results, ranging from the
physical to the biological sciences.
In spite of the absence of one absolute refer-
ence code, we were able to improve and demon-
strate the reproducibility of DFT results by means
of a pairwise comparison of a wide range of codes
and methods. It is now possible to verify whether
any newly developed methodology can reach the
same precision described here, and new DFT
applications can be shown to have used a meth-
od and/or potentials that were screened in this
way. The data generated in this study serve as a
crucial enabler for such a reproducibility-driven
paradigm shift, and future updates of available
D values will be presented at http://molmod.
ugent.be/deltacodesdft. The reproducibility of
reported results also provides a sound basis for
further improvement to the accuracy of DFT,
particularly in the investigation of new DFT func-
tionals, or for the development of new computa-
tional approaches. This work might therefore
Fig. 4. D values for comparisons between the most important DFT methods considered (in
millielectron volts per atom). Shown are comparisons of all-electron (AE), PAW, ultrasoft (USPP), and
norm-conserving pseudopotential (NCPP) results with all-electron results (methods are listed in alpha-
betical order in each category). The labels for each method stand for code, code/specification (AE), or
potential set/code (PAW, USPP, and NCPP) and are explained in full in tables S3 to S42.The color coding
RESEARCH | RESEARCH ARTICLE
onFebruary19,2017http://science.sciencemag.org/Downloadedfrom
Lejaeghere et al. Science, 2016, 351 (6280), aad3000.
Nitrides are an important class of optoel
ported synthesizability of highly metasta
nitrogen precursors (36, 37) suggests th
spectrum of promising and technologica
trides awaiting discovery.
Although our study focuses on the m
crystals, polymorphism and metastability
is of great technological relevance to pha
tronics, and protein folding (7). Our obs
energy to metastability could address a d
in organic molecular solids: Why do man
numerous polymorphs within a small (~
whereas inorganic solids often see >100°C
morph transition temperatures? The wea
molecular solids yield cohesive energies o
or −1 eV per molecule, about a third of t
class of inorganic solids (iodides; Fig. 2B).
yields a correspondingly small energy scal
(38). When this small energy scale of orga
is coupled with the rich structural diversity a
tional degrees of freedom during molecular
leads to a wide range of accessible polymorp
modynamic conditions.
Influence of composition
The space of metastable compounds hov
scape of equilibrium phases. As chemica
thermodynamic system, the complexity
grows. Figure 2A shows an example ca
for the ternary Fe-Al-O system, plotted a
tion energies referenced to the elemental
S1.2 for discussion). We anticipate the th
of a phase to be different when it is compe
S C I E N C E A D V A N C E S | R E S E A R C H A R T I C L E
HAUTIER, ONG, JAIN, MOORE, AND CEDER PHYSICAL REVIEW B 85, 155208 (2012)
or meV/atom); 10 meV/atom corresponds to about 1 kJ/mol-
atom.
III. RESULTS
Figure 2 plots the experimental reaction energies as a
function of the computed reaction energies. All reactions
involve binary oxides to ternary oxides and have been chosen
as presented in Sec. II. The error bars indicate the experimental
error on the reaction energy. The data points follow roughly
the diagonal and no computed reaction energy deviates from
the experimental data by more than 150 meV/atom. Figure 2
does not show any systematic increase in the DFT error with
larger reaction energies. This justifies our focus in this study
on absolute and not relative errors.
In Fig. 3, we plot a histogram of the difference between
the DFT and experimental reaction energies. GGA + U un-
derestimates and overestimates the energy of reaction with the
same frequency, and the mean difference between computed
and experimental energies is 9.6 meV/atom. The root-mean-
square (rms) deviation of the computed energies with respect
to experiments is 34.4 meV/atom. Both the mean and rms are
very different from the results obtained by Lany on reaction
energies from the elements.52
Using pure GGA, Lany found
that elemental formation energies are underestimated by GGA
with a much larger rms of 240 meV/atom. Our results are
closer to experiments because of the greater accuracy of DFT
when comparing chemically similar compounds such as binary
and ternary oxides due to errors cancellation.40
We should note
that even using elemental energies that are fitted to minimize
the error versus experiment in a large set of reactions, Lany
reports that the error is still 70 meV/atom and much larger
than what we find for the relevant reaction energies. The
rms we found is consistent with the error of 3 kJ/mol-atom
600
800
l
V/at)
FIG. 3. (Color online) Histogram of the difference between
computed ( E
comp
0 K ) and experimental ( E
expt
0 K ) energies of reaction
(in meV/atom).
(30 meV/atom) for reaction energies from the binaries in the
limited set of perovskites reported by Martinez et al.29
Very often, instead of the exact reaction energy, one is
interested in knowing if a ternary compound is stable enough
to form with respect to the binaries. This is typically the case
when a new ternary oxide phase is proposed and tested for
stability versus the competing binary phases.18
From the 131
compounds for which reaction energies are negative according
to experiments, all but two (Al2SiO5 and CeAlO3) are also
negative according to computations. This success in predicting
stability versus binary oxides of known ternary oxides can
be related to the very large magnitude of reaction energies
from binary to ternary oxides compared to the typical errors
observed (rms of 34 meV/atom). Indeed, for the vast majority
of the reactions (109 among 131), the experimental reaction en-
ergies are larger than 50 meV/atom. It is unlikely then that the
DFT error would be large enough to offset this large reaction
energy and make a stable compound unstable versus the binary
oxides.
The histogram in Fig. 3 shows several reaction energies
with significant errors. Failures and successes of DFT are often
JSON document in the format of a Crystallographic Information File (cif), which can also be downloaded
via the Materials Project website and Crystalium web application. In addition, the weighted surface
energy (equation (2)), shape factor (equation (3)), and surface anisotropy (equation (4)) are given.
Table 2 provides a full description of all properties available in each entry as well as their corresponding
JSON key.
Technical Validation
The data was validated through an extensive comparison with surface energies from experiments and
other DFT studies in the literature. Due to limitations in the available literature, only the data on ground
state phases were compared.
Comparison to experimental measurements
Experimental determination of surface energy typically involves measuring the liquid surface tension and
solid-liquid interfacial energy of the material20
to estimate the solid surface energy at the melting
temperature, which is then extrapolated to 0 K under isotropic approximations. Surface energies for
individual crystal facets are rarely available experimentally. Figure 5 compares the weighted surface
energies of all crystals (equation (2)) to experimental values in the literature20,23,26–28
. It should be noted
that we have adopted the latest experimental values available for comparison, i.e., values were obtained
from the 2016 review by Mills et al.27
, followed by Keene28
, and finally Niessen et al.26
and Miller and
Tyson20
. A one-factor linear regression line γDFT
¼ γEXP
þ c was fitted for the data points. The choice of
the one factor fit is motivated by the fact that standard broken bond models show that there is a direct
relationship between surface energies and cohesive energies, and previous studies have found no evidence
that DFT errors in the cohesive energy scale with the magnitude of the cohesive energy itself61
.
We find that the DFT weighted surface energies are in excellent agreement with experimental values,
with an average underestimation of only 0.01 J m− 2
and a standard error of the estimate (SEE) of
0.27 J m− 2
. The Pearson correlation coefficient r is 0.966. Crystals with surfaces that are well-known to
undergo significant reconstruction tend to have errors in weighted surface energies that are larger than
the SEE.
The differences between the calculated and experimental surface energies can be attributed to three
main factors. First, there are uncertainties in the experimental surface energies. The experimental values
derived by Miller and Tyson20
are extrapolations from extreme temperatures beyond the melting point.
The surface energy of Ge, Si62
, Te63
, and Se64
were determined at 77, 77, 432 and 313 K respectively while
Figure 5. Comparison to experimental surface energies. Plot of experimental versus calculated weighted
surface energies for ground-state elemental crystals. Structures known to reconstruct have blue data points
while square data points correspond to non-metals. Points that are within the standard error of the estimate
− 2
Phase stability Formation energies
Tran, et al. Sci. Data 2016, 3, 160080.
Sun, et al. Sci. Adv. 2016, 2 (11),
e1600225.
Figure 2. Distribution of calculated volume per atom, Poisson ratio, bulk modulus and shear modulus. Vector
field-plot showing the distribution of the bulk and shear modulus, Poisson ratio and atomic volume for 1,181
metals, compounds and non-metals. Arrows pointing at 12 o’clock correspond to minimum volume-per-atom
and move anti-clockwise in the direction of maximum volume-per-atom, which is located at 6 o’clock. Bar
plots indicate the distribution of materials in terms of their shear and bulk moduli.
www.nature.com/sdata/
Surface energies Elastic constants
de Jong et al. Sci. Data 2015, 2,
150009.
Hautier et al. Phys. Rev. B 2012, 85,
155208.
NANO281
Modern
electronic
structure
codes give
relatively
consistent
equations of
state.
11. The Materials Project is an open science
project to make the computed properties of all
known inorganic materials publicly available to
all researchers to accelerate materials
innovation.
June 2011: Materials Genome Initiative
which aims to “fund computational tools,
software, new methods for material
characterization, and the development of
open standards and databases that will make
the process of discovery and development
of advanced materials faster, less
expensive, and more predictable”
https://www.materialsproject.org
NANO281
13. Materials
Project DB
How do I
access MP
data?
MaterialsAPI
Pros
• Intuitive and user-friendly
• Secure
WebApps
RESTfulAPI
• Programmatic access for
developers and researchers
NANO281
14. The Materials API
An open platform for accessing Materials
Project data based on REpresentational
State Transfer (REST) principles.
Flexible and scalable to cater to large
number of users, with different access
privileges.
Simple to use and code agnostic.
NANO281
15. A REST API maps a URL to a
resource.
Example:
GET https://api.dropbox.com/1/account/info
Returns information about a user’s account.
Methods: GET, POST, PUT, DELETE, etc.
Response: Usually JSON or XML or both
NANO281
19. Secure access
An individual API key provides secure
access with defined privileges.
All https requests must supply API key
as either a “x-api-key” header or a
GET/POST “API_KEY” parameter.
API key available at
https://www.materialsproject.org/dashbo
ard
NANO281
20. Sample output (JSON)
Intuitive response
format
Machine-readable
(JSON parsers
available for most
programming
languages)
Metadata provides
provenance for
tracking
crea ed_a : "2014-07-18T11:23:25.415382",
alid_response: r e,
ersion:
,
-
p ma gen: "2.9.9",
db: "2014.04.18",
res : "1.0"
response: [
],
-
,
-
energ : -67.16532048,
ma erial_id: "mp-24972"
,
-
energ : -132.33035197,
ma erial_id: "mp-542309"
,+
,+
,+
,+
,+
,+
,+
+
cop righ : "Ma erial Projec , 2012"
NANO281
21. Demo of Materials Data Sources
NANO281
https://docs.google.com/spreadsheets/d/18MPVaixzX7hQN6lT0n9-
FdmTnTYjQDkvRhI9T5Ym1jQ/edit?usp=sharing
22. Types of Materials Data
Qualitative
data
Nominal
measurement
(categories)
E.g., Metal/Insulator,
Stable/Unstable
No rank or order
Ranked
data
Ordinal
measurement
(ordered)
E.g., Insulator/
semiconductor/
conductor
Does not indicate
distance between
ranks
Quantitative
Data
Interval/ratio
measurement (equal
intervals and true 0)
E.g., melting point,
elastic constant,
electrical/ionic
conductivity
Considerable
information and
permits meaningful
arithmetic operations
NANO281
23. Machine learning (ML) is nothing more than
(highly) sophisticated curve fitting….
NANO281
Image: https://www.slideshare.net/awahid/big-data-and-machine-learning-for-businesses and Google
Images
24. Typical Materials Data ScienceWorkflow
Identify Purpose
and Target
Data Collection Featurization Training Application
Active learning
Domain knowledge
- Is target learnable?
- Is target ambiguous?
Data Sources
Existing DIY
Elemental Features Structural Features
Classification
Decision tree
Logistic regression
...
Regression
GPR
KRR
Multi-linear
Random forest
SVR
Neural networks
Graph models
...
Supervised
- Cross-validation
- Hyper-parameter optimization
Tools
ænet
Automatminer
CGCNN
DeepChem
MEGNet
PROPhet
SchnetPack
TensorMol
...
NANO281
25. Where is ML valuable in Materials Science?
NANO281
Things that are too slow/difficult
to compute
Relationships that are beyond our
understanding (at the moment)
(AA’)0.5(BB’)0.5O3 perovskite
10 A and 10 B species
= (10C2 x 8C4)2 ≈107
Element-wise classification model
Prediction
PredictedInput
CN4 - Motif 1
CN5 - Mo
CN6
CN4
1: single bond 2: L-shaped 2: water-like
2: bent 120
degrees
2: bent 150
degrees
2: linear 3: T-shaped 3: trigonal planar
3: trigonal non-
coplanar
4: square co-
planar
4: tetrahedral
4: rectangular
see-saw
4: see-saw like
4: trigonal
pyramidal
5: pentagonal
planar
5: square
pyramidal
5: trigonal
bipyramidal
6: hexagonal
planar
6: octahedral
6: pentagonal
pyramidal
7: hexagonal
pyramidal
7: pentagonal
bipyramidal
8: body-centered
cubic
8: hexagonal
bipyramidal
12: cuboctahedra
??
26. Data History of the Materials Project
Reasonable ML
Deep learning
(AA’)0.5(BB’)0.5O3
perovskite
2 x 2 x 2 supercell,
10 A and 10 B species
= (10C2 x 8C4)2 ≈107
NANO281
ratio of (634 + 34)/485 ≈ 1.38 (Supplementary Table S-II) with b5%
difference in the experimental and theoretical values. This again
agree well with those calculated from the rule of mixture (Supplemen-
tary Table-III). The experimental XRD patterns also agree well with
Fig. 2. Atomic-resolution STEM ABF and HAADF images of a representative high-entropy perovskite oxide, Sr(Zr0.2Sn0.2Ti0.2Hf0.2Mn0.2)O3. (a, c) ABF and (b, d) HAADF images at (a, b) low
and (c, d) high magnifications showing nanoscale compositional homogeneity and atomic structure. The [001] zone axis and two perpendicular atomic planes (110) and (110) are marked.
Insets are averaged STEM images.
Jiang et al. A New Class
of High-Entropy
Perovskite Oxides.
Scripta Materialia 2018,
142, 116–120.
Materials design
is combinatorial
27. Solution:Surrogate models for“instant” property
predictions
NANO281
“descriptors/features”“target"
Property
• Energies (formation,
Ehull, reaction, binding,
etc.)
• Band gaps
• Mechanical properties
• Functional properties
(e.g., ionic conductivity)
• ….
Composition
• Stoichiometric
attributes, e.g., number
and ratio of elements,
etc.
• Elemental property,
e.g., mean, range, min,
max, etc. of elemental
properties such as
atomic number,
electronegativity, row,
group, atomic radii, etc.
• Electronic structure,
e.g., number of valence
electrons, shells, etc.
• …
Structure
• Crystal/molecular
symmetry
• Lattice parameters
• Atomic coordinates
• Connectivity / bonding
between atoms
• …
= f( ),
28. Composition-based models
NANO281
Zheng, X., et al (2018). Chem. Sci., 9(44), 8426-8432.
Jha et al. (2018) Sci. Rep., 8(1), 17593.
Meredig et al. (2014) Phys. Rev. B 89, 094104
Feature engineering Deep Learning
29. Structure-based models
NANO281
Property-labelled materials fragments + gradient boosting decision tree
Isayev et al. (2017) Nature Comm., 8, 15679 Xie et al. (2018) Phys. Rev. Lett. 120, 145301
Crystal graph + graph convolutional neural networks
Smooth overlap of atom positions (SOAP)
Rosenbrock et al. npj Comput. Mater. (2017), 3, 29
30. State of the art:Graph-based representations
Figure 4: Pearson correlations between elemental embedding vectors. Elements are arranged
in order of increasing Mendeleev number49
for easier visualization of trends.
Nissan Motor Co.NANO281
31. Performance on 130,462 QM9 molecules
NANO281
80%-10%-10%
train-validation-test split
Only Z as atomic feature,
i.e., feature selection helps
model learn, but is not
critical!
MEGNET1 MEGNET-
Simple1
SchNet2 “Chemical
Accuracy”
U0 (meV) 9 12 14 43
G (meV) 10 12 14 43
εHOMO (eV) 0.038 0.043 0.041 0.043
εLUMO (eV) 0.031 0.044 0.034 0.043
Cv (cal/molK) 0.030 0.029 0.033 0.05
1 Chen et al. Chem. Mater. 2019, 31 (9), 3564–3572. doi: 10.1021/acs.chemmater.9b01294.
2 Schutt et al. J. Chem. Phys. 148, 241722 (2018)
State-of-the-art performance
surpassing chemical accuracy
in 11 of 13 properties!
32. Performance on Materials Project Crystals
NANO281
Property MEGNet SchNet1 CGCNN2
Formation energy Ef
(meV/atom)
28
(60,000)
35 39
(28,046)
Band gap Eg (eV) 0.330
(36,720)
- 0.388
(16,485)
log10 KVRH (GPa) 0.050
(4,664)
- 0.054
(2,041)
log10 GVRH (GPa) 0.079
(4,664)
- 0.087
(2,041)
Metal classifier 78.9%
(55,391)
- 80%
(28,046)
Non-metal classifier 90.6%
(55,391)
- 95%
(28,046)
1 Schutt et al. J. Chem. Phys. 148, 241722 (2018)
2 Xie et al. PRL. 120.14 (2018): 145301.
33. The Scale Challenge in Computational Materials
Science
Many real-world materials problems are not
related to bulk crystals.
Huang et al. ACS Energy Lett. 2018, 3 (12), 2983–
2988.
Tang et al. Chem. Mater. 2018, 30 (1), 163–173.
Electrode-electrolyte interfaces Catalysis Microstructure and segregation
Need linear-scaling with ab initio accuracy.
NANO281
34. Machine Learning: A solution to the Scale Challenge
in Computational Materials Science?
Length Scale
AccuracyTransferability
Finite element /
continuum
models
Empirical
potentials
First principles methods
Critical challenge
Bridging the 10-10 → 10-6
m or 10-12 → 10-6 sec
scales in a manner that
retains transferability
and accuracy, and is
scalable.
Time Scale
Atomic vibrations<ps
ns
µs
Ion dynamics
Reaction dynamics
ms
NANO281
35. Machine learning the potential energy surface
NANO281
symmetry functions (ACSF)39
to represent the atomic local environments and fully con-
nected neural networks to describe the PES with respect to symmetry functions.11,12
A separate neural network is used for each atom. The neural network is defined by
the number of hidden layers and the nodes in each layer, while the descriptor space is
given by the following symmetry functions:
Gatom,rad
i =
NatomX
j6=i
e ⌘(Rij Rs)2
· fc(Rij),
Gatom,ang
i = 21 ⇣
NatomX
j,k6=i
(1 + cos ✓ijk)⇣
· e ⌘0(R2
ij+R2
ik+R2
jk)
· fc(Rij) · fc(Rik) · fc(Rjk),
where Rij is the distance between atom i and neighbor atom j, ⌘ is the width of the
Gaussian and Rs is the position shift over all neighboring atoms within the cuto↵
radius Rc, ⌘0
is the width of the Gaussian basis and ⇣ controls the angular resolution.
fc(Rij) is a cuto↵ function, defined as follows:
fc(Rij) =
8
>><
>>:
0.5 · [cos (
⇡Rij
Rc
) + 1], for Rij Rc
0.0, for Rij > Rc.
These hyperparameters were optimized to minimize the mean absolute errors of en-
ergies and forces for each chemistry. The NNP model has shown great performance
for Si,11
TiO2,40
water41
and solid-liquid interfaces,42
metal-organic frameworks,43
and
has been extended to incorporate long-range electrostatics for ionic systems such as
4
nO44
and Li3PO4.45
aussian Approximation Potential (GAP). The GAP calculates the similar-
y between atomic configurations based on a smooth-overlap of atomic positions
OAP)10,46
kernel, which is then used in a Gaussian process model. In SOAP, the
aussian-smeared atomic neighbor densities ⇢i(R) are expanded in spherical harmonics
follows:
⇢i(R) =
X
j
fc(Rij) · exp(
|R Rij|2
2 2
atom
) =
X
nlm
cnlm gn(R)Ylm( ˆR),
he spherical power spectrum vector, which is in turn the square of expansion coe -
ents,
pn1n2l(Ri) =
lX
m= l
c⇤
n1lmcn2lm,
n be used to construct the SOAP kernel while raised to a positive integer power ⇣
hich is 4 in present case) to accentuate the sensitivity of the kernel,10
K(R, R0
) =
X
n1n2l
(pn1n2l(R)pn1n2l(R0
))⇣
,
the above equations, atom is a smoothness controlling the Gaussian smearing, and
Distances and angles
Neighbor density
Linear regression
Kernel regression
Neural networks
Neural Network Potential (NNP)1
Moment Tensor Potential (MTP)2
Gaussian Approximation Potential (GAP)3
Spectral Neighbor Analysis Potential (SNAP)4
ML models
Descriptors
1 Behler et al. PRL. 98.14 (2007): 146401.
2 Shapeev MultiScale Modeling and Simulation 14, (2016).
3 Bart ́ok et al. PRL. 104.13 (2010): 136403.
4 Thompson et al. J. Chem. Phys. 285, 316330 (2015)
36. Standardized workflow for ML-IAP construction
and evaluation
Pymatgen
Fireworks + VASP
DFT static
Dataset
Elastic deformation Distorted
structures
Surface generation Surface
structures
Vacancy + AIMD Trajectory
snapshots
(low T, high T) AIMD Trajectory
snapshots
Crystal
structure
property fittingE
e
e.g. elastic, phonon
···
energy weights
degrees of freedom
···
cutoff radius
expansion width
S1
S2
Sn
· · ·
rc
atomic descriptors
local
environment
sites
· · · · · ·
X1(r1j … r1n)
X2(r2k … r2m)
Xn(rnj … rnm)
machine learning
Y =f(X; !)
Y (energy, force, stress)
DFT properties
grid search
evolutionary algorithm
NANO281
Available open source on Github: https://github.com/materialsvirtuallab/mlearn
Zuo, Y.; Chen, C.; Li, X.; Deng, Z.; Chen, Y.; Behler, J.; Csányi, G.; Shapeev, A. V.; Thompson, A. P.; Wood, M. A.; et al. A Performance and Cost
Assessment of Machine Learning Interatomic Potentials. arXiv:1906.08888 2019.
37. Ni-Mo SNAP performance
NANO281
q SNAP significantly outperforms in binary and
bcc Mo for energy and elastic constants.
Energy
Forces
Elasticconstants
39. Application:Investigating Hall-Petch strengthening
in Ni-Mo
NANO281
q ~20,000 to ~455,000 atoms
q Uniaxially strained with a strain rate of 5×108 s-
1
q SNAP reproduces the Hall-Petch relationship,
consistent with experiment[1].[1] Hu et al. Nature, 2017, 355, 1292
40. ML-IAP:Accuracy vs Cost
NANO281
Testerror(meV/atom)
Computational cost s/(MD step atom)
a
b
Jmax = 3
Jmax = 3
2000 kernels20 polynomial powers
hidden layers [16, 16]
Mo dataset
Zuo et al. A Performance and Cost Assessment of Machine Learning Interatomic Potentials. arXiv:1906.08888 2019.
41. Modeling relationships that are too complex to
understand right now….
Oct 10 2019
What’s the equivalent of
this problem in materials
characterization?
Identify
Absorption
Species
Learner N
Peak Shifting /
Alignment
Spectra Norm.
(Optional)
Feature Trans.
Intensity Norm.
Similarity
Measure
…
Learner 1
Peak Shifting /
Alignment
Spectra Norm.
(Optional)
Feature Trans.
Intensity Norm.
Similarity
Measure
Rank 1 Rank N
Combined
Rank
Prob. Each
Spectrum
Database
Zheng et al. Automated generation and
ensemble-learned matching of X-ray
absorption spectra. npj Comput. Mater.
2018, 4 (1), 12
500,000 computed
K-edge XANES of >
50,000 crystals (In
progress: L edges
and EXAFS)
~84% accuracy in
identifying correct
oxidation state and
coordination
environment!
42. Random Forest Coordination Environment
Classification
Oct 10 2019
Alkali
TM
Post-TM
Metalloid
Carbon
Alkaline
43. Other examples
NANO281
Oviedo et al. (2019) npj Comput. Mater. 5, 60.
Classification of crystal structures from XRD
Schütt et al. (2019) Nature Comm. 10, 5024
Prediction of wavefunctions