Prof Ong gave a webinar talk on the AI Revolution in Materials Science for the Singapore Agency of Science Technology and Research (A*STAR). In this talk, he discussed the big challenges in materials science where AI can potentially make a huge impact towards addressing as well as outstanding challenges and opportunities to bringing forth the AI revolution to the materials domain.
3. Jul 2 2020
Science
•Thermodynamics
•Kinetics
•Mechanics
Data
•High-throughput
automation
•Data UI
Learning
•Data mining
•AI and machine
learning
•Quantified
chemical intuition
Application
•Energy storage
•Energy efficiency
•Materials for
Extreme
Environments
Community
•Open-source
software
•Open data
•Open APIs
•Outreach
Bond attributes
State attributes
Atom attributes
2. Update atom 3. Update state
New atom attributes
New bond attributes
New state attributes
MEGNet update steps
Outputs
MEGNet
1. Update bond
Dynamically
add
continuation
AIMD job
starting from
previous one
Dynamically add initial
AIMD jobs running at
different temperatures
Converged? Converged? Converged?
AIMD
simulation
AIMD
simulation
AIMD
simulation
Setup
simulation
box
Initial
relaxation
Start
End
N N N
Y Y Y
4. Machine learning (ML) is the scientific study of
algorithms and statistical models that computer
systems use to perform a specific task without
using explicit instructions, relying on patterns
and inference instead.
Jul 2 2020
Artificial
Intelligence
Machine
Learning
Supervised
5. ML in Materials Science
Jul 2 2020
Identify
target/problem Data Collection
Model Choice /
Featurization Training Application
Domain knowledge
• Is the problem worth solving?
• Is the problem be solved by
other (simpler) means?
• Can the target be learned?
Data Sources
Existing DIY
Elemental Features Structural Features
Classification
Decision tree
Logistic regression
...
Regression
GPR
KRR
Multi-linear
Random forest
SVR
Neural networks
Graph models
...
Materials Science Specific
ænet
Automatminer
CGCNN
DeepChem
MEGNet
PROPhet
SchnetPack
TensorMol
...
6. Where is ML valuable in Materials Science?
Jul 2 2020 IMRE Webinar
Things that are too slow/difficult
to compute
Relationships that are beyond our
understanding (at the moment)
> 106 atoms
Element-wise classification model
Prediction
Predicted
Input
CN4 - Motif 1
CN5 - Mo
CN6
CN4
1: single bond 2: L-shaped 2: water-like
2: bent 120
degrees
2: bent 150
degrees
2: linear 3: T-shaped 3: trigonal planar
3: trigonal non-
coplanar
4: square co-
planar
4: tetrahedral
4: rectangular
see-saw
4: see-saw like
4: trigonal
pyramidal
5: pentagonal
planar
5: square
pyramidal
5: trigonal
bipyramidal
6: hexagonal
planar
6: octahedral
6: pentagonal
pyramidal
7: hexagonal
pyramidal
7: pentagonal
bipyramidal
8: body-centered
cubic
8: hexagonal
bipyramidal
12: cuboctahedra
??
7. AI Use Case 1:
Predicting Materials
Properties
Efficiently and Accurately
Jul 2 2020
IMRE Webinar
8. Tesla-ian approach to materials discovery
Jul 2 2020
~20 years + $ billions of development
Reduced time and cost of
development
IMRE Webinar
If he [Edison] had a needle to find in a haystack,
he would not stop to reason where it was most
likely to be, but would proceed at once with the
feverish diligence of a bee, to examine straw
after straw until he found the object of his
search. … Just a little theory and calculation
would have saved him 90% of his labor.
a lot of calculation
9. “First Principles” Property Prediction
Jul 2 2020
Eψ(r) = −
h 2
2m
∇2
ψ(r)+V(r)ψ(r)
Schrodinger Equation
0 0.2 0.4 0.6 0.8 1
0
50
100
150
200
250
Diffusion coordinate
Energy
(meV)
LCO
NCO
Material Properties
Phase stability1
Diffusion barriers2
Charge
densities6
Surface energies
and Wulff shape3
Density functional theory
(DFT) approximation
Generally applicable to any
chemistry
1 Ong et al., Chem. Mater., 2008, 20, 1798–1807.
2 Ong et al., Energy Environ. Sci., 2011, 4, 3680–3688.
3 Tran et al., Sci. Data, 2016, 3, 160080.
4 Deng et al., J. Electrochem. Soc., 2016, 163, A67–A74.
5 Wang et al., Chem. Mater., 2016, 28, 4024–4031.
6 Ong et al., Phys. Rev. B, 2012, 85, 2–5.
Mechanical
properties4
Electronic
structure5
Inherently
scalable
IMRE Webinar
10. The Materials Project is an open science project to
make the computed properties of all known
inorganic materials publicly available to all
researchers to accelerate materials innovation.
June 2011: Materials Genome Initiative which
aims to “fund computational tools, software,
new methods for material characterization, and the
development of open standards and databases that
will make the process of discovery and
development of advanced materials faster, less
expensive, and more predictable”
https://www.materialsproject.org
Jul 2 2020 IMRE Webinar
11. IMRE Webinar
“Google” of Materials
Jul 2 2020
1 Jain et al. APL Mater. 2013, 1 (1), 11002. .
Structure
Electronic
Structure
Elastic
properties
XRD
Energetic
properties
12. First principles calculations are not enough!
Jul 2 2020
Reasonable ML
Deep learning
(AA’)0.5(BB’)0.5O3 perovskite
2 x 2 x 2 supercell,
10 A and 10 B species
= (10C2 x 8C4)2 ≈107
IMRE Webinar
ratio of (634 + 34)/485 ≈ 1.38 (Supplementary Table S-II) with b5%
difference in the experimental and theoretical values. This again
agree well with those calculated from the rule of mixture (Supplemen-
tary Table-III). The experimental XRD patterns also agree well with
Fig. 2. Atomic-resolution STEM ABF and HAADF images of a representative high-entropy perovskite oxide, Sr(Zr0.2Sn0.2Ti0.2Hf0.2Mn0.2)O3. (a, c) ABF and (b, d) HAADF images at (a, b) low
and (c, d) high magnifications showing nanoscale compositional homogeneity and atomic structure. The [001] zone axis and two perpendicular atomic planes (110) and (110) are marked.
Insets are averaged STEM images.
Jiang et al. A New Class of
High-Entropy Perovskite
Oxides. Scripta Materialia
2018, 142, 116–120.
Materials design is
combinatorial
Machine
Learn from
Here…
To predict
everything
else
13. Graphs as a natural representation for materials,
i.e.,molecules and crystals
Jul 2 2020
Global state (u)
ek1
vsk
vrk
IMRE Webinar
Zr … … … … … …
Zs … … ... … … …
… … … … … … …
T p S … … … …
𝑒
!
"!"!
"
#"
Chen et al. Chem. Mater. 2019, 31 (9), 3564–3572. doi: 10.1021/acs.chemmater.9b01294.
14. Information flow between elements in a graph
network
Jul 2 2020
Global state (u)
ek1
vsk
vrk
IMRE Webinar
T p S … … … …
𝒆𝒌𝟏
#
= 𝜙$(𝒗𝒓𝒌⨁𝒗𝒔𝒌⨁𝒆𝒌𝟏⨁𝒖)
Bond update
Chen et al. Chem. Mater. 2019, 31 (9), 3564–3572. doi: 10.1021/acs.chemmater.9b01294.
15. Information flow between elements in a graph
network
Jul 2 2020
Global state (u)
ek1
vsk
vrk
IMRE Webinar
T p S … … … …
ek2 ek3
Atom update
𝒆𝒌𝟏
#
= 𝜙$(𝒗𝒓𝒌⨁𝒗𝒔𝒌⨁𝒆𝒌𝟏⨁𝒖)
Bond update
𝒗𝒓𝒌
#
= 𝜙'(𝒗𝒓𝒌⨁𝒆𝒌𝒓⨁𝒖)
Chen et al. Chem. Mater. 2019, 31 (9), 3564–3572. doi: 10.1021/acs.chemmater.9b01294.
16. Information flow between elements in a graph
network
Jul 2 2020
Global state (u)
ek1
vsk
vrk
IMRE Webinar
T p S … … … …
ek2 ek3
Atom update
𝒆𝒌𝟏
#
= 𝜙$(𝒗𝒓𝒌⨁𝒗𝒔𝒌⨁𝒆𝒌𝟏⨁𝒖)
Bond update
State update
𝒖′ = 𝜙((𝑽⨁𝑬⨁𝒖)
𝜙 are approximated using
Universal
approximation
theorem
Cybenko et al. Math. Control Signal Systems
1989, 2 (4), 303–314.
𝒗𝒓𝒌
#
= 𝜙'(𝒗𝒓𝒌⨁𝒆𝒌𝒓⨁𝒖)
Chen et al. Chem. Mater. 2019, 31 (9), 3564–3572.
doi: 10.1021/acs.chemmater.9b01294.
17. MatErials Graph Networks (MEGNet)
Jul 2 2020 IMRE Webinar
Implementation is open source at https://github.com/materialsvirtuallab/megnet.
Modular blocks can be stacked
to generate models of arbitrary
complexity and “locality” of
interactions
18. Performance on Materials Project Crystals
Jul 2 2020 IMRE Webinar
Property MEGNet SchNet1 CGCNN2
Formation energy Ef (meV/atom) 28
(60,000)
35 39
(28,046)
Band gap Eg (eV) 0.330
(36,720)
- 0.388
(16,485)
log10 KVRH (GPa) 0.050
(4,664)
- 0.054
(2,041)
log10 GVRH (GPa) 0.079
(4,664)
- 0.087
(2,041)
Metal classifier 78.9%
(55,391)
- 80%
(28,046)
Non-metal classifier 90.6%
(55,391)
- 95%
(28,046)
1 Schutt et al. J. Chem. Phys. 148, 241722 (2018)
2 Xie et al. PRL. 120.14 (2018): 145301.
Chen et al. Chem. Mater. 2019, 31 (9), 3564–3572. doi: 10.1021/acs.chemmater.9b01294.
19. THE Data Problem in Materials Science
High value data (e.g., experimental, accurate
calculations) tend to be scarce
Jul 2 2020
PBE, 52348
HSE, 6030
GLLB, 2290
SCAN, 472 Expt, 2700
Most inaccurate,
least valuable
Most valuable
Band
gaps of
crystals
20. Multi-fidelity graph networks
Jul 2 2020
Fidelity: PBE, GLLB-SC, SCAN HSE, Exp, …
Embedding
PBE: 0, GLLB-SC: 1, SCAN: 2, HSE: 3, Exp: 4, …
Encoding
GLLB-SC
SCAN
PBE
HSE
Fidelity-to-state embedding
Gaussian Expansion
…
Atom
features
…
Structure graph
Graph convolution Graph convolution
Readout
Neural networks
Target
Latent feature vector
State
State
b
a
Bond
features
State
features
d
Atomic number embedding
Z
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
0.00
0.25
0.50
0.75
PBE GLLB−SC SCAN HSE Exp
MAE
(eV)
●
●
●
●
●
1−fi
1−fi−stacked
2−fi
4−fi
5−fi
b c
e
d
a
Use large amounts of low-
fidelity data, we can improve
underlying structural
representations in graph deep
learning models and enhance
the performance on high-fidelity
predictions!
t-SNE plot of latent structural features
Without PBE
With PBE
Chen et al. arXiv:2005.04338 [cond-mat] 2020.
21. Application:Discovery of Novel Phosphor for
LEDs
A
Solid-state Lighting
Periodic Table
SrO
Sr3Al2O6
SrAl2O4
LiAl5O8
LiAlO2
Li5AlO4
Li2O
Al2O3
SrAl4O7
Sr2LiAlO4
Sr Li Al O
Sr1 Sr2
2.99
2.65
2.65 2.54
2.99
2.54
2.56
2.60
2.59
2.59
2.66
2.69
2.61
2.69
2.96
2.96
C
B
M-X-O
M-L-X-O
M: Sr/Ca/Ba
L: Li/Mg/Y
X: B/Al/Si/P
Unexplored
chemistries!
Ternaries are
well-explored
# of unique crystals with “phosphor” in title in ICSD
Wang, Z.; Ha, J.; Kim, Y. H.; Im, W. Bin; McKittrick, J.; Ong, S. P. Joule 2018, DOI: 10.1016/j.joule.2018.01.015.
Jul 2 2020
23. Prediction to LED device
Jul 2 2020
b
a c
Eu@Sr2 Eu@Sr1
CBM
5d
4f
Eu2+
/Ce3+
free ion
E
(eV)
d
Eu2+-activated
High color quality
(CRI > 90) LED
prototypes utilizing
SLAO phosphor
Wang, Z.; Ha, J.; Kim, Y. H.; Im, W. Bin; McKittrick, J.; Ong,
S. P. Joule 2018, DOI: 10.1016/j.joule.2018.01.015.
IMRE Webinar
25. AI Use Case 2:
Accessing scale and
complexity
Jul 2 2020
IMRE Webinar
26. Real-world materials are seldom isolated
molecules or bulk crystals
We are really good at calculating these:
But real-world materials look like these:
Jul 2 2020
27. Machine learning the potential energy surface
Jul 2 2020 IMRE Webinar
1 Behler et al. PRL. 98.14 (2007): 146401.
2 Shapeev MultiScale Modeling and Simulation 14, (2016).
3 Bart ́
ok et al. PRL. 104.13 (2010): 136403.
4 Thompson et al. J. Chem. Phys. 285, 316330 (2015)
the descriptors for these local environments and the ML approach/functional expression used
to map the descriptors to the potential energy. The detailed formalism of all four ML-IAPs
are provided in the Supplementary Information. Here, only a concise summary of the key
concepts and model parameters behind the ML-IAPs in chronological order of development,
is provided to aid the reader in following the remainder of this paper.
1. High-dimensional neural network potential (NNP). The NNP uses atom-centered
symmetry functions (ACSF)39
to represent the atomic local environments and fully con-
nected neural networks to describe the PES with respect to symmetry functions.11,12
A separate neural network is used for each atom. The neural network is defined by
the number of hidden layers and the nodes in each layer, while the descriptor space is
given by the following symmetry functions:
Gatom,rad
i =
Natom
X
j6=i
e ⌘(Rij Rs)2
· fc(Rij),
Gatom,ang
i = 21 ⇣
Natom
X
j,k6=i
(1 + cos ✓ijk)⇣
· e ⌘0(R2
ij+R2
ik+R2
jk)
· fc(Rij) · fc(Rik) · fc(Rjk),
where Rij is the distance between atom i and neighbor atom j, ⌘ is the width of the
Gaussian and Rs is the position shift over all neighboring atoms within the cuto↵
radius Rc, ⌘0
is the width of the Gaussian basis and ⇣ controls the angular resolution.
fc(Rij) is a cuto↵ function, defined as follows:
fc(Rij) =
8
>
>
< 0.5 · [cos (
⇡Rij
Rc
) + 1], for Rij Rc
ZnO44
and Li3PO4.45
Gaussian Approximation Potential (GAP). The GAP calculates the similar-
ity between atomic configurations based on a smooth-overlap of atomic positions
(SOAP)10,46
kernel, which is then used in a Gaussian process model. In SOAP, the
Gaussian-smeared atomic neighbor densities ⇢i(R) are expanded in spherical harmonics
as follows:
⇢i(R) =
X
j
fc(Rij) · exp(
|R Rij|2
2 2
atom
) =
X
nlm
cnlm gn(R)Ylm(R̂),
The spherical power spectrum vector, which is in turn the square of expansion coeffi-
cients,
pn1n2l(Ri) =
l
X
c⇤
n1lmcn2lm,
Distances and angles
Neighbor density
𝐸𝑛𝑒𝑟𝑔𝑦 = 𝑓 𝑎𝑡𝑜𝑚 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛𝑠
𝐹𝑜𝑟𝑐𝑒 = 𝑓′ 𝑎𝑡𝑜𝑚 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛𝑠
Pymatgen
Fireworks + VASP
DFT static
Dataset
Elastic deformation Distorted
structures
Surface generation Surface
structures
Vacancy + AIMD Trajectory
snapshots
(low T, high T) AIMD Trajectory
snapshots
Crystal
structure
property fitting
E
e
e.g. elastic, phonon
···
energy weights
degrees of freedom
···
cutoff radius
expansion width
S1
S2
Sn
· · ·
rc
atomic descriptors
local
environment
sites
· · · · · ·
X1(r1j … r1n)
X2(r2k … r2m)
Xn(rnj … rnm)
machine learning
Y =f(X; !)
Y (energy, force, stress)
DFT properties
grid search
evolutionary algorithm
Automated workflow for
ML-IAP development
28. Do ML interatomic potentials work?
Jul 2 2020
ML
interatomic
potentials
Traditional
interatomic
potentials
Zuo et al. J. Phys. Chem.A 2020, 124 (4), 731–745.
29. Application:Multi-Principal ElementAlloys
(MPEAs),aka“High-Entropy”Alloys
Jul 2 2020
atoms of different sizes. This has important thermodynamic im-
plications. Ideal configurational entropy is based on populating
identical lattice sites with chemically different but equal-sized
atoms. Atoms with different sizes can bring some uncertainty in
atom location, thus giving an excess configurational entropy term.
This effect may be small in dilute solutions, where the location of
the minority atom is constrained by the surrounding majority
atoms (Fig. 5a). The uncertainty in atom location increases with
increasing size differences and concentrations. In multi-principal
2.3.4. Magnitudes of excess entropy terms relative to S∅;ideal
Several terms contribute to the total molar entropy of phase ∅,
S∅. Atomic configurations give an ideal component for SS phases via
the Boltzmann equation (SSS;ideal, Section 2.1.1) and for IM phases
with the sub-lattice model (SIM;ideal, Section 2.2.2). An excess
configurational term, conf S
∅
, can result from SRO (Section 2.1.1) or
differences in atom sizes (Section 2.3.3). Other excess entropy
terms come from atomic vibrations (vibS
∅
), magnetic moments
Fig. 5. The effect of atom size difference on atom positions in (a) a dilute solution, where solute atoms are constrained to occupy lattice sites by surrounding solvent atoms and (b) a
complex, concentrated solution, where there is no dominant atom species and atom positions usually deviate from mean lattice positions. The variability in atom positions in (b)
contributes to an excess configurational entropy.
Traditional Alloy MPE Alloy
23 273 714 38 [268]
200 215 582 34 [268]
400 195 496 28 [268]
Ni AC, 1100"
C/24 h, CR 90%, 1000"
C/1 h FCC, GS ¼ 36 mm 1 $ 10!3
!196 499 1283 62 [268]
!70 357 1006 54 [268]
23 280 699 43 [268]
200 215 582 36 [268]
400 186 555 28 [268]
Ni AC, 1100"
C/24 h, CR 90%, 1000"
C/1 h FCC, GS ¼ 48 mm 1 $ 10!3
!196 300 835 48 [268]
!70 210 656 44 [268]
23 175 551 41 [268]
200 135 488 36 [268]
400 116 465 37 [268]
lowing acronyms are used: AC (as-cast); CR (cold-rolled); GB (grain boundary); GS (grain size); HR (hot-rolled); SC (slow cooled); SX (single crystal); WQ (wat
Miracle et al., Acta Materialia 2017, 122, 448–511.
30. NbTaMoW MPEA – Random is not really random
Jul 2 2020
Nb segregates
to GBs
Increased short-range order
Von Mises Strain Distribution
Li et al. npj Comput Mater 2020, 6 (1), 70.
31. Application:Diffusion studies of
Li3N superionic conductor
Jul 2 2020
eSNAP Exp.
Ea
(eV)
⟂c 0.255 0.290
∥c 0.327 0.490
total 0.269
𝜎RT
(mS/cm)
⟂c 29.6 1.2
∥c 2.32 0.01
total 17.3
HR ⟂c 0.42 ~ 0.3
∥c 0.54 ~ 0.5
total 0.44
IMRE Webinar
Ea and 𝜎RT : Appl. Phys. Lett. 1977, 30 (12), 621–623.
HR: J. Phys. C Solid State Phys. 1981, 14, 2731–2746.
Deng et al. submitted
Simulate ion diffusion
in grain boundaries
5000-10000 atom simulations
32. AI Use Case 3:
Interpreting complex
relationships
Jul 2 2020
IMRE Webinar
33. ML in image recognition
Jul 2 2020
Source: https://qz.com/1034972/the-data-that-changed-the-
direction-of-ai-research-and-possibly-the-world/
34. Equivalent problem in Materials Characterization
Jul 2 2020
??
Coordination environment
Oxidation state
….
38. the score of the least abundant atom in the unit cell. The
atomic abundance ranks are given in Table S1 (ESI†). Although
we calculate the earth abundance index for each candidate
material, we do not use this as a screening criterion.
3. Ionic conductivity screening
3.1 Training set
We compiled a training set of 40 crystal structures and reported
experimentally measured ionic conductivity values available in
the literature. These structures and ionic conductivity values
are listed in Table 1. The atomic crystalline structures we
employ for these 40 materials were downloaded from the
Inorganic Crystal Structure Database (ICSD)85
which contains
experimentally derived measurements. The work of Raccuglia
et al.17
demonstrates the value of learning on negative examples;
this supports our inclusion of many poor conductors in the
training set. This is a relatively small set of available data from
which to build a predictive model, and it is n
there is sufficient information available to
are best if good candidates for underlying ph
good features can be chosen.
From these atomistic structures, we co
that characterize the local atomic arrangem
of the crystals. These features are chos
candidates for exhibiting some correlation
tivity. These features depend only on the
electronegativities, and atomic radii of the a
require minimal effort for computation. C
features from electronic structure requir
expensive simulations and quickly becom
the large number of candidate materials to
20 features and their individual Pearson cor
conductivity for the structures in the train
Table 2. The reference values we use for b
Challenge #1:Data,data,data
Jul 2 2020
Note:All identifying information hidden for diplomatic purposes….
Too little data
Data “prejudice”
Efficient Band Gap Prediction for Solids
M. K. Y. Chan1
and G. Ceder2
Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
ineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
(Received 27 March 2010; published 5 November 2010)
d for the prediction of fundamental band gaps in solids using density functional
osed. Generalizing the Delta self-consistent-field (!SCF) method to infinite solids,
ased on total-energy differences and derived from dielectric screening properties of
l and semilocal exchange-correlation functionals (local density and generalized
ons), we demonstrate a 70% reduction of mean absolute errors compared to Kohn-
00 compounds with experimental gaps of 0.5–4 eV, at computational costs similar to
ons.
Lett.105.196403 PACS numbers: 71.15.Dx, 71.15.Mb, 71.20.!b
DFT) [1], in the Kohn-
ocal density (LDA [2]) or
) approximations for the
tional, has been success-
, electronic, magnetic and
ondensed matter systems.
d gap problem’’, in which
rectly predict the energy
occupied states, is a hin-
ncluding semiconductors,
ls, and thermoelectrics.
the Kohn-Sham gap EKS,
ues of lowest unoccupied
s, is identified as the band
ates band gaps (EExp) of
strate the problem, we plot
for a test set of over 100
with room-temperature
rticular interest for appli-
large, there is no discern-
d EExp.
by EKS in local and semi-
uted to their inherent lack
d delocalization error [7].
d gap is an excited-state
described by ground state
evoted to solving the DFT
d in addition to the Kohn-
he GW approximation [9],
exchange [11], hybrid and
ls, and modified Becke-
ficient method for predict-
computationally efficient
The method is drawn upon
sistent-field (!SCF) [14]
o as !-sol. The predicted
(blue crosses) in Fig. 1. The underlying physical principle
of !-sol, as in GW and screened-hybrid functionals, is a
consideration of the dielectric screening properties of
electrons. In the implementation of !-sol, as in hybrid
functionals and MBJ, one or two parameters are fixed to
minimize errors in predicting EExp for a test set of com-
pounds. Like MBJ and unlike GW and hybrid functionals,
!-sol is as computationally expedient as standard DFT.
Unlike MBJ, any existing DFT code and functional can be
used in conjunction with !-sol without recoding.
The fundamental gap EFG of a system is defined as the
energy required to create an unbound electron-hole pair:
EFG ¼ EðN þ 1Þ þ EðN ! 1Þ ! 2EðNÞ; (1)
where N is the number of electrons in the system. The
evaluation of EFG from (1) by explicit calculations of
energies of the system with N, N þ 1 and N ! 1 electrons,
i.e., the !SCF method, produces reasonable results for
atoms and molecules [7,14]. The problem that arises, in
0 1 2 3 4
0
1
2
3
4 ∆−sol (PBE)
Kohn−Sham (PBE)
Experimental Gap (eV)
Predicted
Gap
(eV)
FIG. 1 (color online). Kohn-Sham (dots) and !-sol (crosses)
No good methods
(theoretical or
experimental) ….
Perovskite studies dominate (small sample of publications from
just the last 2 years or so….)
- Accelerated Development of Perovskite-Inspired Materials via High-Throughput Synthesis and Machine-Learning Diagnosis. Joule 2019, 3 (6),
1437–1451
- Machine Learning Bandgaps of Double Perovskites. Scientific Reports 2016, 6, 19375.
- Howard, J. M.; Tennyson, E. M.; Neves, B. R. A.; Leite, M. S. Machine Learning for Perovskites’ Reap-Rest-Recovery Cycle. Joule 2019, 3 (2),
325–337.
- Multi-Fidelity Machine Learning Models for Accurate Bandgap Predictions of Solids. Computational Materials Science 2017, 129, 156–163.
- Rapid Discovery of Ferroelectric Photovoltaic Perovskites and Material Descriptors via Machine Learning. Small Methods 0 (0), 1900360.
- Rationalizing Perovskite Data for Machine Learning and Materials Design. The Journal of Physical Chemistry Letters 2018, 9 (24), 6948–
6954.
- Thermodynamic Stability Landscape of Halide Double Perovskites via High-Throughput Computing and Machine Learning. Adv. Funct. Mater.
2019, 29 (9), 1807280.
- Thermodynamic Stability Trend of Cubic Perovskites. J. Am. Chem. Soc. 2017, 139 (42), 14905–14908.
39. Opportunities
Ideal, but challenging:
• Combinatorial experiments
• Improved theory
Practical:
• Multi-fidelity ML
• Transfer learning
• Text mining
Jul 2 2020
Pilania et al. Computational Materials Science 2017, 129, 156–163.
LETTER RESEARCH
Fig. 2 | Prediction of new thermoelectric materials. a, A ranking of
thermoelectric materials can be produced using cosine similarities of
material embeddings with the embedding of the word ‘thermoelectric’.
Highly ranked materials that have not yet been studied for thermoelectric
applications (do not appear in the same abstracts as words ‘ZT’,
word thermoelectric. The width of the edges between ‘thermoelectric’
and the context words (blue) is proportional to the cosine similarity
between the word embeddings of the nodes, whereas the width of the
edges between the materials and the context words (red, green and purple)
is proportional to the cosine similarity between the word embeddings of
Tshitoyan et al. Nature 2019, 571 (7763), 95.
40. Challenge #2:Models
1. Need models to learn proper physics
and chemistry.
2. Avoid descriptor “soup” (LASSO and
feature importance are not magic
wands…).
3. Need to account for data uncertainty.
Jul 2 2020
Very bad for
interatomic potential
or generalization!
TABLE S2: Properties used in atom feature vector vi
Property Unit Range # of categories
Group number – 1,2, ..., 18 18
Period number – 1,2, ..., 9a 9
Electronegativity[7, 8] – 0.5–4.0 10
Covalent radius[3] pm 25–250 10
Valence electrons – 1, 2, ..., 12 12
First ionization energy[9]b eV 1.3–3.3 10
Electron affinity[10] eV -3–3.7 10
Block – s, p, d, f 4
Atomic volumeb cm3/mol 1.5–4.3 10
a
The lanthanide and actinide elements are considered as period 8 and 9 respectively.
b
Log scale is used for these properties.
TABLE S3: Properties used in bond feature vector u(i,j)k
MEGNet uses just Z for atomic feature
and achieved similar performance, for
both molecules AND crystals...
Another graph-based model….
41. Opportunities
Marrying physics/chemistry with ML models
Develop better model architectures, e.g., invertible
representations for true “inverse design”
𝑆𝑡𝑟𝑢𝑐𝑡𝑢𝑟𝑒 = 𝑓)*
(𝑝𝑟𝑜𝑝𝑒𝑟𝑡𝑖𝑒𝑠)
Jul 2 2020
43. Final Remarks
ML is changing the way we do
materials science, but many
challenges remain.
• A lot of hype
• A few real successes
• Many opportunities for improvement.
Machines are not a substitute for
scientists
• Identify the right problem
• Determine the main physics of the problem.
• Choose the right ML tool!
Jul 2 2020
Identify Purpose
and Target
Data Generation or
Collection
Featurization Training Application
Active learning
Domain knowledge
- Is target learnable?
- Is target ambiguous?
Data Sources
Existing DIY
Elemental Features Structural Features
Classification
Decision tree
Logistic regression
...
Regression
GPR
KRR
Multi-linear
Random forest
SVR
Neural networks
Graph models
...
Supervised
- Cross-validation
- Hyper-parameter optimization
Materials Science Specific
ænet
Automatminer
CGCNN
DeepChem
MEGNet
PROPhet
SchnetPack
TensorMol
...