Software Tools, Methods and Applications of Machine Learning in Functional Materials Design
1. Software Tools, Methods and Applications of Machine Learning
in Functional Materials Design
Anubhav Jain, Energy Storage & Distributed Resources Division, Berkeley Lab
Generate large computational data sets with pymatgen, FireWorks, and atomate
job 1
job 2
job 3 job 4
structure! workflow! database of all
workflows!
automatically submit + execute!output files + database!
Create machine-learning
models with matminer
Together with collaborators, we
have developed several software
packages for high-throughput data
generation, which have been used
to run millions of density functional
theory calculations and powers the
Materials Project database. This
software is available open-source
a n d w i t h c o m p r e h e n s i v e
documentation and support.
Left: the computational infrastructure of
the Materials Project database (Jain et
al., APL Materials 2013) is now powered
by the infrastructure described here.
Right: Calculating the electronic
transport properties of >40,000
materials (Ricci et al., Sci Data 2017),
resulting in the experimental discovery
of the YCuTe2 thermoelectric (Aydemir
et al., JMCA 2016).
experiment
computation
Atomate is a library of standardized workflows
for VASP, Q-Chem, and FEFF codes. Given as
little information as a crystal structure or
molecule, atomate can perform >15 types of
calculation procedures, including band
structure, elastic tensor, thermal expansion,
and work function. Users can customize
settings or use defaults tuned by our team.
When calculations complete, the output files
are automatically parsed via pymatgen and the
information is organized into a database. A
series of database “builders” in atomate collect
data from individual calculations to generate
further database collections, including
searchable summary reports of materials, data
for constructing plots, and higher-level analyses
like phase diagram generation.
www.pymatgen.org https://atomate.org
https://materialsproject.github.io/fireworks
Example: Order-disorder
resolve partial or mixed
occupancies into a fully
ordered crystal structure
(e.g., mixed oxide-fluoride site
into separate oxygen/fluorine)
The pymatgen software
reads crystal structures from
a variety of file formats or
the Materials Project API. It
can perform many structure
operations such as:
• surface / slab generation
• order-disorder
• interstitial finding
• chemical substitution
and also create inputs for
many common DFT codes.
FireWorks is a workflow software that can
manage, monitor, and execute millions of
computational workflows across multiple
supercomputing centers. FireWorks
supports many features needed for the
materials science domain, including dynamic
(self-modifying) workflows and automatic
failure detection and rerun.
A recent plug-in for FireWorks called
rocketsled assists users in performing
machine learning-based adaptive design of
a search space, minimizing the number of
calculations needed to find a solution.
The matminer package lets one load data from
atomate databases, external web databases, or one
of 24 built-in large materials data sets. It can perform
feature extraction using >40 state-of-the-art methods,
and perform visualization or data mining using
common machine learning libraries. Matminer is
available open-source and comprehensive examples
of performing machine learning are available in the
form of interactive “Jupyter” notebooks.
https://hackingmaterials.github.io/matminer
Funding for this research was
provided by the U.S. Department
of Energy, Basic Energy Sciences,
Materials Science Division through
an Early Career Grant. Computing
resources were provided by the
National Energy Research Scientific
Computing Center.
https://hackingmaterials.lbl.gov
@jainpapers
Over 40 feature
extraction
routines are
implemented.
atomate output
database(s)
phase
diagrams
Pourbaix
diagrams
diffusivity via MDband structure analysis