This presentation was part of the workshop on Materials Project Software infrastructure conducted for the Materials Virtual Lab in Nov 10 2014. It presents an introduction to the Python Materials Genomics (pymatgen) materials analysis library. Pymatgen is a robust, open-source Python library for materials analysis. It currently powers the public Materials Project (http://www.materialsproject.org), an initiative to make calculated properties of all known inorganic materials available to materials researchers. These are some of the main features:
1. Highly flexible classes for the representation of Element, Site, Molecule, Structure objects.
Extensive io capabilities to manipulate many VASP (http://cms.mpi.univie.ac.at/vasp/) and ABINIT (http://www.abinit.org/) input and output files and the crystallographic information file format. This includes generating Structure objects from vasp input and output. There is also support for Gaussian input files and XYZ file for molecules.
2. Comprehensive tool to generate and view compositional and grand canonical phase diagrams.
3. Electronic structure analyses (DOS and Bandstructure).
4. Integration with the Materials Project REST API.
1. materiaIs
virtuaLab
Python Materials
Genomics
(pymatgen)
Shyue Ping Ong
November 10, 2014
MAVRL Workshop 2014
2. Python Materials Genomics (pymatgen)
Core materials analysis powering the Materials
Project
• Defines core extensible Python objects for materials data
representation.
• Provides a robust and well-documented set of structure
and thermodynamic analysis tools relevant to many
applications.
• Establishes an open platform for researchers to
collaboratively develop sophisticated analyses of materials
data.
November 10, 2014 MAVRL Workshop 2014
3. Vision for pymatgen
To be the leading open-source software platform for
robust materials analysis.
November 10, 2014 MAVRL Workshop 2014
4. pymatgen is now global.
November 10, 2014 MAVRL Workshop 2014
6. from pymatgen import dao
1. Great code enables great materials science.
2. Comprehensive tests ensure robustness.
3. Clear documentation leads to more usage.
4. More usage improves code quality (and increases citations).
5. Even complex scientific ideas can be broken down into simple
interfaces.
6. Though deep (Hulk-level) understanding is often necessary to
develop the right interface design.
7. Slow and accurate is better than fast and wrong.
8. But efficiency matters for core classes.
9. The law of thermodynamics apply: code entropy always increases
in a closed system.
10. Constant refactoring is the hallmark of an open platform.
November 10, 2014 MAVRL Workshop 2014
7. Most frequently used packages
Package name Purpose
core (start here) Defines classes and methods that are common to many
analyses, e.g., Element, Site, PeriodicSite, Lattice, Structure,
Molecule, etc.
electronic_structure Bandstructure, DOS classes. Plotting and analysis tools.
entries ComputedEntry – Basic unit of most thermodynamic and
other analyses (e.g., constructing phase diagrams or
reaction enthalpies)
Compatibility – Defines schemes to “correct” entries for
compatibility between different computational methods
and/or certain analysis
io Input and output between pymatgen’s objects and various
file formats. E.g., reading CIF files, writing and reading VASP
input and output, ABINIT, Gaussian, Qchem, ….
symmetry Symmetry analysis. Spacegroup, point group, etc.
November 10, 2014 MAVRL Workshop 2014
8. Analysis packages
Package name Purpose
phasediagram Constructing compositional and grand canonical phase
diagrams. Analyze stability.
analysis Master package containing lots of different materials
analyses. A few key ones are:
.structure_matcher
(Will Richards, Steve
Dacek and Shyue Ping)
In-house super powerful structure matching algorithm. Tells you
whether two structures are the same, have the same framework,
etc. Use this to avoid duplicate calculations.
.reaction_calculator Calculate enthalpies of reactions. Balances reactions.
.diffusion_analyzer Analyze MD runs to determine diffusivity, conductivity, Arrhenius
plots, etc.
.pourbaix.* Constructs Pourbaix diagrams. Similar to phase diagrams, except
studies aqueous stability.
.defects
(Bharat Medasani)
Analysis of defects – interstitial, vacancies, etc. Highly
experimental at this stage
November 10, 2014 MAVRL Workshop 2014
9. Other packages
Structure manipulations and generation
Package name Purpose
transformations Defines ways of making changes to structures. Examples:
Interfacing with the Materials Project
November 10, 2014
substitute a species for another, remove certain species,
ordering of disordered structures, etc.
structure_prediction Predict completely novel structures! Based on algorithms
developed by Geoffroy Hautier and fine-tuned by Will
Richards
alchemy High-throughput tools to make lots of changes to lots of
structures in a manner that preserves provenance /
history.
Package name Purpose
Matproj High-level interface to the Materials Project RESTful API.
Allows one to download computed data (energies, DOS,
bandstructures) and relaxed structures.
MAVRL Workshop 2014
10. v2.7.0 è v3.0.7
www.pymatgen.org stats
• Steady increase over the past year
• 1000 views per month on average
500 commits over the last year.
Pymatgen coders work Mon-Wed.
Major new features /
functionality
• Support for ABINIT 7.6.1
(ABINIT group/UCL)
• Defects (Haranczyk/LBNL)
• Qchem (JCESR)
• Robust units handling
(UCSD/UCL)
• XRD pattern simulation
(UCSD)
# of active contributors has more than doubled!
Major new users / fans
November 10, 2014 MAVRL Workshop 2014
11. Getting Started
http://www.pymatgen.org is your friend
• Usage guide: http://pymatgen.org/usage.html
• Simple examples: http://pymatgen.org/examples
• API docs: http://pymatgen.org/modules.html
Source code
• Openly available on Github:
https://github.com/materialsproject/pymatgen
• Very comprehensive unit tests
November 10, 2014 MAVRL Workshop 2014
12. Practical example of typical usage
You have an experimental collaborator who has an
idea that substituting Sn for Ge in Li4GeS4, a fast, but
expensive Li-ion conductor, might improve its
properties and be cheaper. But before he proceeds to
attempt a potentially difficult synthesis, he wants to
know if you can use first principles calculations to
estimate if a potential Li4SnS4 phase would be stable.
What would you do?
November 10, 2014 MAVRL Workshop 2014
13. Broad steps
Get the known
Li4GeS4 phase
November 10, 2014
Substitute Sn
for Ge and
generate the
input files.
Do calculations
with your
favorite DFT
code.
Construct the
phase diagram
for the Li-Sn-S
system to
understand if
Li4SnS4 is stable.
MAVRL Workshop 2014
14. Hands-on Tutorial
For this tutorial, we will be using the excellent
IPython Notebook. Basically, the notebook is like a
superpowered scratch space for you to write quick analyses
and scripts. You can install and run this software on your own
computers, but for the purposes of this workshop, we are
running a notebook server on Amazon EC2, which has all the
necessary packages (pymatgen, etc.) already installed.
1. Go to http://bit.ly/mavrlwksp2014 (bypass any security
warnings). When asked for a password, type in
“MVLworkshop”.
2. Create a new notebook. Rename it as
first_name_last_name_pmg.
November 10, 2014 MAVRL Workshop 2014
15. Step 1: Getting Li4GeS4
Option 1: The traditional, slow and bad option
• Do a search and download the CIF for Li4GeS4 from an an existing database like
the ICSD (this is already done for you, the filename is ICSD_95649.cif).
Hint: If you ever want to see the doc of any method, use ipython’s “?”
syntax. For example, “Structure.from_file?” will show you the doc of what it
does and the args. Pymatgen is extremely well-documented.
Option 2: Use the MPRester inteface to the Materials API
• Register at www.materialsproject.org.
• Go to www.materialsproject.org/dashboard.
• Generate your API key and copy it.
• Use pymatgen’s Materials API interface to get the structure.
November 10, 2014
Advantages
1. You get pre-relaxed structures
2. You can get a lot of structures
at once
MAVRL Workshop 2014
16. Step 2: Doing the substitution and generating the
input files
Simple method:
• Depending on whether you got the structure from the ICSD or Materials Project, you
need to replace either Ge4+ or Ge with Sn.
• Pymatgen has support for all VASP input files. Butgenerating them manually is a bit of
work. We will use what is known as “InputSets” to generate input files. Input sets are
basically well-defined rules for generating inputs from structures. They define things like
what the appropriate INCAR parameters (e.g., U value for each element), an algorithm
for generating a KPOINTS grid, the PSP to use. We will use the MPVaspInputSet, which is
the well-tested set of parameters that is currently being used in the Materials Project.
November 10, 2014 MAVRL Workshop 2014
17. Step 2: Doing the substitution and generating the
input files, contd.
“Advanced” method:
• Method described in the previous slide works perfectly fine and is the fastest way. But a
major problem is that all provenance is lost, i.e., if you revisit your calculations many
months down the road, you have forgotten how you generated the structure and input
files in the first place.
• Pymatgen’s alchemy + transformations packages are designed to deal with such issues. A
bit more complex to use, but if you are doing a lot of calculations on many different
structures, it is important to keep a record of the history of each structure came from.
• An example is given below. We will not go through this exercise, but just mention that in
every directory of VASP input files, there will be a “transformations.json” file that records
everything that has been done, e.g., the source of the structure, the transformations
performed, etc. This file will be parsed by pymatgen-db to be recorded in the database.
November 10, 2014 MAVRL Workshop 2014
18. Step 3: Do your DFT calculations
We are not actually going to run DFT calculations in this
tutorial. We will just note that the Materials Project
Infrastructure has many tools (Custodian, Fireworks) to help
do this better as well (covered in later parts).
For this tutorial, a vasprun.xml from a completed calculation is
already present for you to parse. The Vasprun object is
pymatgen’s highly efficient parser for the vasprun.xml. From
that, we can get a ComputedEntry for analysis.
November 10, 2014
Hint: Ipython has an excellent tab-completion system. For example, if you
type the filename as “vasprun” and hit tab, ipython will autocomplete it for
you, similar to most Unix-command lines.
MAVRL Workshop 2014
19. Step 4: Construct the Li-Sn-S phase diagram
To construct the phase
diagram of Li-Sn-S, you
need the energies of all
structures in the Li-Sn-
S system, i.e., all Li, Sn,
S, LixSny, LixSy, SnxSy and
LixSnySz phases.
• Rapidly becomes a lot of
calculations for more
components.
• = Good news is that we
can use the MPRester to
get pre-calculated data
from the Materials Project!
November 10, 2014 MAVRL Workshop 2014
21. Summary
Pymatgen is an extremely powerful tool for materials
analysis and for facilitating first principles calculations.
Tight integration with the Materials Project is a key
feature – enables analyses that would otherwise be very
time-consuming to perform.
Very well-documented and robustly tested.
Supported by a large and growing community of materials
developer-scientists.
November 10, 2014 MAVRL Workshop 2014