SlideShare uma empresa Scribd logo
1 de 26
Baixar para ler offline
Open-source tools for generating and
analyzing large materials data sets
Anubhav Jain
Energy Technologies Area
Lawrence Berkeley National Laboratory
Berkeley, CA
ACS Spring, April 2017
Slides (already) posted to http://www.slideshare.net/anubhavster
Link is also listed at end of talk
2
“Civilization advances by extending the number of
important operations which we can perform
without thinking about them.”
- Alfred North Whitehead
We don’t work on catalysis, but we do write software
•  We don’t do research into heterogeneous
catalysis
•  We do build software to:
–  execute millions of calculations on supercomputing
centers
–  make it more straightforward to run density
functional theory calculations (mostly VASP, some
Gaussian/Q-Chem)
–  perform structural manipulations
–  analyze the results of calculations
3
Software technologies that we contribute to
4
	
(automatic materials
science workflows)
Custodian	
(calculation error
recovery)
	
(materials analysis
framework)
Base packages Derived packages
	
(workflow definition &
execution)
These are all open-source:
•  FireWorks, atomate, and matminer are led by our group
•  pymatgen and custodian are led by Prof. Ong group (UC San Diego)
•  All developed in coordination with Persson group (UC Berkeley)
	
(materials data mining)
Applications: The Materials Project database
5
Jain*, Ong*, Hautier, Chen, Richards, Dacek, Cholia, Gunter, Skinner, Ceder, and
Persson, APL Mater., 2013, 1, 011002. *equal contributions!
The Materials Project (http://www.materialsproject.org)
free and open
~30,000 registered users
around the world
>65,000 compounds
calculated
Data includes
•  thermodynamic props.
•  electronic band structure
•  aqueous stability (E-pH)
•  elasticity tensors
•  piezoelectric tensors
>75 million CPU-hours
invested = massive scale!
Applications: The Electrolyte Genome
6
data on ~22,000 molecules
(mainly geometry + IP/EA via
full adiabatic calcs)
Also deployed on the
Materials Project web site
L. Cheng, R.S. Assary, X. Qu, A. Jain, S.P. Ong, N.N. Rajput, et al.,
J. Phys. Chem. Lett. 6 (2015) 283–291.!
!
X. Qu, A. Jain, N.N. Rajput, L. Cheng, Y. Zhang, S.P. Ong, et al.,
Comput. Mater. Sci. 103 (2015) 56–67.!
Applications: Crystalium (Ong / Persson)
7
http://crystalium.materialsvirtuallab.org
surface energies for 142 polymorphs of
72 elements + rotatable Wulff shapes
certainly applicable to catalysis
computed & maintained by the Ong
group (UC San Diego) with support
from Persson Group (UC Berkeley)
R. Tran, Z. Xu, B. Radhakrishnan, D. Winston, W. Sun,
K. A. Persson, and S. P. Ong, Sci. Data, 2016, 3, 160080.!
Applications: Rapid data generation
8
M. de Jong, W. Chen, H.
Geerlings, M. Asta, and K. A.
Persson, Sci. Data, 2015, 2,
150053.!
M. De Jong, W. Chen, T.
Angsten, A. Jain, R. Notestine,
A. Gamst, M. Sluiter, C. K.
Ande, S. Van Der Zwaag, J. J.
Plata, C. Toher, S. Curtarolo,
G. Ceder, K. a Persson, and M.
Asta, Sci. Data, 2015, 2, 150009.!
>4500 elastic
tensors
>900
piezoelectric
tensors
>48000
Seebeck
coefficients +
cRTA transport
Ricci, Chen, Aydemir, Snyder,
Rignanese, Jain, & Hautier (in
submission)!
Let’s revisit the libraries
9
	
(automatic materials
science workflows)
Custodian	
(calculation error
recovery)
	
(materials analysis
framework)
Base packages Derived packages
	
(workflow definition &
execution)
These are all open-source:
•  FireWorks, atomate, and matminer are led by our group
•  pymatgen and custodian are led by Prof. Ong group (UC San Diego)
•  All developed in coordination with Persson group (UC Berkeley)
	
(materials data mining)
pymatgen – object-oriented materials analysis
10
www.pymatgen.org!
Ong, S. P., Richards, W. D., Jain, A., Hautier, G., Kocher, M., Cholia, S.,
Gunter, D., Chevrier, V. L., Persson, K. a. & Ceder, G. Python Materials
Genomics (pymatgen): A robust, open-source python library for
materials analysis. Comput. Mater. Sci. 68, 314–319 (2013).!
pymatgen – examples of analyses
11
phase diagrams
Pourbaix diagrams
diffusivity from MDband structure analysis
pymatgen - many useful tools made accessible
12
Structure Matcher
analyzes if two periodic
structures are equivalent, even
if they are in different settings
or have minor distortions
= ?!
Order-disorder
resolve partial or mixed
occupancies into a fully
ordered crystal structure
(e.g., mixed oxide-fluoride site
into separate oxygen/fluorine)
Many other tools, such as:
•  Automatic surface slab generator
•  Bond-valence sums to determine valence
•  Voronoi coordination as well as 3D coordination polyhedron analysis
•  Automatically find and insert interstitial sites
•  Powder diffraction pattern generation
•  Simple cost and materials availability estimators
custodian – fixing job errors
•  Custodian can wrap
around an executable
(e.g., VASP)
–  i.e., run custodian instead of
directly running VASP
•  During execution,
custodian will monitor
output files and detect
errors / problems
–  If so, it can change input files
and rerun the job
–  e.g., if ZPOTRF error
detected, rerun with ISYM=0
–  ever-expanding library of
fixes
13
FireWorks – scientific workflow software
•  FireWorks is an open-source scientific
workflow software
•  Materials Project, JCESR, and other
projects manage their runs with
FireWorks
–  >1 million jobs
–  >100 million CPU-hours
–  multiple computing clusters
•  You can write any kind of workflow
–  e.g., FireWorks is used for graphics
processing, machine learning, document
processing, and protein folding
–  #1 Google hit for “Python workflow
software”, top 5 for general scientific
workflow software
•  Detailed tutorials are available
14
Jain, A., Ong, S. P., Chen, W., Medasani, B., Qu, X., Kocher, M., Brafman, M.,
Petretto, G., Rignanese, G.-M., Hautier, G., Gunter, D. & Persson, K. A.
FireWorks: a dynamic workflow system designed for high-throughput
applications. Concurr. Comput. Pract. Exp. 22, 5037–5059 (2015).!
www.pythonhosted.org/
FireWorks!
FireWorks – screenshot of jobs status
15
Live version at http://fireworks.dash.materialsproject.org
atomate – our newest code (currently in beta)
16
Redesigns an older,, clunkier code (MPWorks)
translate minimal specifications into well-defined
FireWorks workflows. (FireWorks handles all the
execution and job management details)
What	is	the	
GGA-PBE	elas0c	
tensor	of	GaAs?
Advantages – reduce specialization
Because of the steep learning curve to
computational methods, there is often a single
group member assigned to a technique
17
“Alice knows how to do charged defect calculations.”!
“Bob is the one who can properly converge GW runs.”!
“Olga has all the scripts for phonon calculations.”!
Advantages – reduce errors
Let’s take a look at two alternate universes:
Automation reduces your chances
of being caught in universe #2!! 18
researcher! has coffee!
copies files from!
previous simulation!
edits 5 lines!
runs simulation,!
creates report!
forgets coffee!
copies files from!
previous simulation!
edits 4 lines!
forgets!
LHFCALC=F!
creates report, !
looks fine at first, !
in a month!
discovers it used the !
wrong functional!
1
2
researcher!
atomate – what’s available?
19
K. Mathew J. Montoya S. Dwaraknath A. Faghaninia
•  band structure
•  spin-orbit coupling
•  hybrid functional calcs
•  elastic tensor
•  piezoelectric tensor
•  Raman spectra
•  NEB
•  GIBBS method
•  QH thermal expansion
•  AIMD
•  FEFF method
•  LAMMPS MD
All past and present knowledge, from
everyone in the group, everyone previously
in the group, and outside collaborators,
about how to run calculations
M. Aykol
S.P. Ong
B. Bocklund T. Smidt
H. Tang
matminer (still in alpha)
20
MatMiner’s goal: help enable data mining studies
in materials science
matminer usage
•  Examples of usage on
the github page:
–  https://github.com/
hackingmaterials/
matminer
•  Coming next: new
types of crystal
structure descriptors
based on local
environment
21
Some lessons learned (1)
•  In the beginning, strong central coordination from
authority was needed to develop these
–  require that people contribute to common code, e.g.
pymatgen, and not write their own detached scripts
•  Once a code was “established”, less authority was
needed
–  people voluntarily contributed improvements rather than
writing their own code because this benefited them
•  Today the process is almost completely
decentralized
–  culture has changed
–  even for new codes, people rally around it rather than
build independent things
22
Some lessons learned (2)
•  It is helpful to have a strong BDFL (benevolent
dictator for life) for each codebase
•  Requirements for the BDFL:
–  very detail-oriented
–  cares about the code itself, not just the application
–  cares more about the code quality than about offending
teammates, i.e., will not accept poor quality contributions
–  at the same time, able to rally support from people and
convince them to contribute or clean up code
–  willing to work overtime to do things like write detailed
docs, answer questions from users, advocate for the code,
review commits, etc.
–  derives joy from building and deploying things!
23
Some lessons learned (3)
•  Computer scientists are useful for staying up to date in
the fast-moving world of software
–  2006: I took a graduate class in databases at a top CS
university; all SQL, not a single mention of “NoSQL”
–  2007: we use SQL to build a precursor to Materials Project
–  2011: We are designing the framework for Materials Project; I
have lots of experience with SQL and confident this is the way
to go; a computer scientist casually mentions NoSQL, its
growing prominence, and its potential applicability to our
problem
–  2017: We do almost everything in NoSQL
•  Lesson: software moves fast! Much faster than materials
science knowledge or methods. Don’t use “up to date”
data from 5 years ago to inform your decision.
24
Further resources
•  The Github web sites
–  www.github.com/materialsproject
–  www.github.com/hackingmaterials
•  Software carpentry
•  https://software-carpentry.org
25
Acknowledgements
•  Research group of Prof. Shyue Ping Ong
•  Research group of Prof. Kristin Persson
•  Funding: US Dept of Energy, Materials Science
Division
…and all extended collaborators for these various
projects!
26
Slides (already) posted to http://www.slideshare.net/anubhavster

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Discovering advanced materials for energy applications by mining the scientif...
Discovering advanced materials for energy applications by mining the scientif...Discovering advanced materials for energy applications by mining the scientif...
Discovering advanced materials for energy applications by mining the scientif...
 
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
The Materials Project: overview and infrastructure
The Materials Project: overview and infrastructureThe Materials Project: overview and infrastructure
The Materials Project: overview and infrastructure
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials Informatics
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learning
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 
Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...Extracting and Making Use of Materials Data from Millions of Journal Articles...
Extracting and Making Use of Materials Data from Millions of Journal Articles...
 
Automating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomateAutomating materials science workflows with pymatgen, FireWorks, and atomate
Automating materials science workflows with pymatgen, FireWorks, and atomate
 
Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials Project
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methods
 
The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...The Status of ML Algorithms for Structure-property Relationships Using Matb...
The Status of ML Algorithms for Structure-property Relationships Using Matb...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
 
Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...Computational Materials Design and Data Dissemination through the Materials P...
Computational Materials Design and Data Dissemination through the Materials P...
 
Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...
 

Semelhante a Open-source tools for generating and analyzing large materials data sets

ExaLearn Overview - ECP Co-Design Center for Machine Learning
ExaLearn Overview - ECP Co-Design Center for Machine LearningExaLearn Overview - ECP Co-Design Center for Machine Learning
ExaLearn Overview - ECP Co-Design Center for Machine Learning
inside-BigData.com
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
Anubhav Jain
 
Belak_ICME_June02015
Belak_ICME_June02015Belak_ICME_June02015
Belak_ICME_June02015
Jim Belak
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
Anubhav Jain
 

Semelhante a Open-source tools for generating and analyzing large materials data sets (20)

2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research2014 11-13-sbsm032-reproducible research
2014 11-13-sbsm032-reproducible research
 
Resources for Teaching Undergraduate Computational Physics
Resources for Teaching Undergraduate Computational PhysicsResources for Teaching Undergraduate Computational Physics
Resources for Teaching Undergraduate Computational Physics
 
From Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science TalesFrom Workflows to Transparent Research Objects and Reproducible Science Tales
From Workflows to Transparent Research Objects and Reproducible Science Tales
 
ExaLearn Overview - ECP Co-Design Center for Machine Learning
ExaLearn Overview - ECP Co-Design Center for Machine LearningExaLearn Overview - ECP Co-Design Center for Machine Learning
ExaLearn Overview - ECP Co-Design Center for Machine Learning
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
 
Manual petsc
Manual petscManual petsc
Manual petsc
 
Summary of 3DPAS
Summary of 3DPASSummary of 3DPAS
Summary of 3DPAS
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
Atomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discoveryAtomate: a tool for rapid high-throughput computing and materials discovery
Atomate: a tool for rapid high-throughput computing and materials discovery
 
Open PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow toolsOpen PHACTS April 2017 Science webinar Workflow tools
Open PHACTS April 2017 Science webinar Workflow tools
 
Reproducibility: 10 Simple Rules
Reproducibility: 10 Simple RulesReproducibility: 10 Simple Rules
Reproducibility: 10 Simple Rules
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
 
Belak_ICME_June02015
Belak_ICME_June02015Belak_ICME_June02015
Belak_ICME_June02015
 
Accelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy ScienceAccelerating Data-driven Discovery in Energy Science
Accelerating Data-driven Discovery in Energy Science
 
Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...Data-intensive applications on cloud computing resources: Applications in lif...
Data-intensive applications on cloud computing resources: Applications in lif...
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
 
2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research2015 10-7-11am-reproducible research
2015 10-7-11am-reproducible research
 
Research software susainability
Research software susainabilityResearch software susainability
Research software susainability
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
 
Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...Continuous modeling - automating model building on high-performance e-Infrast...
Continuous modeling - automating model building on high-performance e-Infrast...
 

Mais de Anubhav Jain

Mais de Anubhav Jain (20)

Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...Discovering advanced materials for energy applications: theory, high-throughp...
Discovering advanced materials for energy applications: theory, high-throughp...
 
Applications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and DesignApplications of Large Language Models in Materials Discovery and Design
Applications of Large Language Models in Materials Discovery and Design
 
An AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesisAn AI-driven closed-loop facility for materials synthesis
An AI-driven closed-loop facility for materials synthesis
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Best practices for DuraMat software dissemination
Best practices for DuraMat software disseminationBest practices for DuraMat software dissemination
Best practices for DuraMat software dissemination
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...
 
Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...Efficient methods for accurately calculating thermoelectric properties – elec...
Efficient methods for accurately calculating thermoelectric properties – elec...
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
 
Machine Learning for Catalyst Design
Machine Learning for Catalyst DesignMachine Learning for Catalyst Design
Machine Learning for Catalyst Design
 
Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...Natural language processing for extracting synthesis recipes and applications...
Natural language processing for extracting synthesis recipes and applications...
 
Accelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine LearningAccelerating New Materials Design with Supercomputing and Machine Learning
Accelerating New Materials Design with Supercomputing and Machine Learning
 
DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …DuraMat CO1 Central Data Resource: How it started, how it’s going …
DuraMat CO1 Central Data Resource: How it started, how it’s going …
 
The Materials Project
The Materials ProjectThe Materials Project
The Materials Project
 
Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...Evaluating Chemical Composition and Crystal Structure Representations using t...
Evaluating Chemical Composition and Crystal Structure Representations using t...
 
Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...Perspectives on chemical composition and crystal structure representations fr...
Perspectives on chemical composition and crystal structure representations fr...
 
Discovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials ProjectDiscovering and Exploring New Materials through the Materials Project
Discovering and Exploring New Materials through the Materials Project
 
The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...The Materials Project: Applications to energy storage and functional materia...
The Materials Project: Applications to energy storage and functional materia...
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 

Último

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 

Último (20)

Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 

Open-source tools for generating and analyzing large materials data sets

  • 1. Open-source tools for generating and analyzing large materials data sets Anubhav Jain Energy Technologies Area Lawrence Berkeley National Laboratory Berkeley, CA ACS Spring, April 2017 Slides (already) posted to http://www.slideshare.net/anubhavster Link is also listed at end of talk
  • 2. 2 “Civilization advances by extending the number of important operations which we can perform without thinking about them.” - Alfred North Whitehead
  • 3. We don’t work on catalysis, but we do write software •  We don’t do research into heterogeneous catalysis •  We do build software to: –  execute millions of calculations on supercomputing centers –  make it more straightforward to run density functional theory calculations (mostly VASP, some Gaussian/Q-Chem) –  perform structural manipulations –  analyze the results of calculations 3
  • 4. Software technologies that we contribute to 4 (automatic materials science workflows) Custodian (calculation error recovery) (materials analysis framework) Base packages Derived packages (workflow definition & execution) These are all open-source: •  FireWorks, atomate, and matminer are led by our group •  pymatgen and custodian are led by Prof. Ong group (UC San Diego) •  All developed in coordination with Persson group (UC Berkeley) (materials data mining)
  • 5. Applications: The Materials Project database 5 Jain*, Ong*, Hautier, Chen, Richards, Dacek, Cholia, Gunter, Skinner, Ceder, and Persson, APL Mater., 2013, 1, 011002. *equal contributions! The Materials Project (http://www.materialsproject.org) free and open ~30,000 registered users around the world >65,000 compounds calculated Data includes •  thermodynamic props. •  electronic band structure •  aqueous stability (E-pH) •  elasticity tensors •  piezoelectric tensors >75 million CPU-hours invested = massive scale!
  • 6. Applications: The Electrolyte Genome 6 data on ~22,000 molecules (mainly geometry + IP/EA via full adiabatic calcs) Also deployed on the Materials Project web site L. Cheng, R.S. Assary, X. Qu, A. Jain, S.P. Ong, N.N. Rajput, et al., J. Phys. Chem. Lett. 6 (2015) 283–291.! ! X. Qu, A. Jain, N.N. Rajput, L. Cheng, Y. Zhang, S.P. Ong, et al., Comput. Mater. Sci. 103 (2015) 56–67.!
  • 7. Applications: Crystalium (Ong / Persson) 7 http://crystalium.materialsvirtuallab.org surface energies for 142 polymorphs of 72 elements + rotatable Wulff shapes certainly applicable to catalysis computed & maintained by the Ong group (UC San Diego) with support from Persson Group (UC Berkeley) R. Tran, Z. Xu, B. Radhakrishnan, D. Winston, W. Sun, K. A. Persson, and S. P. Ong, Sci. Data, 2016, 3, 160080.!
  • 8. Applications: Rapid data generation 8 M. de Jong, W. Chen, H. Geerlings, M. Asta, and K. A. Persson, Sci. Data, 2015, 2, 150053.! M. De Jong, W. Chen, T. Angsten, A. Jain, R. Notestine, A. Gamst, M. Sluiter, C. K. Ande, S. Van Der Zwaag, J. J. Plata, C. Toher, S. Curtarolo, G. Ceder, K. a Persson, and M. Asta, Sci. Data, 2015, 2, 150009.! >4500 elastic tensors >900 piezoelectric tensors >48000 Seebeck coefficients + cRTA transport Ricci, Chen, Aydemir, Snyder, Rignanese, Jain, & Hautier (in submission)!
  • 9. Let’s revisit the libraries 9 (automatic materials science workflows) Custodian (calculation error recovery) (materials analysis framework) Base packages Derived packages (workflow definition & execution) These are all open-source: •  FireWorks, atomate, and matminer are led by our group •  pymatgen and custodian are led by Prof. Ong group (UC San Diego) •  All developed in coordination with Persson group (UC Berkeley) (materials data mining)
  • 10. pymatgen – object-oriented materials analysis 10 www.pymatgen.org! Ong, S. P., Richards, W. D., Jain, A., Hautier, G., Kocher, M., Cholia, S., Gunter, D., Chevrier, V. L., Persson, K. a. & Ceder, G. Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis. Comput. Mater. Sci. 68, 314–319 (2013).!
  • 11. pymatgen – examples of analyses 11 phase diagrams Pourbaix diagrams diffusivity from MDband structure analysis
  • 12. pymatgen - many useful tools made accessible 12 Structure Matcher analyzes if two periodic structures are equivalent, even if they are in different settings or have minor distortions = ?! Order-disorder resolve partial or mixed occupancies into a fully ordered crystal structure (e.g., mixed oxide-fluoride site into separate oxygen/fluorine) Many other tools, such as: •  Automatic surface slab generator •  Bond-valence sums to determine valence •  Voronoi coordination as well as 3D coordination polyhedron analysis •  Automatically find and insert interstitial sites •  Powder diffraction pattern generation •  Simple cost and materials availability estimators
  • 13. custodian – fixing job errors •  Custodian can wrap around an executable (e.g., VASP) –  i.e., run custodian instead of directly running VASP •  During execution, custodian will monitor output files and detect errors / problems –  If so, it can change input files and rerun the job –  e.g., if ZPOTRF error detected, rerun with ISYM=0 –  ever-expanding library of fixes 13
  • 14. FireWorks – scientific workflow software •  FireWorks is an open-source scientific workflow software •  Materials Project, JCESR, and other projects manage their runs with FireWorks –  >1 million jobs –  >100 million CPU-hours –  multiple computing clusters •  You can write any kind of workflow –  e.g., FireWorks is used for graphics processing, machine learning, document processing, and protein folding –  #1 Google hit for “Python workflow software”, top 5 for general scientific workflow software •  Detailed tutorials are available 14 Jain, A., Ong, S. P., Chen, W., Medasani, B., Qu, X., Kocher, M., Brafman, M., Petretto, G., Rignanese, G.-M., Hautier, G., Gunter, D. & Persson, K. A. FireWorks: a dynamic workflow system designed for high-throughput applications. Concurr. Comput. Pract. Exp. 22, 5037–5059 (2015).! www.pythonhosted.org/ FireWorks!
  • 15. FireWorks – screenshot of jobs status 15 Live version at http://fireworks.dash.materialsproject.org
  • 16. atomate – our newest code (currently in beta) 16 Redesigns an older,, clunkier code (MPWorks) translate minimal specifications into well-defined FireWorks workflows. (FireWorks handles all the execution and job management details) What is the GGA-PBE elas0c tensor of GaAs?
  • 17. Advantages – reduce specialization Because of the steep learning curve to computational methods, there is often a single group member assigned to a technique 17 “Alice knows how to do charged defect calculations.”! “Bob is the one who can properly converge GW runs.”! “Olga has all the scripts for phonon calculations.”!
  • 18. Advantages – reduce errors Let’s take a look at two alternate universes: Automation reduces your chances of being caught in universe #2!! 18 researcher! has coffee! copies files from! previous simulation! edits 5 lines! runs simulation,! creates report! forgets coffee! copies files from! previous simulation! edits 4 lines! forgets! LHFCALC=F! creates report, ! looks fine at first, ! in a month! discovers it used the ! wrong functional! 1 2 researcher!
  • 19. atomate – what’s available? 19 K. Mathew J. Montoya S. Dwaraknath A. Faghaninia •  band structure •  spin-orbit coupling •  hybrid functional calcs •  elastic tensor •  piezoelectric tensor •  Raman spectra •  NEB •  GIBBS method •  QH thermal expansion •  AIMD •  FEFF method •  LAMMPS MD All past and present knowledge, from everyone in the group, everyone previously in the group, and outside collaborators, about how to run calculations M. Aykol S.P. Ong B. Bocklund T. Smidt H. Tang
  • 20. matminer (still in alpha) 20 MatMiner’s goal: help enable data mining studies in materials science
  • 21. matminer usage •  Examples of usage on the github page: –  https://github.com/ hackingmaterials/ matminer •  Coming next: new types of crystal structure descriptors based on local environment 21
  • 22. Some lessons learned (1) •  In the beginning, strong central coordination from authority was needed to develop these –  require that people contribute to common code, e.g. pymatgen, and not write their own detached scripts •  Once a code was “established”, less authority was needed –  people voluntarily contributed improvements rather than writing their own code because this benefited them •  Today the process is almost completely decentralized –  culture has changed –  even for new codes, people rally around it rather than build independent things 22
  • 23. Some lessons learned (2) •  It is helpful to have a strong BDFL (benevolent dictator for life) for each codebase •  Requirements for the BDFL: –  very detail-oriented –  cares about the code itself, not just the application –  cares more about the code quality than about offending teammates, i.e., will not accept poor quality contributions –  at the same time, able to rally support from people and convince them to contribute or clean up code –  willing to work overtime to do things like write detailed docs, answer questions from users, advocate for the code, review commits, etc. –  derives joy from building and deploying things! 23
  • 24. Some lessons learned (3) •  Computer scientists are useful for staying up to date in the fast-moving world of software –  2006: I took a graduate class in databases at a top CS university; all SQL, not a single mention of “NoSQL” –  2007: we use SQL to build a precursor to Materials Project –  2011: We are designing the framework for Materials Project; I have lots of experience with SQL and confident this is the way to go; a computer scientist casually mentions NoSQL, its growing prominence, and its potential applicability to our problem –  2017: We do almost everything in NoSQL •  Lesson: software moves fast! Much faster than materials science knowledge or methods. Don’t use “up to date” data from 5 years ago to inform your decision. 24
  • 25. Further resources •  The Github web sites –  www.github.com/materialsproject –  www.github.com/hackingmaterials •  Software carpentry •  https://software-carpentry.org 25
  • 26. Acknowledgements •  Research group of Prof. Shyue Ping Ong •  Research group of Prof. Kristin Persson •  Funding: US Dept of Energy, Materials Science Division …and all extended collaborators for these various projects! 26 Slides (already) posted to http://www.slideshare.net/anubhavster