1. The MGI & Data-driven High-
Throughput Synthesis and
Characterization
Brian DeCost, Zachary Trautt, Martin Green, Gilad Kusne,
Jason Hattrick-Simpers
NIST Gaithersburg
Jason.Hattrick-Simpers@nist.gov
@jae3goals
Any mention of commercial products within this talk is for information only; it does not imply recommendation or
endorsement by NIST.
2. Outline
• The Materials Genome Initiative (MGI) and NIST’s Role
• The High-Throughput Experimental Materials Collaboratory (HTE-MC)
• Accelerated Discovery of (High – Hardness & Corrosion Resistant)
Metallic Glasses
• Iterative HTE and AI
• Vision for the Future
• Look Ma No Hands (Experimentation)!!
• Conclusions
3. Decrease time-to-market by 50% while <<$$
• Develop a Materials Innovation
Infrastructure
• Achieve National goals in energy,
security, and human welfare with
advanced materials
• Equip the next generation of
materials workforce
Materials Genome Initiative for
Global Competitiveness
9. Examples of Cultural Implementation and
Successes of the MGI
• Argonne Collaboration – phase identification at aluminum interfaces
• Lund Boats – MGI on the plant floor
• Casting Simulation (MAGMA) – MGI in R&D, tool shop, & plant floor
• Timken Steel – Premium Air Melt Practice, putting premium quality,
cost conscious steel into the hands of our customers
• BASF – Foaming simulations based on first principles
• ERCo – Laser Induced Breakdown Spectroscopy for real-time melt
composition (ARPA-E)
10.
11. Standards Are Important
• The NIST MGI Program is taking a very careful approach to consensus
standards for data representation
• There is a long track record of failure for most of the space
• Exception for high structured data (e.g. ICSD)
• This should be done top-down not bottoms-up
12. MGI Directions to Date
Materials by Design
projects:
DOE EFRCs, EMNs
NSF DMREFs
HT computational
databases:
Need: High-throughput
experimental data
13.
14. Workshop: “Fulfilling the Promise of the Materials
Genome Initiative via High-Throughput
Experimentation” – 2014
15. Workshop Conclusions
A large portion of the MGI program thus far has been devoted to modeling
and simulation. Prodigious amounts of experimental data will be required to
inform and validate modeling and simulation, to “power the MGI
computational engine.”
HTE can rapidly establish relationships between composition, structure,
and properties for a wide variety of materials classes, and therefore is:
a) uniquely suited to rapidly generate high quality, consistent data
sets
b) the key enabling counterpart to modeling and simulation for
bringing the MGI to fruition
“Enable broad access to HTE methodologies and data”
16.
17. High Throughput Experimental Materials
Collaboratory (HTE-MC)
• Necessary because even on “brick and mortar” HTE facility would be
very costly, and multiple facilities dedicated to different materials
classes (e.g. catatlysts, photovoltaics, lightweight structural materials,
etc.) are needed
• Enable researchers at national laboratories, universities, and industry
to have access to HTE facilities
• The HTE-MC would facilitate MGI-driven research while leveraging
investment
• Complement new science investments (EMN’s, NNMI, MURI, etc)
18. How?
• Collaboratory: a 1989 neologism (William A. Wulf, Computer Scientist
at University of Virginia):
“defined by… a center without walls, ‘in which the nation’s
researchers can perform their research without regard to physical
locations, interacting with colleaues, accessing instrumentation,
sharing data and computational resources, … accessing information in
digital librarires
• A HTE-MC would consist of:
• An integrated, delocalized network of high-throughput synthesis and
characterization tools
• A best-in-class materials data management platform, consisting of NIST (and
other) software
19. HTE-MC 1st Steps: NIST – NREL Round Robin
Sample synthesis and measurements:
• Synthesize: Zn-Sn-Ti-O composition spread
sample libraries using combinatorial PLD
(@NIST) or sputtering (@NREL)
• Measure: Chemical composition, Crystal
structure, Electrical conductivity, Optical
transmittance, Band gap
• Exchange: Sample libraries and associated
data, repeat measurements
Zn-Sn-Ti-O:
• Chemical composition
• Crystal structure
• Electrical conductivity
• Optical transmittance
• Work function
Goal: test and improve the standards for exchange of data and sample among participant labs
NREL Samples NIST Sample
20. Addressing FAIR Principles
To be Findable:
• (meta)data are assigned a globally unique and
persistent identifier
• data are described with rich metadata
• metadata clearly and explicitly include the identifier
of the data it describes
• (meta)data are registered or indexed in a searchable
resource
To be Accessible:
• (meta)data are retrievable by their identifier using a
standardized communications protocol
– the protocol is open, free, and universally
implementable
– the protocol allows for an authentication and
authorization procedure, where necessary
• metadata are accessible, even when the data are no
longer available
To be Interoperable:
• (meta)data use a formal, accessible, shared, and
broadly applicable language for knowledge
representation.
• (meta)data use vocabularies that follow FAIR
principles
• (meta)data include qualified references to other
(meta)data
To be Reusable:
• meta(data) are richly described with a plurality of
accurate and relevant attributes
– (meta)data are released with a clear and accessible data
usage license
– (meta)data are associated with detailed provenance
– (meta)data meet domain-relevant community standards
Wilkinson, Mark D., et al. "The FAIR Guiding Principles for scientific data
management and stewardship." Scientific data 3 (2016). DOI:
10.1038/sdata.2016.18
21. HTE-MCGOVERNMENT
AGENCIES
MEMBERS
• Academia
• National Labs
• Industry
• Small Business
Provide
Students/Staff
Receive
Funding $Provide Structural
Funding
Provide Science
Infrastructure
USERS
• Industry
• Small Business
• Academia
• National Labs
• Manufacturing
USA Institutes
• Energy Materials
Networks
Pay Tiered
Access Fees
$
$
Generate
New Data
CONTRIBUTORS
• Academia
• National Labs
• HTE-MC Users
(after embargo period)
Receive
Benefits
Publish Open-
Access Data
VISITORS / PUBLIC
• Industry
• Small Business
• Academia
• Educators
• National Labs
• Manufacturing
USA Institutes
• Energy Materials
Networks
Access AI-ready
Public Data
Next Generation
Workforce
New
Knowledge
Materials
Solutions
+1
Provide Data
Infrastructure
22. HTE Materials Collaboratory
Problems
• Experimental databases
are not keeping pace with
computational databases
• HTE is out of reach to most
due to high startup and
operating costs
• Materials are diverse; no
single institution can have
all the necessary
equipment
Solution
• Integrate HTE laboratories
with materials
cyberinfrastructure
• HTE as a shared resource;
operate on demand by
access fees and core
funding
• HTE as a federated
resource; enable
connectivity via
cyberinfrastructure
23. • Member
• Provides infrastructure
• User
• Utilizes infrastructure
• Creates new data
• May choose to
publish data
• Contributor
• Publishes data
• Visitor
• Consumes public data
Technical Stakeholder Types and Population
Visitors
Contributors
Users
Members
(defines action, not access)
25. HTE-MC
Member Institute
Laboratory Information
Management System
Data Transfer Grid
Instruments/Computing
Database / Structured
Data / Metadata
File/Collection Repository
Member Institute
Laboratory Information
Management System
Instruments/Computing
File/Collection Repository
Data Dissemination
Data Transfer Grid
Database / Structured
Data / Metadata
File/Collection Repository
Registries
Materials
Resource Registry
High-Throughput
Experiment
Resource Registry
Member Institute
User Institute
Data Transfer Grid
Laboratory Information
Management System
Data Transfer Grid
Instruments/Computing
Database / Structured
Data / Metadata
File/Collection Repository
Data Transfer Grid
Database / Structured
Data / Metadata
26. High-Throughput Experimental Materials
Collaboratory (HTE-MC) Workshop
• Held: February 2018
• Workshop Goals:
• Socialize the HTE-MC concept among government, academic and industry stakeholders
• Expand HTE-MC membership
• Define technical, operational and business models for the HTE-MC
• Facilitated Breakout Sessions:
• Define the Vision of HTE-MC
• Define the value proposition for participation
• Identify major barriers to successful participation
• Identify and prioritize pilot use cases
• Identify and describe modes of interaction of users
• Define governance and business models for HTE-MC
• Workshop Report: In preparation
27. A Multi-Agency, Multi-Year Program Plan in
Advanced Energy Materials Discovery,
Development, and Process Design
• Held July 2018
• Workshop Goals
• Determine how best to coordinate next steps within the Federal Government
• Efficiently leverage the ongoing research in advanced materials conducted in
academia, industry, and government research laboratories
• Facilitated Breakout Sessions:
• Priorities in Energy Materials R&D: Barriers, Timeline, and Metrics
• Database infrastructure needs in AI and Energy Materials R&D: Moving Materials
Discovery through Materials Processes
• Expansion of the Collaboratory Network for Energy Materials Discovery and Process
Design
• Integration of AI, ML, and Experimentation for Energy Materials Design and
Processing
• Workshop Report: In preparation
28. Iterative Machine Learning – High
Throughput Experimental Approach to
Discovering Novel Amorphous Alloys
Fang Ren1, Logan Ward2, Travis Williams3, Kevin J. Laws4,
Christopher M. Wolverton2, Jason Hattrick-Simpers5, Apurva Mehta1
1SLAC National Accelerator Laboratory, 2Northwestern University, 3University of South Carolina,
4UNSW Australia, 5National Institute of Standards and Technology, 6 University of Chicago
Science Advances, Vol 4 No. 4 (2018)
30. Metallic Glasses Are Interesting
http://vitreloy.caltech.edu/development.htm
West US 7998286 B2
E Ma. Nature Materials. 14, 2015.
Metallic glass (MG) is a solid
metallic material, usually an
alloy, with a disordered atomic-
scale structure (amorphous).
31. The Palette of Potential Metallic Glasses
Usually Contain 3 or more elements
30 non-toxic, earth friendly elements > 4000 ternaries, > 4 Million compositions
32.
33. Building the Machine Learning Model
Ref: Ward et al. npj Comp. Mater. (2016), 28.
Experimental
Data
Machine Learning
Algorithm
Composition-based
Representation
𝜎𝑟 < 1.1 Å
MG Not MG
𝜇 𝑍 ΔΧ
𝜎 𝑇 𝑚 max 𝑟𝑐𝑜𝑣
𝑥 𝐻, 𝑥 𝐻𝑒, … 2
𝑮𝑭𝑨 = 𝒇(𝒙 𝑯, 𝒙 𝑯𝒆, … )
24 Million Ternary Alloys
74520 potential MGs
5739 measurements
145 Attributes
Random Forest
34. Select Experiments that Involve Contradiction
Selection Criteria
1.) None of the models 100% disagree
2.) Some experimental data existed
3.) Inexpensive, low vapor pressure materials
Yang Model
Efficient
Packing Model ML Predictions
41. Case Example X-Y-Al: Breaking from
Convention AND Property Prediction
No “deep” eutectics necessary!
Massalski “Binary Alloy Phase Diagrams” (1990)
42. But How to Create Property Models?
• There is no L-B-type data set for
properties of MG
• NLP/data extraction from
figures is in its infancy
• Manually scrape the literature
• 2000+ entries
• Errant measurements
• Many different groups
• Inconsistent definition of
“amorphous”
Feature Importance
Average Ground State
Volume
0.37
Minimum Ground State
Volume
0.24
Minimum Covalent Radius
0.12
Mean Melting
Temperature
0.036
Highest Melting
Temperature
0.017
46. “In the next 5 years, AI-driven, autonomous
materials research is going to fundamentally
change how we do materials science.”
-Jim Warren, Technical Program Director for
Materials Genomics, NIST
50. Active clustering for autonomous XRD phase
mapping
Think carefully about modeling to remove researcher degrees of freedom
DeCost, et. al., to be submitted
51. Conclusions
• AI & ML are already prevalent in the design of new materials, materials
synthesis, data capture/cleaning and knowledge extraction
• Neither AI nor ML are a panacea that will replace human intuition and
creativity, they are enablers
• In some cases an order of magnitude increase in materials
exploration/discovery is possible
• Maybe a fairer metric of AI’s influence will be on the rate of hypothesis
generation and (in)validation
• AI needs FAIR data including negative results to be effective
• Not part of the solution = consigned to obscurity
• Full materials research autonomy (for specific problems) has already been
demonstrated
52. Acknowledgements
USC
Travis Williams
SLAC
Dr. Apurva Mehta
Dr. Fang Ren
Dr. Suchismita
Northwestern
Prof. Wolverton
Dr. Logan Ward
UNSW
Prof. Kevin Laws
NIST
Dr. James Warren
Dr. Martin Green
Dr. Zachary Trautt
Dr. Gilad Kusne
Dr. Brian DeCost
Mr. Ryan Smith
NREL
Dr. Andriy
Zakutayev
CSM
Prof. Packard
Dr. Schoeppner
53. Demonstrations and Talks by (confirmed speakers):
• Theory
• Computational Approaches
• Experimental Approaches
Andrew Millis (Columbia)
Antoine Georges (CCQ)
Karin Rabe (Rutgers)
Bootcamp: Machine Learning for Materials Research &
Workshop: Machine Learning Quantum Materials
• Dates: July 30 – Aug 3, 2018
• Location: IBBR (Gaithersburg, Maryland)
MLMR Introduces researchers from industry, national labs, and academia to machine learning theory and tools for rapid data analysis.
https://nanocenter.umd.edu/events/mlmr/
Bootcamp
Three days of lectures and hands-on exercises covering a range of
data analysis topics from data pre-processing through advanced
machine learning analysis techniques. Example topics include:
• Identifying important features in complex/high dimensional
data
• Visualizing high dimensional data to facilitate user analysis.
• Identifying the fabrication ‘descriptors’ that best predict
variance in functional properties.
• Quantifying similarities between materials using complex/high
dimensional data
The hands-on exercises will demonstrate practical use of machine
learning tools on real materials data (scalar values, spectra,
micrographs, etc.
Sasha Balatsky (LANL)
Roger Melko (Waterloo)
Shoucheng Zhang (Stanford)
Stefano Curtarolo (Duke)
Gus Hart (BYU)
Ichiro Takeuchi (UMD)
Sergei Kalinin (ORNL)
Benji Maruyama (AFRL)
Jiun-Haw Chu (Univ. Washington)
Giuseppe Carleo (Flatiron)
Miles Soudenmire (Flatiron)
Notas do Editor
I think we have a great opportunity for you to give attendees an overview of your work in data-driven HT synthesis and characterization. You should also feel free to provide forward-looking vision, e.g., if you'd like to highlight the emerging HT collaboratory concept led by NIST. Finally, the audience may also find it interesting to hear a bit of introductory content about NIST's role in MGI and materials data broadly.
Old story – how do we combine experiment, computation, and digital data to develop the materials that fit critical needs but do so cheaper and faster than ever before?
Gist:
We can’t forget that this is about the full discovery to deployment cycle, it doesn’t serve our purposes to compartmentalize and only focus on independent material discovery but to consider how it will eventually move into application.
MGI ideas aren’t necessarily new, but are following a natural progression that began in 1988 with COTA.
The idea is that through computation-guided experimentation we can achieve our goals more quickly than through only experimentation.
The emphasis is that this was started as a multi-agency initiative with coordination through the agencies but with each agency taking their own approach to implementation.
Materials are complex, multiple length scales are important. We use simulations to look up length scales and experiments to look down. These both generate and consume data that is used to inform models which generate data and inform the exp-sim loop. Outside of this loop we would like to arrive at new and outstanding materials. The data and the models can live anywhere, in an ideal somewhere FAIR, but in reality is scattered between notebooks and hard drives and someone’s memory.
When this method of producing materials works, it can be powerful.
Alloys designed by Apple using Questech IP – centered around ICME/MGI technique for materials design and deployment.
So let’s take the idea from the previous slide, abstract it a bit, and ask where does NIST fit into this MGI equation?
DOC’s smiling face towards industry. If we think that there are hundreds (thousands) of such MGI loops in the country all producing data and models, then NIST’s fit is clear.
First of all, we have to help industry, academia, and government labs exchange data. We can help set up repositories but first we have to ask what are meaningful (standard) ways of interchanging materials data from disparate sources?
Secondly, NIST is measurement technology driven and UNCERTAINTY and QUALITY assessment and improvement are key directives in this space.
But we are talking about materials data and models in repositories and a key question remains, “where are the curated, homogeneous and high-quality materials data for model development and validation coming from?”
This is a big problem within the MGI, because a great deal of effort has gone to the top half of the Venn diagram. Our (and a number of other’s) contention is that HT experimentation is the potential driving force for the MGI engine.
This started with a review paper by Marty, Ichiro and myself talking about how HTE has really revolutionized the way people search for and optimize new materials. This caught the attention of OSTP White House and we were asked to organize a workshop bringing together some of the best in the field.
This slide is just about one of the outcomes from the workshop (held in San Fran in May 2014)
Can we turn me, the high-throughput experimentalist, into the rate limiting step in an intelligent search for new amorphous alloys?
Emphasize this is a moonshot, but that my off ramp is in the field of coatings.
Meshing is important but a reasonably dense sampling would take
~1000 years bulk alloy (5/day)
~10 years via HTE alone
~2 years
Start at the bottom of this image and work my way clockwise.
Stocihiometic attributes capture the fraction but not type of elements present.
Elemental property attributes of atomic row, mendeev number, atomic weight, total # of unfilled states, etc. with both weighted averages, max, min, range average deviation and mode
Calence orbital occupation attributes
Ionic compound attributes…
Ask before getting into this, if anyone isn’t familiar with roughly how Random Forest works.
Left melt-spun model, right stacked model
Main data set is unbalanced
The relationship between the liquidus, FWHM, and GFR, shown in fig. S7, suggests a strong correlation between glass formation and the C15 (MgCu2 prototype) Lave and B2 liquidus phase fields common to all four systems. These results indicate that these particular ordered phases are difficult to crystallize quickly, resulting in glass formation. For instance, for the Co-Zr–containing ternaries, despite the ZrCo2 C15 Lave phase having a high melting point relative to surrounding phases, the exceptional correlation between the GFR and the ZrCo2 liquidus phase field as it extends into the ternary composition space suggests a high kinetic barrier to crystallization. These correlations further suggest that large mismatch in ionic sizes and the presence of larger atoms in these structures, such as Zr, hinders crystallization more so.
What do the circle and/or the box mean?
MAE 9.2 Mpa
MRE 10%
Here are some thoughts that I have:
1.) how much of the scatter is due to repeats for a given entry?
2.)
If you’re interested in machine learning, we have an annual bootcamp at University of Maryland that teaches a wide variety of these techniques. You’ll learn things like how to identify important features in your data, how to visualize complex or high dimensional data, and how to identify descriptors.
Each morning there are lectures and the afternoons are hands on activities applying machine learning to real materials data.
And Ichiro Takeuchi, me and some collaborators have also organized an annual bootcamp.
At this bootcamp we teach an introduction to machine learning, the most common techniques. Half of each day is also hands on training where you learn how to write code to analyze real data. Some examples of stuff we teach. How to …