SlideShare uma empresa Scribd logo
1 de 17
The Materials Project
Validation, Provenance and Sandboxes
Goals
• Validation
– constantly guard against bugs in core data
and imported data
• Provenance
– know how data came to be
• Sandboxes
– Combine public and non-public data; "good
fences make good neighbors"
Validation
(Internal)
Database ID
External ID What we expected
What we got
Validation runs all the time
• Rules with "constraints" for every database (and sandbox)
• Test constraints against entire DB every night  email reports
• Validation engine, etc. all open-source software in pymatgen-db
Remote
server
Validation
engine
Rules
MP Databases
Reports
(email, web pages, ..)
Rules have a simple syntax
_aliases:
- snl_id = mps_id
- energy = analysis.e_above_hull
materials:
-
filter:
constraints:
- final_energy_per_atom <= 0
- initial_structure.lattice.volume > 0
- initial_structure.lattice.a > 0
- initial_structure.lattice.b > 0
- initial_structure.lattice.c > 0
- initial_structure.lattice.matrix size 3
- formation_energy_per_atom <= 5
- formation_energy_per_atom > -5
- cpu_time > 5
- e_above_hull > -0.000001
- final_energy < 0
- reduced_cell_formula size$
nelements
# Check num. ICSD sources for
selected compounds
-
filter:
- task_id = "mp-540081"
constraints:
- icsd_id size> 10
-
filter:
- task_id = "mp-20379"
constraints:
- icsd_id size 1
-
filter:
- task_id = "mp-13634"
constraints:
- icsd_id size> 0
-
filter:
- task_id = "mp-600022"
constraints:
- icsd_id size 0
# NiO2 phases should never become
stable
-
filter:
- e_above_hull = 0
constraints:
- pretty_formula != 'NiO2'
tasks:
-
filter:
- state = "successful"
constraints:
- output.final_energy_per_atom <= 0
Validation summary
Easy-to-use, integrated, efficient tools to
report errors
Next steps
– Record all check results in DB
– More sophisticated checks (Map/Reduce)
– Make it easier to add new checks internally
– Make it easier to add new check for anyone
• per-sandbox or even per-user ("MP Alerts")
Provenance: How do I know that
the data is correct?
Types of provenance in the system
1) Calculation workflows
– FireWorks records calculation inputs, .. results in great detail
2) External datasets
– Structure Notation Language standardizes the naming of data
sources and publications
3) Post-calculation data transformations
– New "builders" provides framework for tracking creation of final
database products
(1) (2)
(3)
Provenance is available
for every material
Provenance in DB
Structure Notation Language
"snl_final": {
"about": {
"created_at": {
"string": "2014-02-22
19:07:00.383869",
"@class": "datetime",
"@module": "datetime"
},
"_materialsproject": {
"submission_id":
52621,
"snl_id": 398676,
"spacegroup": {
"lattice_type":
"tetragonal",
"symbol":
"P4_2/mmc",
"number": 131,
"point_group":
"4/mmm",
"crystal_system":
"tetragonal",
"hall": "-P 4c 2"
}
},
"_cedergroup": {
"BURP_sids": [
409544,
409545,
409546
],
"icsd_ids": [
],
"e_above_hull":
0.075125350000000423734
},
"references": "",
"authors": [
{
"name": "Geoffroy
Hautier",
"email":
"geoffroy.hautier@uclouvain
.be"
},
{
"name": "Bo Xu",
"email":
"boxu14@mit.edu"
}
],
"remarks": [
"supplementary
compounds from MIT
matgen database"
],
"projects": [
"MIT matgen"
],
"history": [
{
"url": "http://www.fiz-
karlsruhe.de/icsd_home.htm
l",
"name": "Inorganic
Crystal Structure Database",
"description": {
"Collection code":
24692
}
},
{
"url": "",
"name": "",
"description": {
"source": null,
"orig_name": "Basic
substitution code.",
"formula": "O1 Pd1"
}
},
{
"url":
"http://ceder.mit.edu/",
"name": "MIT Ceder
group research database",
"description": {
"source": 105986,
"orig_name": "",
"formula": "FeO"
}
},
{
"url":
"http://www.materialsproject.
org",
"name": "Materials
Project structure
optimization",
"description": {
"fw_id": 820305,
"task_type": "GGA
optimize structure (2x)",
"task_id": "mp-
753682"
}
},
{
"url":
"http://www.materialsproject.
org",
"name": "Materials
Project structure
optimization",
"description": {
"fw_id": 820308,
"task_type":
"GGA+U optimize structure
(2x)",
"task_id": "mp-
776678"
}
}
]
},
Metadata
Crystal
DB sources
References History of
structure
optimizations
Future work: unified view of
provenance
VASP
result
ICSD
VASP
result
VASP
result
Post-
processing
Material
properties
Computation
Data import
processing
e.g., Defects
Sandbox example: Multivalent
JCESR
users
Non-
JCESR
users
Multivalent app
Sandboxes = Database + Apps
Core data Core data
+
multivalent
materials
Non-
JCESR
users
JCESR
users
Technical challenges
• Pre-process data for real-time search
• Interfaces for per-user access control
– https://materialsproject.org/materials/1234?san
dbox=jcesr
– Web UI elements
and
Future: dynamic sandbox creation
Current:
– Large & significant
additional data / apps
• e.g., JCESR
– Longer-term
connections to MP data
• e.g. porous materials
– Companies
• e.g. VW/Stanford
Future
small collab.
per-user?
CoD?
Summary
• Validation
– guard against bugs by checking all data daily
and at data import/creation time
• Provenance
– universal standard for annotating data
provenance
• Sandboxes
– unified view of distinct databases
– onramp for new collaborations and data

Mais conteúdo relacionado

Mais procurados

Automated In-memory Malware/Rootkit Detection via Binary Analysis and Machin...
Automated In-memory Malware/Rootkit  Detection via Binary Analysis and Machin...Automated In-memory Malware/Rootkit  Detection via Binary Analysis and Machin...
Automated In-memory Malware/Rootkit Detection via Binary Analysis and Machin...Malachi Jones
 
Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2Damir Delija
 
H@dfex 2015 malware analysis
H@dfex 2015   malware analysisH@dfex 2015   malware analysis
H@dfex 2015 malware analysisCharles Lim
 
Windows Threat Hunting
Windows Threat HuntingWindows Threat Hunting
Windows Threat HuntingGIBIN JOHN
 
Lateral Movement - Phreaknik 2016
Lateral Movement - Phreaknik 2016Lateral Movement - Phreaknik 2016
Lateral Movement - Phreaknik 2016Xavier Ashe
 
Android Malware Analysis
Android Malware AnalysisAndroid Malware Analysis
Android Malware AnalysisJongWon Kim
 
Purpose Driven Hunt (DerbyCon 2017)
Purpose Driven Hunt (DerbyCon 2017)Purpose Driven Hunt (DerbyCon 2017)
Purpose Driven Hunt (DerbyCon 2017)Jared Atkinson
 
Anomalies Detection: Windows OS - Part 1
Anomalies Detection: Windows OS - Part 1Anomalies Detection: Windows OS - Part 1
Anomalies Detection: Windows OS - Part 1Rhydham Joshi
 
DC612 Day - Hands on Penetration Testing 101
DC612 Day - Hands on Penetration Testing 101DC612 Day - Hands on Penetration Testing 101
DC612 Day - Hands on Penetration Testing 101dc612
 
Understand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsUnderstand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsRahul Mohandas
 
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...Priyanka Aash
 
Usage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics toolsUsage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics toolsDamir Delija
 
Lateral Movement: How attackers quietly traverse your Network
Lateral Movement: How attackers quietly traverse your NetworkLateral Movement: How attackers quietly traverse your Network
Lateral Movement: How attackers quietly traverse your NetworkEC-Council
 
Analisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario MaliciosoAnalisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario MaliciosoConferencias FIST
 
Malware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowMalware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowSilvio Cesare
 
Cyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on ExamplesCyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on ExamplesSandeep Kumar Seeram
 
Volatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident ResponseVolatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident ResponseTakahiro Haruyama
 

Mais procurados (20)

A Threat Hunter Himself
A Threat Hunter HimselfA Threat Hunter Himself
A Threat Hunter Himself
 
Automated In-memory Malware/Rootkit Detection via Binary Analysis and Machin...
Automated In-memory Malware/Rootkit  Detection via Binary Analysis and Machin...Automated In-memory Malware/Rootkit  Detection via Binary Analysis and Machin...
Automated In-memory Malware/Rootkit Detection via Binary Analysis and Machin...
 
Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2Datafoucs 2014 on line digital forensic investigations damir delija 2
Datafoucs 2014 on line digital forensic investigations damir delija 2
 
H@dfex 2015 malware analysis
H@dfex 2015   malware analysisH@dfex 2015   malware analysis
H@dfex 2015 malware analysis
 
Windows Threat Hunting
Windows Threat HuntingWindows Threat Hunting
Windows Threat Hunting
 
Tcpdump hunter
Tcpdump hunterTcpdump hunter
Tcpdump hunter
 
Lateral Movement - Phreaknik 2016
Lateral Movement - Phreaknik 2016Lateral Movement - Phreaknik 2016
Lateral Movement - Phreaknik 2016
 
Malware forensics
Malware forensicsMalware forensics
Malware forensics
 
Android Malware Analysis
Android Malware AnalysisAndroid Malware Analysis
Android Malware Analysis
 
Purpose Driven Hunt (DerbyCon 2017)
Purpose Driven Hunt (DerbyCon 2017)Purpose Driven Hunt (DerbyCon 2017)
Purpose Driven Hunt (DerbyCon 2017)
 
Anomalies Detection: Windows OS - Part 1
Anomalies Detection: Windows OS - Part 1Anomalies Detection: Windows OS - Part 1
Anomalies Detection: Windows OS - Part 1
 
DC612 Day - Hands on Penetration Testing 101
DC612 Day - Hands on Penetration Testing 101DC612 Day - Hands on Penetration Testing 101
DC612 Day - Hands on Penetration Testing 101
 
Understand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day ThreatsUnderstand How Machine Learning Defends Against Zero-Day Threats
Understand How Machine Learning Defends Against Zero-Day Threats
 
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
From Thousands of Hours to a Couple of Minutes: Automating Exploit Generation...
 
Usage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics toolsUsage aspects techniques for enterprise forensics data analytics tools
Usage aspects techniques for enterprise forensics data analytics tools
 
Lateral Movement: How attackers quietly traverse your Network
Lateral Movement: How attackers quietly traverse your NetworkLateral Movement: How attackers quietly traverse your Network
Lateral Movement: How attackers quietly traverse your Network
 
Analisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario MaliciosoAnalisis Estatico y de Comportamiento de un Binario Malicioso
Analisis Estatico y de Comportamiento de un Binario Malicioso
 
Malware Classification Using Structured Control Flow
Malware Classification Using Structured Control FlowMalware Classification Using Structured Control Flow
Malware Classification Using Structured Control Flow
 
Cyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on ExamplesCyber Defense Forensic Analyst - Real World Hands-on Examples
Cyber Defense Forensic Analyst - Real World Hands-on Examples
 
Volatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident ResponseVolatile IOCs for Fast Incident Response
Volatile IOCs for Fast Incident Response
 

Semelhante a Materials Project Validation, Provenance, and Sandboxes by Dan Gunter

2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptxAndrew Lamb
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016DataStax
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudMongoDB
 
Boundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary
 
Tutorial On Database Management System
Tutorial On Database Management SystemTutorial On Database Management System
Tutorial On Database Management Systempsathishcs
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandraPatrick McFadin
 
Black friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchBlack friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchSylvain Wallez
 
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...IEEEMEMTECHSTUDENTSPROJECTS
 
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...IEEEMEMTECHSTUDENTPROJECTS
 
Scalability20140226
Scalability20140226Scalability20140226
Scalability20140226Nick Kypreos
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 
Public private hybrid - cmdb challenge
Public private hybrid - cmdb challengePublic private hybrid - cmdb challenge
Public private hybrid - cmdb challengeryszardsshare
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBMongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best PracticesLewis Lin 🦊
 
Codemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of ThingsCodemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of ThingsMassimo Brignoli
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecJosh Patterson
 
MS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docMS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docbutest
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossAndrew Flatters
 

Semelhante a Materials Project Validation, Provenance, and Sandboxes by Dan Gunter (20)

2021 04-20 apache arrow and its impact on the database industry.pptx
2021 04-20  apache arrow and its impact on the database industry.pptx2021 04-20  apache arrow and its impact on the database industry.pptx
2021 04-20 apache arrow and its impact on the database industry.pptx
 
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
C* for Deep Learning (Andrew Jefferson, Tracktable) | Cassandra Summit 2016
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal Cloud
 
Boundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary Front end tech talk: how it works
Boundary Front end tech talk: how it works
 
Tutorial On Database Management System
Tutorial On Database Management SystemTutorial On Database Management System
Tutorial On Database Management System
 
Advanced data modeling with apache cassandra
Advanced data modeling with apache cassandraAdvanced data modeling with apache cassandra
Advanced data modeling with apache cassandra
 
Black friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchBlack friday logs - Scaling Elasticsearch
Black friday logs - Scaling Elasticsearch
 
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
 
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
2014 IEEE DOTNET DATA MINING PROJECT Trusteddb a-trusted-hardware-based-datab...
 
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
IEEE 2014 DOTNET DATA MINING PROJECTS Trusted db a-trusted-hardware-based-dat...
 
Scalability20140226
Scalability20140226Scalability20140226
Scalability20140226
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Public private hybrid - cmdb challenge
Public private hybrid - cmdb challengePublic private hybrid - cmdb challenge
Public private hybrid - cmdb challenge
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
 
MongoDB Best Practices
MongoDB Best PracticesMongoDB Best Practices
MongoDB Best Practices
 
Codemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of ThingsCodemotion Milano 2014 - MongoDB and the Internet of Things
Codemotion Milano 2014 - MongoDB and the Internet of Things
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Smart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVecSmart Data Conference: DL4J and DataVec
Smart Data Conference: DL4J and DataVec
 
MS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.docMS Word file resumes16869r.doc.doc
MS Word file resumes16869r.doc.doc
 
Machine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy CrossMachine Learning with ML.NET and Azure - Andy Cross
Machine Learning with ML.NET and Azure - Andy Cross
 

Último

GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxabhishekdhamu51
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Joonhun Lee
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 

Último (20)

GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 

Materials Project Validation, Provenance, and Sandboxes by Dan Gunter

  • 1. The Materials Project Validation, Provenance and Sandboxes
  • 2. Goals • Validation – constantly guard against bugs in core data and imported data • Provenance – know how data came to be • Sandboxes – Combine public and non-public data; "good fences make good neighbors"
  • 3. Validation (Internal) Database ID External ID What we expected What we got
  • 4. Validation runs all the time • Rules with "constraints" for every database (and sandbox) • Test constraints against entire DB every night  email reports • Validation engine, etc. all open-source software in pymatgen-db Remote server Validation engine Rules MP Databases Reports (email, web pages, ..)
  • 5. Rules have a simple syntax _aliases: - snl_id = mps_id - energy = analysis.e_above_hull materials: - filter: constraints: - final_energy_per_atom <= 0 - initial_structure.lattice.volume > 0 - initial_structure.lattice.a > 0 - initial_structure.lattice.b > 0 - initial_structure.lattice.c > 0 - initial_structure.lattice.matrix size 3 - formation_energy_per_atom <= 5 - formation_energy_per_atom > -5 - cpu_time > 5 - e_above_hull > -0.000001 - final_energy < 0 - reduced_cell_formula size$ nelements # Check num. ICSD sources for selected compounds - filter: - task_id = "mp-540081" constraints: - icsd_id size> 10 - filter: - task_id = "mp-20379" constraints: - icsd_id size 1 - filter: - task_id = "mp-13634" constraints: - icsd_id size> 0 - filter: - task_id = "mp-600022" constraints: - icsd_id size 0 # NiO2 phases should never become stable - filter: - e_above_hull = 0 constraints: - pretty_formula != 'NiO2' tasks: - filter: - state = "successful" constraints: - output.final_energy_per_atom <= 0
  • 6. Validation summary Easy-to-use, integrated, efficient tools to report errors Next steps – Record all check results in DB – More sophisticated checks (Map/Reduce) – Make it easier to add new checks internally – Make it easier to add new check for anyone • per-sandbox or even per-user ("MP Alerts")
  • 7. Provenance: How do I know that the data is correct?
  • 8. Types of provenance in the system 1) Calculation workflows – FireWorks records calculation inputs, .. results in great detail 2) External datasets – Structure Notation Language standardizes the naming of data sources and publications 3) Post-calculation data transformations – New "builders" provides framework for tracking creation of final database products (1) (2) (3)
  • 10. Provenance in DB Structure Notation Language "snl_final": { "about": { "created_at": { "string": "2014-02-22 19:07:00.383869", "@class": "datetime", "@module": "datetime" }, "_materialsproject": { "submission_id": 52621, "snl_id": 398676, "spacegroup": { "lattice_type": "tetragonal", "symbol": "P4_2/mmc", "number": 131, "point_group": "4/mmm", "crystal_system": "tetragonal", "hall": "-P 4c 2" } }, "_cedergroup": { "BURP_sids": [ 409544, 409545, 409546 ], "icsd_ids": [ ], "e_above_hull": 0.075125350000000423734 }, "references": "", "authors": [ { "name": "Geoffroy Hautier", "email": "geoffroy.hautier@uclouvain .be" }, { "name": "Bo Xu", "email": "boxu14@mit.edu" } ], "remarks": [ "supplementary compounds from MIT matgen database" ], "projects": [ "MIT matgen" ], "history": [ { "url": "http://www.fiz- karlsruhe.de/icsd_home.htm l", "name": "Inorganic Crystal Structure Database", "description": { "Collection code": 24692 } }, { "url": "", "name": "", "description": { "source": null, "orig_name": "Basic substitution code.", "formula": "O1 Pd1" } }, { "url": "http://ceder.mit.edu/", "name": "MIT Ceder group research database", "description": { "source": 105986, "orig_name": "", "formula": "FeO" } }, { "url": "http://www.materialsproject. org", "name": "Materials Project structure optimization", "description": { "fw_id": 820305, "task_type": "GGA optimize structure (2x)", "task_id": "mp- 753682" } }, { "url": "http://www.materialsproject. org", "name": "Materials Project structure optimization", "description": { "fw_id": 820308, "task_type": "GGA+U optimize structure (2x)", "task_id": "mp- 776678" } } ] }, Metadata Crystal DB sources References History of structure optimizations
  • 11. Future work: unified view of provenance VASP result ICSD VASP result VASP result Post- processing Material properties Computation Data import processing e.g., Defects
  • 14. Sandboxes = Database + Apps Core data Core data + multivalent materials Non- JCESR users JCESR users
  • 15. Technical challenges • Pre-process data for real-time search • Interfaces for per-user access control – https://materialsproject.org/materials/1234?san dbox=jcesr – Web UI elements and
  • 16. Future: dynamic sandbox creation Current: – Large & significant additional data / apps • e.g., JCESR – Longer-term connections to MP data • e.g. porous materials – Companies • e.g. VW/Stanford Future small collab. per-user? CoD?
  • 17. Summary • Validation – guard against bugs by checking all data daily and at data import/creation time • Provenance – universal standard for annotating data provenance • Sandboxes – unified view of distinct databases – onramp for new collaborations and data

Notas do Editor

  1. Picture of 1915 Heinrich Campendonk painting, "Landscape with horses". Steve Martin paid $850K for a forged version of the painting, from a reputable art house in Paris, in 2004. He sold it at a loss of $250K before discovering it was a forgery. The forgery was performed by Wolfgang Beltracchi.
  2. Sandboxes are a way to share preliminary data in the context of MP data and tools.