SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
Using MongoDB for
Materials Discovery
   Michael Kocher and Dan Gunter
   Lawrence Berkeley National Lab
Energy Mission at LBNL
•   Li-ion Batteries

•   Photovoltaic (Solar Cells)

•   Thermoelectrics

•   Biofuels

•   New Computational Tools

•   Cutting edge Spectroscopic Tools (Advanced Light Source)

               http://carboncycle2.lbl.gov/
Current Material Design
    model is Slow


18 Years... from the average
new materials discovery to
commercialization


  Bringing New Materials to the Market: Eagar, T.W.
        Technology Review Feb 1995, 98, 42.
Materials Genome Initiative:
  A Renaissance of American Manufacturing

      “To help businesses discover, develop, and deploy new
     materials twice as fast, we're launching what we call the
       Materials Genome Initiative. The invention of silicon
   circuits and lithium-ion batteries made computers and iPods
        and iPads possible -- but it took years to get those
     technologies from the drawing board to the marketplace.

          We can do it faster.”
     - President Obama at Carnegie Mellon
              University 6/24/2011
What is a Material?
NaCl   Silicon
LiCoO2
          Li

         O

         Co
What can we Compute using
  quantum mechanics?
                                volume
                                 density
                              total energy
         +
                           formation energy
                                metallic?
                                  etc...



     No empirical parameters!
MaterialsProject.org
 ‘The Google of Material Science Data”




  +




  MIT and LBNL collaboration
Inverting the Problem
Detailed Properties
Machine Learning
                  How often can you
                                                   Structure 1
                 substitute Mg for Ca?
                                                   Structure 2
                                 (new materials)
                                                   Structure 3
                                                   Structure 4
materials.bson       Learning                      Structure 5
                     Algorithm                     Structure 6
                     What about
                     Na, V, P, O?


         Prof. Gerbrand Ceder (DOI: 10.1103/PhysRevLett.91.135503)
Materials Project:
 A Play in Three Acts

I.Data generation using HTC
II. Data storage
III.Data analysis/logging
Act I: Managing
       Calculations
• Centralized distributed model is the only
  way to go
• Hub is at LBNL
• Store the state in db
• Overview of running many MPI jobs at
  many different HP centers
MasterQueue  create a new
                    engine, add
                      to queue



                    pull crystal
       builder.x             master_queue.bson
      ‘The Brain’



        manager.x   manager.x      manager.x   manager.x   manager.x


HPC

        Franklin Hopper            Carver         lr1        lr2

                 NERSC                          Lawrencium
                (Oakland)                        (Berkeley)
Centralized Logging
Example                                     MongoDB
                                                                 and Management


manager.x   manager.x   manager.x   manager.x   manager.x   manager.x   manager.x   manager.x




  O1        Cathode Hopper Franklin Carver                     lr1        lr2        DLX
       MIT                  NERSC (Oakland)                          LBNL           Kentucky


 query = {‘elements’: {‘$all’: [“Li”, “O”], ‘nelectrons’ :{“$lte: 200}}
Act II :
Core Data storage
Very Complex Documents
Powerful Querying
Every crystal that has (Li or Na or K), (Mn), (O or S or F or Si)
plus one other element except (Zn or Ni or Fe or Cu or Co)

{
        "lattice.volume" : { "$lt" : 500 },
        "elements" : {"$all" : ['Mn'],"$size" : 4, “$nin”:['Zn','Ni','Fe','Cu','Co']},
        "atoms" : { "$elemMatch" : { ‘oxidation_state’ : 3, ‘symbol’:’Mn’} },
        "$where" : "match_all(
           this.element_names,
              ['Li', 'Na', 'K'],
              ['Mn'],
              ['O', 'S', 'F', 'Si'])"
    }
pre-MongoDB :(
((SELECT structure.structureid FROM structure NATURAL INNER JOIN
database NATURAL INNER JOIN databaseentry WHERE structureid IN
((select structure.structureid from structure NATURAL INNER JOIN
elemententry where elemententry.symbol='Li' INTERSECT select
structure.structureid from structure NATURAL INNER JOIN elemententry
where elemententry.symbol='O') INTERSECT select structure.structureid
from structure NATURAL INNER JOIN database NATURAL INNER JOIN
databaseentry where database.title='ICSD')) EXCEPT (SELECT
structure.structureid FROM structure where structure.entryid IN
(select duplicateentry.entryid from duplicateentry))) EXCEPT (SELECT
structure.structureid FROM structure where structure.entryid IN
(select entryid from removals))




Search for materials with Li and O,
       excluding duplicates
Map/Reduce
     Calculation 12
     Calculation 13 ✓
     Calculation 14
     Calculation 15

             MR


tasks.bson        materials.bson
Every App uses MongoDB


                 structure_predictors.bson
                 candidate_materials.bson
                 diffraction_patterns.bson




 by G. Hautier
Structure Predictor
Diffraction Pattern
Act III:
Analytics and Logging
Rich Error Analysis




    Experimental   Calculated
Integrated logging just
     makes sense
• Semi-structured data easily stored
• Can correlate with all other data
• Automation Layer: Failed tasks
• Web/App Layer
Conclusions
• MongoDB is a very versatile tool
• Used in several different cases
• Elegant query syntax
• Very useful for scientific data storage
• A lot of exciting future ideas
Acknowledgements
Thanks!

MaterialsProject.org

Mais conteúdo relacionado

Mais procurados

Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...Anubhav Jain
 
NANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designNANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designUniversity of California, San Diego
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Anubhav Jain
 
Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Anubhav Jain
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...Anubhav Jain
 
High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...Anubhav Jain
 
DuraMat Data Analytics
DuraMat Data AnalyticsDuraMat Data Analytics
DuraMat Data AnalyticsAnubhav Jain
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAnubhav Jain
 
Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectAnubhav Jain
 
Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Anubhav Jain
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Anubhav Jain
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Anubhav Jain
 
How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?Anubhav Jain
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignAnubhav Jain
 
The Materials Project: overview and infrastructure
The Materials Project: overview and infrastructureThe Materials Project: overview and infrastructure
The Materials Project: overview and infrastructureAnubhav Jain
 
Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Anubhav Jain
 
Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...KAMAL CHOUDHARY
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science researchAnubhav Jain
 
Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Anubhav Jain
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...Anubhav Jain
 

Mais procurados (20)

Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...Density functional theory calculations and data mining for new thermoelectric...
Density functional theory calculations and data mining for new thermoelectric...
 
NANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designNANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials design
 
Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...Discovering advanced materials for energy applications (with high-throughput ...
Discovering advanced materials for energy applications (with high-throughput ...
 
Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...Software tools for calculating materials properties in high-throughput (pymat...
Software tools for calculating materials properties in high-throughput (pymat...
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...
 
High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...High-throughput computation and machine learning methods applied to materials...
High-throughput computation and machine learning methods applied to materials...
 
DuraMat Data Analytics
DuraMat Data AnalyticsDuraMat Data Analytics
DuraMat Data Analytics
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design Problems
 
Conducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials ProjectConducting and Enabling Data-Driven Research Through the Materials Project
Conducting and Enabling Data-Driven Research Through the Materials Project
 
Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...Overview of accelerated materials design efforts in the Hacking Materials res...
Overview of accelerated materials design efforts in the Hacking Materials res...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...Combining density functional theory calculations, supercomputing, and data-dr...
Combining density functional theory calculations, supercomputing, and data-dr...
 
How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?How might machine learning help advance solar PV research?
How might machine learning help advance solar PV research?
 
Machine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst DesignMachine Learning Platform for Catalyst Design
Machine Learning Platform for Catalyst Design
 
The Materials Project: overview and infrastructure
The Materials Project: overview and infrastructureThe Materials Project: overview and infrastructure
The Materials Project: overview and infrastructure
 
Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...
 
Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...Accelerated Materials Discovery & Characterization with Classical, Quantum an...
Accelerated Materials Discovery & Characterization with Classical, Quantum an...
 
Software tools to facilitate materials science research
Software tools to facilitate materials science researchSoftware tools to facilitate materials science research
Software tools to facilitate materials science research
 
Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...Introduction (Part I): High-throughput computation and machine learning appli...
Introduction (Part I): High-throughput computation and machine learning appli...
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...
 

Destaque

酷米移動傳媒簡報(1209城邦)
酷米移動傳媒簡報(1209城邦)酷米移動傳媒簡報(1209城邦)
酷米移動傳媒簡報(1209城邦)idfamily chen
 
An overview of Mentor NW 200715 v2
An overview of Mentor NW 200715 v2An overview of Mentor NW 200715 v2
An overview of Mentor NW 200715 v2Mike Gerighty
 
D'Amato Shop | Olio extravergine, vino e conserve dal Salento
D'Amato Shop | Olio extravergine, vino e conserve dal SalentoD'Amato Shop | Olio extravergine, vino e conserve dal Salento
D'Amato Shop | Olio extravergine, vino e conserve dal SalentoD'Amato Shop
 
Internet i els drets fonamentals
Internet i els drets fonamentalsInternet i els drets fonamentals
Internet i els drets fonamentalsGrup8
 
Next Exit Tokyo: Mobile in Japan
Next Exit Tokyo: Mobile in JapanNext Exit Tokyo: Mobile in Japan
Next Exit Tokyo: Mobile in JapanChristopher Billich
 
First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition R2 ...
First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition  R2 ...First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition  R2 ...
First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition R2 ...JerryDorn
 
e-learning worldwide
e-learning worldwidee-learning worldwide
e-learning worldwidelaurenball
 
Food service supervisor perfomance appraisal 2
Food service supervisor perfomance appraisal 2Food service supervisor perfomance appraisal 2
Food service supervisor perfomance appraisal 2tonychoper4104
 

Destaque (15)

酷米移動傳媒簡報(1209城邦)
酷米移動傳媒簡報(1209城邦)酷米移動傳媒簡報(1209城邦)
酷米移動傳媒簡報(1209城邦)
 
ICE 2009 H1N1
ICE 2009 H1N1ICE 2009 H1N1
ICE 2009 H1N1
 
Wordbench nagoya
Wordbench nagoyaWordbench nagoya
Wordbench nagoya
 
An overview of Mentor NW 200715 v2
An overview of Mentor NW 200715 v2An overview of Mentor NW 200715 v2
An overview of Mentor NW 200715 v2
 
Spinning Top
Spinning TopSpinning Top
Spinning Top
 
Sales Presentation
Sales PresentationSales Presentation
Sales Presentation
 
D'Amato Shop | Olio extravergine, vino e conserve dal Salento
D'Amato Shop | Olio extravergine, vino e conserve dal SalentoD'Amato Shop | Olio extravergine, vino e conserve dal Salento
D'Amato Shop | Olio extravergine, vino e conserve dal Salento
 
Internet i els drets fonamentals
Internet i els drets fonamentalsInternet i els drets fonamentals
Internet i els drets fonamentals
 
35419
3541935419
35419
 
Next Exit Tokyo: Mobile in Japan
Next Exit Tokyo: Mobile in JapanNext Exit Tokyo: Mobile in Japan
Next Exit Tokyo: Mobile in Japan
 
Hustopeče
HustopečeHustopeče
Hustopeče
 
First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition R2 ...
First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition  R2 ...First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition  R2 ...
First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition R2 ...
 
e-learning worldwide
e-learning worldwidee-learning worldwide
e-learning worldwide
 
eshgh
eshgheshgh
eshgh
 
Food service supervisor perfomance appraisal 2
Food service supervisor perfomance appraisal 2Food service supervisor perfomance appraisal 2
Food service supervisor perfomance appraisal 2
 

Semelhante a Using MongoDB for Materials Discovery

Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applicationsaimsnist
 
MongoDB San Francisco 2013: MongoDB for Collaborative Science presented by D...
MongoDB San Francisco 2013:  MongoDB for Collaborative Science presented by D...MongoDB San Francisco 2013:  MongoDB for Collaborative Science presented by D...
MongoDB San Francisco 2013: MongoDB for Collaborative Science presented by D...MongoDB
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Anubhav Jain
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningAnubhav Jain
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for ScienceIan Foster
 
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Anubhav Jain
 
大強子計算網格與OSS
大強子計算網格與OSS大強子計算網格與OSS
大強子計算網格與OSSYuan CHAO
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsAnubhav Jain
 
And Then There Are Algorithms
And Then There Are AlgorithmsAnd Then There Are Algorithms
And Then There Are AlgorithmsInfluxData
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Anubhav Jain
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignAnubhav Jain
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Anubhav Jain
 
Implementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamicsImplementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamicsPFHub PFHub
 
Introduction to active learning
Introduction to active learningIntroduction to active learning
Introduction to active learningAlexey Voropaev
 
Overview of the Exascale Additive Manufacturing Project
Overview of the Exascale Additive Manufacturing ProjectOverview of the Exascale Additive Manufacturing Project
Overview of the Exascale Additive Manufacturing Projectinside-BigData.com
 
Belak_ICME_June02015
Belak_ICME_June02015Belak_ICME_June02015
Belak_ICME_June02015Jim Belak
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...BrianDeCost
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...Anubhav Jain
 

Semelhante a Using MongoDB for Materials Discovery (20)

Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and ApplicationsData Mining to Discovery for Inorganic Solids: Software Tools and Applications
Data Mining to Discovery for Inorganic Solids: Software Tools and Applications
 
MongoDB San Francisco 2013: MongoDB for Collaborative Science presented by D...
MongoDB San Francisco 2013:  MongoDB for Collaborative Science presented by D...MongoDB San Francisco 2013:  MongoDB for Collaborative Science presented by D...
MongoDB San Francisco 2013: MongoDB for Collaborative Science presented by D...
 
Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...Discovering new functional materials for clean energy and beyond using high-t...
Discovering new functional materials for clean energy and beyond using high-t...
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learning
 
Learning Systems for Science
Learning Systems for ScienceLearning Systems for Science
Learning Systems for Science
 
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
Accelerated Materials Discovery Using Theory, Optimization, and Natural Langu...
 
大強子計算網格與OSS
大強子計算網格與OSS大強子計算網格與OSS
大強子計算網格與OSS
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Open Source Tools for Materials Informatics
Open Source Tools for Materials InformaticsOpen Source Tools for Materials Informatics
Open Source Tools for Materials Informatics
 
ICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials ProjectICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials Project
 
And Then There Are Algorithms
And Then There Are AlgorithmsAnd Then There Are Algorithms
And Then There Are Algorithms
 
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...Progress Towards Leveraging Natural Language Processing for Collecting Experi...
Progress Towards Leveraging Natural Language Processing for Collecting Experi...
 
Applications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials DesignApplications of Natural Language Processing to Materials Design
Applications of Natural Language Processing to Materials Design
 
Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...Available methods for predicting materials synthesizability using computation...
Available methods for predicting materials synthesizability using computation...
 
Implementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamicsImplementing a neural network potential for exascale molecular dynamics
Implementing a neural network potential for exascale molecular dynamics
 
Introduction to active learning
Introduction to active learningIntroduction to active learning
Introduction to active learning
 
Overview of the Exascale Additive Manufacturing Project
Overview of the Exascale Additive Manufacturing ProjectOverview of the Exascale Additive Manufacturing Project
Overview of the Exascale Additive Manufacturing Project
 
Belak_ICME_June02015
Belak_ICME_June02015Belak_ICME_June02015
Belak_ICME_June02015
 
TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...TMS workshop on machine learning in materials science: Intro to deep learning...
TMS workshop on machine learning in materials science: Intro to deep learning...
 
The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...The Materials Project: A Community Data Resource for Accelerating New Materia...
The Materials Project: A Community Data Resource for Accelerating New Materia...
 

Último

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 

Using MongoDB for Materials Discovery

  • 1. Using MongoDB for Materials Discovery Michael Kocher and Dan Gunter Lawrence Berkeley National Lab
  • 2. Energy Mission at LBNL • Li-ion Batteries • Photovoltaic (Solar Cells) • Thermoelectrics • Biofuels • New Computational Tools • Cutting edge Spectroscopic Tools (Advanced Light Source) http://carboncycle2.lbl.gov/
  • 3. Current Material Design model is Slow 18 Years... from the average new materials discovery to commercialization Bringing New Materials to the Market: Eagar, T.W. Technology Review Feb 1995, 98, 42.
  • 4. Materials Genome Initiative: A Renaissance of American Manufacturing “To help businesses discover, develop, and deploy new materials twice as fast, we're launching what we call the Materials Genome Initiative. The invention of silicon circuits and lithium-ion batteries made computers and iPods and iPads possible -- but it took years to get those technologies from the drawing board to the marketplace. We can do it faster.” - President Obama at Carnegie Mellon University 6/24/2011
  • 5. What is a Material?
  • 6. NaCl Silicon
  • 7. LiCoO2 Li O Co
  • 8. What can we Compute using quantum mechanics? volume density total energy + formation energy metallic? etc... No empirical parameters!
  • 9. MaterialsProject.org ‘The Google of Material Science Data” + MIT and LBNL collaboration
  • 12. Machine Learning How often can you Structure 1 substitute Mg for Ca? Structure 2 (new materials) Structure 3 Structure 4 materials.bson Learning Structure 5 Algorithm Structure 6 What about Na, V, P, O? Prof. Gerbrand Ceder (DOI: 10.1103/PhysRevLett.91.135503)
  • 13. Materials Project: A Play in Three Acts I.Data generation using HTC II. Data storage III.Data analysis/logging
  • 14. Act I: Managing Calculations • Centralized distributed model is the only way to go • Hub is at LBNL • Store the state in db • Overview of running many MPI jobs at many different HP centers
  • 15. MasterQueue create a new engine, add to queue pull crystal builder.x master_queue.bson ‘The Brain’ manager.x manager.x manager.x manager.x manager.x HPC Franklin Hopper Carver lr1 lr2 NERSC Lawrencium (Oakland) (Berkeley)
  • 16. Centralized Logging Example MongoDB and Management manager.x manager.x manager.x manager.x manager.x manager.x manager.x manager.x O1 Cathode Hopper Franklin Carver lr1 lr2 DLX MIT NERSC (Oakland) LBNL Kentucky query = {‘elements’: {‘$all’: [“Li”, “O”], ‘nelectrons’ :{“$lte: 200}}
  • 17. Act II : Core Data storage
  • 19. Powerful Querying Every crystal that has (Li or Na or K), (Mn), (O or S or F or Si) plus one other element except (Zn or Ni or Fe or Cu or Co) { "lattice.volume" : { "$lt" : 500 }, "elements" : {"$all" : ['Mn'],"$size" : 4, “$nin”:['Zn','Ni','Fe','Cu','Co']}, "atoms" : { "$elemMatch" : { ‘oxidation_state’ : 3, ‘symbol’:’Mn’} }, "$where" : "match_all( this.element_names, ['Li', 'Na', 'K'], ['Mn'], ['O', 'S', 'F', 'Si'])" }
  • 20. pre-MongoDB :( ((SELECT structure.structureid FROM structure NATURAL INNER JOIN database NATURAL INNER JOIN databaseentry WHERE structureid IN ((select structure.structureid from structure NATURAL INNER JOIN elemententry where elemententry.symbol='Li' INTERSECT select structure.structureid from structure NATURAL INNER JOIN elemententry where elemententry.symbol='O') INTERSECT select structure.structureid from structure NATURAL INNER JOIN database NATURAL INNER JOIN databaseentry where database.title='ICSD')) EXCEPT (SELECT structure.structureid FROM structure where structure.entryid IN (select duplicateentry.entryid from duplicateentry))) EXCEPT (SELECT structure.structureid FROM structure where structure.entryid IN (select entryid from removals)) Search for materials with Li and O, excluding duplicates
  • 21. Map/Reduce Calculation 12 Calculation 13 ✓ Calculation 14 Calculation 15 MR tasks.bson materials.bson
  • 22. Every App uses MongoDB structure_predictors.bson candidate_materials.bson diffraction_patterns.bson by G. Hautier
  • 26. Rich Error Analysis Experimental Calculated
  • 27. Integrated logging just makes sense • Semi-structured data easily stored • Can correlate with all other data • Automation Layer: Failed tasks • Web/App Layer
  • 28. Conclusions • MongoDB is a very versatile tool • Used in several different cases • Elegant query syntax • Very useful for scientific data storage • A lot of exciting future ideas