SlideShare uma empresa Scribd logo
1 de 34
Baixar para ler offline
The Open Chemistry Project
       February 2, 2013
          FOSDEM

     Dr. Marcus D. Hanwell
     Technical Leader, Kitware, Inc.
     marcus.hanwell@kitware.com
     http://openchemistry.org/         1	
  
Outline
•  Kitware
•  History
•  Open Chemistry
  –  Avogadro
  –  MoleQueue
  –  MongoChem
•  Final thoughts


                    2	
  
Kitware
•  Founded in 1998 by 5 former GE Research employees
•  111 employees: 38 with PhDs
•  Privately held, profitable from creation, no debt
•  Rapidly Growing: >30% in 2011, 7M web-visitors/quarter
•  Offices                                 •  2011 Small Business
                                              Administration’s
   –  Albany, NY                              Tibbetts Award
   –  Carrboro, NC
                                           •  HPCWire Readers
   –  Santa Fe, NM                            and Editor’s Choice

   –  Lyon, France                         •  Inc’s 5000 List: 2008
                                              to 2011
Kitware: Core Technologies




                CMake




                CDash




                             4	
  
Beginnings of Open Chemistry
•  The Avogadro project began in 2006
•  One of very few open source 3D chemical editors
     •  Draw/edit structure
     •  Generate input for codes
     •  Analyze output of codes
•    Open source, GPLv2 GUI
•    Google Summer of Code in 2007 (KDE)
•    Used by Kalzium in KDE educational tool
•    Over 300,000 downloads, 20+ translations


                                                     5	
  
Avogadro Paper Published 8/13/12




         http://www.jcheminf.com/content/4/1/17
                                                  6	
  
Introduction to Open Chemistry
•  User-friendly integration with
  –  Computational codes
  –  HPC/cloud resources
  –  Database/informatics resources




                                      7	
  
Vision
•  Advancing the state-of-the-art
•  Tight integration is needed
   •    Computational codes
   •    Clusters/supercomputers
   •    Data repositories
   •    Reduce, reuse, recycle!
•  Facilitate sharing and
   searching of data
•  Embracing data-centric workflows

                                      8	
  
Overview
•  Desktop chemistry application suite
  –  3D structure editor, pre- and post-processing
  –  HPC integration – easily run codes
  –  Cheminformatics to store, index, and analyze
•  Each can work independently
  –  Enhanced functionality when used together
  –  One-click HPC job submission
  –  Easily open structure found in database
  –  Coordination of job submission
                                                     9	
  
Open Chemistry Project Approach
•  Open approach to chemistry software
  –  Open source frameworks
  –  Developed openly
  –  Cross platform
  –  Tested, verified
  –  Contribution model
  –  Supported by Kitware experts
•  BSD licensed to facilitate research/reuse

                                               10	
  
Open Chemistry Development Team

•  Assembled an inter-disciplinary team
•  Domain specialists: quantum chemistry,
   biology, solid-state materials
•  Computer scientists: build systems,
   queuing, graphics, process
•  Marcus, Kyle, David, and Chris.



                                            11	
  
OpenChemistry.org
•    Central site to promote Open Chemistry
•    Hosting of project-specific pages
•    Providing an identity for related projects
•    Promoting shared ownership of projects
     –  Website
     –  Code submission/review
     –  Testing infrastructure
     –  Wiki, mailing lists, news, galleries

                                                  12	
  
http://openchemistry.org/   13	
  
Applications Being Developed
•  Three independent applications
•  Communication handled with local sockets
•  Avogadro 2 – structure editing, input
   generation, output viewing and analysis
•  MoleQueue – running local and remote
   jobs in standalone programs, management
•  MongoChem – Storage of data, searching,
   entry, annotation
                                          14	
  
Open Frameworks
•  AvogadroLibs – core data structures and
   algorithms shared across codes
  •  Split into dedicated libraries, e.g. core, io,
     rendering, qtgui, qtopengl, qtplugins, quantum
  •  Core maintains a minimal dependency set
  •  Intended for use on server, command line,
     and in a full-blown desktop application
•  VTK – specialized chemistry visualization/
   data structures, use of above
                                                  15	
  
Project Diagram: Libraries/Apps




                                  16	
  
Workflow in Open Chemistry

       Log File                         Input File
                    Avogadro2	
  



  Results	
       MongoChem	
           Job	
  Submission	
  



                  MoleQueue	
        Local

                                     Remote
                    Calcula>on	
  

                                                            17	
  
Avogadro2
•  Rewrite of Avogadro
•  Split into libraries and
application (plugin based)
•  Still one of very few open source editors
•  Still using Qt, C++, Eigen, OpenGL, CMake
•  Use AvogadroLibs for core data
•  Introduce client-server dataflow/patterns
•  New, efficient rendering code
•  More liberally licensed – from GPL to BSD

                                               18	
  
Avogadro: Visualization
•    GPU accelerated rendering
•    VTK for advanced visualization
•    Support for 2D and 3D plots of data
•    Optimized data structures
     –  Large data
     –  Streaming
•  Reworked interface
     –  Tighter database/workflow integration

                                                19	
  
Advanced Impostor Rendering
•  Using a scene, vertex buffer objects, and
   OpenGL shading language
•  Impostor techniques
  –  Sphere goes from 100s of triangles to 2!
  –  No artifacts from triangulation
  –  Scales to millions of spheres on modest GPU




                                                   20	
  
Electronic Structure Visualization
•  Read quantum output files
  –  Calculate cubes for molecular orbitals
  –  Show isosurface or volume rendering
  –  Multithreaded C++ code to perform
     calculations – scales very well




                                              21	
  
Scriptable Simulation Input Generator

•  Previous input generators were C++
•  Execute a simple Python script
  –  Script can output JSON with parameters
  –  Input is parameters specified by user
  –  Chemical JSON with full structure
•  New input generator is as simple as
   adding a new Python script
  –  Implement 2-3 entry points and done

                                              22	
  
MoleQueue: Job Management
•  Tighter integration with remote queues
•  Integration with databases
  –  Retain full log of computational jobs
  –  Trigger actions on completion
•  Plugin based system
  –  Easy addition of new codes
  –  Easy addition of new queue systems
•  Provide client API for applications

                                             23	
  
MoleQueue
•  Support configuration for a variety of
   remote clusters and queuing software
MoleQueue: Queue Types
•  Several transports implemented
  –  Command line SSH/plink (Windows)
  –  libssh2 (experimental)
  –  HTTPS (SOAP)
•  Several queue types
  –  Sun Grid Engine
  –  PBS
  –  UIT (ezHPC with largely PBS dialect)

                                            25	
  
Using JSON
•  MongoDB stores data as BSON
    –  JSON: JavaScript Object Notation
    –  BSON: Binary form, type safe
•  JSON is very compact, standardized
{
    “name”: “water”,
    “atoms”: {
      “elementType”: [“H”, “H”, “O”],
    }
    “properties”: { “molecular weight”: 18.0153 }
}

                                                    26	
  
JSON-RPC interface
  Applications can submit jobs via a local
   socket or ZeroMQ connection:
Client request:
{ "jsonrpc": "2.0",
  "method": "submitJob",
  "params": {
     "queue": "Remote cluster PBS",
     "program": "MOPAC",
     "description": "PM6 H2 optimization",
     "inputAsString": "PM6nnH 0.0 0.0 0.0nH 1.0 0.0 0.0n"
  },
  "id": "XXX” }       Server reply:
                      { "jsonrpc": "2.0",
                         "result": {
                            "moleQueueId": 17,
                            "queueId": 123456,
                            "workingDirectory": "/tmp/MoleQueue/17/"
                         },
                         "id": "XXX” }
Chemical JSON
•  Stores molecular structure,
   geometry, identifiers, and
   descriptors as a JSON object
•  Benefits:
  –  More compact than XML/CML
  –  Native language of MongoDB
     and JSON-RPC
  –  Easily converted to a binary
     representation (BSON)



                                    28	
  
MongoChem Overview
•  A desktop cheminformatics tool
  –  Chemical data exploration and analysis
  –  Interactive, editable, and searchable database
•  Leverages several open-source projects
  –  Qt, VTK, MongoDB, Chemkit, Open Babel
•  Designed to look at many molecules
•  Spot patterns, outliers, run many jobs
•  Scaling studies with ~3 million structures
Architecture Overview
•  Native, cross-platform C++ application built with Qt
•  Stores chemical data in a NoSQL MongoDB database
•  Uses VTK for 2D and 3D dataset visualization




                                                          30	
  
ParaViewWeb and Open Chemistry




                             31	
  
Software Process
•  Source code publicly hosted using Git
•  Gerrit for code review
•  CTest/CDash for testing/summary
  –  Gerrit can use CDash@Home
     •  Test proposed changes before merge
•  CDash can now provide binaries
  –  Built nightly, available for direct download
•  Wiki, mailing list, bug tracker

                                                    32	
  
Software Process




                   33	
  
Final Thoughts
•    Real opportunity to make an impact
•    Bringing best practices to chemistry
•    Improve research, industry and teaching
•    Semantic data at the center of our work
     –  Storage
     –  Search
     –  Interaction with computational codes
     –  Comparison with experimental data
•  Liberal, BSD licensed, cross platform codes

                                                 34	
  

Mais conteúdo relacionado

Mais procurados

Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryOpen Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Marcus Hanwell
 
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
University of California, San Diego
 
PTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityPTU: Using Provenance for Repeatability
PTU: Using Provenance for Repeatability
Tanu Malik
 

Mais procurados (20)

Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistryOpen Chemistry, JupyterLab and data: Reproducible quantum chemistry
Open Chemistry, JupyterLab and data: Reproducible quantum chemistry
 
Evolution of database access technologies in Java-based software projects
Evolution of database access technologies in Java-based software projectsEvolution of database access technologies in Java-based software projects
Evolution of database access technologies in Java-based software projects
 
LDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationLDV: Light-weight Database Virtualization
LDV: Light-weight Database Virtualization
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
 
Bioinformatics on Azure
Bioinformatics on AzureBioinformatics on Azure
Bioinformatics on Azure
 
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...The Materials Project Ecosystem - A Complete Software and Data Platform for M...
The Materials Project Ecosystem - A Complete Software and Data Platform for M...
 
PTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityPTU: Using Provenance for Repeatability
PTU: Using Provenance for Repeatability
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC Programs
 
Ipaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanIpaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, Ian
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015
 
Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014
 
Reproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter NotebookReproducible Workflow with Cytoscape and Jupyter Notebook
Reproducible Workflow with Cytoscape and Jupyter Notebook
 
io-Chem-BD, una solució per gestionar el Big Data en Química Computacional
io-Chem-BD, una solució per gestionar el Big Data en Química Computacionalio-Chem-BD, una solució per gestionar el Big Data en Química Computacional
io-Chem-BD, una solució per gestionar el Big Data en Química Computacional
 
Open-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKitOpen-source from/in the enterprise: the RDKit
Open-source from/in the enterprise: the RDKit
 
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics DataBest pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
 
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBenchWBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
 
GeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with GlobusGeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with Globus
 
07 data structures_and_representations
07 data structures_and_representations07 data structures_and_representations
07 data structures_and_representations
 
Some "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data frontSome "challenges" on the open-source/open-data front
Some "challenges" on the open-source/open-data front
 
RDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival dataRDF-Gen: Generating RDF from streaming and archival data
RDF-Gen: Generating RDF from streaming and archival data
 

Destaque

Atom Economy - Comenius Project
Atom Economy - Comenius ProjectAtom Economy - Comenius Project
Atom Economy - Comenius Project
classe4ach
 
Copy Of Determination Of The Contents Of Cold Drinks
Copy Of Determination Of The Contents Of Cold DrinksCopy Of Determination Of The Contents Of Cold Drinks
Copy Of Determination Of The Contents Of Cold Drinks
Himanshu Sagar
 
Atom economy - "Green Chemistry Project"
Atom economy - "Green Chemistry Project"Atom economy - "Green Chemistry Project"
Atom economy - "Green Chemistry Project"
classe4ach
 

Destaque (20)

Chemistry project on casein in mik
Chemistry project on casein in mikChemistry project on casein in mik
Chemistry project on casein in mik
 
chemistry Project class 12
chemistry Project class 12chemistry Project class 12
chemistry Project class 12
 
Chemistry project on chemistry in everyday life
Chemistry project on chemistry in everyday lifeChemistry project on chemistry in everyday life
Chemistry project on chemistry in everyday life
 
C7 lesson part four
C7 lesson part fourC7 lesson part four
C7 lesson part four
 
When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...When to Use MongoDB...and When You Should Not...
When to Use MongoDB...and When You Should Not...
 
Atom Economy - Comenius Project
Atom Economy - Comenius ProjectAtom Economy - Comenius Project
Atom Economy - Comenius Project
 
Chemistry Investigatory Project Class 12
Chemistry Investigatory Project Class 12Chemistry Investigatory Project Class 12
Chemistry Investigatory Project Class 12
 
Copy Of Determination Of The Contents Of Cold Drinks
Copy Of Determination Of The Contents Of Cold DrinksCopy Of Determination Of The Contents Of Cold Drinks
Copy Of Determination Of The Contents Of Cold Drinks
 
Atom economy - "Green Chemistry Project"
Atom economy - "Green Chemistry Project"Atom economy - "Green Chemistry Project"
Atom economy - "Green Chemistry Project"
 
Chemistry project
Chemistry projectChemistry project
Chemistry project
 
Peter Ward: The True Power of SharePoint Designer Workflows
Peter Ward: The True Power of SharePoint Designer WorkflowsPeter Ward: The True Power of SharePoint Designer Workflows
Peter Ward: The True Power of SharePoint Designer Workflows
 
Winds of change: The shifting face of leadership in business
Winds of change: The shifting face of leadership in businessWinds of change: The shifting face of leadership in business
Winds of change: The shifting face of leadership in business
 
SharePoint Branding Guidance @ SharePoint Saturday Redmond
SharePoint Branding Guidance @ SharePoint Saturday RedmondSharePoint Branding Guidance @ SharePoint Saturday Redmond
SharePoint Branding Guidance @ SharePoint Saturday Redmond
 
Universidad nacional chimborazo
Universidad nacional chimborazoUniversidad nacional chimborazo
Universidad nacional chimborazo
 
Notice flexible retractable retraflex
Notice flexible retractable retraflexNotice flexible retractable retraflex
Notice flexible retractable retraflex
 
Derecho informatico
Derecho informaticoDerecho informatico
Derecho informatico
 
Monografía EXPOSICION EN EVALUACION
Monografía EXPOSICION EN EVALUACION Monografía EXPOSICION EN EVALUACION
Monografía EXPOSICION EN EVALUACION
 
Word debate
Word debateWord debate
Word debate
 
HAPPYWEEK 197 - 2016.12.05. happyweek 197
HAPPYWEEK 197 - 2016.12.05. happyweek 197HAPPYWEEK 197 - 2016.12.05. happyweek 197
HAPPYWEEK 197 - 2016.12.05. happyweek 197
 
Presentación de socializacion aamtic
Presentación de socializacion aamticPresentación de socializacion aamtic
Presentación de socializacion aamtic
 

Semelhante a The Open Chemistry Project

IMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens Neudecker
IMPACT Centre of Competence
 
Adam bosc-071114
Adam bosc-071114Adam bosc-071114
Adam bosc-071114
fnothaft
 
HPC Web overview - Mobyle Workshop - September 28, 2012
HPC Web overview - Mobyle Workshop - September 28, 2012HPC Web overview - Mobyle Workshop - September 28, 2012
HPC Web overview - Mobyle Workshop - September 28, 2012
Hervé Ménager
 
The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...
The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...
The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...
Daniel Krook
 

Semelhante a The Open Chemistry Project (20)

IMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens Neudecker
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
Kubernetes meetup bangalore december 2017 - v02
Kubernetes meetup bangalore   december 2017 - v02Kubernetes meetup bangalore   december 2017 - v02
Kubernetes meetup bangalore december 2017 - v02
 
Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures Bitfusion Nimbix Dev Summit Heterogeneous Architectures
Bitfusion Nimbix Dev Summit Heterogeneous Architectures
 
How static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ codeHow static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ code
 
KubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
KubeCon USA 2017 brief Overview - from Kubernetes meetup BangaloreKubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
KubeCon USA 2017 brief Overview - from Kubernetes meetup Bangalore
 
Adam bosc-071114
Adam bosc-071114Adam bosc-071114
Adam bosc-071114
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
 
Adopting OpenTelemetry
Adopting OpenTelemetryAdopting OpenTelemetry
Adopting OpenTelemetry
 
Kitware: Qt and Scientific Computing
Kitware: Qt and Scientific ComputingKitware: Qt and Scientific Computing
Kitware: Qt and Scientific Computing
 
Enterprise Trends for MongoDB as a Service
Enterprise Trends for MongoDB as a ServiceEnterprise Trends for MongoDB as a Service
Enterprise Trends for MongoDB as a Service
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Prometheus - basics
Prometheus - basicsPrometheus - basics
Prometheus - basics
 
GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020
 
HPC Web overview - Mobyle Workshop - September 28, 2012
HPC Web overview - Mobyle Workshop - September 28, 2012HPC Web overview - Mobyle Workshop - September 28, 2012
HPC Web overview - Mobyle Workshop - September 28, 2012
 
NoSQL on the move
NoSQL on the moveNoSQL on the move
NoSQL on the move
 
The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...
The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...
The Containers Ecosystem, the OpenStack Magnum Project, the Open Container In...
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

The Open Chemistry Project

  • 1. The Open Chemistry Project February 2, 2013 FOSDEM Dr. Marcus D. Hanwell Technical Leader, Kitware, Inc. marcus.hanwell@kitware.com http://openchemistry.org/ 1  
  • 2. Outline •  Kitware •  History •  Open Chemistry –  Avogadro –  MoleQueue –  MongoChem •  Final thoughts 2  
  • 3. Kitware •  Founded in 1998 by 5 former GE Research employees •  111 employees: 38 with PhDs •  Privately held, profitable from creation, no debt •  Rapidly Growing: >30% in 2011, 7M web-visitors/quarter •  Offices •  2011 Small Business Administration’s –  Albany, NY Tibbetts Award –  Carrboro, NC •  HPCWire Readers –  Santa Fe, NM and Editor’s Choice –  Lyon, France •  Inc’s 5000 List: 2008 to 2011
  • 4. Kitware: Core Technologies CMake CDash 4  
  • 5. Beginnings of Open Chemistry •  The Avogadro project began in 2006 •  One of very few open source 3D chemical editors •  Draw/edit structure •  Generate input for codes •  Analyze output of codes •  Open source, GPLv2 GUI •  Google Summer of Code in 2007 (KDE) •  Used by Kalzium in KDE educational tool •  Over 300,000 downloads, 20+ translations 5  
  • 6. Avogadro Paper Published 8/13/12 http://www.jcheminf.com/content/4/1/17 6  
  • 7. Introduction to Open Chemistry •  User-friendly integration with –  Computational codes –  HPC/cloud resources –  Database/informatics resources 7  
  • 8. Vision •  Advancing the state-of-the-art •  Tight integration is needed •  Computational codes •  Clusters/supercomputers •  Data repositories •  Reduce, reuse, recycle! •  Facilitate sharing and searching of data •  Embracing data-centric workflows 8  
  • 9. Overview •  Desktop chemistry application suite –  3D structure editor, pre- and post-processing –  HPC integration – easily run codes –  Cheminformatics to store, index, and analyze •  Each can work independently –  Enhanced functionality when used together –  One-click HPC job submission –  Easily open structure found in database –  Coordination of job submission 9  
  • 10. Open Chemistry Project Approach •  Open approach to chemistry software –  Open source frameworks –  Developed openly –  Cross platform –  Tested, verified –  Contribution model –  Supported by Kitware experts •  BSD licensed to facilitate research/reuse 10  
  • 11. Open Chemistry Development Team •  Assembled an inter-disciplinary team •  Domain specialists: quantum chemistry, biology, solid-state materials •  Computer scientists: build systems, queuing, graphics, process •  Marcus, Kyle, David, and Chris. 11  
  • 12. OpenChemistry.org •  Central site to promote Open Chemistry •  Hosting of project-specific pages •  Providing an identity for related projects •  Promoting shared ownership of projects –  Website –  Code submission/review –  Testing infrastructure –  Wiki, mailing lists, news, galleries 12  
  • 14. Applications Being Developed •  Three independent applications •  Communication handled with local sockets •  Avogadro 2 – structure editing, input generation, output viewing and analysis •  MoleQueue – running local and remote jobs in standalone programs, management •  MongoChem – Storage of data, searching, entry, annotation 14  
  • 15. Open Frameworks •  AvogadroLibs – core data structures and algorithms shared across codes •  Split into dedicated libraries, e.g. core, io, rendering, qtgui, qtopengl, qtplugins, quantum •  Core maintains a minimal dependency set •  Intended for use on server, command line, and in a full-blown desktop application •  VTK – specialized chemistry visualization/ data structures, use of above 15  
  • 17. Workflow in Open Chemistry Log File Input File Avogadro2   Results   MongoChem   Job  Submission   MoleQueue   Local Remote Calcula>on   17  
  • 18. Avogadro2 •  Rewrite of Avogadro •  Split into libraries and application (plugin based) •  Still one of very few open source editors •  Still using Qt, C++, Eigen, OpenGL, CMake •  Use AvogadroLibs for core data •  Introduce client-server dataflow/patterns •  New, efficient rendering code •  More liberally licensed – from GPL to BSD 18  
  • 19. Avogadro: Visualization •  GPU accelerated rendering •  VTK for advanced visualization •  Support for 2D and 3D plots of data •  Optimized data structures –  Large data –  Streaming •  Reworked interface –  Tighter database/workflow integration 19  
  • 20. Advanced Impostor Rendering •  Using a scene, vertex buffer objects, and OpenGL shading language •  Impostor techniques –  Sphere goes from 100s of triangles to 2! –  No artifacts from triangulation –  Scales to millions of spheres on modest GPU 20  
  • 21. Electronic Structure Visualization •  Read quantum output files –  Calculate cubes for molecular orbitals –  Show isosurface or volume rendering –  Multithreaded C++ code to perform calculations – scales very well 21  
  • 22. Scriptable Simulation Input Generator •  Previous input generators were C++ •  Execute a simple Python script –  Script can output JSON with parameters –  Input is parameters specified by user –  Chemical JSON with full structure •  New input generator is as simple as adding a new Python script –  Implement 2-3 entry points and done 22  
  • 23. MoleQueue: Job Management •  Tighter integration with remote queues •  Integration with databases –  Retain full log of computational jobs –  Trigger actions on completion •  Plugin based system –  Easy addition of new codes –  Easy addition of new queue systems •  Provide client API for applications 23  
  • 24. MoleQueue •  Support configuration for a variety of remote clusters and queuing software
  • 25. MoleQueue: Queue Types •  Several transports implemented –  Command line SSH/plink (Windows) –  libssh2 (experimental) –  HTTPS (SOAP) •  Several queue types –  Sun Grid Engine –  PBS –  UIT (ezHPC with largely PBS dialect) 25  
  • 26. Using JSON •  MongoDB stores data as BSON –  JSON: JavaScript Object Notation –  BSON: Binary form, type safe •  JSON is very compact, standardized { “name”: “water”, “atoms”: { “elementType”: [“H”, “H”, “O”], } “properties”: { “molecular weight”: 18.0153 } } 26  
  • 27. JSON-RPC interface Applications can submit jobs via a local socket or ZeroMQ connection: Client request: { "jsonrpc": "2.0", "method": "submitJob", "params": { "queue": "Remote cluster PBS", "program": "MOPAC", "description": "PM6 H2 optimization", "inputAsString": "PM6nnH 0.0 0.0 0.0nH 1.0 0.0 0.0n" }, "id": "XXX” } Server reply: { "jsonrpc": "2.0", "result": { "moleQueueId": 17, "queueId": 123456, "workingDirectory": "/tmp/MoleQueue/17/" }, "id": "XXX” }
  • 28. Chemical JSON •  Stores molecular structure, geometry, identifiers, and descriptors as a JSON object •  Benefits: –  More compact than XML/CML –  Native language of MongoDB and JSON-RPC –  Easily converted to a binary representation (BSON) 28  
  • 29. MongoChem Overview •  A desktop cheminformatics tool –  Chemical data exploration and analysis –  Interactive, editable, and searchable database •  Leverages several open-source projects –  Qt, VTK, MongoDB, Chemkit, Open Babel •  Designed to look at many molecules •  Spot patterns, outliers, run many jobs •  Scaling studies with ~3 million structures
  • 30. Architecture Overview •  Native, cross-platform C++ application built with Qt •  Stores chemical data in a NoSQL MongoDB database •  Uses VTK for 2D and 3D dataset visualization 30  
  • 31. ParaViewWeb and Open Chemistry 31  
  • 32. Software Process •  Source code publicly hosted using Git •  Gerrit for code review •  CTest/CDash for testing/summary –  Gerrit can use CDash@Home •  Test proposed changes before merge •  CDash can now provide binaries –  Built nightly, available for direct download •  Wiki, mailing list, bug tracker 32  
  • 34. Final Thoughts •  Real opportunity to make an impact •  Bringing best practices to chemistry •  Improve research, industry and teaching •  Semantic data at the center of our work –  Storage –  Search –  Interaction with computational codes –  Comparison with experimental data •  Liberal, BSD licensed, cross platform codes 34