SlideShare uma empresa Scribd logo
1 de 55
Baixar para ler offline
Chemical Databases and Open
   Chemistry on the Desktop
5th Meeting on US Government Chemical Databases & Open Chemistry
                       August 25, 2011
                   Dr. Marcus D. Hanwell
                   marcus.hanwell@kitware.com




                                                               1	
  
Outline
•    Background
•    Opening up chemistry
•    Workflows in computational chemistry
•    Avogadro – chemical editor
•    Databases on the desktop
•    Quixote
•    HPC resource integration
•    Advanced visualization


                                            2	
  
My Background
•  Ph.D. (Physics) – University of Sheffield
•  Google Summer of Code – Avogadro
•  Postdoc (Chemistry) – University of Pittsburgh
•  R&D engineer – Kitware, Inc
•  Passionate about physics, chemistry, and the
   growing need to improve computational tools
•  See the need for powerful open source, cross
   platform frameworks and applications
•  Develop(ed): Gentoo, KDE, Kalzium, Avogadro,
   Open Babel, VTK, ParaView, Titan, CMake

                                                    3	
  
Kitware
•  Founded in 1998: 5 former GE Research employees
•  95 employees: 42% PhD
•  Privately held, profitable from creation, no debt
•  Rapidly Growing: >30% in 2010, 7M web-visitors/quarter
•  Offices                                  •  2011 Small Business
   –  Albany, NY                               Administration’s
                                               Tibbetts Award
   –  Carrboro, NC
                                            •  HPCWire Readers
   –  Lyon, France                             and Editor’s Choice

   –  Bangalore, India                      •  Inc’s 5000 List: 2008
                                               to 2010
Kitware: Core Technologies




                CMake




                CDash




                             5	
  
Opening Up Chemistry
•  Computational chemistry is currently one
   of the more closed sciences
•  Lots of black box proprietary codes
  –  Only a few have access to the code
  –  Publishing results from black box codes
  –  Many file formats in use, little agreement
•  More papers should be including data
•  Growing need for open standards

                                                  6	
  
Movements for Open Chemistry
•  Formed an “unorganization” – Blue Obelisk
  –  Published first article in 2005
  –  Open data, open standards and open source
  –  Meet at ACS and other conferences when possible
  –  Follow-up article currently in press



•  Quixote collaboration more recently
  –  Provide meaningful data storage and exchange
  –  Principally targeting computational chemistry

                                                       7	
  
Typical Chemistry Workflow

        Log File                      Input File
                   Edit/Analyze	
  




   Results	
           Data	
         Job	
  Submission	
  




                                        Local
                   Calcula>on	
        Remote
                                                              8	
  
Problem: Pretty Complex/Manual
•    Most steps require user intervention
•    Obtain starting structure (previous work, databases)
•    Edit structure
•    Write input file
•    Move input file to cluster
•    Submit to queue
•    Wait for completion
•    Retrieve input file
•    Analyze output file
•    Extract the relevant data, change formats
•    Store results
•    Repeat

                                                            9	
  
Improved Chemistry Workflow

       Log File                      Input File
                  Edit/Analyze	
  



  Results	
           Data	
         Job	
  Submission	
  




                                       Local
                  Calcula>on	
        Remote

                                                         10	
  
Avogadro
•  Project began 2006
•  Split into library and
application (plugin based)
•  One of very few open source editors
•  Designed to be extensible from the start
•  Generate input & read output from many codes
•  An active and growing community
•  Chemistry needs a free, open framework


                                                  11	
  
Avogadro’s Roots
•  Avogadro projected started in 2006
•  First funded work in 2007 by Marcus Hanwell
  –  Google Summer of Code student
  –  Final year of Ph.D. spent the summer coding
  –  Funded as part of KDE project – Kalzium editor
•  Built on several other open source projects
  –  Qt, Eigen, Open Babel, Blue Obelisk Data Repository
•  Also uses open standards, e.g. OpenGL
•  Cross platform, open source stack

                                                       12	
  
Avogadro Vital Statistics
•    Supports Linux, Windows and Mac OS X
•    Contributions from over 20 developers
•    Over 180,000 downloads over 4 years
•    Translated into 19 languages
•    Used by Kalzium for molecular editor
•    Featured by Trolltech/Nokia,
     –  Qt in use
     –  Qt ambassador program
                                             13	
  
14	
  
Desktop Database
•  Use of “document store” NoSQL
  •  Doesn’t force too much structure
     •  Some entries have experimental data available
     •  Some have computational jobs
  •  Employ a “pile of stuff” approach
     •  Can store both source and derived data
     •  Calculate identifiers, QSAR properties, etc
•  MongoDB is a scalable, open solution
  •  Proven scaling with large web applications

                                                        15	
  
Chemistry Data Explorer
•    Qt application
•    Connects to local or remote database
•    Uses VTK for visual data exploration
•    Can ingest new data
     –  Uses Open Babel to generate descriptors
     –  Standard InChi, SMILES, molecular weight
     –  More could be added
        •  All derived from files stored in the database

                                                           16	
  
Chemistry Data Explorer




                          17	
  
Database Interaction on the Web
•  Avogadro directly accesses some (read-
   only) public databases:
  •  PDB, NIH “fetch by name”
  •  Resolve structure to common name using CIR
  •  More could be added
•  ChemData also uses NIH CIR for data
•  Quixote aims to support both public and
   private sharing models – open framework

                                              18	
  
Quixote Architecture




                       19	
  
Avogadro




           20	
  
OpenQube – Quantum Data
•  Reads in key quantum data
  –  Basis set used in calculation
  –  Eigenvectors for molecular orbitals
  –  Density matrix for electron density
  –  Standard geometry
•  Multithreaded calculation
  –  Produce regular grids of scalar data
  –  Molecular orbitals, electron density…

                                             21	
  
Molecular Orbitals and Electron Density

•  Quantum files store basis sets and
   matrices
                −αr 2
  GTO = ce
  φ i = ∑ c µiφ µ
        µ

  ρ(r) = ∑ ∑ Pµν φ µ φν
            µ   ν
•  Using these equations, and the supplied
   matrices – calculate cubes
                                             22	
  
Calling Stand Alone Programs
•  Many already supported:
  •  GAMESS, GAMESS-UK, Molpro, Q-Chem,
     MOPAC, NWChem, Gaussian, Dalton
  •  Easy to add more
•  Some codes writing Avogadro based
   custom applications,
  •  Q-Chem, Molpro…
•  DLPOLY author approached me:
  •  Open sourced DLPOLY2, want a GUI

                                          23	
  
Job Submission & Management
•  Take input file, submit to queue, monitor,
   retrieve, repeat
•  System tray resident Qt application
  •  Manage both local and remote jobs
•  Interest from developers
  •  Use in other applications
  •  Share development/maintenance burden


                                                24	
  
Open in Avogadro When Complete




                                 25	
  
Advanced Visualization: VTK
•  New Avogadro plugin:
     •  Takes volumetric data from Avogadro
     •  Uses GPU accelerated rendering in VTK
•    Excitement from many in the community
•    Several groups interested in collaborating
•    Google Summer of Code project
•    Leverage significant capabilities in VTK


                                                  26	
  
Volume Rendered With Contours




                                27	
  
Electron Density Volume Render




                                 28	
  
Electron Density Ray Tracing




                               29	
  
Conclusions
•  There is still a lot of work to do
•  Open databases are of critical importance
•  Need tools to make retrieving and
   depositing data easier
•  Improved data exchange is essential to
   improve reproducibility in chemistry
•  Create shared collaboration platforms
  –  Deliver improved workflows, enable research

                                                   30	
  
Extra Background Slides
•  Additional visualization and background
   slides




                                             31	
  
Standard Representations




                           32	
  
Standard Representations




                           33	
  
Biomolecules




               34	
  
Nanomaterials




                35	
  
Simplified Views




                   36	
  
Volumetric Data: Molecular Orbitals




                                  37	
  
Periodic Systems




                   38	
  
Hybrid Views: CPK + MO + Ball & Stick




                                    39	
  
Linked Views of Live Data




                            40	
  
2D: Graphs and Charts




                        41	
  
Informatics




              42	
  
3D Interaction Widgets




                         43	
  
VTK: The Toolkit
•  Collection of C++ libraries
  –  Leveraged by many applications
  –  Divided into logical areas, e.g.
     •  Filtering – data processing in visualization pipeline
     •  InfoVis – informatics visualization
     •  Widgets – 3D interaction widgets
     •  VolumeRendering – 3D volume rendering
•  Cross platform, using OpenGL
•  Wrapped in Python, Tcl and Java

                                                            44	
  
VTK Development Team
 • From Ohloh: Very large, active development team: Over the past twelve
  months, 100 developers contributed new code to VTK. This is one of the largest open-source
  teams in the world, and is in the top 2% of all project teams on Ohloh.




                                                      and many others...
                                                                                               45	
  
ParaView
•    Parallel visualization application
•    Open source, BSD licensed
•    Turn-key application wrapper around VTK
•    Parallel data processing and rendering




                                               46	
  
Large Data Visualization
•  BlueGene/L at LLNL
  –  65,536 compute nodes (32 bit PPC)
  –  1,024 I/O nodes (32 bit PPC)
  –  512 MB of RAM per node
•  Sandia Red Storm
  –  12,960 compute nodes (AMD Opteron dual)
  –  640 service and I/O nodes
  –  40 TB of DDR RAM per node

                                               47	
  
1 Billion Cell Asteroid Simulation




                                     48	
  
Tiled Displays




                 49	
  
Parallel Processing/Rendering




                                50	
  
3D Chemistry Visualization
•  Some existing features specific to chemistry
   –  Gaussian cube, PDB, and a few others
•  Excellent handling of volumetric data:
   –  Marching cubes
   –  Volume rendering
   –  Contouring
•  Advanced rendering:
   –  Point sprites
   –  Manta – real time ray tracing

                                                  51	
  
Titan: VTK and Informatics
•  Led by Sandia National Laboratories
•  Substantial expansion of VTK:
   –  Informatics & analysis
•  Actively developed, growing feature set
•  Improved 2D rendering and API
•  Database connectivity, client-server, pipeline
   based approach
•  Uses web technologies such as ProtoViz
•  Scalable, interactive infoviz

                                                    52	
  
Manta: Real Time Ray Tracing




                               53	
  
New Frontiers
•  New work porting VTK
  –  Use C++ as the common core
    •  iOS port in the early stages
    •  Android port
  –  Use OpenGL ES 2.0 – new rendering code
•  Also ParaViewWeb – delivering over web
  –  Use image delivery and rendering on server
  –  Also using WebGL for rendering (optionally)


                                                   54	
  
Future Directions
•  VTK modularization (in progress)
   –  Developing more agile build systems
   –  Automating more with CMake
•  Using Git more fully to improve stability
   –  Use of master and next
   –  Topic branches - merge when ready
•  Code review using Gerrit
   –  Integration with continuous integration
   –  Test before merge


                                                55	
  

Mais conteúdo relacionado

Mais procurados

Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonJo-fai Chow
 
Introduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use CasesIntroduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use CasesJo-fai Chow
 
Evolution of database access technologies in Java-based software projects
Evolution of database access technologies in Java-based software projectsEvolution of database access technologies in Java-based software projects
Evolution of database access technologies in Java-based software projectsTom Mens
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution vty
 
Survival analysis of database technologies in open source Java projects
Survival analysis of database technologies in open source Java projectsSurvival analysis of database technologies in open source Java projects
Survival analysis of database technologies in open source Java projectsTom Mens
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyvty
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R MeetupJo-fai Chow
 
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data GeekFrom Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data GeekJo-fai Chow
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...vty
 
OpenVis Conference Report Part 1 (and Introduction to D3.js)
OpenVis Conference Report Part 1 (and Introduction to D3.js)OpenVis Conference Report Part 1 (and Introduction to D3.js)
OpenVis Conference Report Part 1 (and Introduction to D3.js)Keiichiro Ono
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and FutureKeiichiro Ono
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneJo-fai Chow
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneJo-fai Chow
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...Vyacheslav Tykhonov
 
Crafting bigdatabenchmarks
Crafting bigdatabenchmarksCrafting bigdatabenchmarks
Crafting bigdatabenchmarksTilmann Rabl
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes vty
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataversevty
 
TPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data IntegrationTPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data IntegrationTilmann Rabl
 

Mais procurados (20)

Introduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and PythonIntroduction to Machine Learning with H2O and Python
Introduction to Machine Learning with H2O and Python
 
Introduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use CasesIntroduction to H2O and Model Stacking Use Cases
Introduction to H2O and Model Stacking Use Cases
 
Evolution of database access technologies in Java-based software projects
Evolution of database access technologies in Java-based software projectsEvolution of database access technologies in Java-based software projects
Evolution of database access technologies in Java-based software projects
 
5 years of Dataverse evolution
5 years of Dataverse evolution 5 years of Dataverse evolution
5 years of Dataverse evolution
 
Survival analysis of database technologies in open source Java projects
Survival analysis of database technologies in open source Java projectsSurvival analysis of database technologies in open source Java projects
Survival analysis of database technologies in open source Java projects
 
Building COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhyBuilding COVID-19 Knowledge Graph at CoronaWhy
Building COVID-19 Knowledge Graph at CoronaWhy
 
H2O at Poznan R Meetup
H2O at Poznan R MeetupH2O at Poznan R Meetup
H2O at Poznan R Meetup
 
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data GeekFrom Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
From Kaggle to H2O - The True Story of a Civil Engineer Turned Data Geek
 
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
Automated CI/CD testing, installation and deployment of Dataverse infrastruct...
 
OpenVis Conference Report Part 1 (and Introduction to D3.js)
OpenVis Conference Report Part 1 (and Introduction to D3.js)OpenVis Conference Report Part 1 (and Introduction to D3.js)
OpenVis Conference Report Part 1 (and Introduction to D3.js)
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
Bioinformatics on Azure
Bioinformatics on AzureBioinformatics on Azure
Bioinformatics on Azure
 
H2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to EveryoneH2O Deep Water - Making Deep Learning Accessible to Everyone
H2O Deep Water - Making Deep Learning Accessible to Everyone
 
Flexible metadata schemes for research data repositories - Clarin Conference...
Flexible metadata schemes for research data repositories  - Clarin Conference...Flexible metadata schemes for research data repositories  - Clarin Conference...
Flexible metadata schemes for research data repositories - Clarin Conference...
 
ResourceSync tutorial OAI8
ResourceSync tutorial OAI8ResourceSync tutorial OAI8
ResourceSync tutorial OAI8
 
Crafting bigdatabenchmarks
Crafting bigdatabenchmarksCrafting bigdatabenchmarks
Crafting bigdatabenchmarks
 
The world of Docker and Kubernetes
The world of Docker and Kubernetes The world of Docker and Kubernetes
The world of Docker and Kubernetes
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
TPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data IntegrationTPC-DI - The First Industry Benchmark for Data Integration
TPC-DI - The First Industry Benchmark for Data Integration
 

Semelhante a Chemical Databases and Open Chemistry on the Desktop

Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
Reproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsReproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsSimon Cockell
 
Hadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreHadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreUri Laserson
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesJan Aerts
 
Efficient & effective data management for research projects : ILRI's Data Ma...
Efficient & effective  data management for research projects : ILRI's Data Ma...Efficient & effective  data management for research projects : ILRI's Data Ma...
Efficient & effective data management for research projects : ILRI's Data Ma...CIARD Movement
 
GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020GEO Analytics Canada
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation WorkflowsSCAPE Project
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Christopher Curtin
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobus
 
Data Science Day New York: Data Science: A Personal History
Data Science Day New York: Data Science: A Personal HistoryData Science Day New York: Data Science: A Personal History
Data Science Day New York: Data Science: A Personal HistoryCloudera, Inc.
 
How static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ codeHow static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ codecppfrug
 
IMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Centre of Competence
 
Webinar: How We Evaluated MongoDB as a Relational Database Replacement
Webinar: How We Evaluated MongoDB as a Relational Database ReplacementWebinar: How We Evaluated MongoDB as a Relational Database Replacement
Webinar: How We Evaluated MongoDB as a Relational Database ReplacementMongoDB
 
Clouds, Clusters, and Containers: Tools for responsible, collaborative computing
Clouds, Clusters, and Containers: Tools for responsible, collaborative computingClouds, Clusters, and Containers: Tools for responsible, collaborative computing
Clouds, Clusters, and Containers: Tools for responsible, collaborative computingMatthew Vaughn
 
QuantCell Research - The Big Data Spreadsheet
QuantCell Research - The Big Data SpreadsheetQuantCell Research - The Big Data Spreadsheet
QuantCell Research - The Big Data Spreadsheetinside-BigData.com
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4Michael Kehoe
 
Reproducibility of computational workflows is automated using continuous anal...
Reproducibility of computational workflows is automated using continuous anal...Reproducibility of computational workflows is automated using continuous anal...
Reproducibility of computational workflows is automated using continuous anal...Kento Aoyama
 
COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015Comsode - FP7 project
 

Semelhante a Chemical Databases and Open Chemistry on the Desktop (20)

Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
Reproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsReproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformatics
 
Hadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreHadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant Store
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
 
Efficient & effective data management for research projects : ILRI's Data Ma...
Efficient & effective  data management for research projects : ILRI's Data Ma...Efficient & effective  data management for research projects : ILRI's Data Ma...
Efficient & effective data management for research projects : ILRI's Data Ma...
 
Virtualization for HPC at NCI
Virtualization for HPC at NCIVirtualization for HPC at NCI
Virtualization for HPC at NCI
 
GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation Workflows
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
 
GlobusWorld 2020 Keynote
GlobusWorld 2020 KeynoteGlobusWorld 2020 Keynote
GlobusWorld 2020 Keynote
 
Data Science Day New York: Data Science: A Personal History
Data Science Day New York: Data Science: A Personal HistoryData Science Day New York: Data Science: A Personal History
Data Science Day New York: Data Science: A Personal History
 
How static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ codeHow static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ code
 
IMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens Neudecker
 
COPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob DaveyCOPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob Davey
 
Webinar: How We Evaluated MongoDB as a Relational Database Replacement
Webinar: How We Evaluated MongoDB as a Relational Database ReplacementWebinar: How We Evaluated MongoDB as a Relational Database Replacement
Webinar: How We Evaluated MongoDB as a Relational Database Replacement
 
Clouds, Clusters, and Containers: Tools for responsible, collaborative computing
Clouds, Clusters, and Containers: Tools for responsible, collaborative computingClouds, Clusters, and Containers: Tools for responsible, collaborative computing
Clouds, Clusters, and Containers: Tools for responsible, collaborative computing
 
QuantCell Research - The Big Data Spreadsheet
QuantCell Research - The Big Data SpreadsheetQuantCell Research - The Big Data Spreadsheet
QuantCell Research - The Big Data Spreadsheet
 
CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4CouchbasetoHadoop_Matt_Michael_Justin v4
CouchbasetoHadoop_Matt_Michael_Justin v4
 
Reproducibility of computational workflows is automated using continuous anal...
Reproducibility of computational workflows is automated using continuous anal...Reproducibility of computational workflows is automated using continuous anal...
Reproducibility of computational workflows is automated using continuous anal...
 
COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015
 

Último

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Chemical Databases and Open Chemistry on the Desktop

  • 1. Chemical Databases and Open Chemistry on the Desktop 5th Meeting on US Government Chemical Databases & Open Chemistry August 25, 2011 Dr. Marcus D. Hanwell marcus.hanwell@kitware.com 1  
  • 2. Outline •  Background •  Opening up chemistry •  Workflows in computational chemistry •  Avogadro – chemical editor •  Databases on the desktop •  Quixote •  HPC resource integration •  Advanced visualization 2  
  • 3. My Background •  Ph.D. (Physics) – University of Sheffield •  Google Summer of Code – Avogadro •  Postdoc (Chemistry) – University of Pittsburgh •  R&D engineer – Kitware, Inc •  Passionate about physics, chemistry, and the growing need to improve computational tools •  See the need for powerful open source, cross platform frameworks and applications •  Develop(ed): Gentoo, KDE, Kalzium, Avogadro, Open Babel, VTK, ParaView, Titan, CMake 3  
  • 4. Kitware •  Founded in 1998: 5 former GE Research employees •  95 employees: 42% PhD •  Privately held, profitable from creation, no debt •  Rapidly Growing: >30% in 2010, 7M web-visitors/quarter •  Offices •  2011 Small Business –  Albany, NY Administration’s Tibbetts Award –  Carrboro, NC •  HPCWire Readers –  Lyon, France and Editor’s Choice –  Bangalore, India •  Inc’s 5000 List: 2008 to 2010
  • 5. Kitware: Core Technologies CMake CDash 5  
  • 6. Opening Up Chemistry •  Computational chemistry is currently one of the more closed sciences •  Lots of black box proprietary codes –  Only a few have access to the code –  Publishing results from black box codes –  Many file formats in use, little agreement •  More papers should be including data •  Growing need for open standards 6  
  • 7. Movements for Open Chemistry •  Formed an “unorganization” – Blue Obelisk –  Published first article in 2005 –  Open data, open standards and open source –  Meet at ACS and other conferences when possible –  Follow-up article currently in press •  Quixote collaboration more recently –  Provide meaningful data storage and exchange –  Principally targeting computational chemistry 7  
  • 8. Typical Chemistry Workflow Log File Input File Edit/Analyze   Results   Data   Job  Submission   Local Calcula>on   Remote 8  
  • 9. Problem: Pretty Complex/Manual •  Most steps require user intervention •  Obtain starting structure (previous work, databases) •  Edit structure •  Write input file •  Move input file to cluster •  Submit to queue •  Wait for completion •  Retrieve input file •  Analyze output file •  Extract the relevant data, change formats •  Store results •  Repeat 9  
  • 10. Improved Chemistry Workflow Log File Input File Edit/Analyze   Results   Data   Job  Submission   Local Calcula>on   Remote 10  
  • 11. Avogadro •  Project began 2006 •  Split into library and application (plugin based) •  One of very few open source editors •  Designed to be extensible from the start •  Generate input & read output from many codes •  An active and growing community •  Chemistry needs a free, open framework 11  
  • 12. Avogadro’s Roots •  Avogadro projected started in 2006 •  First funded work in 2007 by Marcus Hanwell –  Google Summer of Code student –  Final year of Ph.D. spent the summer coding –  Funded as part of KDE project – Kalzium editor •  Built on several other open source projects –  Qt, Eigen, Open Babel, Blue Obelisk Data Repository •  Also uses open standards, e.g. OpenGL •  Cross platform, open source stack 12  
  • 13. Avogadro Vital Statistics •  Supports Linux, Windows and Mac OS X •  Contributions from over 20 developers •  Over 180,000 downloads over 4 years •  Translated into 19 languages •  Used by Kalzium for molecular editor •  Featured by Trolltech/Nokia, –  Qt in use –  Qt ambassador program 13  
  • 14. 14  
  • 15. Desktop Database •  Use of “document store” NoSQL •  Doesn’t force too much structure •  Some entries have experimental data available •  Some have computational jobs •  Employ a “pile of stuff” approach •  Can store both source and derived data •  Calculate identifiers, QSAR properties, etc •  MongoDB is a scalable, open solution •  Proven scaling with large web applications 15  
  • 16. Chemistry Data Explorer •  Qt application •  Connects to local or remote database •  Uses VTK for visual data exploration •  Can ingest new data –  Uses Open Babel to generate descriptors –  Standard InChi, SMILES, molecular weight –  More could be added •  All derived from files stored in the database 16  
  • 18. Database Interaction on the Web •  Avogadro directly accesses some (read- only) public databases: •  PDB, NIH “fetch by name” •  Resolve structure to common name using CIR •  More could be added •  ChemData also uses NIH CIR for data •  Quixote aims to support both public and private sharing models – open framework 18  
  • 20. Avogadro 20  
  • 21. OpenQube – Quantum Data •  Reads in key quantum data –  Basis set used in calculation –  Eigenvectors for molecular orbitals –  Density matrix for electron density –  Standard geometry •  Multithreaded calculation –  Produce regular grids of scalar data –  Molecular orbitals, electron density… 21  
  • 22. Molecular Orbitals and Electron Density •  Quantum files store basis sets and matrices −αr 2 GTO = ce φ i = ∑ c µiφ µ µ ρ(r) = ∑ ∑ Pµν φ µ φν µ ν •  Using these equations, and the supplied matrices – calculate cubes 22  
  • 23. Calling Stand Alone Programs •  Many already supported: •  GAMESS, GAMESS-UK, Molpro, Q-Chem, MOPAC, NWChem, Gaussian, Dalton •  Easy to add more •  Some codes writing Avogadro based custom applications, •  Q-Chem, Molpro… •  DLPOLY author approached me: •  Open sourced DLPOLY2, want a GUI 23  
  • 24. Job Submission & Management •  Take input file, submit to queue, monitor, retrieve, repeat •  System tray resident Qt application •  Manage both local and remote jobs •  Interest from developers •  Use in other applications •  Share development/maintenance burden 24  
  • 25. Open in Avogadro When Complete 25  
  • 26. Advanced Visualization: VTK •  New Avogadro plugin: •  Takes volumetric data from Avogadro •  Uses GPU accelerated rendering in VTK •  Excitement from many in the community •  Several groups interested in collaborating •  Google Summer of Code project •  Leverage significant capabilities in VTK 26  
  • 27. Volume Rendered With Contours 27  
  • 28. Electron Density Volume Render 28  
  • 29. Electron Density Ray Tracing 29  
  • 30. Conclusions •  There is still a lot of work to do •  Open databases are of critical importance •  Need tools to make retrieving and depositing data easier •  Improved data exchange is essential to improve reproducibility in chemistry •  Create shared collaboration platforms –  Deliver improved workflows, enable research 30  
  • 31. Extra Background Slides •  Additional visualization and background slides 31  
  • 34. Biomolecules 34  
  • 35. Nanomaterials 35  
  • 37. Volumetric Data: Molecular Orbitals 37  
  • 39. Hybrid Views: CPK + MO + Ball & Stick 39  
  • 40. Linked Views of Live Data 40  
  • 41. 2D: Graphs and Charts 41  
  • 42. Informatics 42  
  • 44. VTK: The Toolkit •  Collection of C++ libraries –  Leveraged by many applications –  Divided into logical areas, e.g. •  Filtering – data processing in visualization pipeline •  InfoVis – informatics visualization •  Widgets – 3D interaction widgets •  VolumeRendering – 3D volume rendering •  Cross platform, using OpenGL •  Wrapped in Python, Tcl and Java 44  
  • 45. VTK Development Team • From Ohloh: Very large, active development team: Over the past twelve months, 100 developers contributed new code to VTK. This is one of the largest open-source teams in the world, and is in the top 2% of all project teams on Ohloh. and many others... 45  
  • 46. ParaView •  Parallel visualization application •  Open source, BSD licensed •  Turn-key application wrapper around VTK •  Parallel data processing and rendering 46  
  • 47. Large Data Visualization •  BlueGene/L at LLNL –  65,536 compute nodes (32 bit PPC) –  1,024 I/O nodes (32 bit PPC) –  512 MB of RAM per node •  Sandia Red Storm –  12,960 compute nodes (AMD Opteron dual) –  640 service and I/O nodes –  40 TB of DDR RAM per node 47  
  • 48. 1 Billion Cell Asteroid Simulation 48  
  • 49. Tiled Displays 49  
  • 51. 3D Chemistry Visualization •  Some existing features specific to chemistry –  Gaussian cube, PDB, and a few others •  Excellent handling of volumetric data: –  Marching cubes –  Volume rendering –  Contouring •  Advanced rendering: –  Point sprites –  Manta – real time ray tracing 51  
  • 52. Titan: VTK and Informatics •  Led by Sandia National Laboratories •  Substantial expansion of VTK: –  Informatics & analysis •  Actively developed, growing feature set •  Improved 2D rendering and API •  Database connectivity, client-server, pipeline based approach •  Uses web technologies such as ProtoViz •  Scalable, interactive infoviz 52  
  • 53. Manta: Real Time Ray Tracing 53  
  • 54. New Frontiers •  New work porting VTK –  Use C++ as the common core •  iOS port in the early stages •  Android port –  Use OpenGL ES 2.0 – new rendering code •  Also ParaViewWeb – delivering over web –  Use image delivery and rendering on server –  Also using WebGL for rendering (optionally) 54  
  • 55. Future Directions •  VTK modularization (in progress) –  Developing more agile build systems –  Automating more with CMake •  Using Git more fully to improve stability –  Use of master and next –  Topic branches - merge when ready •  Code review using Gerrit –  Integration with continuous integration –  Test before merge 55