SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
Pushing	
  Chemical	
  Biology	
  Data	
  
Through	
  the	
  Pipes	
  
Architec(ng	
  and	
  Extending	
  the	
  BARD	
  API	
  
Rajarshi	
  Guha,	
  John	
  Braisted,	
  Ajit	
  
Jadhav,	
  Dac-­‐Trung	
  Nguyen,	
  Tyler	
  
Peryea,	
  Noel	
  Southall	
  
	
  
September	
  9,	
  2014	
  
Indianapolis,	
  IN	
  
Outline	
  
MoIvaIons	
  
for	
  the	
  BARD	
  
InteracIng	
  
with	
  the	
  BARD	
  
Extensibility	
  &	
  
Community	
  
The	
  BioAssay	
  Research	
  Database	
  
•  Originated	
  from	
  the	
  NIH	
  Molecular	
  Libraries	
  
Program	
  
•  MoIvated	
  to	
  make	
  the	
  bioassay	
  data	
  generated	
  
by	
  the	
  MLP	
  more	
  accessible	
  and	
  amenable	
  to	
  
exploraIon	
  and	
  hypothesis	
  generaIon	
  
•  Joint	
  effort	
  between	
  NCGC,	
  Broad,	
  UNM,	
  
UMiami,	
  Vanderbilt,	
  Scripps,	
  Burnham	
  
Goals	
  of	
  the	
  BARD	
  
BARD’s	
  mission	
  is	
  to	
  enable	
  novice	
  and	
  expert	
  
scien7sts	
  to	
  effec7vely	
  u7lize	
  MLP	
  data	
  to	
  
generate	
  new	
  hypotheses	
  
	
  
•  Developed	
  as	
  an	
  open-­‐source,	
  	
  industrial-­‐strength	
  
plaWorm	
  to	
  support	
  public	
  translaIonal	
  research	
  
•  Foster	
  new	
  methods	
  to	
  interpret	
  &	
  analyze	
  chemical	
  
biology	
  data	
  
•  Develop	
  and	
  adopt	
  an	
  Assay	
  Data	
  Standard	
  
•  Enable	
  co-­‐loca7on	
  of	
  data	
  and	
  methods	
  
Components	
  of	
  the	
  BARD	
  Pla=orm	
  
CAP, Data Dictionary,
and Results
Deposition Data
model created &
populated
CAP UI with View and
basic editing
Dictionary defined as
OWL using Protégé
Warehouse loaded
with all PubChem
AIDs and results
Warehouse loaded
with GO terms, KEGG
terms, and DrugBank
annotations
Interac?ng	
  with	
  the	
  BARD	
  
Searching	
  the	
  BARD	
  
•  Full	
  text	
  search	
  via	
  Lucene/Solr	
  
– Key	
  entry	
  point	
  for	
  new	
  users	
  
– Search	
  code	
  runs	
  queries	
  in	
  parallel	
  
– Fast	
  faceIng,	
  auto	
  suggest,	
  filters	
  
•  EnIIes	
  are	
  linked	
  manually	
  via	
  Solr	
  schema	
  
– Allows	
  us	
  to	
  pick	
  up	
  enIIes	
  related	
  to	
  the	
  query	
  
– Supply	
  matching	
  context	
  
•  Custom	
  NCATS	
  code	
  for	
  fast	
  structure	
  searches	
  
The	
  BARD	
  API	
  
•  A	
  RESTful	
  applicaIon	
  programming	
  interface	
  
•  Access	
  to	
  individual	
  &	
  collecIons	
  of	
  enIIes	
  
•  Documented	
  via	
  the	
  wiki	
  
•  Versioned	
  
•  Hit	
  a	
  URL,	
  get	
  a	
  JSON	
  response	
  
– Every	
  language	
  supports	
  parsing	
  JSON	
  documents	
  
– Easy	
  to	
  inspect	
  via	
  the	
  browser	
  or	
  REST	
  clients	
  
hp://bard.nih.gov/api/v18/	
  
API	
  Architecture	
  
•  Java,	
  read-­‐only,	
  deployed	
  on	
  Glassfish	
  cluster	
  
•  Different	
  funcIonality	
  
hosted	
  in	
  different	
  
containers	
  
– Maintenance,	
  security	
  
– Stability	
  
– Performance	
  
	
  
API
Text
Search
Struct
Search
Data Warehouse
Plugins
ETL Database Text Search Engine Structure Search Engine
Caching Layer
http://bard.nih.gov/api
Jersey Webapps
deployed on HA
Application
Server Cluster
Open	
  Source	
  as	
  Far	
  as	
  Possible	
  
API	
  Resources	
  
•  Covers	
  many	
  data	
  types	
  
•  Each	
  resource	
  supports	
  a	
  	
  
variety	
  of	
  sub-­‐resources	
  
– Usually	
  linked	
  to	
  other	
  	
  
resources	
  
•  Use	
  /_info	
  to	
  see	
  what	
  	
  
sub-­‐resources	
  are	
  available	
  
/assays/_info!
API	
  Resources	
  
En7ty	
   Count	
  	
  
/assays! 990	
  
/biology! 1080	
  
/cap! 1942	
  
/compounds! 42,572,799	
  
/documents! 10,499	
  
/experiments! 1314	
  
/exptdata! 32,754,242	
  
/projects! 144	
  
/substances! 113,751,456	
  
Data	
  Warning	
  
We’re	
  s7ll	
  in	
  the	
  process	
  of	
  data	
  
cura7on	
  and	
  QC	
  so	
  while	
  you	
  can	
  
access	
  all	
  BARD	
  data,	
  keep	
  in	
  mind	
  that	
  
much	
  of	
  it	
  will	
  be	
  undergoing	
  review	
  
and	
  cura7on	
  and	
  so	
  may	
  change	
  
Summary	
  Resources	
  
•  A	
  number	
  of	
  enIIes	
  have	
  a	
  /summary sub-­‐
resource	
  (Compounds,	
  Projects)	
  
•  Aggregates	
  informaIon,	
  suitable	
  for	
  dashboards	
  
JSON	
  Responses	
  
•  All	
  responses	
  are	
  	
  
currently	
  JSON	
  
•  EnIIes	
  can	
  include	
  	
  
other	
  enIIes	
  (recursively)	
  
•  JSON	
  Schema	
  is	
  available	
  	
  
via	
  	
  /_schema 

	
  
/assays/_schema!
Extending	
  the	
  API	
  
•  Concept	
  of	
  plugins	
  
– Expands	
  the	
  resource	
  hierarchy	
  
– Has	
  to	
  be	
  wrien	
  in	
  Java	
  
•  What	
  can	
  a	
  plugin	
  accept?	
  
– Anything	
  
•  What	
  can	
  a	
  plugin	
  provide?	
  
– Anything	
  –	
  plain	
  text,	
  XML,	
  JSON,	
  HTML,	
  Flash	
  
•  Plugin	
  manifest	
  –	
  describe	
  available	
  resources,	
  
argument	
  types	
  
	
  
Plugins	
  have	
  to	
  	
  
be	
  deployable	
  
	
  on	
  the	
  JVM	
  
Plugin	
  Development	
  Workflow	
  
What	
  Can	
  a	
  Plugin	
  Expect?	
  
•  Direct	
  access	
  to	
  the	
  database	
  via	
  JDBC	
  
•  Faster	
  access	
  to	
  REST	
  API	
  via	
  co-­‐localizaIon	
  
•  No	
  local	
  storage	
  in	
  the	
  BARD	
  warehouse	
  
– But	
  the	
  plugin	
  can	
  use	
  its	
  own	
  storage	
  (such	
  as	
  an	
  
embedded	
  database)	
  
•  Plugins	
  have	
  access	
  to	
  system	
  JARs	
  (e.g.,	
  XOM,	
  
JChem)	
  but	
  should	
  bundle	
  their	
  own	
  required	
  
dependencies	
  
Plugin	
  Valida?on	
  
•  Run	
  a	
  series	
  of	
  checks	
  on	
  a	
  plugin	
  before	
  
deployment	
  
– Catches	
  manifest/resource	
  errors	
  
– Doesn’t	
  check	
  for	
  correctness	
  (not	
  our	
  job)	
  
– Examines	
  plugin	
  Java	
  class	
  
– Examines	
  final	
  plugin	
  package	
  
•  Run	
  as	
  a	
  command	
  line	
  tool	
  or	
  from	
  your	
  code	
  
•  Could	
  be	
  made	
  into	
  an	
  Ant/Maven	
  plugin	
  
What	
  Plugins	
  are	
  Available?	
  
•  The	
  plugin	
  registry	
  provides	
  
a	
  list	
  of	
  deployed	
  plugins	
  	
  
– Path	
  to	
  the	
  plugin	
  
– Version	
  
– Availability	
  
•  Long	
  term	
  goal	
  is	
  to	
  	
  
have	
  a	
  plugin	
  store	
  
/plugins/registry/list!
Exemplar	
  Plugins	
  
•  Look	
  at	
  the	
  plugin	
  repository	
  on	
  Github	
  
•  Ranges	
  from	
  	
  
– trivial	
  calls	
  to	
  external	
  service	
  
– Incorporate	
  external	
  tools/programs	
  
•  Current	
  set	
  of	
  plugins	
  highlight	
  plugin	
  
development	
  
•  Plugins	
  let	
  us	
  move	
  non-­‐essenIal	
  funcIonality	
  
out	
  of	
  the	
  core	
  API	
  
	
  
P.	
  Rydberg	
  et	
  al,	
  hp://www.farma.ku.dk/smartcyp/	
  
BARD-­‐	
  SMARTCyp	
  
More	
  Than	
  Just	
  Data	
  
BARD is not just a data store – it’s a platform
•  Seamlessly interact with users’ preferred tools
•  Allows the community to tailor it to their needs
•  Serve as a meeting ground for experimental
and computational methods
•  Enhance collaboration opportunities
Links	
  
@AskTheBARD	
  
hNp://bard.nih.gov	
  
API	
  Source	
  Code	
  Repository	
  
Plugin	
  Source	
  Code	
  Repository	
  
Development	
  Wiki	
  

Mais conteúdo relacionado

Mais procurados

Fire kit ios (r-baldwin)
Fire kit ios (r-baldwin)Fire kit ios (r-baldwin)
Fire kit ios (r-baldwin)DevDays
 
Best Practices for RESTful Web Services
Best Practices for RESTful Web ServicesBest Practices for RESTful Web Services
Best Practices for RESTful Web ServicesSalesforce Developers
 
Profiling with clin fhir
Profiling with clin fhirProfiling with clin fhir
Profiling with clin fhirDevDays
 
Security event logging and monitoring techniques
Security event logging and monitoring techniquesSecurity event logging and monitoring techniques
Security event logging and monitoring techniquesDataWorks Summit
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastDataWorks Summit
 
Open Source SQL Databases
Open Source SQL DatabasesOpen Source SQL Databases
Open Source SQL DatabasesEmanuel Calvo
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Abhiraj Butala
 
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...Milan Dojchinovski
 
Stream processing on mobile networks
Stream processing on mobile networksStream processing on mobile networks
Stream processing on mobile networkspbelko82
 
Managing Open Source Software in the GitHub Era
Managing Open Source Software in the GitHub EraManaging Open Source Software in the GitHub Era
Managing Open Source Software in the GitHub EranexB Inc.
 
Supporting End Users In The Creation Of Dependable Web Clips
Supporting End Users In The Creation Of Dependable Web ClipsSupporting End Users In The Creation Of Dependable Web Clips
Supporting End Users In The Creation Of Dependable Web Clipstomelf2007
 

Mais procurados (12)

Fire kit ios (r-baldwin)
Fire kit ios (r-baldwin)Fire kit ios (r-baldwin)
Fire kit ios (r-baldwin)
 
Best Practices for RESTful Web Services
Best Practices for RESTful Web ServicesBest Practices for RESTful Web Services
Best Practices for RESTful Web Services
 
Profiling with clin fhir
Profiling with clin fhirProfiling with clin fhir
Profiling with clin fhir
 
Security event logging and monitoring techniques
Security event logging and monitoring techniquesSecurity event logging and monitoring techniques
Security event logging and monitoring techniques
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the BeastTroubleshooting Kerberos in Hadoop: Taming the Beast
Troubleshooting Kerberos in Hadoop: Taming the Beast
 
Open Source SQL Databases
Open Source SQL DatabasesOpen Source SQL Databases
Open Source SQL Databases
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
 
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia Based NER S...
 
Stream processing on mobile networks
Stream processing on mobile networksStream processing on mobile networks
Stream processing on mobile networks
 
Managing Open Source Software in the GitHub Era
Managing Open Source Software in the GitHub EraManaging Open Source Software in the GitHub Era
Managing Open Source Software in the GitHub Era
 
Data platform evolution
Data platform evolutionData platform evolution
Data platform evolution
 
Supporting End Users In The Creation Of Dependable Web Clips
Supporting End Users In The Creation Of Dependable Web ClipsSupporting End Users In The Creation Of Dependable Web Clips
Supporting End Users In The Creation Of Dependable Web Clips
 

Semelhante a Pushing Chemical Biology Through the Pipes

Creating a RESTful api without losing too much sleep
Creating a RESTful api without losing too much sleepCreating a RESTful api without losing too much sleep
Creating a RESTful api without losing too much sleepMike Anderson
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache StanbolAlkuvoima
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research DatabaseRajarshi Guha
 
GCE11 Apache Rave Presentation
GCE11 Apache Rave PresentationGCE11 Apache Rave Presentation
GCE11 Apache Rave Presentationmarpierc
 
RESTful Web Service using Swagger
RESTful Web Service using SwaggerRESTful Web Service using Swagger
RESTful Web Service using SwaggerHong-Jhih Lin
 
OWASP Dependency-Track Introduction
OWASP Dependency-Track IntroductionOWASP Dependency-Track Introduction
OWASP Dependency-Track IntroductionSergey Sotnikov
 
Real world RESTful service development problems and solutions
Real world RESTful service development problems and solutionsReal world RESTful service development problems and solutions
Real world RESTful service development problems and solutionsMasoud Kalali
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...mestato
 
Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Jian Qin
 
Module development
Module development Module development
Module development Araport
 
Data munging and analysis
Data munging and analysisData munging and analysis
Data munging and analysisRaminder Singh
 
Linked APIs for Life Sciences Tutorial at SWAT4LS 3011
Linked APIs for Life Sciences Tutorial at SWAT4LS 3011Linked APIs for Life Sciences Tutorial at SWAT4LS 3011
Linked APIs for Life Sciences Tutorial at SWAT4LS 3011sspeiser
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologiesneeraj rathore
 
SC11 Science Gateway Group Overview
SC11 Science Gateway Group OverviewSC11 Science Gateway Group Overview
SC11 Science Gateway Group Overviewmarpierc
 
Linked Data Platform as a novel approach for Enterprise Application Integra...
Linked Data Platform as a novel approach for Enterprise Application Integra...Linked Data Platform as a novel approach for Enterprise Application Integra...
Linked Data Platform as a novel approach for Enterprise Application Integra...Nandana Mihindukulasooriya
 
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated RefactoringA Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated RefactoringRaffi Khatchadourian
 
Darwin Core Archive (DwC-A) validation: A New Collaborative Effort
Darwin Core Archive (DwC-A) validation: A New Collaborative EffortDarwin Core Archive (DwC-A) validation: A New Collaborative Effort
Darwin Core Archive (DwC-A) validation: A New Collaborative Effortkristgen
 
New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionSri Ambati
 

Semelhante a Pushing Chemical Biology Through the Pipes (20)

Creating a RESTful api without losing too much sleep
Creating a RESTful api without losing too much sleepCreating a RESTful api without losing too much sleep
Creating a RESTful api without losing too much sleep
 
Drupal and Apache Stanbol
Drupal and Apache StanbolDrupal and Apache Stanbol
Drupal and Apache Stanbol
 
The BioAssay Research Database
The BioAssay Research DatabaseThe BioAssay Research Database
The BioAssay Research Database
 
GCE11 Apache Rave Presentation
GCE11 Apache Rave PresentationGCE11 Apache Rave Presentation
GCE11 Apache Rave Presentation
 
RESTful Web Service using Swagger
RESTful Web Service using SwaggerRESTful Web Service using Swagger
RESTful Web Service using Swagger
 
OWASP Dependency-Track Introduction
OWASP Dependency-Track IntroductionOWASP Dependency-Track Introduction
OWASP Dependency-Track Introduction
 
Real world RESTful service development problems and solutions
Real world RESTful service development problems and solutionsReal world RESTful service development problems and solutions
Real world RESTful service development problems and solutions
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08
 
Swt
SwtSwt
Swt
 
Module development
Module development Module development
Module development
 
Data munging and analysis
Data munging and analysisData munging and analysis
Data munging and analysis
 
Linked APIs for Life Sciences Tutorial at SWAT4LS 3011
Linked APIs for Life Sciences Tutorial at SWAT4LS 3011Linked APIs for Life Sciences Tutorial at SWAT4LS 3011
Linked APIs for Life Sciences Tutorial at SWAT4LS 3011
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 
SC11 Science Gateway Group Overview
SC11 Science Gateway Group OverviewSC11 Science Gateway Group Overview
SC11 Science Gateway Group Overview
 
Linked Data Platform as a novel approach for Enterprise Application Integra...
Linked Data Platform as a novel approach for Enterprise Application Integra...Linked Data Platform as a novel approach for Enterprise Application Integra...
Linked Data Platform as a novel approach for Enterprise Application Integra...
 
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated RefactoringA Tool for Optimizing Java 8 Stream Software via Automated Refactoring
A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
 
Darwin Core Archive (DwC-A) validation: A New Collaborative Effort
Darwin Core Archive (DwC-A) validation: A New Collaborative EffortDarwin Core Archive (DwC-A) validation: A New Collaborative Effort
Darwin Core Archive (DwC-A) validation: A New Collaborative Effort
 
SEppt
SEpptSEppt
SEppt
 
New Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 EditionNew Developments in H2O: April 2017 Edition
New Developments in H2O: April 2017 Edition
 

Mais de Rajarshi Guha

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomeRajarshi Guha
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in contextRajarshi Guha
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomeRajarshi Guha
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMCRajarshi Guha
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformRajarshi Guha
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?Rajarshi Guha
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?Rajarshi Guha
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsRajarshi Guha
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATSRajarshi Guha
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & RRajarshi Guha
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical StructuresRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Rajarshi Guha
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the partsRajarshi Guha
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Rajarshi Guha
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Rajarshi Guha
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsRajarshi Guha
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleRajarshi Guha
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Rajarshi Guha
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in RRajarshi Guha
 
PMML for QSAR Model Exchange
PMML for QSAR Model Exchange PMML for QSAR Model Exchange
PMML for QSAR Model Exchange Rajarshi Guha
 

Mais de Rajarshi Guha (20)

Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in context
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark Genome
 
Pharos - Face of the KMC
Pharos - Face of the KMCPharos - Face of the KMC
Pharos - Face of the KMC
 
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS PlatformEnhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
Enhancing Prioritization & Discovery of Novel Combinations using an HTS Platform
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?
 
So I have an SD File … What do I do next?
So I have an SD File … What do I do next?So I have an SD File … What do I do next?
So I have an SD File … What do I do next?
 
Characterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network ModelsCharacterization of Chemical Libraries Using Scaffolds and Network Models
Characterization of Chemical Libraries Using Scaffolds and Network Models
 
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action: Bridging Chemistry and Biology with Informatics at NCATSFrom Data to Action: Bridging Chemistry and Biology with Informatics at NCATS
From Data to Action : Bridging Chemistry and Biology with Informatics at NCATS
 
Robots, Small Molecules & R
Robots, Small Molecules & RRobots, Small Molecules & R
Robots, Small Molecules & R
 
Fingerprinting Chemical Structures
Fingerprinting Chemical StructuresFingerprinting Chemical Structures
Fingerprinting Chemical Structures
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D...
 
When the whole is better than the parts
When the whole is better than the partsWhen the whole is better than the parts
When the whole is better than the parts
 
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
Exploring Compound Combinations in High Throughput Settings: Going Beyond 1D ...
 
Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...Characterization and visualization of compound combination responses in a hig...
Characterization and visualization of compound combination responses in a hig...
 
Cloudy with a Touch of Cheminformatics
Cloudy with a Touch of CheminformaticsCloudy with a Touch of Cheminformatics
Cloudy with a Touch of Cheminformatics
 
Chemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & ReproducibleChemical Data Mining: Open Source & Reproducible
Chemical Data Mining: Open Source & Reproducible
 
Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?Chemogenomics in the cloud: Is the sky the limit?
Chemogenomics in the cloud: Is the sky the limit?
 
Quantifying Text Sentiment in R
Quantifying Text Sentiment in RQuantifying Text Sentiment in R
Quantifying Text Sentiment in R
 
PMML for QSAR Model Exchange
PMML for QSAR Model Exchange PMML for QSAR Model Exchange
PMML for QSAR Model Exchange
 

Último

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Último (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Pushing Chemical Biology Through the Pipes

  • 1. Pushing  Chemical  Biology  Data   Through  the  Pipes   Architec(ng  and  Extending  the  BARD  API   Rajarshi  Guha,  John  Braisted,  Ajit   Jadhav,  Dac-­‐Trung  Nguyen,  Tyler   Peryea,  Noel  Southall     September  9,  2014   Indianapolis,  IN  
  • 2. Outline   MoIvaIons   for  the  BARD   InteracIng   with  the  BARD   Extensibility  &   Community  
  • 3. The  BioAssay  Research  Database   •  Originated  from  the  NIH  Molecular  Libraries   Program   •  MoIvated  to  make  the  bioassay  data  generated   by  the  MLP  more  accessible  and  amenable  to   exploraIon  and  hypothesis  generaIon   •  Joint  effort  between  NCGC,  Broad,  UNM,   UMiami,  Vanderbilt,  Scripps,  Burnham  
  • 4. Goals  of  the  BARD   BARD’s  mission  is  to  enable  novice  and  expert   scien7sts  to  effec7vely  u7lize  MLP  data  to   generate  new  hypotheses     •  Developed  as  an  open-­‐source,    industrial-­‐strength   plaWorm  to  support  public  translaIonal  research   •  Foster  new  methods  to  interpret  &  analyze  chemical   biology  data   •  Develop  and  adopt  an  Assay  Data  Standard   •  Enable  co-­‐loca7on  of  data  and  methods  
  • 5. Components  of  the  BARD  Pla=orm   CAP, Data Dictionary, and Results Deposition Data model created & populated CAP UI with View and basic editing Dictionary defined as OWL using Protégé Warehouse loaded with all PubChem AIDs and results Warehouse loaded with GO terms, KEGG terms, and DrugBank annotations
  • 7. Searching  the  BARD   •  Full  text  search  via  Lucene/Solr   – Key  entry  point  for  new  users   – Search  code  runs  queries  in  parallel   – Fast  faceIng,  auto  suggest,  filters   •  EnIIes  are  linked  manually  via  Solr  schema   – Allows  us  to  pick  up  enIIes  related  to  the  query   – Supply  matching  context   •  Custom  NCATS  code  for  fast  structure  searches  
  • 8. The  BARD  API   •  A  RESTful  applicaIon  programming  interface   •  Access  to  individual  &  collecIons  of  enIIes   •  Documented  via  the  wiki   •  Versioned   •  Hit  a  URL,  get  a  JSON  response   – Every  language  supports  parsing  JSON  documents   – Easy  to  inspect  via  the  browser  or  REST  clients   hp://bard.nih.gov/api/v18/  
  • 9. API  Architecture   •  Java,  read-­‐only,  deployed  on  Glassfish  cluster   •  Different  funcIonality   hosted  in  different   containers   – Maintenance,  security   – Stability   – Performance     API Text Search Struct Search Data Warehouse Plugins
  • 10. ETL Database Text Search Engine Structure Search Engine Caching Layer http://bard.nih.gov/api Jersey Webapps deployed on HA Application Server Cluster Open  Source  as  Far  as  Possible  
  • 11. API  Resources   •  Covers  many  data  types   •  Each  resource  supports  a     variety  of  sub-­‐resources   – Usually  linked  to  other     resources   •  Use  /_info  to  see  what     sub-­‐resources  are  available   /assays/_info!
  • 12. API  Resources   En7ty   Count     /assays! 990   /biology! 1080   /cap! 1942   /compounds! 42,572,799   /documents! 10,499   /experiments! 1314   /exptdata! 32,754,242   /projects! 144   /substances! 113,751,456  
  • 13. Data  Warning   We’re  s7ll  in  the  process  of  data   cura7on  and  QC  so  while  you  can   access  all  BARD  data,  keep  in  mind  that   much  of  it  will  be  undergoing  review   and  cura7on  and  so  may  change  
  • 14. Summary  Resources   •  A  number  of  enIIes  have  a  /summary sub-­‐ resource  (Compounds,  Projects)   •  Aggregates  informaIon,  suitable  for  dashboards  
  • 15. JSON  Responses   •  All  responses  are     currently  JSON   •  EnIIes  can  include     other  enIIes  (recursively)   •  JSON  Schema  is  available     via    /_schema 
   /assays/_schema!
  • 16. Extending  the  API   •  Concept  of  plugins   – Expands  the  resource  hierarchy   – Has  to  be  wrien  in  Java   •  What  can  a  plugin  accept?   – Anything   •  What  can  a  plugin  provide?   – Anything  –  plain  text,  XML,  JSON,  HTML,  Flash   •  Plugin  manifest  –  describe  available  resources,   argument  types    
  • 17. Plugins  have  to     be  deployable    on  the  JVM   Plugin  Development  Workflow  
  • 18. What  Can  a  Plugin  Expect?   •  Direct  access  to  the  database  via  JDBC   •  Faster  access  to  REST  API  via  co-­‐localizaIon   •  No  local  storage  in  the  BARD  warehouse   – But  the  plugin  can  use  its  own  storage  (such  as  an   embedded  database)   •  Plugins  have  access  to  system  JARs  (e.g.,  XOM,   JChem)  but  should  bundle  their  own  required   dependencies  
  • 19. Plugin  Valida?on   •  Run  a  series  of  checks  on  a  plugin  before   deployment   – Catches  manifest/resource  errors   – Doesn’t  check  for  correctness  (not  our  job)   – Examines  plugin  Java  class   – Examines  final  plugin  package   •  Run  as  a  command  line  tool  or  from  your  code   •  Could  be  made  into  an  Ant/Maven  plugin  
  • 20. What  Plugins  are  Available?   •  The  plugin  registry  provides   a  list  of  deployed  plugins     – Path  to  the  plugin   – Version   – Availability   •  Long  term  goal  is  to     have  a  plugin  store   /plugins/registry/list!
  • 21. Exemplar  Plugins   •  Look  at  the  plugin  repository  on  Github   •  Ranges  from     – trivial  calls  to  external  service   – Incorporate  external  tools/programs   •  Current  set  of  plugins  highlight  plugin   development   •  Plugins  let  us  move  non-­‐essenIal  funcIonality   out  of  the  core  API    
  • 22. P.  Rydberg  et  al,  hp://www.farma.ku.dk/smartcyp/   BARD-­‐  SMARTCyp  
  • 23. More  Than  Just  Data   BARD is not just a data store – it’s a platform •  Seamlessly interact with users’ preferred tools •  Allows the community to tailor it to their needs •  Serve as a meeting ground for experimental and computational methods •  Enhance collaboration opportunities
  • 24. Links   @AskTheBARD   hNp://bard.nih.gov   API  Source  Code  Repository   Plugin  Source  Code  Repository   Development  Wiki