SlideShare a Scribd company logo
1 of 88
Andrew Lang
Professor of Mathematics
Oral Roberts University
February 17, 2014
OSU Research Week
-Cameron Neylon
Eight committees investigated the allegations and
published reports, finding no evidence of fraud or scientific
misconduct.
However, the reports* called on the scientists to avoid any
such allegations in the future by taking steps to regain
public confidence in their work, for example by opening up
access to their supporting data, processing methods and
software, and by promptly honouring freedom of
information requests.
* Archana Venkatraman, "Data Without the Doubts". Information World Review
Andrew Wakefield’s study,
linked the measles, mumps
and rubella vaccine to autism.
Vaccination rates in the
developed world plummeted
after the study’s publication
and a heated anti-vaccination
movement persists today.
http://www.cfr.org/interactives/GH_Vaccine_Map/#map
?
Science has lost its way, at a big cost to humanity
Researchers are rewarded for splashy findings, not for double-checking
accuracy. So many scientists looking for cures to diseases have been
building on ideas that aren't even true.
A few years ago, scientists at the Thousand Oaks biotech
firm Amgen set out to double-check the results of 53 landmark papers in
their fields of cancer research and blood biology.
The idea was to make sure that research on which Amgen was spending
millions of development dollars still held up. They figured that a few of
the studies would fail the test — that the original results couldn't be
reproduced because the findings were especially novel or described
fresh therapeutic approaches.
But what they found was startling: Of the 53 landmark papers, only six
could be proved valid.
http://www.latimes.com/business/la-fi-hiltzik-20131027,0,1228881.column#axzz2ix1w9zGf
A special challenge for science writers covering research
today arises from science’s growing credibility problem. It
stems from the cumulative effect of errors and exaggerations
that has fueled a recent rise in retractions, misconduct, and
fraud among peer-reviewed researchers.
For reporters covering major scientific developments – from
the search for alien life and genomics, to particle physics,
climate change and cancer — it can be difficult to distinguish
error from fraud, sloppiness from deception, eagerness from
greed or, increasingly, scientific conviction from partisan
passion. Findings in fields from climate change to vaccines
can also be deceptively cherry-picked in service of a political
cause.
trust
evidence
trust
documentation
trust
confidence
trust
reproducibility
Anything produced is released under a CC0 license:
Open Data, Open Access, Open Source.
Faster Science
failed experiments
discoverable
unexpected collaborations
real-time data and results
Faster Science
failed experiments
discoverable
unexpected collaborations
real-time data and results
Faster Science
failed experiments
discoverable
unexpected collaborations
real-time data and results
Faster Science
failed experiments
discoverable
unexpected collaborations
real-time data and results
Faster Science
failed experiments
discoverable
unexpected collaborations
real-time data and results
no insider information
reusability
reproducibility
transparency
no insider information
reusability
reproducibility
transparency
no insider information
reusability
reproducibility
transparency
no insider information
reusability
reproducibility
transparency
no insider information
reusability
reproducibility
transparency
Open Drug
Discovery for
Neglected
Diseases
malaria
schistosomiasis
gram positive bacteria
breast cancer
Drugs for neglected diseases
need to be…
cheap and…
easy to make.
docking
combinatorial
library
synthesis
solvent
selection
recrystallization
biological
assay
solubility
models
solubility data
melting point
models
melting point
data
The big picture
docking
combinatorial
library
synthesis
solvent
selection
recrystallization
biological
assay
solubility
models
solubility data
melting point
models
melting point
data
Let’s focus
Early models, before 2005 were…
…specialized
1979 Martin – disubstituted benzenes
1987 Hanson – normal alkanes
1988 Needham – normal and branched alkanes
1990 Abramowitz – non-hydrogen bonded benzenes
1991 Dearden – anilines
1993 Katritzky – aldehydes, amines, and ketones
1994 Simamora – rigid aromatic
1996 Charlton – alkanes
1996 Katritzky – pyridines
1999 Zhao – aliphatic
2001 Chickos – homologous series
2003 Bergstrom – druglike (N = 277, r2 = 0.54)
In 2005…
…everything changed
MDPI - cheminformatics.org
Karthikeyan 2005 N = 4173, r2 = 0.65
PHYSPROP
Clark 2005 N = 6257, r2 = 0.61
Recent melting point models
use these datasets…
…never reproducing r2 = 0.65 (0.47 – 0.56)
Even though [a] melting point
can be measured accurately, its
prediction has been a
notoriously difficult problem.
We began measuring, collecting, and
curating melting points in the Fall of 2010
Jean-Claude Bradley’s
Chemical Information Retrieval
Course at Drexel
567 curated and referenced measurements from
Fall 2010 Chemical Information Retrieval course
Most popular data sources…
…chemical vendors
Alfa Aesar donates ~13,000
melting points to the public domain
collection
curation
modelingvalidation
measurement
ONS
melting point
workflow
Collection: Open Data
source data points curated values source year data type
Bell 2483 1631 1995 donated-CC0
Bergstrom 277 277 2003 open
MDPI-Karthikeyan 4450 4084 2005 open
Hughes 287 262 2008 open
Oxford-MSDS 3217 1481 2010 open
Drugbank 875 875 2011 open
Griffiths 3757 278 2011 donated-CC0
Alfa Aesar 12986 8739 2011 donated-CC0
PHYSPROP 11645 9694 2011 donated-CC0
ONS 471 471 2012 open
27792 curated measurements
for 19515 compounds
Curation is…
…lots of hard, tedious work
(Jean-Claude Bradley and Antony Williams)
Antony Williams – RSC ChemSpider
Inconsistencies and SMILES problems
within the “high trust level” MDPI dataset
PHYSPROP Structure Errors (Incorrect Valence)
2315 out of 43543 contained pentavalent nitrogens
PHYSPROP Errors: Structure displayed is for the neutral
compound dopamine but the associated CAS Number and
chemical name in the file are for the hydrobromide salt.
unit errors: Kelvin/Celsius, Fahrenheit/Celsius
bad SMILES (non-rendering, hypervalency)
salts associated with SMILES for free base
using boiling point for melting point
Some melting points can’t be resolved
only with literature: 4-benzyltoluene
Open lab notebook page
measuring the melting point of 4-benzyltoluene
Melting
Point
Model
CDK
descriptor calculator
R
statistical computing
melting point data
use this model
compounds
doubleplusgood
single
CDK
descriptor calculator
R
statistical computing
Melting
Point
Model
Straight chain carboxylic acids from 1 to 10 carbons
Straight chain alcohols from 1 to 10 carbons
Comparison of model with
double+ validated measurements
Cyclic primary amines from 3 to 6 carbons
cyclobutylamine flagged for measurement
only single source available
Publication of double+ validated
melting point dataset
…as a preprint
Publication of double+ validated
melting point dataset
…as a book
Data and model deployed…
…on the web
web service
…in Google spreadsheets
…as an app
 Can the solvents used to recrystallize compounds in
organic teaching labs be improved?
 Trans-dibenzalacetone
 Aldol condensation between two molecules of
benzaldehyde and one molecule of acetone
[Matthew McBride: Undergraduate Research Assistant - Drexel]
 First recrystallized in ethyl acetate in 1906: Straus
and Ecker, Ber. 39, 2988 (1906)
 Recrystallized in ethyl acetate in Organic Syntheses
 Recommended recrystallization solvent: ethyl acetate.
(http://classes.kvcc.edu/chm230/mixed%20aldol%20condensation.pdf
(http://www.xula.edu/chemistry/documents/orgleclab/Aldol_notes.pdf)
Enter compound identification and desired parameters
How does it work?
1. Look up the solvent boiling point
2. Look up the room temperature solubility or predict it via measured or
predicted Abraham descriptors
3. Look up the solute melting point or predict it via a model
4. Use the melting point and the solubility at room temperature to predict
the solubility at boiling
5. Calculate the predicted recrystallization yield
Lists solvents and their predicted recrystallization yield.
Prediction is generated by the temperature dependent
solubility curves.
 ethyl acetate (predicted yield of 72%) vs ethanol
(predicted yield of 93%)
 ethyl acetate
 ethanol
0.09M
1.1M
0.62M
2.06M
Dibenzalacetone derivatives docking against tubulin
(paclitaxel site)
 Derivatives of dibenzalacetone may be synthesized
by altering the aldehyde used
 From a library of derivatives, the following
compound was the top hit for the docking site of
Taxol
 Uses phenanthrene-9-carboxaldehyde
 Perform a Reaxys search to determine availability
of synthesis procedures
 No results
[Matthew McBride: Undergraduate Research Assistant - Drexel]
 Used methanol and benzene
 Melting Point: 264-265°C
(http://usefulchem.wikispaces.com/EXP286)
[Matthew McBride: Undergraduate Research Assistant - Drexel]
trust
reproducibility
open notebook science
Acknowledgements
Jean-Claude Bradley (Drexel)
Cameron Neylon (Advocacy Director at PLOS)
Antony Williams (RSC ChemSpider)
Drexel research assistants: Evan Curtin and Matthew
McBride
ORU research assistants: David Bulger, Daryl Charron,
Lizzie Clark, Lacey Condron, Samantha Gaines, Alejandro
Hernandez, Maria Hernandez, Jesse Patsolic, and
Matthew Wilson

More Related Content

Similar to Open Notebooks Science

Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 

Similar to Open Notebooks Science (20)

Open Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysisOpen Science and Ecological meta-anlaysis
Open Science and Ecological meta-anlaysis
 
ChemInfo 2011 class1
ChemInfo 2011 class1ChemInfo 2011 class1
ChemInfo 2011 class1
 
Bradley Opal 2011
Bradley Opal 2011Bradley Opal 2011
Bradley Opal 2011
 
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data eraScott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
Scott Edmunds ICIS talk at UC Davis: Open Publishing for the Big Data era
 
The collection, curation and modeling of Open Melting Point measurements
The collection, curation and modeling of Open Melting Point measurementsThe collection, curation and modeling of Open Melting Point measurements
The collection, curation and modeling of Open Melting Point measurements
 
The Role of Trust in Science at SLA 2011
The Role of Trust in Science at SLA 2011The Role of Trust in Science at SLA 2011
The Role of Trust in Science at SLA 2011
 
Indications discovery and drug repurposing
Indications discovery and drug repurposingIndications discovery and drug repurposing
Indications discovery and drug repurposing
 
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
Scott Edmunds talk at G3 (Great GigaScience & Galaxy) workshop: Open Data: th...
 
Reproducibility, argument and data in translational medicine
Reproducibility, argument and data in translational medicineReproducibility, argument and data in translational medicine
Reproducibility, argument and data in translational medicine
 
Bradley SLA Talk on Open Melting Point Collections
Bradley SLA Talk on Open Melting Point CollectionsBradley SLA Talk on Open Melting Point Collections
Bradley SLA Talk on Open Melting Point Collections
 
Developing data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universitiesDeveloping data services: a tale from two Oregon universities
Developing data services: a tale from two Oregon universities
 
Forest Environment Analysis for the Pandemic Health
Forest Environment Analysis for the Pandemic HealthForest Environment Analysis for the Pandemic Health
Forest Environment Analysis for the Pandemic Health
 
Grace etal_2016_Nature.pdf
Grace etal_2016_Nature.pdfGrace etal_2016_Nature.pdf
Grace etal_2016_Nature.pdf
 
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
Biomedical Clusters, Clouds and Commons - DePaul Colloquium Oct 24, 2014
 
C&E news talk sept 16
C&E news talk sept 16C&E news talk sept 16
C&E news talk sept 16
 
Chemical Information Retrieval Class 1
Chemical Information Retrieval Class 1Chemical Information Retrieval Class 1
Chemical Information Retrieval Class 1
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 
Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...Reproducibility in cheminformatics and computational chemistry research: cert...
Reproducibility in cheminformatics and computational chemistry research: cert...
 
NBCC Open Notebook Science Talk
NBCC Open Notebook Science TalkNBCC Open Notebook Science Talk
NBCC Open Notebook Science Talk
 
Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...Serving the medicinal chemistry community with Royal Society of Chemistry che...
Serving the medicinal chemistry community with Royal Society of Chemistry che...
 

More from Andrew Lang

More from Andrew Lang (11)

Lewis, Science, Religion, and Aliens
Lewis, Science, Religion, and AliensLewis, Science, Religion, and Aliens
Lewis, Science, Religion, and Aliens
 
Higher Education
Higher EducationHigher Education
Higher Education
 
Quantum Psychology
Quantum PsychologyQuantum Psychology
Quantum Psychology
 
modeling melting points
modeling melting pointsmodeling melting points
modeling melting points
 
Lewis' view of Venus in Perelandra
Lewis' view of Venus in PerelandraLewis' view of Venus in Perelandra
Lewis' view of Venus in Perelandra
 
Written rummage
Written rummageWritten rummage
Written rummage
 
I'm a professor
I'm a professorI'm a professor
I'm a professor
 
Lewis' view of Mars in out of the silent planet
Lewis' view of Mars in out of the silent planetLewis' view of Mars in out of the silent planet
Lewis' view of Mars in out of the silent planet
 
Sortase A Inhibition By Ugi Products (Complex)
Sortase A Inhibition By Ugi Products (Complex)Sortase A Inhibition By Ugi Products (Complex)
Sortase A Inhibition By Ugi Products (Complex)
 
Chemistry in Second Life
Chemistry in Second LifeChemistry in Second Life
Chemistry in Second Life
 
Why the Universe appears designed and why it doesn’t have to be
Why the Universe appears designed and why it doesn’t have to beWhy the Universe appears designed and why it doesn’t have to be
Why the Universe appears designed and why it doesn’t have to be
 

Recently uploaded

Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 

Recently uploaded (20)

Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 

Open Notebooks Science

Editor's Notes

  1. http://usefulchem.wikispaces.com/D-EXP022 From a library of derivatives, it was the hop hit for the docking site of taxol