SlideShare uma empresa Scribd logo
1 de 22
Baixar para ler offline
***********************************************************************************
Note: The final approved version of this paper can be found at www.springerlink.com,
http://dx.doi.org/10.1007/s13361-011-0265-y. This is a prepress version of the article.
***********************************************************************************


Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider

James L. Little,1 Antony J. Williams,2 Alexey Pshenichnov,2 Valery Tkachenko2
1
Eastman Chemical Company, Kingsport, TN 37662, USA
2
ChemSpider, Royal Society of Chemistry, Cambridge, UK

Correspondence to: J. L. Little; e-mail:      jameslittle@eastman.com, Antony J. Williams; e-mail:
williamsa@rsc.org



       Abstract

       In many cases, an unknown to an investigator is actually known in the chemical literature,
       a reference database, or an internet resource. We refer to these types of compounds as
       “known unknowns.” ChemSpider is a very valuable internet database of known
       compounds useful in the identification of these types of compounds in commercial,
       environmental, forensic, and natural product samples. The database contains over 26
       million entries from hundreds of data sources and is provided as a free resource to the
       community. Accurate mass mass spectrometry data is used to query the database by
       either elemental composition or a monoisotopic mass. Searching by elemental
       composition is the preferred approach. However, it is often difficult to determine a
       unique elemental composition for compounds with molecular weights greater than 600
       Da. In these cases, searching by the monoisotopic mass is advantageous. In either case,
       the search results are refined by sorting the number of references associated with each
       compound in descending order. This raises the most useful candidates to the top of the
       list for further evaluation. These approaches were shown to be successful in identifying
       “known unknowns” noted in our laboratory and for compounds of interest to others.




W           e have previously demonstrated [1]
            that searching the Chemical
            Abstracts Service (CAS) Registry
employing accurate mass mass spectrometry
data is a very useful approach for the
                                                        There is a particular need for additional
                                                        approaches for the identification of “known
                                                        unknowns” found in liquid
                                                        chromatography/mass spectrometry (LC/MS)
                                                        analyses because the availability of computer
identification of “known unknowns.” We define           searchable collision-induced dissociation (CID)
a “known unknown” as a compound which is                mass spectral databases is limited for LC/MS [1]
unknown to the investigator but is known in the         as compared to that of electron ionization (EI)
chemical literature, a reference database, or an        mass spectral databases. We have found that
internet resource.                                      searching “spectraless” databases such as the


                                                    1
CAS Registry [1] by elemental compositions or           m/z value directly for a charged species and
molecular weights then sorting the hit list in          then select the type of ionic species from a pull-
descending order by the number of associated            down (drop-down) menu. Examples of typical
references to be very effective in the                  types of ionic species in the menu include [M +
identification of “known unknowns.” The most            Na]+, [M + NH4]+, [M - H]-, etc. The m/z value
likely candidates are normally brought to the           entered is then automatically adjusted by the
top of the list where they can be further               program before searching the monoisotopic
scrutinized by employing additional data.               mass of the neutral species as it would appear
                                                        in the ChemSpider database.
ChemSpider is another very large “spectraless”
database that can be searched by elemental              The second change was the ability for the
composition, molecular weight, or                       entered m/z value of the charged species to be
monoisotopic mass and it is provided as a free          corrected for the mass of an electron. Some
resource to the community. ChemSpider                   manufacturers’ data systems do not properly
contains >26 million entries as compared to the         calculate the m/z values of ions for the mass of
CAS Registry that contains >62 million                  an electron [3, 4]. The errors normally cancel
substances. In our current work, the                    within the manufacturers’ elemental
ChemSpider interface was modified [2] such              composition programs because the reference
that the initial search results can be sorted by        calibration tables are also not corrected.
the number of references associated with an             However, all data exported into other
entry. The approach was then evaluated with a           applications for further data processing should
wide variety of compounds and compared to               be corrected.
previous results obtained with the CAS Registry
[1].                                                    Acquisition of Accurate Mass Data

Experimental                                            The experimental detail including calibrations,
                                                        typical chromatographic separations, and
ChemSpider Software Modifications                       sample preparations were previously described
                                                        in detail [1]. Briefly, the accurate mass
Several changes were made in the ChemSpider             electrospray LC/MS/UV-Vis (ultraviolet-visible)
software interface to facilitate our studies [2].       data were obtained on a LCT time-of-flight mass
The most important change was the ability to            spectrometer (Waters Corporation, Milford,
sort the initial search results in descending           MA) equipped with a LockSpray secondary ESI
order by the number of data sources or                  probe. An 1100 Series liquid chromatograph
associated references. The references in                with autosampler, degasser, and UV-Vis diode
ChemSpider originate from the SureChem                  array spectrophotometer (Agilent Technologies,
patent database (>20 million), PubMed (>20              Santa Clara, CA) was employed for the
million articles), and the content of the Royal         separations in the reversed phase mode. The
Society of Chemistry publishing database. In            UV-Vis spectral data is very useful in locating
our work, sorting the “# of References” column          particular classes of compounds (UV absorbers,
was found to be the most useful. Many screen            dyes, etc.) and in confirming the identity of
displays of the software annotated with                 compounds by comparison to reference spectra
examples from our laboratory are shown in the           from standards, literature references, or even
Electronic Supplementary Material.                      internet sources.

Two other changes allowed more convenient               Elemental compositions were generated from
data entry for searching by the user. The first         monoisotopic masses utilizing the Waters
change was the ability for the user to enter the        Elemental Composition Program within

                                                    2
MassLynx V4.1 software which included i-FIT             web-based version of SciFinder. The CAS
software for numerically ranking the observed           Registry can only be searched by elemental
isotopic pattern to the theoretical ones. Our           composition, molecular weight, or nominal
LCT accurate mass measurements typically                molecular weight, but not monoisotopic mass.
yielded a standard deviation of 5 ppm. Thus,            The ability to search by the monoisotopic mass
windows of +/-18 ppm (slightly greater than 3           is much more useful because the standard
standard deviations) were employed in                   deviation for its determination is much lower
examples from our laboratory to insure                  than that for the molecular weight. In addition,
inclusions of all reasonable candidates for             the monoisotopic mass is calculated by all
determining elemental compositions or for               manufacturers’ data systems whereas the
searching ChemSpider by monoisotopic masses.            molecular weight must be manually calculated
In literature studies, windows of approximately         by the user.
+/-5 ppm were employed because many
currently available accurate mass time-of-flight        One significant advantage of searching the CAS
instruments yield standard deviations                   Registry versus the ChemSpider database is the
approaching 1 ppm for mass measurements.                ability of the former to search its > 34 million
                                                        document records associated with an elemental
Results and Discussion                                  composition by key words [1]. Only very
                                                        minimal sample history is required to quickly
Descriptions of Two Approaches                          obtain useful candidates for tentative
                                                        identifications by this approach. This capability
The ability to search by either elemental               can be especially useful in identifying more
composition or monoisotopic mass is very                obscure “known unknowns” with fewer
valuable for the identification of “known               associated references. This capability is not
unknowns” using accurate mass mass                      currently available with ChemSpider.
spectrometry data. In the work described
within this article, the ChemSpider user                Evaluation of the Two Approaches with
interface was modified to permit the sorting of         Literature Examples
either the elemental composition or
monoisotopic mass search results by the                 A group of 90 compounds was assembled from
number of associated references. The results,           literature sources [5-8], internet sites, and
sorted in descending order, normally bring the          American Society of Mass Spectrometry
most useful entries to the top of the list. These       Conference presentations to evaluate the two
candidate structures are then scrutinized [1] by        approaches in ChemSpider. The results are
other available data such as in-source CID              summarized in Tables 1 and 2. Searching the
spectra, UV-Vis spectra, GC/MS data, number of          ChemSpider database by elemental
exchangeable protons, NMR data, etc. to                 compositions then sorting in descending order
ultimately obtain the identification of the             by the number of overall references yielded
compound of interest. For critical                      more target compounds highly ranked as
identifications, a standard of the material is          compared to the same approach using
normally obtained and its LC retention time,            monoisotopic masses. This is to be expected
UV-Vis spectrum, and in-source CID spectra are          because the number of overall candidates was
compared to those of the unknown.                       less when searching by elemental composition
                                                        (mean = 513, median = 310) compared to
A very similar approach [1] was demonstrated            monoisotopic mass (mean = 800, median =
to be very useful utilizing the CAS Registry of         740). However, the overall number with
>62 million substances, a fee-based service,            rankings less than or equal to 5 was acceptable
which is searched by either STN Express or the          in both cases. Thus, searching by elemental

                                                    3
composition is the preferred approach, but             reasonable approach when a unique elemental
searching by a monoisotopic mass is also a             composition cannot be readily determined.

Table 1. Searching ChemSpider by elemental composition then sorting by number of associated
references.

                                           Number        Position of Compound Sorted in Descending
         Class of Compounds               Compounds            Order by Number of References
                                            in Class     #1      #2     #3     #4      #5      >#5
Drugs                                          45        43      1       1
Pesticides                                      8         7      1
Toxins                                          2         2
Polymer antioxidants                           15        15
Polymer UV Stabilizers                         10         8       1      1
Polymer Clarifying agent (Irgaclear DM)         1                                             1(14)
Polyurethane additives                          4         2      1                      1
Natural products                                3         2              1
Herbicide (clofibric acid)                     1          1
Artificial sweetener (Sucralose)                1         1
        Total Compounds ChemSpider             90        81      4       3              1       1
       Total Compounds CAS Registry [1]        90        84      4       1              1

Table 2. Searching ChemSpider by monoisotopic mass with +/- 5 ppm window.

                                           Number        Position of Compound Sorted in Descending
         Class of Compounds               Compounds            Order by Number of References
                                            in Class     #1      #2     #3     #4      #5     >#5
Drugs                                          45        43      1       1
Pesticides                                      8         7      1
Toxins                                          2         2
Polymer antioxidants                           15        13              1                     1
Polymer UV Stabilizers                         10         6      1       1              1     1(8)
Polymer Clarifying agent (Irgaclear DM)         1                                              1
Polyurethane additives                          4         2      1                      1
Natural products                                3         2              1
Herbicide (clofibric acid)                     1          1
Artificial sweetener (Sucralose)                1         1
        Total Compounds ChemSpider             90        77      4       4              2      3

The same 90 compounds were previously                  comparison can be made in Table 2 for the
evaluated with the CAS Registry searching by           results obtained with ChemSpider.
elemental compositions [1] using the web-
based version of SciFinder. Similar results (see
bottom of Table 1) were obtained using either
ChemSpider or the CAS registry as databases for
these limited number of test compounds. The
CAS Registry currently cannot be searched by
monoisotopic mass [1], therefore, no direct

                                                   4
Example from Our Laboratory for UV Stabilizer           found which were sorted in descending order
Identification in a Commercial Polymer                  by the number of overall associated references.
                                                        The top candidate was Tinuvin 328. The
An unknown additive, noted in a commercial              proposed structure was consistent with the
polymer sample, was characterized by accurate           accurate mass in-source CID spectrum (Scheme
mass LC/MS. The accurate mass data was used             1) which showed two very significant losses of
in conjunction with isotopic abundance                  C5H10 from the protonated molecule, [M + H]+,
information to obtain an elemental composition          and the presence of a C5H11+ ion at a m/z value
of C22H29N3O. ChemSpider was searched by the            of 71.085.
elemental composition and 1135 hits were




Scheme 1. In-source CID fragmentation for Tinuvin 328

The confidence afforded by the initial data
allowed the identity to be reported to the              Advantage of Searching Higher MW Compounds
customer. At a later date, the identification of        by Monoisotopic Mass Data
the additive was confirmed by comparison of its
LC retention time, UV-Vis spectrum, and in-             It is often difficult to determine a unique
source CID mass spectra to those of a                   elemental composition for “known unknowns”
purchased reference sample.                             with molecular weights greater than 600 Da [9,
                                                        10]. Either there are two or more possible
ChemSpider was also searched for protonated             elemental compositions for the unknown or
molecules with m/z 352.239 using a m/z                  one is inadvertently excluded if the somewhat
window of +/-18 ppm. There were 1459 hits               subjective user settings are set too narrow for
and Tinuvin 328 was still the top candidate. The        elements present, range of elements, double
mass precision of our older instrumentation is          bond equivalents, etc. In theory, the number of
relatively large, but newer time-of-flight              elemental compositions increases dramatically
instrumentation afford much better mass                 as the molecular weight increases. However, in
precision, which would return a smaller number          practice, the number of elemental compositions
of candidates from the search. Screen displays          at molecular weights greater than 600 Da
from the ChemSpider interface for the                   decreases dramatically in both ChemSpider (see
identification of Tinuvin 328 by both                   Figure 1) and CAS Registry databases [1].
approaches are shown in the Electronic
Supplementary Material.




                                                   5
8,000,000
       No. of Compounds   7,000,000
                          6,000,000
                          5,000,000
                          4,000,000
                          3,000,000
                          2,000,000
                          1,000,000
                                 0




                                                    Molecular Weight Ranges


Figure 1. Number of ChemSpider entries versus molecular weight ranges

Therefore, it is much more reasonable to search           of examples. Of course, ex post facto, it is still
for “known unknowns” in these databases using             extremely important to compare the isotopic
monoisotopic mass instead of struggling to                abundances of the candidate structures from
determine a unique elemental composition for              ChemSpider to those of the unknown. The
searching. The rankings noted by number of                ability to calculate, rank, and compare
references for several higher molecular weight            theoretical isotopic abundances for elemental
compounds noted in the literature [9,10] are              compositions to those of observed ones is
compared to elemental composition searches                normally a standard option in most mass
versus monoisotopic mass searches in Table 3.             spectrometry manufacturers’ software. The
There is essentially no significant penalty noted         CAS and ChemSpider identification numbers for
for searching by monoisotopic mass instead of             compounds in Table 3 are included in the
elemental composition for this limited number             Electronic Supplementary Material.




                                                      6
Table 3: Comparison of Results Searching Compounds MW>600 Da by Elemental Composition and
Monoisotopic Mass

                                                                         Rank        Rank Monoisotopic
                             Elemental              Monoisotopic
      Compound                                                        Elemental      Mass using +/-5 ppm
                            Composition                Mass
                                                                     Composition          window
      Moxidectin             C37H53NO8                639.3771          1 of 5             1 of 39
     Erythromycin            C37H67NO13               733.4612         1 of 42             1 of 53
        Digoxin               C41H64O14               780.4296         1 of 47             1 of 65
       Rifampicin            C43H58N4O12              822.4051         1 of 29             1 of 96
      Rapamycin             C51H79NO13                913.5551         1 of 43             1 of 51
    Amphotericin B          C47H73NO17                923.4878         1 of 33             1 of 42
     Gramicidin S          C60H92N12O10              1140.7059          1 of 5             1 of 13
       Cereulide            C57H96N6O18              1152.6781          1 of 3              2 of 8
     Cyclosporin A         C62H111N11O12             1201.8414         1 of 36             1 of 38
      Vancomycin          C66H75Cl2N9O24             1447.4302         1 of 24             1 of 26
   Perfluorotriazine      C30H18N3O6P3F48            1520.9642          1 of 1              1 of 1
     Thiostrepton         C72H85N19O18S5             1663.4924          1 of 5              1 of 5

Example from Our Laboratory for the                       laboratory for in-source CID [1] and a typical
Identification of a Higher Molecular Weight               example is shown in the Electronic
Antioxidant in a Commercial Polymer                       Supplementary Material.

An unknown additive, noted in a commercial                ChemSpider was searched by the monoisotopic
polymer sample, was characterized by accurate             mass for the ammonium adduct with a mass
mass LC/MS. The ammonium adduct, [M +                     window of 18 ppm (see Electronic
NH4]+, of the component was observed at a m/z             Supplementary Material) and 23 candidates
value of 801.558. The observed ion was                    were obtained. The top candidate was
confirmed to be an ammonium adduct because                Goodrite 3114. Only two of the 23 candidates
at higher in-source CID energies the ammonium             had isotopic abundances consistent with that of
adduct intensity was reduced and the intensity            the unknown. Of these two, only the in-source
of the sodium adduct was noted to increase.               CID mass spectrum (Scheme 2) of Goodrite
This increase in absolute intensity of the sodium         3114 was consistent with that for the unknown.
adduct and corresponding decrease in that of
the ammonium adduct is routinely noted in our




                                                      7
Scheme 2. In-source CID fragmentation for Goodrite 3114

At a later date, the LC retention time, UV-Vis          Some changes in both the current API and the
spectrum, and in-source CID mass spectra of a           vendors’ programs would be required to utilize
commercial sample were shown to be identical            associated references and comparison of
to those of the unknown.                                isotopic abundances to facilitate increases in
                                                        data processing speeds. When searching by
Future Enhancements to Improve Productivity             monoisotopic mass, the isotopic abundances of
                                                        the candidates should then be calculated in the
ChemSpider currently supports a web                     vendor’s data system and compared to the
application program interface (API) via web             observed isotopic abundance of the unknown
services                                                and sorted in descending order by either the
(http://www.chemspider.com/MassSpecAPI.as               number of references or the fit of the isotopic
mx) for batch queries by external vendors’              abundances to those of the unknown. This
programs. Several companies, including Bruker           would be particularly useful for “known
Daltonics, Thermo Scientific, Waters, and               unknowns” with molecular weights greater than
Agilent Technologies, integrate their data              600 Da, but would be also beneficial for lower
processing programs with ChemSpider. Users              molecular weight compounds. Alternatively,
who integrate using the web services, including         the ChemSpider software could be enhanced to
these vendors, use a token supplied by                  perform the calculations of the isotopic
ChemSpider to authenticate their identity to the        abundances of the candidates for comparison
web service. Currently there are no similar             to the unknowns.
capabilities for such queries of the CAS Registry
using either SciFinder or STN Express.                  Even with the above changes, excessive time
                                                        would still be needed to manually compare the
                                                        observed CID spectrum of an unknown to

                                                    8
fragments expected by the user from the                    elemental composition and monoisotopic mass
candidates’ structures. If in silico CID spectra,          are listed as C12H19NO2 and 209.1216,
i.e. theoretical fragmentation with predicted              respectively, in ChemSpider. It would be more
abundances, could be calculated [11,12] and                beneficial to parse this type of organic cation as
ranked for the candidate structures from either            C10H16N.C2H3O2 then list the elemental
ChemSpider or SciFinder SDF (Structure Data                composition and monoisotopic mass for
Format) files [13], further data processing                searching, respectively, as C10H16N and
speeds would be realized. The results of the               150.1277.
numeric structure ranking, isotopic abundance,
and number of references could then be more                Amphoteric (inner salt, zwitterionic) species are
quickly examined by the user to yield tentative            listed in the database with no modifications.
identifications.                                           For example, (CH3)3N+CH2CO2-, trimethylglycine,
                                                           is listed as C5H11NO2 with monoisotopic mass of
Both ChemSpider and the web-based SciFinder                117.079. This type of compound would yield [M
are able to export structures for the candidate            + H]+ and [M + acetate]- ions, respectively, in
compounds in SDF. The web-based version                    positive and negative ion electrospray analyses
SciFinder permits the export of up to 500                  using acetate in the LC eluent. Thus the
structures in an SDF file. There is no limit to the        elemental composition for the species would
number of structures that can be exported in               need to be corrected before searching.
the ChemSpider API and a limit of 10,000 in the
web interface.                                             Conclusions

Charged Species in ChemSpider                              Modifications in the ChemSpider interface to
                                                           sort elemental composition and monoisotopic
More development work needs to be                          mass search results by the number of
performed to facilitate the searching of charged           associated references in descending order
species by either elemental composition or                 offered significantly improved capabilities for
monoisotopic mass. Sodium benzoate                         the identification of “known unknowns” using
illustrates the current state of affairs for organic       accurate mass mass spectrometry data. Other
anions. Its elemental composition and                      changes were made to allow easier input of
monoisotopic mass are listed in ChemSpider as              data by users to specify the type of ion adduct
C7H5NaO2 and 144.018724, respectively. It                  and to correct the monoisotopic mass for the
would be more useful to parse the elemental                mass of an electron. Further enhancements are
composition as C7H6O2.Na (neutral benzoic acid             still needed to improve the overall productivity
to left of period, associated cation to the right)         of the process and to enable the searching of
then list the elemental composition and                    charged species.
monoisotopic mass for searching as C7H6O2 and
122.037, respectively. This is the format                  The elemental composition search is the
employed in the CAS Registry [1]. In reverse               preferred one in the lower molecular weight
phase chromatography electrospray mass                     range (200-600 Da), but even the monoisotopic
spectrometry, the organic anions elute as their            mass search with reasonable error windows is
free acid form and are normally detected as [M             viable in this range. Monoisotopic mass
- H]- ions in the negative ion mode. Thus, their           searching for compounds with a molecular
identity is independent of their associated                weight >600 Da is preferred when it is difficult
cation.                                                    to determine a unique elemental composition.
                                                           The resulting candidates from the monoisotopic
A simple example of an organic cation is N,N,N-            mass search are then ranked by comparing their
trimethyl-N-benzylammonium acetate whose                   calculated isotopic abundances by the

                                                       9
manufacturers’ isotope abundance programs to             2. Little, J.L.; Williams, A. J.; Tkachenko, V.;
that of the unknown.                                     Pshenichnov, A: Accurate mass measurements:
                                                         identifying “known unknowns” using
The ability to search monoisotopic mass in               ChemSpider. Proceedings of the ASMS, Denver,
ChemSpider is an important function which is             CO, 2010.
absent in search engines for the CAS Registry.           3. Mamer, O. A.; Lesimple, A.: Common
However, CAS Registry searches can be further            shortcomings in computer programs calculating
refined by key words which can be particularly           molecular weights. J.Am.Soc.Mass Spectrom.
useful for more obscure “known unknowns”                 14, 626 (2004).
with few associated references. This option is           4. Ferrer, I; Thurman, E. M.: Importance of the
not currently available in ChemSpider. The               electron mass in the calculations of exact mass
ability to search by both databases is very              by time-of-flight mass spectrometry. Rapid
desirable depending on the problem at hand               Commun. Mass Spectrom. 21, 2538-2539
when costs are not a limitation. ChemSpider is           (2007).
provided at no cost to the community, but                5. Hogenboom, A. C.; van Leerdam, J. A.; de
there is a fee associated with the utilization of        Voogt, P.: Accurate mass screening and
the CAS Registry.                                        identification of emerging contaminants in
                                                         environmental samples by liquid
In all cases, other data such as sample history,         chromatography-hybrid linear ion trap orbitrap
UV-Vis data, types of ions observed,                     mass spectrometry. J. Chromatogr. A, 1216,
exchangeable protons, CID spectra, etc. are              510-519 (2009).
needed to scrutinize the candidate structures            6. Liao, W.; Draper, W. M.; Perera, S. K.:
from the elemental composition and                       Identification of unknowns in atmospheric
monoisotopic mass search results. For ultimate           pressure ionization mass spectrometry using a
confirmation of structure, analysis of a standard        mass to structure search engine. Anal. Chem.
material under identical conditions is always            80, 7765-7777 (2008).
desirable.                                               7. Garcia-Reyes, J. F.; Ferrer, I.; Thurman, E. M.;
                                                         Molina-Diaz, A.; Fernandez-Alba, A. R.:
Acknowledgements                                         Searching for non-target chlorinated pesticides
                                                         in food by liquid chromatography/time-of-flight
The authors gratefully recognize Curt Cleven             mass spectrometry. Rapid Commun. Mass
from Eastman Chemical Company, Mike Scott                Spectrom. 19, 2780-2788 (2005).
from Agilent Technologies, Inc., and Jim                 8. Ojanperä, S.; Pelander, A.; Peizing, M.; Krebs,
Lekander from Waters Corporation for their               I.; Vuori, E.; Ojanperä, I.: Isotopic pattern and
help and advice. We also thank Bill Tindall and          accurate mass determination in urine drug
Kent Morrill (retirees from Eastman Chemical             screening by liquid chromatography/time-of-
Company) for their initial work on “spectraless”         flight Mass Spectrometry. Rapid Commun. Mass
databases created from the TSCA (Toxic                   Spectrom. 20, 1161-1167 (2006).
Substances Control Act) and the Eastman                  9. Erve, J.C.; Gu, M.; Wang, Y.; DeMaio, W.;
Corporate Plant Material databases.                      Talaat, R.E.: Spectral accuracy of molecular ions
                                                         in an LTQ/Orbitrap mass spectrometer and
References                                               implications for elemental composition
                                                         determination. J. Am. Soc. Mass Spectrom. 20,
1. Little, J. L.; Cleven, C. D.; Brown, S. D.:
                                                         2058-2069 (2009).
Identification of “known unknowns” utilizing
                                                         10. Lohmann, W.; Decker, P; Barsch, A:
accurate mass data and Chemical Abstracts
                                                         Accurate elemental composition determination
Service databases. J. Am. Soc. Mass Spectrom.
                                                         and identification of molecules with > 1100 m/z
22, 348-359 (2011).

                                                    10
with UHR-TOF. Bruker Application Note # ET-23           systematic bond disconnection approach. Rapid
(2011).                                                 Comm. in Mass Spectrom., 19, 3111-3118
11. Sweeney, D. L.: Small molecules as                  (2005).
mathematical partitions. Anal. Chem. 75, 5362-          13. Dalby, A.; Nourse, J. G.; Hounshel, W. D.;
5373 (2003).                                            Gushurst, A.K.I.; Grier, D. L.; Leland, B. A.;
12. Hill, A. W.; Mortishire-Smith, J.: Automated        Laufer, J.: Description of several chemical
assignment of high-resolution collisionally             structure file formats used by computer
activated dissociation mass spectra using a             programs developed at Molecular Design
                                                        Limited. J. Chem. Inf. Comput. Sci., 32, 244-255
                                                        (1992).




                                                   11
Electronic Supplementary Material for
                              pp        y
                    DOI: 10.1007/s13361-011-0265-y

 Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider

 Journal of the American Society for Mass Spectrometry, online version

 James L. Little,a Antony J. Williams,b Alexey Pshenichnov,b Valery Tkachenkob
aEastman Chemical Company, Kingsport Tennessee, USA
                         p y       gp
bChemSpider, Royal Society of Chemistry, Cambridge, UK



 Correspondence to jameslittle@eastman.com or williamsa@rsc.org




                                                                                  1
Initial Window Selected at www.chemspider.com on Internet

                  Either Select “More Searches”

                     or Select “Advanced Search”




                                                            2
Second Window Selected in ChemSpider Interface on Internet


                          Select “Advanced”




                                                             3
UV Stabilizer Example: Three Steps for Searching by Elemental Composition

                                          1) Select “Search by Properties”
                                              2) Enter elemental composition




                         3) Select “Search”
                                                                               4
UV Stabilizer Example: Sorting “Initial” Elemental Composition Results by
                    Selecting Desired Column to Sort
                Sort descending by selecting
                “# of References” column, down arrow will appear after sorted




                                                                                5
UV Stabilizer Example: Five Steps for Searching by Monoisotopic Mass

                                         1) Enter observed m/z value
                                             2) Enter appropriate precision window
                                             for your instrument
                                               3) S l t t
                                                  Select type of i
                                                               f ion
                                                 4) Correct for mass of electron if
                                                 necessary for your instrument




                    5) Select “Search”
                                                                                      6
UV Stabilizer Example: Sorting “Initial” Monoisotopic Mass Results
                            by Desired Column
Sort descending by selecting
“# of References” column, down arrow will appear after sorted




                                                                          7
Antioxidant Example: Identification by Monoisotopic MW
                        1) Enter observed m/z value
                              2) Enter appropriate precision window
                              for your instrument

                                       3) Select type of ion
                                             4) Correct for mass of electron if
                                             necessary for your instrument




                 5) Select “Search”
                                                                                  8
Antioxidant Example: Sorting “Initial” Monoisotopic Mass Results
                    by “# of References Column”

Sort descending by selecting
“# of References” column, down arrow will appear after sorted




                                                                      9
CAS and ChemSpider Numbers for Compounds Shown in Table 3 in Text

                                Elemental        Monoisotopic
           Compound                                                  CAS No.                   ChemSpider No.
                               Composition          Mass

          M id ti c
          Moxidectin            C37H53NO8          639.3771
                                                   639 3771        113507-06-5
                                                                   113507 06 5          16736424, 7845502
                                                                                        16736424 7845502, 7878580
         Erythromycinc         C37H67NO13          733.4612         114-07-8                        12041
            Digoxinc            C41H64O14          780.4296        20830-75-5                2006532, 23089581
           Rifampicinc         C43H58N4O12         822.4051        13292-46-1               10468813, 21112299,
           Rapamycinc
           R      i            C51H79NO13          913.5551
                                                   913 5551        53123-88-9
                                                                   53123 88 9                     10482078
         Amphotericin Bc       C47H73NO17          923.4878         1397-89-3          10237579, 21111691, 23123815

          Gramicidin Sc        C60H92N12O10       1140.7059         113-73-5                        66085
           Cereulided          C57H96N6O18        1152.6781        157232-64-9           8030708, 8117283, 8232646
         Cyclosporin Ac       C62H111N11O12       1201.8414        59865-13-3                 4444325, 4447449
          Vancomycinc         C66H75Cl2N9O24      1447.4302         1404-90-6                       14253
     Perfluorotriazinec      C30H18N3O6P3F48      1520.9642        16059-16-8                     2055443
          Thiostreptonc       C72H85N19O18S5      1663.4924         1393 48 2
                                                                    1393-48-2               10469505, 24603973

cErve,J.C.; Gu, M.; Wang, Y.; DeMaio, W.; Talaat, R.E.: Spectral accuracy of molecular ions in an LTQ/Orbitrap mass
spectrometer and implications for elemental composition determination. J. Am. Soc. Mass Spectrom. 20, 2058-2069 (2009).
dLohmann, W.; Decker, P; Barsch, A: Accurate molecular formula determination and identification of molecules with > 1100

m/z with UHR-TOF. Bruker Application Note # ET-23 (2011).
  /                           pp                     (      )




                                                                                                                           10
Increase in Absolute Intensity of Sodium Adduct at Higher
                  In Source
                  In-Source CID Voltages



                                                   [M+H]+
                                                   [   ]
                                                   [M+NH4]+
                                                   [M+Na]+




                                  (volts)

                      m/z 329




                 Chemical Formula: C30H58O4S
                         MW 514 4
                              514.4




                 m/z 143                m/z 89




                                                              11

Mais conteúdo relacionado

Mais procurados

Virtual screening of chemicals for endocrine disrupting activity through CER...
Virtual screening of chemicals for endocrine disrupting activity through  CER...Virtual screening of chemicals for endocrine disrupting activity through  CER...
Virtual screening of chemicals for endocrine disrupting activity through CER...Kamel Mansouri
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology Chris Southan
 
Data drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistryData drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistryAnn-Marie Roche
 
IRJET- Modelling BOD and COD using Artificial Neural Network with Factor Anal...
IRJET- Modelling BOD and COD using Artificial Neural Network with Factor Anal...IRJET- Modelling BOD and COD using Artificial Neural Network with Factor Anal...
IRJET- Modelling BOD and COD using Artificial Neural Network with Factor Anal...IRJET Journal
 
Environmental Cheminformatics for Unknown ID UC Davis Nov 2018
Environmental Cheminformatics for Unknown ID UC Davis Nov 2018Environmental Cheminformatics for Unknown ID UC Davis Nov 2018
Environmental Cheminformatics for Unknown ID UC Davis Nov 2018Emma Schymanski
 
Closing the gap between chemistry and biology: Joining between text tombs and...
Closing the gap between chemistry and biology: Joining between text tombs and...Closing the gap between chemistry and biology: Joining between text tombs and...
Closing the gap between chemistry and biology: Joining between text tombs and...Chris Southan
 
Small Molecules in Big Data - Analytica Munich
Small Molecules in Big Data - Analytica MunichSmall Molecules in Big Data - Analytica Munich
Small Molecules in Big Data - Analytica MunichEmma Schymanski
 
Small Molecules in Big Data fTALES Ghent
Small Molecules in Big Data fTALES GhentSmall Molecules in Big Data fTALES Ghent
Small Molecules in Big Data fTALES GhentEmma Schymanski
 
Collaboraive sharing of molecules and data in the mobile age
Collaboraive sharing of molecules and data in the mobile ageCollaboraive sharing of molecules and data in the mobile age
Collaboraive sharing of molecules and data in the mobile ageSean Ekins
 
The IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updates
The IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updatesThe IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updates
The IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updatesGuide to PHARMACOLOGY
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Maulik Kamdar
 

Mais procurados (20)

Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...Web-based access to data for >600 disinfection by-products via the EPA CompTo...
Web-based access to data for >600 disinfection by-products via the EPA CompTo...
 
Virtual screening of chemicals for endocrine disrupting activity through CER...
Virtual screening of chemicals for endocrine disrupting activity through  CER...Virtual screening of chemicals for endocrine disrupting activity through  CER...
Virtual screening of chemicals for endocrine disrupting activity through CER...
 
Connecting chemistry-to-biology
Connecting chemistry-to-biology Connecting chemistry-to-biology
Connecting chemistry-to-biology
 
Data drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistryData drivenapproach to medicinalchemistry
Data drivenapproach to medicinalchemistry
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
IRJET- Modelling BOD and COD using Artificial Neural Network with Factor Anal...
IRJET- Modelling BOD and COD using Artificial Neural Network with Factor Anal...IRJET- Modelling BOD and COD using Artificial Neural Network with Factor Anal...
IRJET- Modelling BOD and COD using Artificial Neural Network with Factor Anal...
 
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental scienceUS-EPA Chemicals Dashboard – an integrated data hub for environmental science
US-EPA Chemicals Dashboard – an integrated data hub for environmental science
 
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
Environmental Chemistry Compound Identification Using High Resolution Mass Sp...
 
Environmental Cheminformatics for Unknown ID UC Davis Nov 2018
Environmental Cheminformatics for Unknown ID UC Davis Nov 2018Environmental Cheminformatics for Unknown ID UC Davis Nov 2018
Environmental Cheminformatics for Unknown ID UC Davis Nov 2018
 
Closing the gap between chemistry and biology: Joining between text tombs and...
Closing the gap between chemistry and biology: Joining between text tombs and...Closing the gap between chemistry and biology: Joining between text tombs and...
Closing the gap between chemistry and biology: Joining between text tombs and...
 
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox Chemicals Dash...
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox  Chemicals Dash...PFAS Chemistry: Range, Complexity, Groupings, and the CompTox  Chemicals Dash...
PFAS Chemistry: Range, Complexity, Groupings, and the CompTox Chemicals Dash...
 
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...What chemicals constitute the Exposome? Accessing data via the US EPA’s  Comp...
What chemicals constitute the Exposome? Accessing data via the US EPA’s Comp...
 
Small Molecules in Big Data - Analytica Munich
Small Molecules in Big Data - Analytica MunichSmall Molecules in Big Data - Analytica Munich
Small Molecules in Big Data - Analytica Munich
 
Small Molecules in Big Data fTALES Ghent
Small Molecules in Big Data fTALES GhentSmall Molecules in Big Data fTALES Ghent
Small Molecules in Big Data fTALES Ghent
 
Collaboraive sharing of molecules and data in the mobile age
Collaboraive sharing of molecules and data in the mobile ageCollaboraive sharing of molecules and data in the mobile age
Collaboraive sharing of molecules and data in the mobile age
 
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
The EPA Online Prediction Physicochemical Prediction Platform to Support Envi...
 
ChemSpider as a hub for online chemical information resources
ChemSpider as a hub for online chemical information resources   ChemSpider as a hub for online chemical information resources
ChemSpider as a hub for online chemical information resources
 
How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...
 
The IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updates
The IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updatesThe IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updates
The IUPHAR/BPS Guide to PHARAMCOLOGY in 2018: new features and updates
 
Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...Current advances to bridge the usability-expressivity gap in biomedical seman...
Current advances to bridge the usability-expressivity gap in biomedical seman...
 

Destaque

Destaque (20)

December
DecemberDecember
December
 
Alison jewelry designer
Alison   jewelry designerAlison   jewelry designer
Alison jewelry designer
 
El mundo esta_loco-6993-6993
El mundo esta_loco-6993-6993El mundo esta_loco-6993-6993
El mundo esta_loco-6993-6993
 
Mood board
Mood boardMood board
Mood board
 
Copia practica de diapositivas
Copia practica de diapositivasCopia practica de diapositivas
Copia practica de diapositivas
 
De Evenement Assistent - Peter van Eick
De Evenement Assistent - Peter van EickDe Evenement Assistent - Peter van Eick
De Evenement Assistent - Peter van Eick
 
Christopher Wellbelove, BT
Christopher Wellbelove, BTChristopher Wellbelove, BT
Christopher Wellbelove, BT
 
Unit I sayings grammar practice
Unit I sayings grammar practiceUnit I sayings grammar practice
Unit I sayings grammar practice
 
C# дээр инстал хийх арга
C# дээр инстал хийх аргаC# дээр инстал хийх арга
C# дээр инстал хийх арга
 
Why five?
Why five?Why five?
Why five?
 
Mugatzailea (1)
Mugatzailea (1)Mugatzailea (1)
Mugatzailea (1)
 
Power point
Power pointPower point
Power point
 
แสงกระจาย
แสงกระจายแสงกระจาย
แสงกระจาย
 
Social Media: Insight ou Procrastinação?
Social Media: Insight ou Procrastinação?Social Media: Insight ou Procrastinação?
Social Media: Insight ou Procrastinação?
 
Manualul profesorului
Manualul profesoruluiManualul profesorului
Manualul profesorului
 
Trifold
TrifoldTrifold
Trifold
 
המהפכה האמריקאית
המהפכה האמריקאיתהמהפכה האמריקאית
המהפכה האמריקאית
 
Lucagbo.... lab2 2 tree pests
Lucagbo.... lab2 2 tree pestsLucagbo.... lab2 2 tree pests
Lucagbo.... lab2 2 tree pests
 
Validation Class
Validation ClassValidation Class
Validation Class
 
Housing complex empyrean group
Housing complex empyrean groupHousing complex empyrean group
Housing complex empyrean group
 

Semelhante a Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider

Cadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.PharmCadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.PharmShikha Popali
 
Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Valery Tkachenko
 
article_wjpps_14600076161.pdf
article_wjpps_14600076161.pdfarticle_wjpps_14600076161.pdf
article_wjpps_14600076161.pdfhamidpasha6
 
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGijbbjournal
 
Clasificacion biofarmaceutica y benchmarketing
Clasificacion biofarmaceutica y benchmarketingClasificacion biofarmaceutica y benchmarketing
Clasificacion biofarmaceutica y benchmarketingHector Gonzalez
 
Mass Spectrometry Applications and spectral interpretation: Basics
Mass Spectrometry Applications and spectral interpretation: BasicsMass Spectrometry Applications and spectral interpretation: Basics
Mass Spectrometry Applications and spectral interpretation: BasicsShreekant Deshpande
 
Proteomics 2009 V9p1696
Proteomics 2009 V9p1696Proteomics 2009 V9p1696
Proteomics 2009 V9p1696jcruzsilva
 
Target oriented generic fingerprint-based molecular representation
Target oriented generic fingerprint-based molecular representationTarget oriented generic fingerprint-based molecular representation
Target oriented generic fingerprint-based molecular representationcsandit
 

Semelhante a Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider (20)

ChemSpider: Connecting Chemistry & Mass Spectrometry on the Internet
ChemSpider: Connecting Chemistry & Mass Spectrometry on the Internet ChemSpider: Connecting Chemistry & Mass Spectrometry on the Internet
ChemSpider: Connecting Chemistry & Mass Spectrometry on the Internet
 
Metabolomics seminarslides 013111final 110201
Metabolomics seminarslides 013111final 110201Metabolomics seminarslides 013111final 110201
Metabolomics seminarslides 013111final 110201
 
Structure verification and elucidation using the ChemSpider database
Structure verification and elucidation using the ChemSpider databaseStructure verification and elucidation using the ChemSpider database
Structure verification and elucidation using the ChemSpider database
 
Cadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.PharmCadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.Pharm
 
Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...Using publicly available resources to build a comprehensive knowledgebase of ...
Using publicly available resources to build a comprehensive knowledgebase of ...
 
Using online chemistry databases to facilitate structure identification in ma...
Using online chemistry databases to facilitate structure identification in ma...Using online chemistry databases to facilitate structure identification in ma...
Using online chemistry databases to facilitate structure identification in ma...
 
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
The EPA Comptox Chemistry Dashboard: A Web-Based Data Integration Hub for Tox...
 
article_wjpps_14600076161.pdf
article_wjpps_14600076161.pdfarticle_wjpps_14600076161.pdf
article_wjpps_14600076161.pdf
 
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
 
How ACDLabs Software Tools are used by the Royal Society of Chemistry
How ACDLabs Software Tools are used by the Royal Society of ChemistryHow ACDLabs Software Tools are used by the Royal Society of Chemistry
How ACDLabs Software Tools are used by the Royal Society of Chemistry
 
C044041723
C044041723C044041723
C044041723
 
Clasificacion biofarmaceutica y benchmarketing
Clasificacion biofarmaceutica y benchmarketingClasificacion biofarmaceutica y benchmarketing
Clasificacion biofarmaceutica y benchmarketing
 
Mass Spectrometry Applications and spectral interpretation: Basics
Mass Spectrometry Applications and spectral interpretation: BasicsMass Spectrometry Applications and spectral interpretation: Basics
Mass Spectrometry Applications and spectral interpretation: Basics
 
How an Online Resource for Chemistry Can Change Our World
How an Online Resource for Chemistry Can Change Our WorldHow an Online Resource for Chemistry Can Change Our World
How an Online Resource for Chemistry Can Change Our World
 
Data handling metabolomics
Data handling metabolomicsData handling metabolomics
Data handling metabolomics
 
Proteomics 2009 V9p1696
Proteomics 2009 V9p1696Proteomics 2009 V9p1696
Proteomics 2009 V9p1696
 
US-EPA CompTox Chemicals Dashboard as a web-based data resource to help ident...
US-EPA CompTox Chemicals Dashboard as a web-based data resource to help ident...US-EPA CompTox Chemicals Dashboard as a web-based data resource to help ident...
US-EPA CompTox Chemicals Dashboard as a web-based data resource to help ident...
 
Target oriented generic fingerprint-based molecular representation
Target oriented generic fingerprint-based molecular representationTarget oriented generic fingerprint-based molecular representation
Target oriented generic fingerprint-based molecular representation
 
Automatic vs manual curation of a multisource chemical dictionary
Automatic vs manual curation of a multisource chemical dictionaryAutomatic vs manual curation of a multisource chemical dictionary
Automatic vs manual curation of a multisource chemical dictionary
 
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
Applications of the US EPA’s CompTox Chemistry Dashboard to support structure...
 

Último

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 

Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider

  • 1. *********************************************************************************** Note: The final approved version of this paper can be found at www.springerlink.com, http://dx.doi.org/10.1007/s13361-011-0265-y. This is a prepress version of the article. *********************************************************************************** Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider James L. Little,1 Antony J. Williams,2 Alexey Pshenichnov,2 Valery Tkachenko2 1 Eastman Chemical Company, Kingsport, TN 37662, USA 2 ChemSpider, Royal Society of Chemistry, Cambridge, UK Correspondence to: J. L. Little; e-mail: jameslittle@eastman.com, Antony J. Williams; e-mail: williamsa@rsc.org Abstract In many cases, an unknown to an investigator is actually known in the chemical literature, a reference database, or an internet resource. We refer to these types of compounds as “known unknowns.” ChemSpider is a very valuable internet database of known compounds useful in the identification of these types of compounds in commercial, environmental, forensic, and natural product samples. The database contains over 26 million entries from hundreds of data sources and is provided as a free resource to the community. Accurate mass mass spectrometry data is used to query the database by either elemental composition or a monoisotopic mass. Searching by elemental composition is the preferred approach. However, it is often difficult to determine a unique elemental composition for compounds with molecular weights greater than 600 Da. In these cases, searching by the monoisotopic mass is advantageous. In either case, the search results are refined by sorting the number of references associated with each compound in descending order. This raises the most useful candidates to the top of the list for further evaluation. These approaches were shown to be successful in identifying “known unknowns” noted in our laboratory and for compounds of interest to others. W e have previously demonstrated [1] that searching the Chemical Abstracts Service (CAS) Registry employing accurate mass mass spectrometry data is a very useful approach for the There is a particular need for additional approaches for the identification of “known unknowns” found in liquid chromatography/mass spectrometry (LC/MS) analyses because the availability of computer identification of “known unknowns.” We define searchable collision-induced dissociation (CID) a “known unknown” as a compound which is mass spectral databases is limited for LC/MS [1] unknown to the investigator but is known in the as compared to that of electron ionization (EI) chemical literature, a reference database, or an mass spectral databases. We have found that internet resource. searching “spectraless” databases such as the 1
  • 2. CAS Registry [1] by elemental compositions or m/z value directly for a charged species and molecular weights then sorting the hit list in then select the type of ionic species from a pull- descending order by the number of associated down (drop-down) menu. Examples of typical references to be very effective in the types of ionic species in the menu include [M + identification of “known unknowns.” The most Na]+, [M + NH4]+, [M - H]-, etc. The m/z value likely candidates are normally brought to the entered is then automatically adjusted by the top of the list where they can be further program before searching the monoisotopic scrutinized by employing additional data. mass of the neutral species as it would appear in the ChemSpider database. ChemSpider is another very large “spectraless” database that can be searched by elemental The second change was the ability for the composition, molecular weight, or entered m/z value of the charged species to be monoisotopic mass and it is provided as a free corrected for the mass of an electron. Some resource to the community. ChemSpider manufacturers’ data systems do not properly contains >26 million entries as compared to the calculate the m/z values of ions for the mass of CAS Registry that contains >62 million an electron [3, 4]. The errors normally cancel substances. In our current work, the within the manufacturers’ elemental ChemSpider interface was modified [2] such composition programs because the reference that the initial search results can be sorted by calibration tables are also not corrected. the number of references associated with an However, all data exported into other entry. The approach was then evaluated with a applications for further data processing should wide variety of compounds and compared to be corrected. previous results obtained with the CAS Registry [1]. Acquisition of Accurate Mass Data Experimental The experimental detail including calibrations, typical chromatographic separations, and ChemSpider Software Modifications sample preparations were previously described in detail [1]. Briefly, the accurate mass Several changes were made in the ChemSpider electrospray LC/MS/UV-Vis (ultraviolet-visible) software interface to facilitate our studies [2]. data were obtained on a LCT time-of-flight mass The most important change was the ability to spectrometer (Waters Corporation, Milford, sort the initial search results in descending MA) equipped with a LockSpray secondary ESI order by the number of data sources or probe. An 1100 Series liquid chromatograph associated references. The references in with autosampler, degasser, and UV-Vis diode ChemSpider originate from the SureChem array spectrophotometer (Agilent Technologies, patent database (>20 million), PubMed (>20 Santa Clara, CA) was employed for the million articles), and the content of the Royal separations in the reversed phase mode. The Society of Chemistry publishing database. In UV-Vis spectral data is very useful in locating our work, sorting the “# of References” column particular classes of compounds (UV absorbers, was found to be the most useful. Many screen dyes, etc.) and in confirming the identity of displays of the software annotated with compounds by comparison to reference spectra examples from our laboratory are shown in the from standards, literature references, or even Electronic Supplementary Material. internet sources. Two other changes allowed more convenient Elemental compositions were generated from data entry for searching by the user. The first monoisotopic masses utilizing the Waters change was the ability for the user to enter the Elemental Composition Program within 2
  • 3. MassLynx V4.1 software which included i-FIT web-based version of SciFinder. The CAS software for numerically ranking the observed Registry can only be searched by elemental isotopic pattern to the theoretical ones. Our composition, molecular weight, or nominal LCT accurate mass measurements typically molecular weight, but not monoisotopic mass. yielded a standard deviation of 5 ppm. Thus, The ability to search by the monoisotopic mass windows of +/-18 ppm (slightly greater than 3 is much more useful because the standard standard deviations) were employed in deviation for its determination is much lower examples from our laboratory to insure than that for the molecular weight. In addition, inclusions of all reasonable candidates for the monoisotopic mass is calculated by all determining elemental compositions or for manufacturers’ data systems whereas the searching ChemSpider by monoisotopic masses. molecular weight must be manually calculated In literature studies, windows of approximately by the user. +/-5 ppm were employed because many currently available accurate mass time-of-flight One significant advantage of searching the CAS instruments yield standard deviations Registry versus the ChemSpider database is the approaching 1 ppm for mass measurements. ability of the former to search its > 34 million document records associated with an elemental Results and Discussion composition by key words [1]. Only very minimal sample history is required to quickly Descriptions of Two Approaches obtain useful candidates for tentative identifications by this approach. This capability The ability to search by either elemental can be especially useful in identifying more composition or monoisotopic mass is very obscure “known unknowns” with fewer valuable for the identification of “known associated references. This capability is not unknowns” using accurate mass mass currently available with ChemSpider. spectrometry data. In the work described within this article, the ChemSpider user Evaluation of the Two Approaches with interface was modified to permit the sorting of Literature Examples either the elemental composition or monoisotopic mass search results by the A group of 90 compounds was assembled from number of associated references. The results, literature sources [5-8], internet sites, and sorted in descending order, normally bring the American Society of Mass Spectrometry most useful entries to the top of the list. These Conference presentations to evaluate the two candidate structures are then scrutinized [1] by approaches in ChemSpider. The results are other available data such as in-source CID summarized in Tables 1 and 2. Searching the spectra, UV-Vis spectra, GC/MS data, number of ChemSpider database by elemental exchangeable protons, NMR data, etc. to compositions then sorting in descending order ultimately obtain the identification of the by the number of overall references yielded compound of interest. For critical more target compounds highly ranked as identifications, a standard of the material is compared to the same approach using normally obtained and its LC retention time, monoisotopic masses. This is to be expected UV-Vis spectrum, and in-source CID spectra are because the number of overall candidates was compared to those of the unknown. less when searching by elemental composition (mean = 513, median = 310) compared to A very similar approach [1] was demonstrated monoisotopic mass (mean = 800, median = to be very useful utilizing the CAS Registry of 740). However, the overall number with >62 million substances, a fee-based service, rankings less than or equal to 5 was acceptable which is searched by either STN Express or the in both cases. Thus, searching by elemental 3
  • 4. composition is the preferred approach, but reasonable approach when a unique elemental searching by a monoisotopic mass is also a composition cannot be readily determined. Table 1. Searching ChemSpider by elemental composition then sorting by number of associated references. Number Position of Compound Sorted in Descending Class of Compounds Compounds Order by Number of References in Class #1 #2 #3 #4 #5 >#5 Drugs 45 43 1 1 Pesticides 8 7 1 Toxins 2 2 Polymer antioxidants 15 15 Polymer UV Stabilizers 10 8 1 1 Polymer Clarifying agent (Irgaclear DM) 1 1(14) Polyurethane additives 4 2 1 1 Natural products 3 2 1 Herbicide (clofibric acid) 1 1 Artificial sweetener (Sucralose) 1 1 Total Compounds ChemSpider 90 81 4 3 1 1 Total Compounds CAS Registry [1] 90 84 4 1 1 Table 2. Searching ChemSpider by monoisotopic mass with +/- 5 ppm window. Number Position of Compound Sorted in Descending Class of Compounds Compounds Order by Number of References in Class #1 #2 #3 #4 #5 >#5 Drugs 45 43 1 1 Pesticides 8 7 1 Toxins 2 2 Polymer antioxidants 15 13 1 1 Polymer UV Stabilizers 10 6 1 1 1 1(8) Polymer Clarifying agent (Irgaclear DM) 1 1 Polyurethane additives 4 2 1 1 Natural products 3 2 1 Herbicide (clofibric acid) 1 1 Artificial sweetener (Sucralose) 1 1 Total Compounds ChemSpider 90 77 4 4 2 3 The same 90 compounds were previously comparison can be made in Table 2 for the evaluated with the CAS Registry searching by results obtained with ChemSpider. elemental compositions [1] using the web- based version of SciFinder. Similar results (see bottom of Table 1) were obtained using either ChemSpider or the CAS registry as databases for these limited number of test compounds. The CAS Registry currently cannot be searched by monoisotopic mass [1], therefore, no direct 4
  • 5. Example from Our Laboratory for UV Stabilizer found which were sorted in descending order Identification in a Commercial Polymer by the number of overall associated references. The top candidate was Tinuvin 328. The An unknown additive, noted in a commercial proposed structure was consistent with the polymer sample, was characterized by accurate accurate mass in-source CID spectrum (Scheme mass LC/MS. The accurate mass data was used 1) which showed two very significant losses of in conjunction with isotopic abundance C5H10 from the protonated molecule, [M + H]+, information to obtain an elemental composition and the presence of a C5H11+ ion at a m/z value of C22H29N3O. ChemSpider was searched by the of 71.085. elemental composition and 1135 hits were Scheme 1. In-source CID fragmentation for Tinuvin 328 The confidence afforded by the initial data allowed the identity to be reported to the Advantage of Searching Higher MW Compounds customer. At a later date, the identification of by Monoisotopic Mass Data the additive was confirmed by comparison of its LC retention time, UV-Vis spectrum, and in- It is often difficult to determine a unique source CID mass spectra to those of a elemental composition for “known unknowns” purchased reference sample. with molecular weights greater than 600 Da [9, 10]. Either there are two or more possible ChemSpider was also searched for protonated elemental compositions for the unknown or molecules with m/z 352.239 using a m/z one is inadvertently excluded if the somewhat window of +/-18 ppm. There were 1459 hits subjective user settings are set too narrow for and Tinuvin 328 was still the top candidate. The elements present, range of elements, double mass precision of our older instrumentation is bond equivalents, etc. In theory, the number of relatively large, but newer time-of-flight elemental compositions increases dramatically instrumentation afford much better mass as the molecular weight increases. However, in precision, which would return a smaller number practice, the number of elemental compositions of candidates from the search. Screen displays at molecular weights greater than 600 Da from the ChemSpider interface for the decreases dramatically in both ChemSpider (see identification of Tinuvin 328 by both Figure 1) and CAS Registry databases [1]. approaches are shown in the Electronic Supplementary Material. 5
  • 6. 8,000,000 No. of Compounds 7,000,000 6,000,000 5,000,000 4,000,000 3,000,000 2,000,000 1,000,000 0 Molecular Weight Ranges Figure 1. Number of ChemSpider entries versus molecular weight ranges Therefore, it is much more reasonable to search of examples. Of course, ex post facto, it is still for “known unknowns” in these databases using extremely important to compare the isotopic monoisotopic mass instead of struggling to abundances of the candidate structures from determine a unique elemental composition for ChemSpider to those of the unknown. The searching. The rankings noted by number of ability to calculate, rank, and compare references for several higher molecular weight theoretical isotopic abundances for elemental compounds noted in the literature [9,10] are compositions to those of observed ones is compared to elemental composition searches normally a standard option in most mass versus monoisotopic mass searches in Table 3. spectrometry manufacturers’ software. The There is essentially no significant penalty noted CAS and ChemSpider identification numbers for for searching by monoisotopic mass instead of compounds in Table 3 are included in the elemental composition for this limited number Electronic Supplementary Material. 6
  • 7. Table 3: Comparison of Results Searching Compounds MW>600 Da by Elemental Composition and Monoisotopic Mass Rank Rank Monoisotopic Elemental Monoisotopic Compound Elemental Mass using +/-5 ppm Composition Mass Composition window Moxidectin C37H53NO8 639.3771 1 of 5 1 of 39 Erythromycin C37H67NO13 733.4612 1 of 42 1 of 53 Digoxin C41H64O14 780.4296 1 of 47 1 of 65 Rifampicin C43H58N4O12 822.4051 1 of 29 1 of 96 Rapamycin C51H79NO13 913.5551 1 of 43 1 of 51 Amphotericin B C47H73NO17 923.4878 1 of 33 1 of 42 Gramicidin S C60H92N12O10 1140.7059 1 of 5 1 of 13 Cereulide C57H96N6O18 1152.6781 1 of 3 2 of 8 Cyclosporin A C62H111N11O12 1201.8414 1 of 36 1 of 38 Vancomycin C66H75Cl2N9O24 1447.4302 1 of 24 1 of 26 Perfluorotriazine C30H18N3O6P3F48 1520.9642 1 of 1 1 of 1 Thiostrepton C72H85N19O18S5 1663.4924 1 of 5 1 of 5 Example from Our Laboratory for the laboratory for in-source CID [1] and a typical Identification of a Higher Molecular Weight example is shown in the Electronic Antioxidant in a Commercial Polymer Supplementary Material. An unknown additive, noted in a commercial ChemSpider was searched by the monoisotopic polymer sample, was characterized by accurate mass for the ammonium adduct with a mass mass LC/MS. The ammonium adduct, [M + window of 18 ppm (see Electronic NH4]+, of the component was observed at a m/z Supplementary Material) and 23 candidates value of 801.558. The observed ion was were obtained. The top candidate was confirmed to be an ammonium adduct because Goodrite 3114. Only two of the 23 candidates at higher in-source CID energies the ammonium had isotopic abundances consistent with that of adduct intensity was reduced and the intensity the unknown. Of these two, only the in-source of the sodium adduct was noted to increase. CID mass spectrum (Scheme 2) of Goodrite This increase in absolute intensity of the sodium 3114 was consistent with that for the unknown. adduct and corresponding decrease in that of the ammonium adduct is routinely noted in our 7
  • 8. Scheme 2. In-source CID fragmentation for Goodrite 3114 At a later date, the LC retention time, UV-Vis Some changes in both the current API and the spectrum, and in-source CID mass spectra of a vendors’ programs would be required to utilize commercial sample were shown to be identical associated references and comparison of to those of the unknown. isotopic abundances to facilitate increases in data processing speeds. When searching by Future Enhancements to Improve Productivity monoisotopic mass, the isotopic abundances of the candidates should then be calculated in the ChemSpider currently supports a web vendor’s data system and compared to the application program interface (API) via web observed isotopic abundance of the unknown services and sorted in descending order by either the (http://www.chemspider.com/MassSpecAPI.as number of references or the fit of the isotopic mx) for batch queries by external vendors’ abundances to those of the unknown. This programs. Several companies, including Bruker would be particularly useful for “known Daltonics, Thermo Scientific, Waters, and unknowns” with molecular weights greater than Agilent Technologies, integrate their data 600 Da, but would be also beneficial for lower processing programs with ChemSpider. Users molecular weight compounds. Alternatively, who integrate using the web services, including the ChemSpider software could be enhanced to these vendors, use a token supplied by perform the calculations of the isotopic ChemSpider to authenticate their identity to the abundances of the candidates for comparison web service. Currently there are no similar to the unknowns. capabilities for such queries of the CAS Registry using either SciFinder or STN Express. Even with the above changes, excessive time would still be needed to manually compare the observed CID spectrum of an unknown to 8
  • 9. fragments expected by the user from the elemental composition and monoisotopic mass candidates’ structures. If in silico CID spectra, are listed as C12H19NO2 and 209.1216, i.e. theoretical fragmentation with predicted respectively, in ChemSpider. It would be more abundances, could be calculated [11,12] and beneficial to parse this type of organic cation as ranked for the candidate structures from either C10H16N.C2H3O2 then list the elemental ChemSpider or SciFinder SDF (Structure Data composition and monoisotopic mass for Format) files [13], further data processing searching, respectively, as C10H16N and speeds would be realized. The results of the 150.1277. numeric structure ranking, isotopic abundance, and number of references could then be more Amphoteric (inner salt, zwitterionic) species are quickly examined by the user to yield tentative listed in the database with no modifications. identifications. For example, (CH3)3N+CH2CO2-, trimethylglycine, is listed as C5H11NO2 with monoisotopic mass of Both ChemSpider and the web-based SciFinder 117.079. This type of compound would yield [M are able to export structures for the candidate + H]+ and [M + acetate]- ions, respectively, in compounds in SDF. The web-based version positive and negative ion electrospray analyses SciFinder permits the export of up to 500 using acetate in the LC eluent. Thus the structures in an SDF file. There is no limit to the elemental composition for the species would number of structures that can be exported in need to be corrected before searching. the ChemSpider API and a limit of 10,000 in the web interface. Conclusions Charged Species in ChemSpider Modifications in the ChemSpider interface to sort elemental composition and monoisotopic More development work needs to be mass search results by the number of performed to facilitate the searching of charged associated references in descending order species by either elemental composition or offered significantly improved capabilities for monoisotopic mass. Sodium benzoate the identification of “known unknowns” using illustrates the current state of affairs for organic accurate mass mass spectrometry data. Other anions. Its elemental composition and changes were made to allow easier input of monoisotopic mass are listed in ChemSpider as data by users to specify the type of ion adduct C7H5NaO2 and 144.018724, respectively. It and to correct the monoisotopic mass for the would be more useful to parse the elemental mass of an electron. Further enhancements are composition as C7H6O2.Na (neutral benzoic acid still needed to improve the overall productivity to left of period, associated cation to the right) of the process and to enable the searching of then list the elemental composition and charged species. monoisotopic mass for searching as C7H6O2 and 122.037, respectively. This is the format The elemental composition search is the employed in the CAS Registry [1]. In reverse preferred one in the lower molecular weight phase chromatography electrospray mass range (200-600 Da), but even the monoisotopic spectrometry, the organic anions elute as their mass search with reasonable error windows is free acid form and are normally detected as [M viable in this range. Monoisotopic mass - H]- ions in the negative ion mode. Thus, their searching for compounds with a molecular identity is independent of their associated weight >600 Da is preferred when it is difficult cation. to determine a unique elemental composition. The resulting candidates from the monoisotopic A simple example of an organic cation is N,N,N- mass search are then ranked by comparing their trimethyl-N-benzylammonium acetate whose calculated isotopic abundances by the 9
  • 10. manufacturers’ isotope abundance programs to 2. Little, J.L.; Williams, A. J.; Tkachenko, V.; that of the unknown. Pshenichnov, A: Accurate mass measurements: identifying “known unknowns” using The ability to search monoisotopic mass in ChemSpider. Proceedings of the ASMS, Denver, ChemSpider is an important function which is CO, 2010. absent in search engines for the CAS Registry. 3. Mamer, O. A.; Lesimple, A.: Common However, CAS Registry searches can be further shortcomings in computer programs calculating refined by key words which can be particularly molecular weights. J.Am.Soc.Mass Spectrom. useful for more obscure “known unknowns” 14, 626 (2004). with few associated references. This option is 4. Ferrer, I; Thurman, E. M.: Importance of the not currently available in ChemSpider. The electron mass in the calculations of exact mass ability to search by both databases is very by time-of-flight mass spectrometry. Rapid desirable depending on the problem at hand Commun. Mass Spectrom. 21, 2538-2539 when costs are not a limitation. ChemSpider is (2007). provided at no cost to the community, but 5. Hogenboom, A. C.; van Leerdam, J. A.; de there is a fee associated with the utilization of Voogt, P.: Accurate mass screening and the CAS Registry. identification of emerging contaminants in environmental samples by liquid In all cases, other data such as sample history, chromatography-hybrid linear ion trap orbitrap UV-Vis data, types of ions observed, mass spectrometry. J. Chromatogr. A, 1216, exchangeable protons, CID spectra, etc. are 510-519 (2009). needed to scrutinize the candidate structures 6. Liao, W.; Draper, W. M.; Perera, S. K.: from the elemental composition and Identification of unknowns in atmospheric monoisotopic mass search results. For ultimate pressure ionization mass spectrometry using a confirmation of structure, analysis of a standard mass to structure search engine. Anal. Chem. material under identical conditions is always 80, 7765-7777 (2008). desirable. 7. Garcia-Reyes, J. F.; Ferrer, I.; Thurman, E. M.; Molina-Diaz, A.; Fernandez-Alba, A. R.: Acknowledgements Searching for non-target chlorinated pesticides in food by liquid chromatography/time-of-flight The authors gratefully recognize Curt Cleven mass spectrometry. Rapid Commun. Mass from Eastman Chemical Company, Mike Scott Spectrom. 19, 2780-2788 (2005). from Agilent Technologies, Inc., and Jim 8. Ojanperä, S.; Pelander, A.; Peizing, M.; Krebs, Lekander from Waters Corporation for their I.; Vuori, E.; Ojanperä, I.: Isotopic pattern and help and advice. We also thank Bill Tindall and accurate mass determination in urine drug Kent Morrill (retirees from Eastman Chemical screening by liquid chromatography/time-of- Company) for their initial work on “spectraless” flight Mass Spectrometry. Rapid Commun. Mass databases created from the TSCA (Toxic Spectrom. 20, 1161-1167 (2006). Substances Control Act) and the Eastman 9. Erve, J.C.; Gu, M.; Wang, Y.; DeMaio, W.; Corporate Plant Material databases. Talaat, R.E.: Spectral accuracy of molecular ions in an LTQ/Orbitrap mass spectrometer and References implications for elemental composition determination. J. Am. Soc. Mass Spectrom. 20, 1. Little, J. L.; Cleven, C. D.; Brown, S. D.: 2058-2069 (2009). Identification of “known unknowns” utilizing 10. Lohmann, W.; Decker, P; Barsch, A: accurate mass data and Chemical Abstracts Accurate elemental composition determination Service databases. J. Am. Soc. Mass Spectrom. and identification of molecules with > 1100 m/z 22, 348-359 (2011). 10
  • 11. with UHR-TOF. Bruker Application Note # ET-23 systematic bond disconnection approach. Rapid (2011). Comm. in Mass Spectrom., 19, 3111-3118 11. Sweeney, D. L.: Small molecules as (2005). mathematical partitions. Anal. Chem. 75, 5362- 13. Dalby, A.; Nourse, J. G.; Hounshel, W. D.; 5373 (2003). Gushurst, A.K.I.; Grier, D. L.; Leland, B. A.; 12. Hill, A. W.; Mortishire-Smith, J.: Automated Laufer, J.: Description of several chemical assignment of high-resolution collisionally structure file formats used by computer activated dissociation mass spectra using a programs developed at Molecular Design Limited. J. Chem. Inf. Comput. Sci., 32, 244-255 (1992). 11
  • 12. Electronic Supplementary Material for pp y DOI: 10.1007/s13361-011-0265-y Identification of “Known Unknowns” Utilizing Accurate Mass Data and ChemSpider Journal of the American Society for Mass Spectrometry, online version James L. Little,a Antony J. Williams,b Alexey Pshenichnov,b Valery Tkachenkob aEastman Chemical Company, Kingsport Tennessee, USA p y gp bChemSpider, Royal Society of Chemistry, Cambridge, UK Correspondence to jameslittle@eastman.com or williamsa@rsc.org 1
  • 13. Initial Window Selected at www.chemspider.com on Internet Either Select “More Searches” or Select “Advanced Search” 2
  • 14. Second Window Selected in ChemSpider Interface on Internet Select “Advanced” 3
  • 15. UV Stabilizer Example: Three Steps for Searching by Elemental Composition 1) Select “Search by Properties” 2) Enter elemental composition 3) Select “Search” 4
  • 16. UV Stabilizer Example: Sorting “Initial” Elemental Composition Results by Selecting Desired Column to Sort Sort descending by selecting “# of References” column, down arrow will appear after sorted 5
  • 17. UV Stabilizer Example: Five Steps for Searching by Monoisotopic Mass 1) Enter observed m/z value 2) Enter appropriate precision window for your instrument 3) S l t t Select type of i f ion 4) Correct for mass of electron if necessary for your instrument 5) Select “Search” 6
  • 18. UV Stabilizer Example: Sorting “Initial” Monoisotopic Mass Results by Desired Column Sort descending by selecting “# of References” column, down arrow will appear after sorted 7
  • 19. Antioxidant Example: Identification by Monoisotopic MW 1) Enter observed m/z value 2) Enter appropriate precision window for your instrument 3) Select type of ion 4) Correct for mass of electron if necessary for your instrument 5) Select “Search” 8
  • 20. Antioxidant Example: Sorting “Initial” Monoisotopic Mass Results by “# of References Column” Sort descending by selecting “# of References” column, down arrow will appear after sorted 9
  • 21. CAS and ChemSpider Numbers for Compounds Shown in Table 3 in Text Elemental Monoisotopic Compound CAS No. ChemSpider No. Composition Mass M id ti c Moxidectin C37H53NO8 639.3771 639 3771 113507-06-5 113507 06 5 16736424, 7845502 16736424 7845502, 7878580 Erythromycinc C37H67NO13 733.4612 114-07-8 12041 Digoxinc C41H64O14 780.4296 20830-75-5 2006532, 23089581 Rifampicinc C43H58N4O12 822.4051 13292-46-1 10468813, 21112299, Rapamycinc R i C51H79NO13 913.5551 913 5551 53123-88-9 53123 88 9 10482078 Amphotericin Bc C47H73NO17 923.4878 1397-89-3 10237579, 21111691, 23123815 Gramicidin Sc C60H92N12O10 1140.7059 113-73-5 66085 Cereulided C57H96N6O18 1152.6781 157232-64-9 8030708, 8117283, 8232646 Cyclosporin Ac C62H111N11O12 1201.8414 59865-13-3 4444325, 4447449 Vancomycinc C66H75Cl2N9O24 1447.4302 1404-90-6 14253 Perfluorotriazinec C30H18N3O6P3F48 1520.9642 16059-16-8 2055443 Thiostreptonc C72H85N19O18S5 1663.4924 1393 48 2 1393-48-2 10469505, 24603973 cErve,J.C.; Gu, M.; Wang, Y.; DeMaio, W.; Talaat, R.E.: Spectral accuracy of molecular ions in an LTQ/Orbitrap mass spectrometer and implications for elemental composition determination. J. Am. Soc. Mass Spectrom. 20, 2058-2069 (2009). dLohmann, W.; Decker, P; Barsch, A: Accurate molecular formula determination and identification of molecules with > 1100 m/z with UHR-TOF. Bruker Application Note # ET-23 (2011). / pp ( ) 10
  • 22. Increase in Absolute Intensity of Sodium Adduct at Higher In Source In-Source CID Voltages [M+H]+ [ ] [M+NH4]+ [M+Na]+ (volts) m/z 329 Chemical Formula: C30H58O4S MW 514 4 514.4 m/z 143 m/z 89 11