SlideShare uma empresa Scribd logo
1 de 57
Baixar para ler offline
Applications of Computer Software for the Interpretation
and Management of Mass Spectrometry Data in
Pharmaceutical Science
Mark Bayliss and Antony Williams, Advanced Chemistry Development,

90 Adelaide Street West, Suite 702, Toronto, ON, M5H 3V9, Canada
Abstract

       Within the last decade there has been a rapid growth in the adoption of

Mass Spectrometry (MS) as a routine and facile technique not just by a group of

expert level mass spectrometrists, but by a much more diverse group of non-MS

related disciplines. This shift continues to be fueled by a number of factors,

which can be broadly segregated into, instrumental technologies, the derived

high value of the technique, the cost per sample, the derived information

content, ease of use and software.

       Advances in sensitivity, ruggedness, reliability, ease of integration with

High Performance Liquid Chromatography (HPLC), Gas Chromatography (GC)

and other separation techniques and the general ease of operation of MS

instrumentation can all be considered as enabling. Ultimately, the strongest

driver for the wide adoption of MS has been driven by the clear value that the

technique brings to so many different businesses in terms of both sample

throughput and information content per sample. This expansion in the ability to

create data both in terms of volume and in data density per dataset can be

correlated directly with a backlog in the ability to extract, process, store and

report, and thereby create the resulting high information and knowledge content

which is sought. Data that are generated by the instruments in their various

guises are simply binary bits and bytes and information has to be extracted via a

process of conversion of data to information and knowledge. Software therefore
becomes an integral, critical and enabling part of the cycle of information

creation in support of compound development and chemical analysis.



       Additional business drivers include the need to reduce development

timelines, a greater understanding of the chemical significance of a particular

development compound and return on investment. All these factors result in a

tremendous business effort focused around streamlined approaches that provide

scientists, managers, and executives the capability to readily obtain, or even

request, the necessary information.

       Due to the heterogeneous instrumentation environment and resulting

distribution of data formats, it is challenging to bring together a single universally

applied interface for the data. Data in this sense refers to the different

spectroscopies and other analytical techniques that are commonly used in

support of chemical analysis. The ability to read in raw vendor formats and allow

integrated data-handling has been severely lacking. Efforts have been made to

define common exchange data formats such as JCAMP and NetCDF and current

efforts using XML which are being driven by the ASTM E13 committee. Third

party vendors [1] have also assumed the task of becoming the neutral party to

unify data handling and management. Such third party offerings have become a

crucial component in the effort to build a single corporate spectroscopic

database supporting all instrumentation, not limited to MS but inclusive of NMR,

IR, UV-Vis, Raman and HPLC as shown in Figure 1.
In this chapter we intend to present, review and discuss some of the non-

instrument related software systems that exist for qualitative data extraction and

structural elucidation. During this discussion we will examine the representation

of molecular structures associated with analytical data and the support systems

that are able to store, retrieve and report this information. It is not our intention to

review the archival systems that exist for the long term storage of the physical

datafiles and other associated electronic records. As part of this review we will

include a survey of the creation of commercial and laboratory specific reference

databases and associated searching algorithms. We will also discuss recent

efforts to introduce advanced processing and analysis algorithms to the hands of

the masses, specifically as an aid to data extraction and structure elucidation.

Broadly speaking, we can separate the points of discussion into tools for data

extraction, elucidation, storage, retrieval, reporting and information distribution.

Nine strategies consistently appear in MS-based methods for accelerated

development and have been discussed in detail by Lee [2]. The strategies are

standard methods, template structure identification, databases, screening,

integration, miniaturization, parallel processing, visualization and automation.

These strategies serve to define the attributes of the analytical methods being

applied. High-throughput sample-generating technologies such as biomolecular

screening and combinatorial chemistry can create many thousands of samples,

each requiring the application of one or more forms of analytical chemistry.

Nowadays, the ability to devise, construct, and refine sample-analysis methods,

either chromatographic or spectroscopic, has become as equally important as
the hardware itself. Today, the need to integrate appropriate method

development strategies with MS processing capabilities is a critical factor in the

modern industrial laboratory.

       In chemical and pharmaceutical companies around the world, the

necessity to acquire and analyze analytical data for the abundance of samples is

a critical business requirement. As a result the availability of open-access

laboratories containing highly roboticized instrumentation such as OpenLynx

from Waters Corporation – formerly Micromass Ltd [3], the 1100 Series High

Throughput LC/MS System from Agilent [4] and others are now commonplace.

The careers of professional spectrometrists are now largely focused on the

implementation of optimal techniques to support the users of these laboratories

rather than the standard sample analysis of yesteryear. Decreasing costs and

reduced footprints for the instrumentation, as well as more intuitive software

interfaces for non-specialists and globalization of software platforms such as

Waters Micromass OpenLynx Global Server™[5], allows the use of

spectroscopic and chromatographic techniques in an open-access laboratory

environment across organizations. Commonly, these laboratories are also likely

to provide NMR, MS, IR, UV-Vis and chromatographic instrumentation. As a

result of these laboratories both standard and hyphenated MS-based techniques

have entered the hands of the masses. It is clear that distinct differences still

exist between the applications of mass spectrometry made available to non-

specialists and those performed by the specialist.
In general, non-specialists are adopting MS instrumentation that

predominantly generates molecular ion only MS with little or no fragmentation.

This is clearly revealed during visits to any of the number of laboratories that

now offer Open Access technologies that enable a chemist with no prior MS

knowledge or experience to submit samples for analysis in a totally automated

manner. In a small number of cases, this has been extended to the inclusion of

MS/MS fragmentation though this appears not to be the norm at this time.

Another example of this appears in applications that deal with combinatorial

plate analysis, for example, the data generated includes a full high performance

liquid chromatography-MS (LC-MS) run. The ionizing technique is “soft” and

produces for each well in a plate both the parent ion and one or more

chromatographic traces [Total Ion Current (TIC), Extracted Ion Current (XIC),

Diode Array, Chemiluminesence Nitrogen Detector (CLND), Evaporative Light

Scattering Detection (ELSD) and others] to aid in the assay of materials in the

sample. Meanwhile, the traditional spectrometrist is generally more focused on

non-routine analyses which require greater levels of custom method

development, structural elucidation, and studies requiring the usage of accurate

mass LC/MS and LC/MS/MS.

       Whether the application provides data for synthetic chemists or expert

spectrometrists, computer software is an essential factor in a successful

analysis. Whether it is the application of advanced chemometric algorithms for

noise-reduction, the association of structural fragments with mass spectral

features, or the management and databasing of the derived information,
computer software applications additional to those required for operation of the

instrument are a necessary and integral part of the analytical information

repertoire that exists for scientists industry wide.




Extraction of data

       Prior to any structural elucidation, the need for data extraction is of

paramount importance. The simplest form of extraction may merely be a case of

selecting a peak of interest in the LC/MS or GC/MS TIC and obtaining a

spectrum for that peak. The inclusion of background subtraction further improves

the spectral quality with the removal of solvent and contributions from any

background ions – thus making the identification of the molecular weight or

spectrally related ions clearer. The automation of background subtraction and

generation of a “cleaned” spectrum is very much the mainstay of all data

processing systems that exist in the marketplace. Of course this method

precludes that the elution times of the peaks are either known or that the peaks

in the TIC are clearly visible. In the case of Open Access or combinatorial

studies, it can be common practice to use the additional analog detectors, UV,

ELSD or CLND detectors to define the retention time of the eluting peaks which

can then be used to obtain a combined and background subtracted MS

spectrum. This technique certainly adds value when the analysis is not sample

limited, and a strong peak exists in the analog detector(s) which can be used to

direct the extraction of the MS spectrum. Variability in detection between one or
more detectors, as in a lack of chromophore for example does lead to a lack of

detector response. The use of more than one analog detector does help to

minimize this impact.

In many cases, where the focus of the MS is in the extraction of low intensity

components such as impurity analysis and metabolite determinations, the

presence of the chromatographic peak of interest may be obscured by the

presence of high background levels resulting from solvents, buffers and other

none sample related background contamination ions. In addition, in these cases

the concentration of the unknown peak(s) of interest in the sample may be so

low that there may be no response on the UV or other analog detector that can

define the position of the chromatographic peak. It is often the case that the

intensity of the contamination ions or those from the solvents and buffers far

exceeds those arising from the sample related ions and thus extraction by

retention time alone becomes less appealing. In the case of natural product

analysis and metabolism studies, the chromatographic peaks of interest may be

present with a multitude of other peaks that are related to the sample matrix and

thus unwanted. This of course further increases the complexity of the extraction

process. Differentiating sample related peaks from those resulting from the

matrix often requires extensive knowledge of both the samples of interest and

the matrix and thus these tasks are often performed by highly trained mass

spectrometrists with a detailed understanding of the sample and its chemistry. Of

course, if, for a particular sample, a significant knowledge base already exists, it

is possible to use this knowledge as a template for data extraction. This is done
by searching for masses within the dataset that differ by some delta mass (∆M)

from the compound of interest, such as a parent drug compound or synthesis

material. For example, it would be possible to extract mass chromatograms for

the mono, di, tri… hydroxylated forms of a starting drug structure by extracting

mass chromatograms for (Parent Mass + n[+16]) and then identifying the

presence of chromatographic peaks within these extracted mass

chromatograms. This method represents the route of choice for many of the

software packages that exist for metabolite data extraction by many of the

vendors and offers significant value in being able to extract only sample related

events that exist within the datasets of interest.

       As an aid to data extraction, a number of chemometric algorithms have

been developed over the years to assist in the extraction of sample related

spectra and remove the interference of the background and matrix-based

effects. These algorithms by their very nature do not use any knowledge base

for the extraction process and can be beneficial in cases where there has been

significant rearrangement in the integrity of the structure relative to the original

parent structure. Examples of such algorithms include Biller-Biemann [6] and

more recently CODA (COmponent Detection Algorithm) reported by Windig [7],

both of which have been integrated into various MS processing software

platforms over the years. Both of these algorithms are effective in removing the

noise resulting from chemical background and electronic noise that exists within

the data. This can be seen in Figure 2 where the trace at the top represents the

original TIC and the trace at the bottom represents the TIC following the
application of CODA. The output from the CODA approach can also be

visualized in the form of individual mass chromatograms. As a generic technique

CODA is most appropriate for the extraction of all peaks contained within the

sample data file, for example an impurity analysis. In other cases, it maybe

desirable to extract only the unique chromatographic peaks present in two or

more data sets. Windig et. al. [8] also reports on the application of CODA to two

or more datasets and the subsequent comparison of the output to determine only

those components that are unique referred to as COMPARELCMS. Figure 3

represents such a comparison using the COMPARELCMS process, where the

top trace represents a metabolized trace and the bottom one a control against

which the metabolized sample is compared. As is clearly visible, the top trace

contains a number of peaks that are unique and thus can be investigated further

as potential metabolite or impurity candidates. As in the case of the visualization

of the CODA output, COMPARELCMS can also be visualized as individually

selectable mass chromatograms. Once extracted, the difference in mass from

the starting parent compound can then be rationalized to either a simple

modification of the original structure, or some other more complex structural

rearrangement.

      The isolation of the MS chromatographic peak and its associated mass

can be used in a number of ways, simply as an indicator of molecular weight, as

a means of calculating the empirical formula or as a driver used in the

generation of tandem MS/MS or MS(n) data either in an instrument driven MS to

MS/MS or MS(n) switching protocol or via an MS1 targeted method.
Structural Elucidation Using MS Data

       The elucidation of chemical structure(s) covers an extremely wide arena

of processes. At its simplest level this may be the calculation of empirical

formula using high mass accuracy, of an isotopically pure spectral peak. Whilst

calculation of empirical formula does not preclude the use of high resolution MS,

it remains a critical requirement in the determination of spectral peak purity. The

necessity for high mass accuracy and high mass resolution may not be apparent

at first glance. High mass accuracy is the ability to determine the value of the

ionized mass to a significant number of decimal places as discussed below. High

mass resolution is the ability of an MS instrument to separate two or more

masses that have the same nominal value. It is also important to note that a high

mass accuracy instrument is unable to separate isomeric forms of the same

compound as the mass of each component is exactly the same.

       A spectrally pure peak is an absolute requirement to ensure the correct

calculation of the center of gravity for the mass spectral peak under investigation

and that it is not biased by the presence of some spectral peak with similar

nominal mass. Such determinations of empirical formula thus require the

calculation of molecular weight to at least 3 decimal places or better such that

the number of permutations of carbon, hydrogen, nitrogen, oxygen and so on

can be minimized, Figure 4. The usage of accurate mass determinations need

not be confined to just MS1 or molecular ion peaks. Rather it has a much wider
applicability when used in conjunction with tandem MS spectral peaks [9]. This

has been found to assist greatly in the determination of structural fragments and

is being widely applied in the study of metabolites, degradants, natural products

and impurity elucidations.

       In the example of the fragmentation of the tri-ethyl pirimiphos, it is

determined that two potential fragment routes give rise to the nominal mass

m/z 152 Figure 5. In the first suggested fragmentation route, cleavage occurs at

the oxygen in position number 11 attached to the phosphorus-sulfur moiety. The

charge is retained on this portion of the molecule to result in a fragment ion with

calculated accurate mass of m/z 152.006 Da, Figure 6a and resulting in a delta

mass of 80 mDa from the experimentally recorded mass of m/z 152.086. When

this is contrasted with the other fragmentation possibility, Figure 6b, a mass

delta of 3 mDa is observed between the calculated fragment mass and the

experimentally determined mass. In the example presented above, it is possible

by adjusting the mass accuracy of the fragmentation assignment process to

match that of the instrumentation being used to acquire the MS data thereby

reduces the number of false positive fragment possibilities that have to be

reviewed.

       In the pharmaceutical industry, much of the MS-based elucidation

strategy is based on the premise that much of the parent drug structure will be

retained in the metabolites, impurities, or degradants [10]. The resulting

fragment ions associated with unique substructures of the parent compound are

thus also retained. Thus, the unique fragment ions contained in either full scan
or product ion mass spectra of the parent compound serve as the template for

identification. The template structure identification strategy has been recently

illustrated for the profiling of paclitaxel degradants [11].

       MS vendors are astute at providing tools for data extraction, quantitation

and compound suggestions, but these often do not include proposed chemical

structures or fragments. The conversion of spectrum to structure in a de-novo

sense, for example natural products, where no prior sample information exists,

remains an extremely difficult process when MS is used in isolation. In the

majority of cases the conversion of a spectrum to a structure even with all the

advances that have been made in the technology, still requires some starting

information about the sample that has to be used in conjunction with the mass

spectral information. Confirmation of structure by the verification of key mass

spectral ions present in the spectrum forms an extremely powerful technique for

structural analysis around a scaffold of prior information of the sample. The

complement of MS, NMR, other spectroscopies and anecdotal information has

been proven to be necessary for de-novo structural elucidation[12,13]. In these

cases MS provides accurate mass information and thus empirical formulae for

the complete structure and key fragments which can be used during the

elucidation process. Neutral loss analysis of the tandem MS and other

fragmentation techniques provides indications for the presence of structural

fragment information for example hydroxylation and phosphate moieties.

Additionally, isotopic information especially in the cases of structures which are

chlorinated, brominated, those containing sulfur and some transitional metal
cations are highly characteristic and are thus diagnostic. The incorporation of

NMR data [1H NMR, 13C NMR, 2D NMR data and other relevant techniques]

allows complete atom-to-atom connectivity maps and thus a route to complete

structural identification. These structural elucidations are still typically

undertaken by expert level spectrometrists throughout the industry, however,

such expert software systems as ACD/Structure Elucidator from Advanced

Chemistry Development Inc., are now serving to dramatically reduce the time

and complexity of this process.

       Where a significant body of knowledge exists for the structure being

elucidated, for example in impurity analysis and metabolism studies, the

difference in mass between the starting compound and the unknown significantly

reduces the number of possibilities that have to evaluated. In most cases

significant structural information is retained in the spectral information of the

unknown and thus techniques such as spectral correlation, discussed later, offer

advantages. In those cases where significant rearrangement or oxidative

cleavage may have occurred, the remaining part of the structure may be

significantly different from the parent drug. In these situations the fragment ions

are often significantly different from those of the parent compound and thus

spectral correlation approaches may not be as useful in the determination of

structural changes. In practice these types of structural analysis challenges

require evaluation by a spectrometrist and potentially other scientists with a

detailed understanding of the chemistries and possible enzymatic pathways that

are involved.
The method by which a spectrum is obtained can have a significant effect

on the way in which the structure can be elucidated. High energy ionization

techniques such as EI typically result in spectra containing extensive

fragmentation usually with little or no remaining molecular ion spectral

information. Fortunately, standardized instrumental ionization acquisition

conditions ensure that spectra are usually reproducible from instrument to

instrument. These standardized methods of acquisition thus ensure that spectra

can be easily stored in a spectral library and distributed to all groups who

require search access. Spectral databases are discussed later in this chapter.

Low energy ionization techniques such as electrospray and atmospheric

pressure chemical ionization on the other hand typically generate protonated or

deprotonated molecular ions with little or no fragmentation. Fragmentation can

be induced in a number of ways including source induced fragmentation,

fragmentation in a gas filled collision cell or via resonant fragmentation in ion

traps. These low energy spectra, unlike EI spectra, are not acquired under fixed

fragmentation conditions and as such the spectra can be very different. These

differences are further exacerbated when instrument-to-instrument, vendor-to-

vendor and MS instrument types are included in the variation matrix [14].

Whether the spectrum has been obtained as a MS1 full scan experiment or via a

tandem MS/MS acquisition, structural assignment of the spectrum can still be

possible. In the case of the assignment of a full scan MS1 trace such as EI

GC/MS spectra it is important to note that the assignment of the spectrum will be

dependent upon the isotope that is selected for the fragment assignment
procedure. This is clearly identified in the fragmentation of Temazepam,

Figure 7, in which the 37Cl contributes a significant amount to the ion intensity of
                                                                                 35
the fragment ions. Note that the spectrum in this case is assigned using the       Cl

isotope. It is usual however in the case of the majority of structural elucidations

to isolate an individual isotope using the first stage mass filtering capabilities of

the MS instrumentation before collisionally induced dissociation (CID) in a

collision cell or ion trap. In this way the tandem MS spectrum is isotopically pure

and thus the fragments in the spectrum can result from the assignment of the

selected isotope. The use of high resolution, at the stage of isolation of the MS1

mass of interest, can provide an additional level of confidence ensuring that the

tandem MS spectrum is isotopically pure. In those cases where low resolution

MS1 ion isolation is coupled with high resolution ion detection, the presence of

isobaric masses in the isolation MS1 spectrum can be detected and their

presence taken into account and minimized during the elucidation phases.

       Detailed information is also obtained by the observation of sequential

neutral losses to determine the sequence of substructures or “molecular

connectivity” within the analyte [15]. This procedure is analogous to two-

dimensional NMR techniques used to sequentially connect substructures. This

approach has major benefits for those structural modifications whereby the

majority of the structural integrity is maintained. Of course, a familiar example of

molecular connectivity is the determination of the amino acid sequence of a

peptide. Specific neutral losses are indicative of certain amino acids, and the

sequence of these losses can be used to identify the peptide [16].
Owens [17] reports a software based technique of spectral correlation or pattern

matching of MS/MS spectra and the determination of a similarity index as a

means of filtering out those tandem MS spectra which have low correlations with

respect to the parent drug MS/MS spectrum and are thus defined as

endogenous background peaks. Where a high similarity exists, this is indicative

that there are spectral elements that show a high degree of correlation to the

parent drug compound [18]. The subsequent auto-correlation between the

assigned parent drug spectrum and the unknown spectrum can then influence

the identification of the changes in the original parent drug structure and thus

the determination of potential structural modifications. When linked with high

mass accuracy data this technique may offer significant value in expediting the

generation of metabolite or impurity structures.

       In the determination of chemical structure using either a manual approach

or via some software driven method or a combination of the two techniques, the

assignment of the spectral fragments remains a key part of the process. To date

the spectral analysis software systems that exist in the industry allow assignment

of the spectrum to a particular proposed structure using a rules based approach,

as the autoassignment example, Figure 7, shows. As with all rules based

approaches, it may not be possible to identify all spectral ions and thus the

intervention of a spectrometrist with a detailed knowledge of the chemistries

being investigated can result in a complete assignment of the fragments to a

proposed structure. Where the software assignment algorithms can provide

major benefit is in the assignment of the majority of spectral peaks when
predicted using the coded rule sets, thus significantly reducing the amount of

time that it takes to perform a series of spectral assignments. Often the

suggestion of a potential fragmentation process using the rules based approach

can act as a source of inspiration when trying to assign compounds that

fragment through more esoteric and undefined routes.

       Where structural elucidation uses an underlying knowledge of the

samples and chemistries, fragmentation analysis of the parent drug substance

provides clear indications for structural modification within the structure as

discussed earlier. In cases where a number of potential changes have to be

considered, it maybe necessary that a series of possible structures need to be

validated against the spectrum. This may be achieved in a couple of ways using

third party tools, where a combination of rules based fragmentation is coupled

with a manual review of the results and where appropriate unpredicted

fragmentation routes maybe added manually , Figure 8.This capability is

presently delivered by third party software tools [19]. In this example, following

the import of a mass spectrum, a chemical structure is attached using the

molecular structure editor integrated into the program. The lasso tool is used to

encircle a particular fragment, and if a spectral ion corresponding to the mass of

the selected structural fragment exists in the spectrum, the fragment is

highlighted and the assignment is added to the fragment assignment table. In

this way, an entire mass spectrum can be assigned and examined for

consistency with the hypothetical structure. If there is a mixture of components in
a single spectrum due to co-elution, then each component can be individually

assigned.

Structure as a Means of Communication

       As a universal language of chemists, structure represents a clear and

concise way to communicate chemistries that form the nucleus of research

efforts. Whilst the need to elucidate a final and complete structure is the

objective for any spectrometrists, in mass spectrometry, it is commonly the case

that we are unable to arrive at a finalized structure. In addition, during the

process of structural elucidation, there may be a number of iterative versions of

what the structure may be before arriving at a finalized version. In these cases

the ability to represent structure in some incomplete format, such as a Markush

representation, provides a way of creating and storing a “work-in-progress”

structure, Figure 9. In this example the position of the chloro group can be

intuitively defined as 2,3,4,5 and 6 on the phenyl ring. Whilst this representation

has significant benefits for those cases where all remaining positions in a phenyl

ring are possible points of attachment, in the case where the structure is

represented with the chloro group in the meta and ortho positions the above

shorthand notation clearly has limitations. There have been extensions to the

notation of “generic” chemical structures over the years, including but not limited

to the usage of graphical overlay elements such as boxes etc, Figure 10 [20,21],

and polymer like brackets Figure 11. Whilst these representations of structure

do have value as a means of visualization within reports they do not convey any
chemical knowledge that can be transformed into extractable programmatical

elements that can be used in software platforms. Whilst the needs of FDA

regulation 21 CFR Part 11 [22] are not generally applied in the drug discovery

phase of drug development, for example in metabolism identification, these

regulations have in reality set a precedent for the storage of electronic records

where feasible, especially in the latest modification to the FDA 21 CFR Part 11

regulations [23]. Whilst the implementation of 21CFR Part 11 in the early phases

of drug discovery and development of metabolites, impurities and degradants

can be highly contentious, the need to communicate information in a variety of

electronic formats is very much becoming a requirement across all of drug

discovery and development groups within the pharmaceutical industry.

Moreover, the reporting, storage, searching and distribution of electronic

information including structures throughout all industries are becoming more

commonplace. Therefore, any representations of incomplete structure that are

ambiguous create opportunities for miscommunication, resulting in time and thus

financial implications for the industry.

       Metabolism groups in a number of the major pharmaceutical companies

have been highly instrumental in encouraging the development of more

advanced representations of Markush structure representations which are

designed to more clearly show the sites of attachment of a particular

substituent(s). One such representation, in the form of a shaded Markush from

Advanced Chemistry Development Inc., denotes the points of attachment(s)

using user definable color shading as shown in Figure 12. This approach has
been extended to more complex structures where the positions of attachment

are discontinuous as depicted in Figure 13. These visual representations are

also understood programmatically as atom-to-atom mappings allowing the

structures to be searched electronically and thus enabled as part of a larger

structurally enabled analytical data management system.




Linking structures with analytical data

      Attaching a structure or a number of structures to an elucidated spectrum

or chromatogram represents a concise way of reporting our findings as analysts.

It is often the case that we cut and paste structures onto our data either in some

document editing system or potentially in a package design for spectral

processing and reporting. Moreover, the attachment of structure to the analytical

data, with subsequent database storage or archival of the elucidated data can

act as an important knowledge system. Searches of meta-data, structures or

data related features, when coupled together, allow the extraction of compounds

and data that are able to provide key insights for current development needs. In

this case an analytical data archive does not describe an archive of raw data

files instead it represents a repository of knowledge extracted from and

associated with the data. A data archive generally describes a repository of raw

data which are originally captured at the instrument, collected and deposited into

the archive without further analysis. Vendors of such file based archive systems
include for example NuGenesis Technologies [24]. A further discussion of

databasing is covered later within this Chapter.

      It is generally easier for a chemist or spectrometrist to remember and

draw structures from memory than it is to remember a series of spectral masses

or analytically determined parameters. Thus, the physical attachment of a

structure or series of structures to the data and their subsequent storage in an

appropriate database, as shown in Figure 14, represents a primary link between

what a chemist is able to remember and an ability to extract that information

quickly and easily from the database using, for example, a chemical structure

search. This allows data to be located quickly and effectively from amongst huge

volumes of data which are created annually within our organizations.




Structure Based Searching

      The representation of chemical structures and their attachment to

analytical data in an electronic format is only made useful when linked with

appropriate search engine capabilities. As discussed earlier in the chapter,

structures range from complete structures to Markush representations and to

fragmental structure information. Additionally, stereochemistry and tautomerism

can all affect the performance of structure searching [25]. To date structure

search engines are usually able to search using full structure, similar structures

and substructure components. In the case of the Markush structural

representation discussed earlier the search engine has to be able to allow for
Markush structure searching in a variety of ways over and above the standard

structure search capabilities. For example, if a Markush structure is the starting

point for a search then the search engine should be able to return hits containing

completely defined structures that contain modifications that are within the

region incorporated by the Markush inclusion positions, Figure 15. A similar

search performed using a substructural search of the same database returns as

expected a much greater number of structures as indicated in Figure 16. In

addition, if a search is made with a completely defined structure, then it should

be possible to return structures which are represented as Markush

representations containing the functional modifications contained within the

search structure. These capabilities are a part of the ACD/Labs analytical data

management system, ADMS, software suite which includes as a component the

support of MS data processing and database management.




Databasing and Analytical Data Management

       An alternative approach to aid in the identification of an unknown is to

perform a spectrum or subspectrum search against a database of known

structures and associated spectra. A simple search based on just a few peaks

from the mass spectrum is possible. McLafferty developed two search

techniques, based on the probability of certain ions (PBM), as well as a

technique based on a collection of chemical fragments associated with certain

fragmentation patterns [26].
Over the years collections of mass spectra have been collected by

different groups. The National Institutes of Health (NIH) and Environmental

Protection Agency (EPA) standardized the data collection and analysis of the

data to ensure a high quality aggregation of tens of thousands of spectra. In

addition, Stenhagen, Abrahamsson, and McLafferty collected thousands of mass

spectra to form one of the standard MS electron ionization (EI) reference

databases available today. The standard computer readable collections are

those of the US Government, distributed by NIST and the McLafferty collection

[27].

        The categorization of processed information into databases is a powerful

approach for leveraging the advantages of high throughput analysis schemes.

The implementation of an electronic database storage system represents a

significant change from the way in which many organizations have historically

approached analytical data management. The transition to an electronic storage

system typically requires changes to business practices, requiring some level of

change management for the most effective conversion and implementation

strategies.

        Additionally the consistency of storage for data, structures and associated

alpha-numeric meta-data all represent important aspects that should be

considered during the implementation process. In structural terms, isomers, salt

structures, and tautomers all represent different structural forms that can have

an effect on the route that is adopted for searching. Textual based information

including naming conventions for compounds, for example metabolite labels, can
be entered in many different ways. If the values are not entered into the

database tables in the same data fields then this will limit the effectiveness of the

implementation. In operation simple business rules and practices are able to

alleviate this potential shortcoming.

       A database represents an easily accessible knowledge management

system containing all structural elucidations that have occurred during the

elucidation process and as a storage container for the wide array of textual and

numeric information that supports our analytical studies. Access to records

within the database(s) when enabled through structure similarity, structure and

substructure searches, user field searches, spectrum and subspectrum searches

allow flexible access to the stored information. For example, the identification of

a metabolite structure may require only a retention time and molecular weight

information via LC-MS analysis when compared to the metabolite structure

database compiled from previous studies [28]. A further benefit of databases is

the efficient extraction of information. Databases may be “mined” to detect

trends that may not otherwise be noticed. For example approach can be used to

reveal trends such as the metabolically active sites of a molecule and/or

substructures labile to degradative conditions. The extension of databases to

include a much wider array of data and information over and above the spectrum

allows searches to be done using a wider array of parameters. This method

provides an efficient mechanism to reduce the number of false positives. The

increasing adoption of high mass accuracy instrumentation, represents an
exciting addition to the information content that can be stored within proprietary

and commercial databases.

       Once created, a database may be transferred to other laboratories and

facilities that are participating in a particular research activity. The resulting

databases can be distributed via standard server technologies or “web-enabled”

and made accessible via corporate intranets or public internets. Information is

coordinated within the database, and different scientists are able to effectively

pool and merge their information. When implemented early within the product

development cycle, valuable information for later stages in drug development

can be made available [29]. Therefore, this approach provides a comprehensive

method for information gathering whereby future projects are planned,

coordinated, and efficiently supported. In most cases, the information gathering

process is targeted towards the creation of either a single compound report,

some larger series of cross study reports or, in the case of a regulatory

submission, the creation of a compound dossier.

       It is worth noting that database creation, modification, and use; benefits

greatly from a standard, systematic method. This approach produces reliable

datasets that lend themselves to a highly consistent database format throughout

a project lifetime. While spectral databases can be purchased these are

generally limited to nominal mass EI data. Since library searching techniques

are limited by the size and nature of the library, relative to the particular problem

of the chemist the creation of user databases are of high value to any

corporation. With today’s technologies allowing the generation of low energy
ionization techniques and accurate mass data, proprietary databases can

certainly be of significantly higher value than commercial databases as they

represent a focused repository of chemistries appropriate to the organization.

The content contained within proprietary databases typically exceeds that

contained within commercial databases, which dramatically increases their value

to an organization. The searches of such databases can be defined according to

a series of options and multiple databases can be searched simultaneously. In

the case of the spectrum of Ovex, when this was searched against the NIST

replicates database, a similar spectrum for Ovex was returned with a similarity

index of 87.9% as shown in Figure 17

       Search efficiency is increased by imposing additional constraints. As an

example of a multi-step constrained search approach, a search of the NIST

database for a para-substituted benzene sulfonic acid fragment, as a starting

point, gives a total of almost 300 such spectra in the database. This subset of

spectra can then be searched according to variables such as molecular formula,

elemental composition based on elemental analysis, and substructural

components based on identified fragmentations (loss of Ph, CCl 3, C(CH3)3 and

so on).

       Often, when work is initiated on new project compounds, the use of a

complete spectral database is not possible (i.e. drug discovery). When

information is stored within a comparative database, compounds of interest can

be effectively searched and identified for use in early to late stages of

development [30]. Database capabilities also permit the use of substructure-
based searches to identify compounds within a specific dataset or library that

contains a distinct substructural entity [31].




Distribution of Spectrometry Data to Chemists

       New technology is delivered at almost every new analytical

instrumentation conference. Similar to standard computer platforms, the cost and

size of MS instrumentation with the same capability continues to drop resulting

in the proliferation of open-access MS labs supporting chemists in both single

and multiple synthesis environments. Typically, the resulting data is pre-

processed by the generating instrument and is provided to the chemist in a

hardcopy format or as a spectral image requiring a vendor-specific viewer.

       Both of these scenarios prohibit direct or limit interaction with the spectral

data. While in some cases this is preferable since the data is locked from further

manipulation, as is necessary in a regulated environment, in a research

environment such barriers may limit further analysis. The expense of installing a

copy of vendor software on the desktop of every non-specialist accessing the

MS instrument often renders this level of distribution and flexibility as

uneconomical. Additionally the overhead in training and support needs for such

large distributions, especially in instrumentally heterogeneous environments,

may act as additional limiting factors. In general, such an approach may be

overkill as most chemists simply want access to the final spectrum and or
confirmation that the correct product was synthesized. In most cases a simple

determination of molecular weight may be sufficient for such needs.

       Traditionally, vendor software provides sophisticated data reduction tools

but limited chemical structure association and reporting capability. An alternative

resolution to this problem is the installation of a third-party structure enabled

desktop processing solution for accessing the data directly over a computer

network, allowing the chemist to further manipulate the data and store the

resulting spectra in a database for further reference. Such an approach offers

additional capability since it is common for a facility to utilize a heterogeneous

mix of hardware platforms whereby spectra are generated. With the capability to

read multiple file formats in their raw binary format, the costs of operation and

the efforts to generate data portability may be significantly reduced.




Integrated Spectroscopic and Chemical Structure

Databasing

       Integration strategies often encompass separate events involving

instrumentation, methodology, and process. Conventional methods of analysis

involve multiple steps. For example, the identification of natural products

traditionally involves the scale-up of fermentation broths, solvent extraction,

liquid/liquid or column fractionation, chromatographic fraction collection, and

spectroscopic analysis of the individual components. The integration of these

bench-scale steps into dedicated systems provides unique and powerful
advantages for on-line, and perhaps, real-time analysis [31]. Arguably the most

significant bottleneck that exists in industry today is the ability to integrate these

traditional analysis steps with MS processing and analysis.

       Discovery chemists and the research and development environments

focus a lot of effort into the resolution of components with direct attention paid to

the actual chemical structures. As a result, for spectroscopic techniques such as

NMR, MS and IR, it is not uncommon to find filing cabinets full of spectra,

relevant scientific literature, and associated information, generally linked to the

chemical structures that gave rise to the spectra. Even though electronic

libraries of chemical structures and MS spectra exist, these libraries are usually

limited to EI data as discussed previously. It is possible to search experimental

MS data against these libraries with the intention to aid in the identification of

possible unknowns. These libraries are, however, not structure or substructure

searchable. The requirement for the electronic management of experimental

spectra with associated chemical structures is an obvious requirement. There

are two general forms to such databases. For spectral-centric solutions the

primary focus of the software is the desktop processing of spectroscopic data,

followed by the concomitant association with chemical structures. Commonly, a

particular facility has access to a structure databasing system from one of the

multiple vendors providing this type of solution. These structure databasing

systems provide a structure-centric solution whereby spectral records are

attached to the structure records in the database for viewing and further

processing.
Spectrometrists and chromatographers utilize a variety of technologies to

both separate and identify chemical structures. It is common in today’s analytical

environment to find teams assembled with skillsets to generate both optimal

separation and analysis solutions. Spectrometrists assign their spectra in

relation to chemical structures using parent ion mass or fragment ion mass

analysis in MS, nucleus–to-peak assignments in NMR and vibrational band

association with IR peaks, for example. Spectrometrists have used the standard

filing system of drawers full of spectra with an association of the file number with

some textual identifier in order to locate the detailed knowledge extracted from

the spectra at a later date. The general level of spectral management has been

limited to hand written notes in notebooks or sometimes text-searchable

databases pointing to associated spectra.

       As explained earlier, tools are now available to allow spectra to be

databased in electronic format with associated chemical structures [1a]. In this

manner, the mass spectrometrist now has the opportunity to search the

database for related structures or substructures, or spectral features when

performing fresh analyses. When integrated with other spectral data the result is

a legacy database of multiple spectroscopy data, thereby building a foundation

for future analyses. The value residing in such tools is the time savings that

result for the analysis of related chemicals and the exchange of information

between different analytical laboratories within the same company. In theory,

such an approach should not be isolated to spectrometry; for chromatography,
tools now exist to allow the similar integration of chromatographic peaks and

chemical structures.

         Resulting spectra with associated chemical structure(s) carry valuable

information for future analyses. Such resulting files can be stored on a

centralized server and thus become a powerful means for dissemination of the

mass spectrum-structure connectivity and fragment assignment information. This

general approach can be expanded to a World Wide Web intranet approach

whereby the spectra are posted as individual HTML pages with hyperlinked MS

files.

         Software solutions available today allow each spectrum to be databased

with associated chemical structures, thereby offering significantly enhanced

capabilities over the common file systems used today in many laboratories. Due

to recent advances in database technology there is enhanced searching

capability over the standard filing cabinet system or a text-based databasing

system. It is possible to search the resulting databases by structure,

substructure, formula, molecular weight, chromatographic and spectroscopic

parameters or user data. User data includes the creation of user-definable

database fields with particular field labels including, for example, submitter,

project name and type of analysis, all of which become searchable fields.

Multiple databases can be searched at one time, thereby allowing different

databases to be constructed according to analysis type, project name, individual

user and so on. These multiple databases can also be distributed across

different departments, divisions or even an entire corporation, simply by using
the ability to point to databases located on mapped network drives. Corporate-

wide database capability engenders concern about the integrity of the

databases. This can be addressed by standard database security features.

Other than the spectrum parameters, the association of individual searchable

user data fields is invaluable, thereby allowing each spectrum in the database to

be associated with a project, a customer, an analyst or any other appropriate

information.

       The value of the approach outlined here should be obvious as the ability

to integrate structural information with spectra into a database offers exciting

benefits to the spectrometrist and is an ideal solution for an environment where

multiple spectrometrists need to quickly determine assignments and identify

specific chemical structure classes. The additional benefit of this tool is that it

may also be fully integrated with similar toolsets allowing similar structure-

spectrum management for NMR, MS, UV-Vis, IR and Raman.




Conclusions and Future Prospects

Computer software technologies for the processing and analysis of MS data and

the management of the resulting knowledge are quickly emerging. While it is

almost impossible to define the long term future of MS data processing and

analysis, it certain that MS systems will continue to become smaller, easier to

use, offering greater levels of automation and on-the-fly decision making. It is

certainly likely that an increasing amount of data will be acquired with even
higher mass resolution and higher mass accuracies and hence the tools

necessary to manage this data will need to be further developed. The synergistic

coupling of high mass accuracy MS data, MS fragment analysis, integrations

with other forms of spectroscopy, for example LC-NMR-MS, will provide still

further levels of structural detail. The tools which will be delivered in the future

will have to include additional developments in the area of highly automated

processing of thousands of datasets, advances in MS fragmentation and tools

for the creation and searching of accurate mass spectral databases. Such an

approach, when further integrated with spectral processing and databasing for

other techniques (NMR, IR, UV-Vis etc.) will provide a unifying tool for

spectroscopy management.

       With further research into statistical and chemometric methods it is hoped

that further techniques will be developed for mass spectral identification.

However, MS, in any of the separate ionization techniques (El, CI, Electrospray

(ESI), Atmospheric Pressure Chemical Ionization (APCI) and so forth), has

inherent limitations. Only in the presence of additional techniques, such as IR

and NMR, will structure elucidation and verification be more rigorous when

identifying the structure of unknown chemicals.
Figure 1: A multi-spectroscopic display of Alizarin. UV (top), IR (middle) and MS

(bottom) contained within a single display window. This ability allows unified

desktop viewing of data.
Figure 2: Example showing the reduction in chemical and electronic noise using

chemometric algorithms (CODA used for this example). Notice the high levels of

noise and background in the total ion chromatogram and the low relative

intensity of the chromatographically relevant peaks m/z 739 and 1460 mass

regions (Upper panel). The mass spectrum in the top window is for the scan at a

retention time of 17.8 minutes. Notice the low molecular weight components

around m/z 214. Notice the removal of the gradient background after application

of the CODA chemometrics algorithm and the significant decrease in noise level

in the Total Ion Chromatogram (Lower panel). The interface shown is for

ACD/MS Manager.
Figure 3: Following the process of COMPARELCMS, mass chromatograms that

are determined to be unique when a control sample is compared with a

metabolized sample are retained for further review.
Figure 4: An Isotope Pattern Calculator showing Nominal, Average and

Monoisotopic masses
Figure 5: The EI MS spectrum of ethyl pirimiphos [O-[2-(diethylamino)-6-

methylpyrimidin-4-yl] O,O-diethyl thiophosphate] showing two potential fragment

assignments for m/z 152.086.


 ROUTE 1                       ROUTE 2
Figure 6: A series of proposed fragment structures with nominal mass m/z 152

for the fragmentation of ethyl pirimiphos. (a) The structure on the left

corresponds to an accurate mass of m/z 152.006 which has a delta mass of 80

milli Da from the experimental data Figure 5 Route 1 and (b) The right hand

structure with mass m/z 152.082 corresponds to a delta mass of 3 milli Da from

the experimental data Figure 5 Route 2. (Display extracted from MS Fragmenter

Advanced Chemistry Development Inc.)
Figure 7: The nominal electron ionization (EI) mass spectrum and structure for
                                                      37
temazepam. Note the significant contribution of the    Cl isotope ion to the

spectrum especially at m/z 273, the primary fragment ion which can have a

significant impact on the fragment assignment of the spectrum.
Figure 8: The assignment of the N-Oxide buspirone MS/MS spectrum using the

“lasso tool” (Left inset box “Stage 1 Lasso structure). The fragment table lists

assigned fragments. Moving the mouse cursor over the table highlights the

assigned molecular fragment on both the spectrum and the structure.




        Stage 1 – Lasso                            Result = fragment
           structure                                   selected
Figure 9: Chloroaniline without the position of the chloro group being specified




                                  NH 2




                                         Cl
Figure 10: Incomplete structure representation using graphical elements such as

shaded boxes




                              O



                              N                                 O   CH 3

                                                          N

                              O              N       N              OH

                                                          N

                              OH
Figure 11: Incomplete structure representation using Polymer brackets




                               O



                               N                                        O   CH 3

                                                                  N

                               O                  N        N                OH

                                                                  N
                               OH
Figure 12: Suggested hydroxylation of a buspirone metabolite represented using

the shaded Markush structural representation.




                           O



        HO
                           N                                   O   CH 3

                                                         N

                           O               N       N               OH

                                                         N
Figure 13: Representation of a Markush structure for a discontinuous series of

attachment positions

                O
   OH



                N                                         O    CH 3

                                                   N

                O                 N         N                  OH

                                                   N
Figure 14: Structure attachment of Theophylline to its associated EI Spectrum.

Note that the structure is understood at a programmatical level and thus can be

utilized directly in structurally enabled search engines.
Figure 15: The results of a complete structure search of the NIST98 [27]

database of a Markush structure where the position of the hydroxylation and

chlorination are defined within any of the possible ring positions.




               OH


          Cl
Figure 16: The results of a substructure search of the NIST98 [27] database of a

Markush structure where the position of the hydroxylation and chlorination are

defined within any of the possible ring positions. Note that 945 possible

structural combinations are returned.




                OH


           Cl
Figure 17: The most similar match (87.9% match factor – see bottom middle) for

the spectral search displays the spectrum of Ovex, from the catalogue of mass

spectra of pesticides from within the NIST replicates database. The structure of

Ovex is consistent with the suggested structure.
REFERENCES
1 Third party vendors providing software solutions for integrated spectroscopy

processing include a) Advanced Chemistry Development Inc., www.acdlabs.com

and b) Thermo,

http://www.thermo.com/eThermo/CDA/Products/Product_Detail/1,1075,22304-

134-X-1-1,00.html



2 Lee, M.S., Kerns, E.H. LC/MS Applications in Drug Development. Mass

Spectrom. Rev. 1999, 18, 187-279



3 Waters Corporation, MS Technologies Centre (Micromass UK Ltd.), Atlas Park

Simonsway, Manchester, M22 5PP, United Kingdom.

4 Agilent Technologies, 5301 Stephens Creek Boulevard, Santa Clara, CA,

95051, USA

5 OpenLynx Global Server™ is a registered trademark of Waters Corporation,

MS Technologies Centre (Micromass UK Ltd.), Atlas Park Simonsway,

Manchester, M22 5PP, United Kingdom.

6 J. E. Biller and K. Biemann, "Reconstructed Mass Spectra, A Novel Approach

For The Utilization Of Gas Chromatograph - Mass Spectrometer Data", Anal.

Letters, 1974, 7 (7), 515-528.
7 Windig, W., Payne, A., Nichols, W., A Noise and Background Reduction

Method for Component Detection in Liquid Chromatography/Mass Spectrometry,

Anal. Chem., 1996, 68, 3602-3606.

8 Comparelcms ref



9 Harland, G, Castro Perez, J., Pugh, J., Leandersson, C., Thompson, R., High

Mass Accuracy Measurements in W-optics using an Orthogonal Hybrid

Quadrupole Time Of Flight Mass Spectrometer for In-Vitro Metabolism Studies.

51st ASMS, Montreal, 2003, TPO 274.

10 Lee, M.S., Yost, R.A., Perchalski, R.J. Tandem Mass Spectrometry and Drug

Metabolism. Annu Rep Med Chem 1986, 21, 313-321.

11 Volk, K.J., Hill, S.E., Kerns, E.H., Lee, M.S. Profiling Degradants of Paclitaxel

Using Liquid Chromatography-Mass Spectrometry and Liquid Chromatography-

Tandem Mass Spectrometry Substructural Techniques. J. Chromatogr. B

Biomed. Sci. 1997, 696, 99-115.

12 Blinov K. A., Carlson D., Elyashberg M.E., Martin G.E.,

Martirosian E.R., Molodtsov, S., Williams, A.J. Computer-assisted structure

elucidation of natural products with limited 2D NMR data: application of

the StrucEluc system., Magn. Reson. Chem. 2003, 41, 359–372

13 Blinov K., Elyashberg M., Martirosian, E.R., Molodtsov, S.G., Williams A.J.,

Tackie, A.N., Maged, M., Sharaf, M.H., Schiff P.L., Crouch, R.C. Jr., Martin G.E.,

Hadden C.E., Guido, J.E., Mills, K.A., Quindolinocryptotackieine: The
Elucidation of a Novel Indoloquinoline Alkaloid Structure through the use of

Computer-Assisted Structure Elucidation and 2D-NMR, In Press.

14 Bristow, A.W.T., Nichols, W.F., Webb, K.S., Conway, B, "The evaluation of

the utility of electrospray in-source collisionally induced dissociation (in-source-

CID) spectral libraries", Rapid Communications in Mass Spectrometry, (2002),

16, 2374 - 2386

15 Lee, M.S., Klohr, S.E., Kerns, E.H., Volk, K.J., Leet, J.E., Schroeder, D.R.,

Rosenberg, I.E. The Coordinated Use of Tandem Mass Spectrometry and High

Resolution Mass Spectrometry for the Structure Elucidation of the Kedarcidin

Chromophore. J. Mass Spectrom. 1996, 31, 1253-1260.

16 Roepstorff, P., Fohlman, J. Proposal for a Common Nomenclature for

Sequence Ions in Mass Spectra of Peptides. Biomed. Mass Spectrom. 1984, 11,

601-602.

17 Owens K.G. Application of correlation analytical techniques to mass spectral

data. Applied Spectroscopy Reviews, 1992, 27, 1-49.

18 Gundersdorf, R.W., Fernandez-Metzler, C.L., King, R. C., Overcoming SRM

Blindness with the Linear Ion Trap., 51 st ASMS, Montreal, 2003, WPH 146.

19 Advanced Chemistry Development Inc., Suite 600, 90 Adelaide Street West,

Toronto, ON, M5H 3V9, Canada.

20 Mike S. Lee, Wiley, 2002, LC/MS Applications in Drug Development, ISBN

0-471-40520-5.
21 Lam W., Ramanathan R., In Electrospray Ionization Source

Hydrogen/Deuterium Exchange LC-MS and LC-MS-MS for Characterization of

Metabolites., J. Am. Soc. Mass Spectrom., 13, 345 – 353, 2002.

22 21 CFR Part 11 Regulations,

www.fda.gov/ora/compliance_ref/part11/frs/background/11cfr-fr.htm

23 Guidance for Industry Part 11, Electronic Records; Electronic Signatures –

Scope and Application (Draft), February 2003,

www.fda.gov/cder/guidance/index.htm

24 NuGenesis Technologies Corporation, 1900 West Park Drive, Westborough,

MA, 01581, United States

25 Trepalin, S. V., Skorenko, A. V., Balakin K. V., Nasonov, A.F., Lang, S.A.,

Ivashchenko, A. A., Savchuk, N. P., Advanced Exact Structure Seaching in

Large Databases of Chemical Compounds., J. Chem. Inf. Comput. Sci., 2003,

43, 852 – 860.

26 Pesyna G.M., Venkataraghavan R., Dayringer H.E. & McLafferty F.W.

Probability Based Matching System Using a Large Collection of Reference Mass

Spectra. Anal Chem., 1976, 48(9), 1362-1368.

27 The US Government MS database is available from NIST, Office of Standard

Reference Data, Washington DC, 20234. The McLafferty database is available

from John Wiley & Sons, Electronic Publishing Division, 605 Third Avenue, New

York, New York 10158.
28 Kerns, E.H., Rourick, R.A., Volk, K.J., Lee, M.S. Buspirone Metabolite

Structure Profile Using a Standard Liquid Chromatographic-Mass Spectrometric

Protocol. J. Chromatogr. B 1997, 698,133-145.

29 Kerns, E.H., Volk, K.J., Hill, S.E., Lee, M.S. Profiling Taxanes in Taxus

Extracts Using LC/MS and LC/MS/MS Techniques. J. Nat. Prod. 1994, 57, 1391-

1403.

30 Kerns, E.H., Volk, K.J., Hill, S.E., Lee, M.S. Profiling New Taxanes Using

LC/MS and LC/MS/MS Substructural Analysis Techniques. Rapid Commun.

Mass Spectrom. 1995, 9, 1539-1545.

31 Lee, M.S., Kerns, E.H., Hail, M.E., Liu, J., Volk, K.J. Recent Applications of

LC-MS Techniques for the Structure Identification of Drug Metabolites and

Related Compounds. LC-GC, 1997, 15, 542-558.

Mais conteúdo relacionado

Último

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Destaque

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destaque (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Applications of Computer Software for the Interpretation and Management of Mass Spectrometry Data in Pharmaceutical Science

  • 1. Applications of Computer Software for the Interpretation and Management of Mass Spectrometry Data in Pharmaceutical Science Mark Bayliss and Antony Williams, Advanced Chemistry Development, 90 Adelaide Street West, Suite 702, Toronto, ON, M5H 3V9, Canada
  • 2. Abstract Within the last decade there has been a rapid growth in the adoption of Mass Spectrometry (MS) as a routine and facile technique not just by a group of expert level mass spectrometrists, but by a much more diverse group of non-MS related disciplines. This shift continues to be fueled by a number of factors, which can be broadly segregated into, instrumental technologies, the derived high value of the technique, the cost per sample, the derived information content, ease of use and software. Advances in sensitivity, ruggedness, reliability, ease of integration with High Performance Liquid Chromatography (HPLC), Gas Chromatography (GC) and other separation techniques and the general ease of operation of MS instrumentation can all be considered as enabling. Ultimately, the strongest driver for the wide adoption of MS has been driven by the clear value that the technique brings to so many different businesses in terms of both sample throughput and information content per sample. This expansion in the ability to create data both in terms of volume and in data density per dataset can be correlated directly with a backlog in the ability to extract, process, store and report, and thereby create the resulting high information and knowledge content which is sought. Data that are generated by the instruments in their various guises are simply binary bits and bytes and information has to be extracted via a process of conversion of data to information and knowledge. Software therefore
  • 3. becomes an integral, critical and enabling part of the cycle of information creation in support of compound development and chemical analysis. Additional business drivers include the need to reduce development timelines, a greater understanding of the chemical significance of a particular development compound and return on investment. All these factors result in a tremendous business effort focused around streamlined approaches that provide scientists, managers, and executives the capability to readily obtain, or even request, the necessary information. Due to the heterogeneous instrumentation environment and resulting distribution of data formats, it is challenging to bring together a single universally applied interface for the data. Data in this sense refers to the different spectroscopies and other analytical techniques that are commonly used in support of chemical analysis. The ability to read in raw vendor formats and allow integrated data-handling has been severely lacking. Efforts have been made to define common exchange data formats such as JCAMP and NetCDF and current efforts using XML which are being driven by the ASTM E13 committee. Third party vendors [1] have also assumed the task of becoming the neutral party to unify data handling and management. Such third party offerings have become a crucial component in the effort to build a single corporate spectroscopic database supporting all instrumentation, not limited to MS but inclusive of NMR, IR, UV-Vis, Raman and HPLC as shown in Figure 1.
  • 4. In this chapter we intend to present, review and discuss some of the non- instrument related software systems that exist for qualitative data extraction and structural elucidation. During this discussion we will examine the representation of molecular structures associated with analytical data and the support systems that are able to store, retrieve and report this information. It is not our intention to review the archival systems that exist for the long term storage of the physical datafiles and other associated electronic records. As part of this review we will include a survey of the creation of commercial and laboratory specific reference databases and associated searching algorithms. We will also discuss recent efforts to introduce advanced processing and analysis algorithms to the hands of the masses, specifically as an aid to data extraction and structure elucidation. Broadly speaking, we can separate the points of discussion into tools for data extraction, elucidation, storage, retrieval, reporting and information distribution. Nine strategies consistently appear in MS-based methods for accelerated development and have been discussed in detail by Lee [2]. The strategies are standard methods, template structure identification, databases, screening, integration, miniaturization, parallel processing, visualization and automation. These strategies serve to define the attributes of the analytical methods being applied. High-throughput sample-generating technologies such as biomolecular screening and combinatorial chemistry can create many thousands of samples, each requiring the application of one or more forms of analytical chemistry. Nowadays, the ability to devise, construct, and refine sample-analysis methods, either chromatographic or spectroscopic, has become as equally important as
  • 5. the hardware itself. Today, the need to integrate appropriate method development strategies with MS processing capabilities is a critical factor in the modern industrial laboratory. In chemical and pharmaceutical companies around the world, the necessity to acquire and analyze analytical data for the abundance of samples is a critical business requirement. As a result the availability of open-access laboratories containing highly roboticized instrumentation such as OpenLynx from Waters Corporation – formerly Micromass Ltd [3], the 1100 Series High Throughput LC/MS System from Agilent [4] and others are now commonplace. The careers of professional spectrometrists are now largely focused on the implementation of optimal techniques to support the users of these laboratories rather than the standard sample analysis of yesteryear. Decreasing costs and reduced footprints for the instrumentation, as well as more intuitive software interfaces for non-specialists and globalization of software platforms such as Waters Micromass OpenLynx Global Server™[5], allows the use of spectroscopic and chromatographic techniques in an open-access laboratory environment across organizations. Commonly, these laboratories are also likely to provide NMR, MS, IR, UV-Vis and chromatographic instrumentation. As a result of these laboratories both standard and hyphenated MS-based techniques have entered the hands of the masses. It is clear that distinct differences still exist between the applications of mass spectrometry made available to non- specialists and those performed by the specialist.
  • 6. In general, non-specialists are adopting MS instrumentation that predominantly generates molecular ion only MS with little or no fragmentation. This is clearly revealed during visits to any of the number of laboratories that now offer Open Access technologies that enable a chemist with no prior MS knowledge or experience to submit samples for analysis in a totally automated manner. In a small number of cases, this has been extended to the inclusion of MS/MS fragmentation though this appears not to be the norm at this time. Another example of this appears in applications that deal with combinatorial plate analysis, for example, the data generated includes a full high performance liquid chromatography-MS (LC-MS) run. The ionizing technique is “soft” and produces for each well in a plate both the parent ion and one or more chromatographic traces [Total Ion Current (TIC), Extracted Ion Current (XIC), Diode Array, Chemiluminesence Nitrogen Detector (CLND), Evaporative Light Scattering Detection (ELSD) and others] to aid in the assay of materials in the sample. Meanwhile, the traditional spectrometrist is generally more focused on non-routine analyses which require greater levels of custom method development, structural elucidation, and studies requiring the usage of accurate mass LC/MS and LC/MS/MS. Whether the application provides data for synthetic chemists or expert spectrometrists, computer software is an essential factor in a successful analysis. Whether it is the application of advanced chemometric algorithms for noise-reduction, the association of structural fragments with mass spectral features, or the management and databasing of the derived information,
  • 7. computer software applications additional to those required for operation of the instrument are a necessary and integral part of the analytical information repertoire that exists for scientists industry wide. Extraction of data Prior to any structural elucidation, the need for data extraction is of paramount importance. The simplest form of extraction may merely be a case of selecting a peak of interest in the LC/MS or GC/MS TIC and obtaining a spectrum for that peak. The inclusion of background subtraction further improves the spectral quality with the removal of solvent and contributions from any background ions – thus making the identification of the molecular weight or spectrally related ions clearer. The automation of background subtraction and generation of a “cleaned” spectrum is very much the mainstay of all data processing systems that exist in the marketplace. Of course this method precludes that the elution times of the peaks are either known or that the peaks in the TIC are clearly visible. In the case of Open Access or combinatorial studies, it can be common practice to use the additional analog detectors, UV, ELSD or CLND detectors to define the retention time of the eluting peaks which can then be used to obtain a combined and background subtracted MS spectrum. This technique certainly adds value when the analysis is not sample limited, and a strong peak exists in the analog detector(s) which can be used to direct the extraction of the MS spectrum. Variability in detection between one or
  • 8. more detectors, as in a lack of chromophore for example does lead to a lack of detector response. The use of more than one analog detector does help to minimize this impact. In many cases, where the focus of the MS is in the extraction of low intensity components such as impurity analysis and metabolite determinations, the presence of the chromatographic peak of interest may be obscured by the presence of high background levels resulting from solvents, buffers and other none sample related background contamination ions. In addition, in these cases the concentration of the unknown peak(s) of interest in the sample may be so low that there may be no response on the UV or other analog detector that can define the position of the chromatographic peak. It is often the case that the intensity of the contamination ions or those from the solvents and buffers far exceeds those arising from the sample related ions and thus extraction by retention time alone becomes less appealing. In the case of natural product analysis and metabolism studies, the chromatographic peaks of interest may be present with a multitude of other peaks that are related to the sample matrix and thus unwanted. This of course further increases the complexity of the extraction process. Differentiating sample related peaks from those resulting from the matrix often requires extensive knowledge of both the samples of interest and the matrix and thus these tasks are often performed by highly trained mass spectrometrists with a detailed understanding of the sample and its chemistry. Of course, if, for a particular sample, a significant knowledge base already exists, it is possible to use this knowledge as a template for data extraction. This is done
  • 9. by searching for masses within the dataset that differ by some delta mass (∆M) from the compound of interest, such as a parent drug compound or synthesis material. For example, it would be possible to extract mass chromatograms for the mono, di, tri… hydroxylated forms of a starting drug structure by extracting mass chromatograms for (Parent Mass + n[+16]) and then identifying the presence of chromatographic peaks within these extracted mass chromatograms. This method represents the route of choice for many of the software packages that exist for metabolite data extraction by many of the vendors and offers significant value in being able to extract only sample related events that exist within the datasets of interest. As an aid to data extraction, a number of chemometric algorithms have been developed over the years to assist in the extraction of sample related spectra and remove the interference of the background and matrix-based effects. These algorithms by their very nature do not use any knowledge base for the extraction process and can be beneficial in cases where there has been significant rearrangement in the integrity of the structure relative to the original parent structure. Examples of such algorithms include Biller-Biemann [6] and more recently CODA (COmponent Detection Algorithm) reported by Windig [7], both of which have been integrated into various MS processing software platforms over the years. Both of these algorithms are effective in removing the noise resulting from chemical background and electronic noise that exists within the data. This can be seen in Figure 2 where the trace at the top represents the original TIC and the trace at the bottom represents the TIC following the
  • 10. application of CODA. The output from the CODA approach can also be visualized in the form of individual mass chromatograms. As a generic technique CODA is most appropriate for the extraction of all peaks contained within the sample data file, for example an impurity analysis. In other cases, it maybe desirable to extract only the unique chromatographic peaks present in two or more data sets. Windig et. al. [8] also reports on the application of CODA to two or more datasets and the subsequent comparison of the output to determine only those components that are unique referred to as COMPARELCMS. Figure 3 represents such a comparison using the COMPARELCMS process, where the top trace represents a metabolized trace and the bottom one a control against which the metabolized sample is compared. As is clearly visible, the top trace contains a number of peaks that are unique and thus can be investigated further as potential metabolite or impurity candidates. As in the case of the visualization of the CODA output, COMPARELCMS can also be visualized as individually selectable mass chromatograms. Once extracted, the difference in mass from the starting parent compound can then be rationalized to either a simple modification of the original structure, or some other more complex structural rearrangement. The isolation of the MS chromatographic peak and its associated mass can be used in a number of ways, simply as an indicator of molecular weight, as a means of calculating the empirical formula or as a driver used in the generation of tandem MS/MS or MS(n) data either in an instrument driven MS to MS/MS or MS(n) switching protocol or via an MS1 targeted method.
  • 11. Structural Elucidation Using MS Data The elucidation of chemical structure(s) covers an extremely wide arena of processes. At its simplest level this may be the calculation of empirical formula using high mass accuracy, of an isotopically pure spectral peak. Whilst calculation of empirical formula does not preclude the use of high resolution MS, it remains a critical requirement in the determination of spectral peak purity. The necessity for high mass accuracy and high mass resolution may not be apparent at first glance. High mass accuracy is the ability to determine the value of the ionized mass to a significant number of decimal places as discussed below. High mass resolution is the ability of an MS instrument to separate two or more masses that have the same nominal value. It is also important to note that a high mass accuracy instrument is unable to separate isomeric forms of the same compound as the mass of each component is exactly the same. A spectrally pure peak is an absolute requirement to ensure the correct calculation of the center of gravity for the mass spectral peak under investigation and that it is not biased by the presence of some spectral peak with similar nominal mass. Such determinations of empirical formula thus require the calculation of molecular weight to at least 3 decimal places or better such that the number of permutations of carbon, hydrogen, nitrogen, oxygen and so on can be minimized, Figure 4. The usage of accurate mass determinations need not be confined to just MS1 or molecular ion peaks. Rather it has a much wider
  • 12. applicability when used in conjunction with tandem MS spectral peaks [9]. This has been found to assist greatly in the determination of structural fragments and is being widely applied in the study of metabolites, degradants, natural products and impurity elucidations. In the example of the fragmentation of the tri-ethyl pirimiphos, it is determined that two potential fragment routes give rise to the nominal mass m/z 152 Figure 5. In the first suggested fragmentation route, cleavage occurs at the oxygen in position number 11 attached to the phosphorus-sulfur moiety. The charge is retained on this portion of the molecule to result in a fragment ion with calculated accurate mass of m/z 152.006 Da, Figure 6a and resulting in a delta mass of 80 mDa from the experimentally recorded mass of m/z 152.086. When this is contrasted with the other fragmentation possibility, Figure 6b, a mass delta of 3 mDa is observed between the calculated fragment mass and the experimentally determined mass. In the example presented above, it is possible by adjusting the mass accuracy of the fragmentation assignment process to match that of the instrumentation being used to acquire the MS data thereby reduces the number of false positive fragment possibilities that have to be reviewed. In the pharmaceutical industry, much of the MS-based elucidation strategy is based on the premise that much of the parent drug structure will be retained in the metabolites, impurities, or degradants [10]. The resulting fragment ions associated with unique substructures of the parent compound are thus also retained. Thus, the unique fragment ions contained in either full scan
  • 13. or product ion mass spectra of the parent compound serve as the template for identification. The template structure identification strategy has been recently illustrated for the profiling of paclitaxel degradants [11]. MS vendors are astute at providing tools for data extraction, quantitation and compound suggestions, but these often do not include proposed chemical structures or fragments. The conversion of spectrum to structure in a de-novo sense, for example natural products, where no prior sample information exists, remains an extremely difficult process when MS is used in isolation. In the majority of cases the conversion of a spectrum to a structure even with all the advances that have been made in the technology, still requires some starting information about the sample that has to be used in conjunction with the mass spectral information. Confirmation of structure by the verification of key mass spectral ions present in the spectrum forms an extremely powerful technique for structural analysis around a scaffold of prior information of the sample. The complement of MS, NMR, other spectroscopies and anecdotal information has been proven to be necessary for de-novo structural elucidation[12,13]. In these cases MS provides accurate mass information and thus empirical formulae for the complete structure and key fragments which can be used during the elucidation process. Neutral loss analysis of the tandem MS and other fragmentation techniques provides indications for the presence of structural fragment information for example hydroxylation and phosphate moieties. Additionally, isotopic information especially in the cases of structures which are chlorinated, brominated, those containing sulfur and some transitional metal
  • 14. cations are highly characteristic and are thus diagnostic. The incorporation of NMR data [1H NMR, 13C NMR, 2D NMR data and other relevant techniques] allows complete atom-to-atom connectivity maps and thus a route to complete structural identification. These structural elucidations are still typically undertaken by expert level spectrometrists throughout the industry, however, such expert software systems as ACD/Structure Elucidator from Advanced Chemistry Development Inc., are now serving to dramatically reduce the time and complexity of this process. Where a significant body of knowledge exists for the structure being elucidated, for example in impurity analysis and metabolism studies, the difference in mass between the starting compound and the unknown significantly reduces the number of possibilities that have to evaluated. In most cases significant structural information is retained in the spectral information of the unknown and thus techniques such as spectral correlation, discussed later, offer advantages. In those cases where significant rearrangement or oxidative cleavage may have occurred, the remaining part of the structure may be significantly different from the parent drug. In these situations the fragment ions are often significantly different from those of the parent compound and thus spectral correlation approaches may not be as useful in the determination of structural changes. In practice these types of structural analysis challenges require evaluation by a spectrometrist and potentially other scientists with a detailed understanding of the chemistries and possible enzymatic pathways that are involved.
  • 15. The method by which a spectrum is obtained can have a significant effect on the way in which the structure can be elucidated. High energy ionization techniques such as EI typically result in spectra containing extensive fragmentation usually with little or no remaining molecular ion spectral information. Fortunately, standardized instrumental ionization acquisition conditions ensure that spectra are usually reproducible from instrument to instrument. These standardized methods of acquisition thus ensure that spectra can be easily stored in a spectral library and distributed to all groups who require search access. Spectral databases are discussed later in this chapter. Low energy ionization techniques such as electrospray and atmospheric pressure chemical ionization on the other hand typically generate protonated or deprotonated molecular ions with little or no fragmentation. Fragmentation can be induced in a number of ways including source induced fragmentation, fragmentation in a gas filled collision cell or via resonant fragmentation in ion traps. These low energy spectra, unlike EI spectra, are not acquired under fixed fragmentation conditions and as such the spectra can be very different. These differences are further exacerbated when instrument-to-instrument, vendor-to- vendor and MS instrument types are included in the variation matrix [14]. Whether the spectrum has been obtained as a MS1 full scan experiment or via a tandem MS/MS acquisition, structural assignment of the spectrum can still be possible. In the case of the assignment of a full scan MS1 trace such as EI GC/MS spectra it is important to note that the assignment of the spectrum will be dependent upon the isotope that is selected for the fragment assignment
  • 16. procedure. This is clearly identified in the fragmentation of Temazepam, Figure 7, in which the 37Cl contributes a significant amount to the ion intensity of 35 the fragment ions. Note that the spectrum in this case is assigned using the Cl isotope. It is usual however in the case of the majority of structural elucidations to isolate an individual isotope using the first stage mass filtering capabilities of the MS instrumentation before collisionally induced dissociation (CID) in a collision cell or ion trap. In this way the tandem MS spectrum is isotopically pure and thus the fragments in the spectrum can result from the assignment of the selected isotope. The use of high resolution, at the stage of isolation of the MS1 mass of interest, can provide an additional level of confidence ensuring that the tandem MS spectrum is isotopically pure. In those cases where low resolution MS1 ion isolation is coupled with high resolution ion detection, the presence of isobaric masses in the isolation MS1 spectrum can be detected and their presence taken into account and minimized during the elucidation phases. Detailed information is also obtained by the observation of sequential neutral losses to determine the sequence of substructures or “molecular connectivity” within the analyte [15]. This procedure is analogous to two- dimensional NMR techniques used to sequentially connect substructures. This approach has major benefits for those structural modifications whereby the majority of the structural integrity is maintained. Of course, a familiar example of molecular connectivity is the determination of the amino acid sequence of a peptide. Specific neutral losses are indicative of certain amino acids, and the sequence of these losses can be used to identify the peptide [16].
  • 17. Owens [17] reports a software based technique of spectral correlation or pattern matching of MS/MS spectra and the determination of a similarity index as a means of filtering out those tandem MS spectra which have low correlations with respect to the parent drug MS/MS spectrum and are thus defined as endogenous background peaks. Where a high similarity exists, this is indicative that there are spectral elements that show a high degree of correlation to the parent drug compound [18]. The subsequent auto-correlation between the assigned parent drug spectrum and the unknown spectrum can then influence the identification of the changes in the original parent drug structure and thus the determination of potential structural modifications. When linked with high mass accuracy data this technique may offer significant value in expediting the generation of metabolite or impurity structures. In the determination of chemical structure using either a manual approach or via some software driven method or a combination of the two techniques, the assignment of the spectral fragments remains a key part of the process. To date the spectral analysis software systems that exist in the industry allow assignment of the spectrum to a particular proposed structure using a rules based approach, as the autoassignment example, Figure 7, shows. As with all rules based approaches, it may not be possible to identify all spectral ions and thus the intervention of a spectrometrist with a detailed knowledge of the chemistries being investigated can result in a complete assignment of the fragments to a proposed structure. Where the software assignment algorithms can provide major benefit is in the assignment of the majority of spectral peaks when
  • 18. predicted using the coded rule sets, thus significantly reducing the amount of time that it takes to perform a series of spectral assignments. Often the suggestion of a potential fragmentation process using the rules based approach can act as a source of inspiration when trying to assign compounds that fragment through more esoteric and undefined routes. Where structural elucidation uses an underlying knowledge of the samples and chemistries, fragmentation analysis of the parent drug substance provides clear indications for structural modification within the structure as discussed earlier. In cases where a number of potential changes have to be considered, it maybe necessary that a series of possible structures need to be validated against the spectrum. This may be achieved in a couple of ways using third party tools, where a combination of rules based fragmentation is coupled with a manual review of the results and where appropriate unpredicted fragmentation routes maybe added manually , Figure 8.This capability is presently delivered by third party software tools [19]. In this example, following the import of a mass spectrum, a chemical structure is attached using the molecular structure editor integrated into the program. The lasso tool is used to encircle a particular fragment, and if a spectral ion corresponding to the mass of the selected structural fragment exists in the spectrum, the fragment is highlighted and the assignment is added to the fragment assignment table. In this way, an entire mass spectrum can be assigned and examined for consistency with the hypothetical structure. If there is a mixture of components in
  • 19. a single spectrum due to co-elution, then each component can be individually assigned. Structure as a Means of Communication As a universal language of chemists, structure represents a clear and concise way to communicate chemistries that form the nucleus of research efforts. Whilst the need to elucidate a final and complete structure is the objective for any spectrometrists, in mass spectrometry, it is commonly the case that we are unable to arrive at a finalized structure. In addition, during the process of structural elucidation, there may be a number of iterative versions of what the structure may be before arriving at a finalized version. In these cases the ability to represent structure in some incomplete format, such as a Markush representation, provides a way of creating and storing a “work-in-progress” structure, Figure 9. In this example the position of the chloro group can be intuitively defined as 2,3,4,5 and 6 on the phenyl ring. Whilst this representation has significant benefits for those cases where all remaining positions in a phenyl ring are possible points of attachment, in the case where the structure is represented with the chloro group in the meta and ortho positions the above shorthand notation clearly has limitations. There have been extensions to the notation of “generic” chemical structures over the years, including but not limited to the usage of graphical overlay elements such as boxes etc, Figure 10 [20,21], and polymer like brackets Figure 11. Whilst these representations of structure do have value as a means of visualization within reports they do not convey any
  • 20. chemical knowledge that can be transformed into extractable programmatical elements that can be used in software platforms. Whilst the needs of FDA regulation 21 CFR Part 11 [22] are not generally applied in the drug discovery phase of drug development, for example in metabolism identification, these regulations have in reality set a precedent for the storage of electronic records where feasible, especially in the latest modification to the FDA 21 CFR Part 11 regulations [23]. Whilst the implementation of 21CFR Part 11 in the early phases of drug discovery and development of metabolites, impurities and degradants can be highly contentious, the need to communicate information in a variety of electronic formats is very much becoming a requirement across all of drug discovery and development groups within the pharmaceutical industry. Moreover, the reporting, storage, searching and distribution of electronic information including structures throughout all industries are becoming more commonplace. Therefore, any representations of incomplete structure that are ambiguous create opportunities for miscommunication, resulting in time and thus financial implications for the industry. Metabolism groups in a number of the major pharmaceutical companies have been highly instrumental in encouraging the development of more advanced representations of Markush structure representations which are designed to more clearly show the sites of attachment of a particular substituent(s). One such representation, in the form of a shaded Markush from Advanced Chemistry Development Inc., denotes the points of attachment(s) using user definable color shading as shown in Figure 12. This approach has
  • 21. been extended to more complex structures where the positions of attachment are discontinuous as depicted in Figure 13. These visual representations are also understood programmatically as atom-to-atom mappings allowing the structures to be searched electronically and thus enabled as part of a larger structurally enabled analytical data management system. Linking structures with analytical data Attaching a structure or a number of structures to an elucidated spectrum or chromatogram represents a concise way of reporting our findings as analysts. It is often the case that we cut and paste structures onto our data either in some document editing system or potentially in a package design for spectral processing and reporting. Moreover, the attachment of structure to the analytical data, with subsequent database storage or archival of the elucidated data can act as an important knowledge system. Searches of meta-data, structures or data related features, when coupled together, allow the extraction of compounds and data that are able to provide key insights for current development needs. In this case an analytical data archive does not describe an archive of raw data files instead it represents a repository of knowledge extracted from and associated with the data. A data archive generally describes a repository of raw data which are originally captured at the instrument, collected and deposited into the archive without further analysis. Vendors of such file based archive systems
  • 22. include for example NuGenesis Technologies [24]. A further discussion of databasing is covered later within this Chapter. It is generally easier for a chemist or spectrometrist to remember and draw structures from memory than it is to remember a series of spectral masses or analytically determined parameters. Thus, the physical attachment of a structure or series of structures to the data and their subsequent storage in an appropriate database, as shown in Figure 14, represents a primary link between what a chemist is able to remember and an ability to extract that information quickly and easily from the database using, for example, a chemical structure search. This allows data to be located quickly and effectively from amongst huge volumes of data which are created annually within our organizations. Structure Based Searching The representation of chemical structures and their attachment to analytical data in an electronic format is only made useful when linked with appropriate search engine capabilities. As discussed earlier in the chapter, structures range from complete structures to Markush representations and to fragmental structure information. Additionally, stereochemistry and tautomerism can all affect the performance of structure searching [25]. To date structure search engines are usually able to search using full structure, similar structures and substructure components. In the case of the Markush structural representation discussed earlier the search engine has to be able to allow for
  • 23. Markush structure searching in a variety of ways over and above the standard structure search capabilities. For example, if a Markush structure is the starting point for a search then the search engine should be able to return hits containing completely defined structures that contain modifications that are within the region incorporated by the Markush inclusion positions, Figure 15. A similar search performed using a substructural search of the same database returns as expected a much greater number of structures as indicated in Figure 16. In addition, if a search is made with a completely defined structure, then it should be possible to return structures which are represented as Markush representations containing the functional modifications contained within the search structure. These capabilities are a part of the ACD/Labs analytical data management system, ADMS, software suite which includes as a component the support of MS data processing and database management. Databasing and Analytical Data Management An alternative approach to aid in the identification of an unknown is to perform a spectrum or subspectrum search against a database of known structures and associated spectra. A simple search based on just a few peaks from the mass spectrum is possible. McLafferty developed two search techniques, based on the probability of certain ions (PBM), as well as a technique based on a collection of chemical fragments associated with certain fragmentation patterns [26].
  • 24. Over the years collections of mass spectra have been collected by different groups. The National Institutes of Health (NIH) and Environmental Protection Agency (EPA) standardized the data collection and analysis of the data to ensure a high quality aggregation of tens of thousands of spectra. In addition, Stenhagen, Abrahamsson, and McLafferty collected thousands of mass spectra to form one of the standard MS electron ionization (EI) reference databases available today. The standard computer readable collections are those of the US Government, distributed by NIST and the McLafferty collection [27]. The categorization of processed information into databases is a powerful approach for leveraging the advantages of high throughput analysis schemes. The implementation of an electronic database storage system represents a significant change from the way in which many organizations have historically approached analytical data management. The transition to an electronic storage system typically requires changes to business practices, requiring some level of change management for the most effective conversion and implementation strategies. Additionally the consistency of storage for data, structures and associated alpha-numeric meta-data all represent important aspects that should be considered during the implementation process. In structural terms, isomers, salt structures, and tautomers all represent different structural forms that can have an effect on the route that is adopted for searching. Textual based information including naming conventions for compounds, for example metabolite labels, can
  • 25. be entered in many different ways. If the values are not entered into the database tables in the same data fields then this will limit the effectiveness of the implementation. In operation simple business rules and practices are able to alleviate this potential shortcoming. A database represents an easily accessible knowledge management system containing all structural elucidations that have occurred during the elucidation process and as a storage container for the wide array of textual and numeric information that supports our analytical studies. Access to records within the database(s) when enabled through structure similarity, structure and substructure searches, user field searches, spectrum and subspectrum searches allow flexible access to the stored information. For example, the identification of a metabolite structure may require only a retention time and molecular weight information via LC-MS analysis when compared to the metabolite structure database compiled from previous studies [28]. A further benefit of databases is the efficient extraction of information. Databases may be “mined” to detect trends that may not otherwise be noticed. For example approach can be used to reveal trends such as the metabolically active sites of a molecule and/or substructures labile to degradative conditions. The extension of databases to include a much wider array of data and information over and above the spectrum allows searches to be done using a wider array of parameters. This method provides an efficient mechanism to reduce the number of false positives. The increasing adoption of high mass accuracy instrumentation, represents an
  • 26. exciting addition to the information content that can be stored within proprietary and commercial databases. Once created, a database may be transferred to other laboratories and facilities that are participating in a particular research activity. The resulting databases can be distributed via standard server technologies or “web-enabled” and made accessible via corporate intranets or public internets. Information is coordinated within the database, and different scientists are able to effectively pool and merge their information. When implemented early within the product development cycle, valuable information for later stages in drug development can be made available [29]. Therefore, this approach provides a comprehensive method for information gathering whereby future projects are planned, coordinated, and efficiently supported. In most cases, the information gathering process is targeted towards the creation of either a single compound report, some larger series of cross study reports or, in the case of a regulatory submission, the creation of a compound dossier. It is worth noting that database creation, modification, and use; benefits greatly from a standard, systematic method. This approach produces reliable datasets that lend themselves to a highly consistent database format throughout a project lifetime. While spectral databases can be purchased these are generally limited to nominal mass EI data. Since library searching techniques are limited by the size and nature of the library, relative to the particular problem of the chemist the creation of user databases are of high value to any corporation. With today’s technologies allowing the generation of low energy
  • 27. ionization techniques and accurate mass data, proprietary databases can certainly be of significantly higher value than commercial databases as they represent a focused repository of chemistries appropriate to the organization. The content contained within proprietary databases typically exceeds that contained within commercial databases, which dramatically increases their value to an organization. The searches of such databases can be defined according to a series of options and multiple databases can be searched simultaneously. In the case of the spectrum of Ovex, when this was searched against the NIST replicates database, a similar spectrum for Ovex was returned with a similarity index of 87.9% as shown in Figure 17 Search efficiency is increased by imposing additional constraints. As an example of a multi-step constrained search approach, a search of the NIST database for a para-substituted benzene sulfonic acid fragment, as a starting point, gives a total of almost 300 such spectra in the database. This subset of spectra can then be searched according to variables such as molecular formula, elemental composition based on elemental analysis, and substructural components based on identified fragmentations (loss of Ph, CCl 3, C(CH3)3 and so on). Often, when work is initiated on new project compounds, the use of a complete spectral database is not possible (i.e. drug discovery). When information is stored within a comparative database, compounds of interest can be effectively searched and identified for use in early to late stages of development [30]. Database capabilities also permit the use of substructure-
  • 28. based searches to identify compounds within a specific dataset or library that contains a distinct substructural entity [31]. Distribution of Spectrometry Data to Chemists New technology is delivered at almost every new analytical instrumentation conference. Similar to standard computer platforms, the cost and size of MS instrumentation with the same capability continues to drop resulting in the proliferation of open-access MS labs supporting chemists in both single and multiple synthesis environments. Typically, the resulting data is pre- processed by the generating instrument and is provided to the chemist in a hardcopy format or as a spectral image requiring a vendor-specific viewer. Both of these scenarios prohibit direct or limit interaction with the spectral data. While in some cases this is preferable since the data is locked from further manipulation, as is necessary in a regulated environment, in a research environment such barriers may limit further analysis. The expense of installing a copy of vendor software on the desktop of every non-specialist accessing the MS instrument often renders this level of distribution and flexibility as uneconomical. Additionally the overhead in training and support needs for such large distributions, especially in instrumentally heterogeneous environments, may act as additional limiting factors. In general, such an approach may be overkill as most chemists simply want access to the final spectrum and or
  • 29. confirmation that the correct product was synthesized. In most cases a simple determination of molecular weight may be sufficient for such needs. Traditionally, vendor software provides sophisticated data reduction tools but limited chemical structure association and reporting capability. An alternative resolution to this problem is the installation of a third-party structure enabled desktop processing solution for accessing the data directly over a computer network, allowing the chemist to further manipulate the data and store the resulting spectra in a database for further reference. Such an approach offers additional capability since it is common for a facility to utilize a heterogeneous mix of hardware platforms whereby spectra are generated. With the capability to read multiple file formats in their raw binary format, the costs of operation and the efforts to generate data portability may be significantly reduced. Integrated Spectroscopic and Chemical Structure Databasing Integration strategies often encompass separate events involving instrumentation, methodology, and process. Conventional methods of analysis involve multiple steps. For example, the identification of natural products traditionally involves the scale-up of fermentation broths, solvent extraction, liquid/liquid or column fractionation, chromatographic fraction collection, and spectroscopic analysis of the individual components. The integration of these bench-scale steps into dedicated systems provides unique and powerful
  • 30. advantages for on-line, and perhaps, real-time analysis [31]. Arguably the most significant bottleneck that exists in industry today is the ability to integrate these traditional analysis steps with MS processing and analysis. Discovery chemists and the research and development environments focus a lot of effort into the resolution of components with direct attention paid to the actual chemical structures. As a result, for spectroscopic techniques such as NMR, MS and IR, it is not uncommon to find filing cabinets full of spectra, relevant scientific literature, and associated information, generally linked to the chemical structures that gave rise to the spectra. Even though electronic libraries of chemical structures and MS spectra exist, these libraries are usually limited to EI data as discussed previously. It is possible to search experimental MS data against these libraries with the intention to aid in the identification of possible unknowns. These libraries are, however, not structure or substructure searchable. The requirement for the electronic management of experimental spectra with associated chemical structures is an obvious requirement. There are two general forms to such databases. For spectral-centric solutions the primary focus of the software is the desktop processing of spectroscopic data, followed by the concomitant association with chemical structures. Commonly, a particular facility has access to a structure databasing system from one of the multiple vendors providing this type of solution. These structure databasing systems provide a structure-centric solution whereby spectral records are attached to the structure records in the database for viewing and further processing.
  • 31. Spectrometrists and chromatographers utilize a variety of technologies to both separate and identify chemical structures. It is common in today’s analytical environment to find teams assembled with skillsets to generate both optimal separation and analysis solutions. Spectrometrists assign their spectra in relation to chemical structures using parent ion mass or fragment ion mass analysis in MS, nucleus–to-peak assignments in NMR and vibrational band association with IR peaks, for example. Spectrometrists have used the standard filing system of drawers full of spectra with an association of the file number with some textual identifier in order to locate the detailed knowledge extracted from the spectra at a later date. The general level of spectral management has been limited to hand written notes in notebooks or sometimes text-searchable databases pointing to associated spectra. As explained earlier, tools are now available to allow spectra to be databased in electronic format with associated chemical structures [1a]. In this manner, the mass spectrometrist now has the opportunity to search the database for related structures or substructures, or spectral features when performing fresh analyses. When integrated with other spectral data the result is a legacy database of multiple spectroscopy data, thereby building a foundation for future analyses. The value residing in such tools is the time savings that result for the analysis of related chemicals and the exchange of information between different analytical laboratories within the same company. In theory, such an approach should not be isolated to spectrometry; for chromatography,
  • 32. tools now exist to allow the similar integration of chromatographic peaks and chemical structures. Resulting spectra with associated chemical structure(s) carry valuable information for future analyses. Such resulting files can be stored on a centralized server and thus become a powerful means for dissemination of the mass spectrum-structure connectivity and fragment assignment information. This general approach can be expanded to a World Wide Web intranet approach whereby the spectra are posted as individual HTML pages with hyperlinked MS files. Software solutions available today allow each spectrum to be databased with associated chemical structures, thereby offering significantly enhanced capabilities over the common file systems used today in many laboratories. Due to recent advances in database technology there is enhanced searching capability over the standard filing cabinet system or a text-based databasing system. It is possible to search the resulting databases by structure, substructure, formula, molecular weight, chromatographic and spectroscopic parameters or user data. User data includes the creation of user-definable database fields with particular field labels including, for example, submitter, project name and type of analysis, all of which become searchable fields. Multiple databases can be searched at one time, thereby allowing different databases to be constructed according to analysis type, project name, individual user and so on. These multiple databases can also be distributed across different departments, divisions or even an entire corporation, simply by using
  • 33. the ability to point to databases located on mapped network drives. Corporate- wide database capability engenders concern about the integrity of the databases. This can be addressed by standard database security features. Other than the spectrum parameters, the association of individual searchable user data fields is invaluable, thereby allowing each spectrum in the database to be associated with a project, a customer, an analyst or any other appropriate information. The value of the approach outlined here should be obvious as the ability to integrate structural information with spectra into a database offers exciting benefits to the spectrometrist and is an ideal solution for an environment where multiple spectrometrists need to quickly determine assignments and identify specific chemical structure classes. The additional benefit of this tool is that it may also be fully integrated with similar toolsets allowing similar structure- spectrum management for NMR, MS, UV-Vis, IR and Raman. Conclusions and Future Prospects Computer software technologies for the processing and analysis of MS data and the management of the resulting knowledge are quickly emerging. While it is almost impossible to define the long term future of MS data processing and analysis, it certain that MS systems will continue to become smaller, easier to use, offering greater levels of automation and on-the-fly decision making. It is certainly likely that an increasing amount of data will be acquired with even
  • 34. higher mass resolution and higher mass accuracies and hence the tools necessary to manage this data will need to be further developed. The synergistic coupling of high mass accuracy MS data, MS fragment analysis, integrations with other forms of spectroscopy, for example LC-NMR-MS, will provide still further levels of structural detail. The tools which will be delivered in the future will have to include additional developments in the area of highly automated processing of thousands of datasets, advances in MS fragmentation and tools for the creation and searching of accurate mass spectral databases. Such an approach, when further integrated with spectral processing and databasing for other techniques (NMR, IR, UV-Vis etc.) will provide a unifying tool for spectroscopy management. With further research into statistical and chemometric methods it is hoped that further techniques will be developed for mass spectral identification. However, MS, in any of the separate ionization techniques (El, CI, Electrospray (ESI), Atmospheric Pressure Chemical Ionization (APCI) and so forth), has inherent limitations. Only in the presence of additional techniques, such as IR and NMR, will structure elucidation and verification be more rigorous when identifying the structure of unknown chemicals.
  • 35. Figure 1: A multi-spectroscopic display of Alizarin. UV (top), IR (middle) and MS (bottom) contained within a single display window. This ability allows unified desktop viewing of data.
  • 36. Figure 2: Example showing the reduction in chemical and electronic noise using chemometric algorithms (CODA used for this example). Notice the high levels of noise and background in the total ion chromatogram and the low relative intensity of the chromatographically relevant peaks m/z 739 and 1460 mass regions (Upper panel). The mass spectrum in the top window is for the scan at a retention time of 17.8 minutes. Notice the low molecular weight components around m/z 214. Notice the removal of the gradient background after application of the CODA chemometrics algorithm and the significant decrease in noise level in the Total Ion Chromatogram (Lower panel). The interface shown is for ACD/MS Manager.
  • 37. Figure 3: Following the process of COMPARELCMS, mass chromatograms that are determined to be unique when a control sample is compared with a metabolized sample are retained for further review.
  • 38. Figure 4: An Isotope Pattern Calculator showing Nominal, Average and Monoisotopic masses
  • 39. Figure 5: The EI MS spectrum of ethyl pirimiphos [O-[2-(diethylamino)-6- methylpyrimidin-4-yl] O,O-diethyl thiophosphate] showing two potential fragment assignments for m/z 152.086. ROUTE 1 ROUTE 2
  • 40. Figure 6: A series of proposed fragment structures with nominal mass m/z 152 for the fragmentation of ethyl pirimiphos. (a) The structure on the left corresponds to an accurate mass of m/z 152.006 which has a delta mass of 80 milli Da from the experimental data Figure 5 Route 1 and (b) The right hand structure with mass m/z 152.082 corresponds to a delta mass of 3 milli Da from the experimental data Figure 5 Route 2. (Display extracted from MS Fragmenter Advanced Chemistry Development Inc.)
  • 41. Figure 7: The nominal electron ionization (EI) mass spectrum and structure for 37 temazepam. Note the significant contribution of the Cl isotope ion to the spectrum especially at m/z 273, the primary fragment ion which can have a significant impact on the fragment assignment of the spectrum.
  • 42. Figure 8: The assignment of the N-Oxide buspirone MS/MS spectrum using the “lasso tool” (Left inset box “Stage 1 Lasso structure). The fragment table lists assigned fragments. Moving the mouse cursor over the table highlights the assigned molecular fragment on both the spectrum and the structure. Stage 1 – Lasso Result = fragment structure selected
  • 43. Figure 9: Chloroaniline without the position of the chloro group being specified NH 2 Cl
  • 44. Figure 10: Incomplete structure representation using graphical elements such as shaded boxes O N O CH 3 N O N N OH N OH
  • 45. Figure 11: Incomplete structure representation using Polymer brackets O N O CH 3 N O N N OH N OH
  • 46. Figure 12: Suggested hydroxylation of a buspirone metabolite represented using the shaded Markush structural representation. O HO N O CH 3 N O N N OH N
  • 47. Figure 13: Representation of a Markush structure for a discontinuous series of attachment positions O OH N O CH 3 N O N N OH N
  • 48. Figure 14: Structure attachment of Theophylline to its associated EI Spectrum. Note that the structure is understood at a programmatical level and thus can be utilized directly in structurally enabled search engines.
  • 49. Figure 15: The results of a complete structure search of the NIST98 [27] database of a Markush structure where the position of the hydroxylation and chlorination are defined within any of the possible ring positions. OH Cl
  • 50. Figure 16: The results of a substructure search of the NIST98 [27] database of a Markush structure where the position of the hydroxylation and chlorination are defined within any of the possible ring positions. Note that 945 possible structural combinations are returned. OH Cl
  • 51. Figure 17: The most similar match (87.9% match factor – see bottom middle) for the spectral search displays the spectrum of Ovex, from the catalogue of mass spectra of pesticides from within the NIST replicates database. The structure of Ovex is consistent with the suggested structure.
  • 52.
  • 53. REFERENCES 1 Third party vendors providing software solutions for integrated spectroscopy processing include a) Advanced Chemistry Development Inc., www.acdlabs.com and b) Thermo, http://www.thermo.com/eThermo/CDA/Products/Product_Detail/1,1075,22304- 134-X-1-1,00.html 2 Lee, M.S., Kerns, E.H. LC/MS Applications in Drug Development. Mass Spectrom. Rev. 1999, 18, 187-279 3 Waters Corporation, MS Technologies Centre (Micromass UK Ltd.), Atlas Park Simonsway, Manchester, M22 5PP, United Kingdom. 4 Agilent Technologies, 5301 Stephens Creek Boulevard, Santa Clara, CA, 95051, USA 5 OpenLynx Global Server™ is a registered trademark of Waters Corporation, MS Technologies Centre (Micromass UK Ltd.), Atlas Park Simonsway, Manchester, M22 5PP, United Kingdom. 6 J. E. Biller and K. Biemann, "Reconstructed Mass Spectra, A Novel Approach For The Utilization Of Gas Chromatograph - Mass Spectrometer Data", Anal. Letters, 1974, 7 (7), 515-528.
  • 54. 7 Windig, W., Payne, A., Nichols, W., A Noise and Background Reduction Method for Component Detection in Liquid Chromatography/Mass Spectrometry, Anal. Chem., 1996, 68, 3602-3606. 8 Comparelcms ref 9 Harland, G, Castro Perez, J., Pugh, J., Leandersson, C., Thompson, R., High Mass Accuracy Measurements in W-optics using an Orthogonal Hybrid Quadrupole Time Of Flight Mass Spectrometer for In-Vitro Metabolism Studies. 51st ASMS, Montreal, 2003, TPO 274. 10 Lee, M.S., Yost, R.A., Perchalski, R.J. Tandem Mass Spectrometry and Drug Metabolism. Annu Rep Med Chem 1986, 21, 313-321. 11 Volk, K.J., Hill, S.E., Kerns, E.H., Lee, M.S. Profiling Degradants of Paclitaxel Using Liquid Chromatography-Mass Spectrometry and Liquid Chromatography- Tandem Mass Spectrometry Substructural Techniques. J. Chromatogr. B Biomed. Sci. 1997, 696, 99-115. 12 Blinov K. A., Carlson D., Elyashberg M.E., Martin G.E., Martirosian E.R., Molodtsov, S., Williams, A.J. Computer-assisted structure elucidation of natural products with limited 2D NMR data: application of the StrucEluc system., Magn. Reson. Chem. 2003, 41, 359–372 13 Blinov K., Elyashberg M., Martirosian, E.R., Molodtsov, S.G., Williams A.J., Tackie, A.N., Maged, M., Sharaf, M.H., Schiff P.L., Crouch, R.C. Jr., Martin G.E., Hadden C.E., Guido, J.E., Mills, K.A., Quindolinocryptotackieine: The
  • 55. Elucidation of a Novel Indoloquinoline Alkaloid Structure through the use of Computer-Assisted Structure Elucidation and 2D-NMR, In Press. 14 Bristow, A.W.T., Nichols, W.F., Webb, K.S., Conway, B, "The evaluation of the utility of electrospray in-source collisionally induced dissociation (in-source- CID) spectral libraries", Rapid Communications in Mass Spectrometry, (2002), 16, 2374 - 2386 15 Lee, M.S., Klohr, S.E., Kerns, E.H., Volk, K.J., Leet, J.E., Schroeder, D.R., Rosenberg, I.E. The Coordinated Use of Tandem Mass Spectrometry and High Resolution Mass Spectrometry for the Structure Elucidation of the Kedarcidin Chromophore. J. Mass Spectrom. 1996, 31, 1253-1260. 16 Roepstorff, P., Fohlman, J. Proposal for a Common Nomenclature for Sequence Ions in Mass Spectra of Peptides. Biomed. Mass Spectrom. 1984, 11, 601-602. 17 Owens K.G. Application of correlation analytical techniques to mass spectral data. Applied Spectroscopy Reviews, 1992, 27, 1-49. 18 Gundersdorf, R.W., Fernandez-Metzler, C.L., King, R. C., Overcoming SRM Blindness with the Linear Ion Trap., 51 st ASMS, Montreal, 2003, WPH 146. 19 Advanced Chemistry Development Inc., Suite 600, 90 Adelaide Street West, Toronto, ON, M5H 3V9, Canada. 20 Mike S. Lee, Wiley, 2002, LC/MS Applications in Drug Development, ISBN 0-471-40520-5.
  • 56. 21 Lam W., Ramanathan R., In Electrospray Ionization Source Hydrogen/Deuterium Exchange LC-MS and LC-MS-MS for Characterization of Metabolites., J. Am. Soc. Mass Spectrom., 13, 345 – 353, 2002. 22 21 CFR Part 11 Regulations, www.fda.gov/ora/compliance_ref/part11/frs/background/11cfr-fr.htm 23 Guidance for Industry Part 11, Electronic Records; Electronic Signatures – Scope and Application (Draft), February 2003, www.fda.gov/cder/guidance/index.htm 24 NuGenesis Technologies Corporation, 1900 West Park Drive, Westborough, MA, 01581, United States 25 Trepalin, S. V., Skorenko, A. V., Balakin K. V., Nasonov, A.F., Lang, S.A., Ivashchenko, A. A., Savchuk, N. P., Advanced Exact Structure Seaching in Large Databases of Chemical Compounds., J. Chem. Inf. Comput. Sci., 2003, 43, 852 – 860. 26 Pesyna G.M., Venkataraghavan R., Dayringer H.E. & McLafferty F.W. Probability Based Matching System Using a Large Collection of Reference Mass Spectra. Anal Chem., 1976, 48(9), 1362-1368. 27 The US Government MS database is available from NIST, Office of Standard Reference Data, Washington DC, 20234. The McLafferty database is available from John Wiley & Sons, Electronic Publishing Division, 605 Third Avenue, New York, New York 10158.
  • 57. 28 Kerns, E.H., Rourick, R.A., Volk, K.J., Lee, M.S. Buspirone Metabolite Structure Profile Using a Standard Liquid Chromatographic-Mass Spectrometric Protocol. J. Chromatogr. B 1997, 698,133-145. 29 Kerns, E.H., Volk, K.J., Hill, S.E., Lee, M.S. Profiling Taxanes in Taxus Extracts Using LC/MS and LC/MS/MS Techniques. J. Nat. Prod. 1994, 57, 1391- 1403. 30 Kerns, E.H., Volk, K.J., Hill, S.E., Lee, M.S. Profiling New Taxanes Using LC/MS and LC/MS/MS Substructural Analysis Techniques. Rapid Commun. Mass Spectrom. 1995, 9, 1539-1545. 31 Lee, M.S., Kerns, E.H., Hail, M.E., Liu, J., Volk, K.J. Recent Applications of LC-MS Techniques for the Structure Identification of Drug Metabolites and Related Compounds. LC-GC, 1997, 15, 542-558.