Low Hanging Fruit Breakout Discussion #2

Compound Risk Dossier
Objectives
Improved toxicological prediction demands the best integrated view of current and historic
data, both proprietary and public domain. The objective of the compound risk dossier (CRD)
would be to create a service that is able gather and integrate risk/safety-related information
for a compound (including consideration of similar structures, key moieties, metabolites,
toxicology MoA, etc). The harvested information would then be integrated and presented to
the user in the form of a “safety profile”.

Business Case
It is envisaged that the CRD could bring the following business benefits:
 The system would enable an efficient “background check” for NCEs based on
structural or biological similarity, or possibly shared pharmacology, toxicology MoAs
or adverse event effects, i.e. what is known about molecules similar to my candidate?
 Creation of a safety profile, in which safety categories are normalised and can be
grouped according to public ontologies, provides a powerful method of aligning data
and enables intelligent analysis.
 Pharma companies duplicate effort in aligning internal, vendor and public data; such
a CRD service would reduce the organisation time for this sort of activity down to
almost zero for common activities across organisations, which currently can be costly,
time consuming, tedious, and error prone.

Open Standards
 Open vocabularies, ontologies, e.g. PubChem, ChemIDplus, WHOINN, OBO,
OpenTox, ChEBI,…
 Safety data sources: AERS, drug labels, regulatory documents, etc.
 Open source methods (QSAR, CDK, Weka, R, OpenTox,..)
 Open APIs (e.g., extend and test OpenTox API 1.2
http://www.opentox.org/dev/apis/api-1.2 for data integration into common rdf
resource)

Implementation
It is suggested that a limited set of public domain data sources are selected in the first
instance, to allow a proof of concept within a 12 months.
 Identify vocabulary, ontology sources for compounds, pathologies, etc.(See
Toxicology Ontology Roadmap, Hardy, B. et al. from OpenTox-EBI Industry Forum
workshop, in press)
 Identify data sources from which to harvest risk related information. Opt for a handful
of structured sources rather than free text (NDAs, etc.) in the first instance?
 Compound safety data sources, both public and private, are mined for risk-related
content which is harmonised and organised using public domain ontologies (and held
as an RDF triple store?)
 Text mining and other semantic technologies will be necessary at this stage.
 This data store can be called on by APIs or provide information that can be
consumed by analysis tools, ELNs, etc.
 Decide on quality metrics – on-the-fly profiles vs. curated, pre-canned data, accuracy
vs. recall
 Other things to consider include provenance, governance, security, legal, etc.

Pistoia Alliance Role
 Definition of Use Case
 Guidance on best safety-related data sources
 Guidance on open standards to use, and their extensions needed
 Provide partners willing to integrate public, vendor and proprietary data
 Funding of early phase POCs

Text Mining/Metadata Mark up of Unstructured Text
Objectives
Unstructured text sources, both public and proprietary, are rich in information but several
features limit their use in analysis, such as:
 No mark-up of key concepts – important terms such as drug and target names are
buried within free text with no simple mechanism to surface this information
 Linguistic diversity – widespread use of synonyms and ad hoc identifiers make it
difficult to carry out semantic searching of free text sources.
The objective is to carry out carry out text mining and concept tagging of unstructured text to
provide a meta-data layer over documents. By linking the metadata to public ontologies, a
semantically consistent set of tags will be produced, allowing document sources to be queried
and clustered according to recognised standards. This resource could then be made available
using a cloud model to deliver value and standard search capabilities to Pharma and
Academics alike with appropriate consumption models.

Business Case
The mark-up and mapping of key terms from unstructured text would bring the following
benefits:
 Enhanced search and document retrieval over free text sources
 Linking of in-house structured data sources to unstructured information, in-house and
in the public domain
 Repurpose unstructured text to produce actionable intelligence, for example by
creating assertional metadata
 Drive towards a common standard for searching or at least a common “honest
broker” for search across different resources.

Open Standards
It is suggested that, in order to achieve a working implementation within a 12 month time
frame, a limited set of open standards are applied in the first instance. This could be
discussed more widely within the Pistoia Alliance, but the following areas are worthy of
consideration
 Limiting by domain, e.g. protein targets, drug terms, gene names, pathology
 Limit to a single standard that covers multiple domains, e.g. SNOMED-CT, ICD9CM

Implementation
 Select public domain free text source, e.g. Medline
 Identify public ontologies and vocabulary sources
 Use text mining/concept recognition tools to identify key concepts and map to
standards: Autonomy, Metawise (BioWisdom), Helium (Ceiba), etc.
 Platform for search/display – Lucene, other open source

 Collaborate to define Use Case
 Agree on document sources
 Agree on open standards to use, extensions needed
 Advise on best practice on document mark-up, search, analysis, governance,
security, etc.
 Funding of early phase POCs to aid the development of the tools and a drive towards
standards.
 Support for a free/reduced cost academic access mechanism to encourage common
methods of tagging and naming in the academic environment.

Improved Collaboration: Management of Screening Data
Objectives
 To integrate screening data from multiple sources
 To create a standard for expression of screening data, to allow easier integration

Business Case
 Definition of a standard for reporting compound screening data allows easier
integration, with cost and time savings
 Facilitates easier sharing of data and collaboration

Open Standards
 MIABE, MIAME
 ISA-TAB
 Define standard for dose response for HTS, HCS, include vocabulary, units; support
multiple plate formats, standardised statistical anaylsis
 Define how to deal with incomplete data sets, null values, etc.

Implementation
 Create a the standard, learning from existing standards such as MIAME
 Apply the standard in a working project
 Reiterate and refine

 Guidance on definition of the standard
 Survey what has already been done in the area

Enabling better collaboration in the cloud, applied to
monitoring of NGS data
Objectives
 To provide scientific, business and legal processes outlining best practices for
organisations collaborating in the cloud.
 Application of these best practices in a system for monitoring the progress of NGS
projects.

Business Case
 Time and cost savings in deciding whether a collaborative project should be carried
out in the cloud.
 Streamline implementation of cloud-based collaborations by providing clear
guidelines.
 Reduces delays in handovers.
 Greater visibility of distributed project statuses across different organisations.
 Early visibility, alerting of important events, allowing timely interventions.

Open Standards
 Clear APIs and communication standards.
 Define web services and service discovery mechanisms.
 UDDI (Universal Description, Discovery and Integration).
 MIAME?

Implementation
 Outline best practice rules for working on the cloud
 What is the use case?, e.g. alternative to an internally-hosted system, a method of
distributing large queries, etc.
 What are the requirements for flexibility, such as how long is the service required for
and will capacity requirements change over time? What is the tie-in period?
 Need clear APIs and communication standards.
 Location – does data need to be held within certain boundaries, e.g. within the EU?
 What level of encryption is required?
 Create standard format for NGS data, consumable by analysis software, e.g. Spotfire.

 Signposting best practice in the cloud.
 Advise on standard representation of NGS data.

Low Hanging Fruit Breakout Discussion #2

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Destaque

Destaque (9)

Semelhante a Low Hanging Fruit Breakout Discussion #2

Semelhante a Low Hanging Fruit Breakout Discussion #2 (20)

Mais de Pistoia Alliance

Mais de Pistoia Alliance (20)

Último

Último (20)

Low Hanging Fruit Breakout Discussion #2