SlideShare uma empresa Scribd logo
1 de 25
Baixar para ler offline
Towards a brokering framework
                  for knowledge-based services:

                  Learning from the Pistoia Alliance
                  SESL pilot

Ian Harrow, PhD
Co-Leader of Pistoia Alliance SESL pilot (ex-Pfizer)
Founder, Director & Principal Consultant at Ian Harrow Consulting Ltd



Bio IT World, Hanover, October 2011


http://pistoiaalliance.org
Outline

• Industry Drivers
• Mission and Strategy of Pistoia
• Vision for the SESL pilot
• Minimal configuration to test a
  brokering service
• Public demonstrator and standards
• Deliverables achieved by SESL pilot
• Learning and future direction
                                        2
What is Core to your Business?
       What is Critical?
                     Core?

                         Externalize
              Focus
                             for        1990
             Staff on
                            Best
Critical?




            Innovation
                          Practices


                                        2012
              Reduce     Externalize
             Non-Value     for Cost
            Added Work    Reduction


                                           3
Why the Pistoia Alliance?

• Industry was at a cross roads          Henry Chesbrough, UC Berlkey 2011


  – Change in business models required
• We are all in this (mess) together (Life Science,
  technology vendors, service IT, academia, etc.)
• Need industry applicable services and
  standards
• Collect all the stakeholders together
  – Agree on commonly-shared, pre-competitive use
    cases
• Focus on delivery of proofs of concept to
  stimulate and foster new business models
                                                                             4
The Mission of the Pistoia Alliance



Lowering the barriers to innovation
by improving the interoperability of
R&D business processes
via pre-competitive collaborations


                                       5
6
Pistoia Alliance Membership   Sept 2011




                                      7
A Reality Check: Setting Expectations




                            

                            
                                        8
Signpost
clearly


           9
Pistoia
Strategy


           10
Domains of Action


   Biology &
 Translational       Chemistry
   Medicine



             Scientific
           Collaboration
                                 11
The Focus of Each Domain


   Big Data,
                             Supply Chain,
  Analytics,
                             Tech Transfer
  Semantics
     Biology                          Chemistry


               Vocabularies,
                Use Cases,
               Best Practices
           Scientific Collaboration               12
Try this at your desk….


Which diseases are correlated to the gene, TCF7L2?




    Gene/Protein     Literature - Abstracts Literature – Full Text




          Inherited diseases        Gene expression
                                                                     13
Try it again with Pistoia’s SESL….

                Gene naming/synonyms
                Gene Function
                Literature statistics
                Disease co-occurrences
                Gene/protein interactions

                …all in one report from one
                search

                HOW? A standard vocabulary,
                data model, query language,
                report structure, etc.
                                              14
SESL Pilot project description
• Deliverables:
   – Publication of standards and recommendations for brokering service
     implementation
   – Public demonstrator service for a single disease area
   – Dialogue and assessment of potential business impact with key content
     suppliers

• Scope:
   – Development of an assertion database in combination with a user
     interface and associated web services for one
     disease/indication/phenotype of broad interest: Type II Diabetes
   – Assertional content derived from 3 structured data sources and limited
     Journal content (co-occurrence and statistical derivation from full text)
   – Assertional evidence for filtering and drill down to primary data.
   – Limited vocabulary development for area of focus: Type II Diabetes

• Participants and Cost:
   –   AZ, Pfizer, GSK, Roche, Unilever, EMBL-EBI, NPG, OUP, Elsevier & RSC
   –   Single contract between Pistoia Alliance & EMBL-EBI
   –   £200K cost (=2 x FTEs) – shared by industry
   –   12 month project, January 2010 start
                                                                                 15
The Knowledge Service Framework

                                                                          Multiple
                                                                          Consumers

‘Consumer’
                                  Disease Dossier                         Knowledge
                                                                          Applications
Firewall
             Service Layer                                  Std Public
                                                                          Common
Open         Assertion & Meta Data Management              Vocabularies
                                                                          Service
Stand        Transform /Translate (RDF triples)             Business      Broker
-ards        Integrator/Aggregator (Triple store)            Rules
Supplier
Firewall                                                                    Content
                                                                            Suppliers
                         Db 2

                                                    Db 4
           Corpus 1
                                     Db 3                    Corpus 5
                                            16
                                                                                         16
Minimal configuration to test the technical
 feasibility of a Knowledge Broker Service


                                                                                                                     Interface
                                                     User Interface                                                  Layer



                       Service Layer                 Std Public         Service Layer                  Std Public
Condition:
                                                                                                                     Brokering service
                                                    Vocabularies                                      Vocabularies
                       Assertion & Meta Data Mgmt                       Assertion & Meta Data Mgmt
Identical structure.
                       Transform / Translate          Query             Transform / Translate           Query
Different content
which can overlap      Triple store 1
                                                    templates
                                                                        Triple store 2
                                                                                                      templates      Layer
                              Broker #1                                       Broker #2




                                                                                                                     Primary source
                                                                                                                     Layer
                                                             RSC
                                   UK-Pubmed                                                          NPG    OUP
                                                            corpus
                                     Central                                                         corpus corpus
                EBI Uniprot          corpus                          EBI Array       EBI Uniprot
                 database                                            Express          database
                                                Elsevier             database
                         NCBI OMIM              corpus
                          database                                                                                                    17
Simple Graphical User Interface to the
    SESL public demonstrator

1. Single point of query through a simple GUI        2. Aggregated Results on a single web page

                                                                                              Full text detail
                                                       A. Gene query results summary
                                                                                              Title: Authors:
                                                       1) Co-occurrence Documents             Citation
                                                       2) Uniprot names and annotation        Co-occurrence of
                                                       3) OMIM disease names                  gene and disease
                                                       4) Array express disease and/or        mentions in text
                                                          pancreas expression                 extracts
                                                       5) Uniprot GO terms
                                                       6) Uniprot Binary interactions
                                  A. Gene Query
             Show:                 and/or                   The results include links out to the primary sources

                                  B. Disease Query                                            Full text detail
                                                       B. Disease query results summary
                                                                                              Title: Authors:
                                                       1) Co-occurrence Documents             Citation
                                                       2) OMIM disease names                  Co-occurrence of
                                                       3) Array express disease expression    gene and disease
  Filtered by:
  1) Everything                                                                               mentions in text
                                                                                              extracts
  2) Consensus
  3) Co-occurrence
  4) OMIM
  5) Array Express    SESL public demonstrator:
                                       http://www.pistoia-sesl.org
                                                                                                                   18
Type 2 diabetes genes in SESL demonstrator

Human protein names                   Human      Source: SESL:      Google Pubmed: SESL: gene Source: SESL:      Source:   SESL:  Source: SESL: GO Source:         SESL:
                                      gene       UniProt UniProt Scholar: type 2 and type 2 OMIM OMIM             Array   Array   Uniprot terms     Uniprot       Binary
                                      names      diabetes diabetes type 2 diabetes diabetes diabetes diabetes Express Express GO terms               Intact    interactions
                                                 mention mention diabetes June         co-      mention mention   Atlas  pancreas                    binary
                                                                   2006 to   2011 occurrence                    pancreas                          interactions
                                                                  June 2011        in Full Text
ATP-binding cassette sub-family C     ABCC8         1        1       753      37        6        6        6        5         7         7        9        0           0
member 8
Calpain-10                            CAPN10        1        1       810      168       21       1        1        1         1        12       12        0           0
Glucokinase                           GCK           1        1      3,950     708       12       7        7        0         0        19       19        2           2
Hematopoietically-expressed           HHEX          0        0       626      91        24       1        0        2         2        21       23        3           0
homeobox protein
Hepatocyte nuclear factor 1-alpha     HNF1A         1        1       633      340       23       3        4        2         2        12       12        6           6
Hepatocyte nuclear factor 1-beta      HNF1B         1        1       408      269       20       1        1        2         2         9        8        1           0
Hepatocyte nuclear factor 4-alpha     HNF4A         1        1       811      173       34       2        2        3         3        22       20        5           5
Insulin                               INS           2        1     166,000   37,670     5        9        0        7         0        59       59        0           0
Insulin receptor substrate 1          IRS1          1        1      7,970     616       9        1        0        2         2        24       24        3           0
Insulin receptor                      INSR          1        1      14,00    4,830      16       2        4        6         6        41       43        9           9
ATP-sensitive inward rectifier        KCNJ11        1        1      1,260      45       35       3        1        0         0        12       12        1           0
potassium channel 11
Hepatic triacylglycerol lipase        LIPC          1        0      2,090     89        1        1        1        1         1        17       17        0           0
C-Jun-amino-terminal kinase-          MAPK8IP1      1        1       248       4        1        1        1        1         1         6        6        4           4
interacting protein 1
Neurogenic differentiation factor 1   NEUROD1       1        1       549      50        7        2        2        2         4        13       14        0           0
Pancreas/duodenum homeobox            PDX1          1        1      2,270     154       9        2        0        1         1         9        9        0           0
protein 1
Peroxisome proliferator-activated     PPARG         1        1      9,540    1,556      48       1        1        2         2        40       42        7           7
receptor gamma
Protein phosphatase 1 regulatory      PPP1R3A       1        1       141      23        3        1        0        1         1         2        2        0           0
subunit 3A
Zinc transporter 8                    SLC30A8       1        0       724      117       0        2        1        3         4        13       13        0           0
Transcription factor 7-like 2         TCF7L2        1        1      2,000     284       65       1        1        3         3        33       31        5           5
Mitochondrial brown fat uncoupling    UCP1          1        0      1,760     50        3        0        0        0         0         6        6        0           0
protein 1

                                                                                                                                                                         19
Gene discovery in SESL demonstrator



    Pancreas                   T2D disease
                      1               gene
    expression
    in Array                       mention
    Express db                  in OMIM db

                  3        1                 Gene count
        20            10           0
                           3
                                             intersections from
                  4
                                             the data sources in
                                             the demonstrator
    T2D disease                T2D disease
    genes in                          gene
    Full Text         1         mention in
    documents                   Uniprot db



                                                                   20
Selected content loaded as RDF triples

 Source Description                                # triples    %
 Expression data Array Express                        182,840    0.5%

 Experimental Factor Ontology from Array Express      49,026     0.1%

 Disease vocabulary from UMLS                       6,906,735   18.8%

 Vocabulary from Disease Ontology                   1,863,664    5.1%

 Terms from Gene Ontology                            495,595     1.3%

 Human genes from Uniprot                          12,552,239   34.1%

 Meta data from Full Text documents                 3,485,212    9.5%

 Gene annotations from Full Text documents          2,373,584    6.5%

 Disease annotations from Full Text documents       4,983,788   13.6%

 GO annotations from Full Text documents            3,870,834   10.5%

 Totals                                            36,763,517   100%

                                                                        21
Signposting: Standards used in SESL

  Category              Name                  Community
                         RDF                     W3C
                       SPARQL                    W3C
Triple Store        Jena, Sesame,
                                             Open Source
                       Virtuoso
                        leXML                 EBI & CALBC
                                           EBI, NaCTeM, U of
Text Mining        LexEBI/BioLexicon
                                                  Pisa
                         CALCBC               EBI & CALBC
                         UniProt           EBI, PIR, SBI, etc
               Disease Ontology and UMLS    OBO, NIH/NLM
                                                                Blending of
   URIs               ArrayExpress                 EBI          existing
                     NCBI Taxonomy                NCBI          standards
                       Dublin Core                W3C
                       N3 notation                W3C
RDF Schema       Co-occurrence of gene-
                                                  EBI
                         disease
                    PMC doc standard             NCBI
                    Relation ontology            OBO
 Ontology               URI server               W3C
                                                                              22
The Deliverables of the SESL pilot

• A proof-of-concept to demonstrate feasibility and
  clarify requirements
  – http://www.pistoia-sesl.org
• A functional specification for query brokering,
  result filtering, report generation
  – Expect publication by end 2011
  – http://www.pistoiaalliance.com/workinggroups/sesl.html

• Academia, Life Science Industry and Publishers
  – Attained a better understanding of each other’s needs
  – Demonstration of potential for a new business model
  – Explore follow-on via Open Innovation consortia
                                                             23
Learning and Future Direction

• Framework to maximise re-use of existing standards
  – Minimise use of bespoke, hard-coded implementations
• Crucial features of a knowledge brokering service:-
  – RDF triples for a scalable, meta index to broker across
    primary sources (both databases and literature)
  – Important to define business rules for query & extraction
  – Recommend a registry of suitable data sources
     • similar to web services registry
• What is next?
  –   Example, follow-on to the SESL pilot:-
  –   Open PHACTs consortium => www.openphacts.org
  –   3 year IMI pre-competitive project (started early 2011)
  –   Data providers and Life Science industry working together   24
Acknowledgements

Industry                       EMBL-EBI                     Publishers
Wendy Filsell - Unilever       Dietrich Rebholz Schuhmann   Claire Bird – OUP
(SESL co-leader)               (Technical Team Leader)      Richard O’Bierne – OUP
Ian Stott - Unilever           Christoph Grabmueller
                               Silvestras Kavaliauskas      Colin Batchelor – RSC
Nigel Wilkinson - PFE                                       Richard Kidd – RSC
Catherine Marshall - PFE       Dominic Clark
                               Roderigo Lopez               David Hoole – NPG
Peter Woollard - GSK           Jo McEntyre – UK-PMC         Alf Eaton – NGP
Ashley George - GSK            Janet Thornton
                                                            Jabe Wilson – Elsevier
Mike Westaway - AZ                                          Bradley Allen – Elsevier
Nick Lynch - AZ
Ian Dix - AZ

Michael Braxenthaler – Roche

John Wise – Pistoia Alliance
                                                                                       25

Mais conteúdo relacionado

Destaque

Destaque (16)

The Pistoia Alliance: Strategy, Progress, Momentum
The Pistoia Alliance: Strategy, Progress, MomentumThe Pistoia Alliance: Strategy, Progress, Momentum
The Pistoia Alliance: Strategy, Progress, Momentum
 
Information Ecosystem Business Models
Information Ecosystem Business ModelsInformation Ecosystem Business Models
Information Ecosystem Business Models
 
The Standards Landscape
The Standards LandscapeThe Standards Landscape
The Standards Landscape
 
Pistoia Chemistry Live Strategy April 2011
Pistoia Chemistry Live Strategy April 2011Pistoia Chemistry Live Strategy April 2011
Pistoia Chemistry Live Strategy April 2011
 
Canonical Models for Large Molecules
Canonical Models for Large MoleculesCanonical Models for Large Molecules
Canonical Models for Large Molecules
 
Infosys -- Pharma in a box
Infosys -- Pharma in a boxInfosys -- Pharma in a box
Infosys -- Pharma in a box
 
Pistoia 2011 Annual General Meeting
Pistoia 2011 Annual General MeetingPistoia 2011 Annual General Meeting
Pistoia 2011 Annual General Meeting
 
The Pistoia Alliance Biology Domain Strategy April 2011
The Pistoia Alliance Biology Domain Strategy April 2011The Pistoia Alliance Biology Domain Strategy April 2011
The Pistoia Alliance Biology Domain Strategy April 2011
 
Resource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and FederationResource Description Framework Approach to Data Publication and Federation
Resource Description Framework Approach to Data Publication and Federation
 
GenomeQuest -- The Bioinformatics Bottleneck
GenomeQuest -- The Bioinformatics BottleneckGenomeQuest -- The Bioinformatics Bottleneck
GenomeQuest -- The Bioinformatics Bottleneck
 
NMR Automatic Structure Verficiation
NMR Automatic Structure VerficiationNMR Automatic Structure Verficiation
NMR Automatic Structure Verficiation
 
Assay Depot -- Pharmageddon
Assay Depot -- PharmageddonAssay Depot -- Pharmageddon
Assay Depot -- Pharmageddon
 
Pistoia Alliance Sequence Services Programme Phase 2
Pistoia Alliance Sequence Services Programme Phase 2Pistoia Alliance Sequence Services Programme Phase 2
Pistoia Alliance Sequence Services Programme Phase 2
 
Collaborative Drug Discovery -- Life Science Collaboration & Virtualization: ...
Collaborative Drug Discovery -- Life Science Collaboration & Virtualization: ...Collaborative Drug Discovery -- Life Science Collaboration & Virtualization: ...
Collaborative Drug Discovery -- Life Science Collaboration & Virtualization: ...
 
Sequence Services Phase 2 Webinar Series: Constellation Technology and Genestack
Sequence Services Phase 2 Webinar Series: Constellation Technology and GenestackSequence Services Phase 2 Webinar Series: Constellation Technology and Genestack
Sequence Services Phase 2 Webinar Series: Constellation Technology and Genestack
 
Pistoia Alliance Intro and Strategy April 2011
Pistoia Alliance Intro and Strategy April 2011Pistoia Alliance Intro and Strategy April 2011
Pistoia Alliance Intro and Strategy April 2011
 

Semelhante a Towards a brokering framework for knowledge-based services: Learning from the Pistoia Alliance SESL pilot

Monolix Day 2011
Monolix Day 2011Monolix Day 2011
Monolix Day 2011
blaudez
 
Cheng bearing point
Cheng bearing pointCheng bearing point
Cheng bearing point
southmos
 
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integration
DataWorks Summit
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick Knupffer
IntelAPAC
 
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
Stichting ePortfolio Support
 

Semelhante a Towards a brokering framework for knowledge-based services: Learning from the Pistoia Alliance SESL pilot (20)

Enterprise Integration of Disruptive Technologies
Enterprise Integration of Disruptive TechnologiesEnterprise Integration of Disruptive Technologies
Enterprise Integration of Disruptive Technologies
 
MarkLogic Applications in Healthcare
MarkLogic Applications in HealthcareMarkLogic Applications in Healthcare
MarkLogic Applications in Healthcare
 
Monolix Day 2011
Monolix Day 2011Monolix Day 2011
Monolix Day 2011
 
PCTY 2012, Risk Based Access Control v. Pat Wardrop
PCTY 2012, Risk Based Access Control v. Pat WardropPCTY 2012, Risk Based Access Control v. Pat Wardrop
PCTY 2012, Risk Based Access Control v. Pat Wardrop
 
Metadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiencesMetadata in general and Dublin Core in specific; some experiences
Metadata in general and Dublin Core in specific; some experiences
 
Powering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache HadoopPowering Next Generation Data Architecture With Apache Hadoop
Powering Next Generation Data Architecture With Apache Hadoop
 
IBM involvement in adoption of EHR, health data standards and epSOS - Matej Adam
IBM involvement in adoption of EHR, health data standards and epSOS - Matej AdamIBM involvement in adoption of EHR, health data standards and epSOS - Matej Adam
IBM involvement in adoption of EHR, health data standards and epSOS - Matej Adam
 
Cheng bearing point
Cheng bearing pointCheng bearing point
Cheng bearing point
 
Results of the Apollon pilot in homecare and independent living
Results of the Apollon pilot in homecare and independent livingResults of the Apollon pilot in homecare and independent living
Results of the Apollon pilot in homecare and independent living
 
MFW12: Dirk deRoos (IBM)
MFW12: Dirk deRoos (IBM)MFW12: Dirk deRoos (IBM)
MFW12: Dirk deRoos (IBM)
 
Reflections on knowledge management practice case study
Reflections on knowledge management practice    case studyReflections on knowledge management practice    case study
Reflections on knowledge management practice case study
 
Tackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integrationTackling big data with hadoop and open source integration
Tackling big data with hadoop and open source integration
 
Oepi external overview
Oepi external overviewOepi external overview
Oepi external overview
 
Intel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick KnupfferIntel Cloud summit: Big Data by Nick Knupffer
Intel Cloud summit: Big Data by Nick Knupffer
 
Pistoia Alliance: SESL Pilot for a Biomedical Brokering Service
Pistoia Alliance: SESL Pilot for a Biomedical Brokering ServicePistoia Alliance: SESL Pilot for a Biomedical Brokering Service
Pistoia Alliance: SESL Pilot for a Biomedical Brokering Service
 
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
10052012 luc vervenne synergetics van syntax portfolio naar semantische uitwi...
 
SOA and Cloud in Life Sciences
SOA and Cloud in Life SciencesSOA and Cloud in Life Sciences
SOA and Cloud in Life Sciences
 
SLAS Informatics SIG: SLAS2013 Presentation
SLAS Informatics SIG: SLAS2013 PresentationSLAS Informatics SIG: SLAS2013 Presentation
SLAS Informatics SIG: SLAS2013 Presentation
 
MITA Beyond MMIS Presentation
MITA Beyond MMIS PresentationMITA Beyond MMIS Presentation
MITA Beyond MMIS Presentation
 
Dorado Hybrid Cloud Use Case
Dorado Hybrid Cloud Use CaseDorado Hybrid Cloud Use Case
Dorado Hybrid Cloud Use Case
 

Mais de Pistoia Alliance

Mais de Pistoia Alliance (20)

Fairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matricesFairification experience clarifying the semantics of data matrices
Fairification experience clarifying the semantics of data matrices
 
MPS webinar master deck
MPS webinar master deckMPS webinar master deck
MPS webinar master deck
 
Digital webinar master deck final
Digital webinar master deck finalDigital webinar master deck final
Digital webinar master deck final
 
Heartificial intelligence - claudio-mirti
Heartificial intelligence - claudio-mirtiHeartificial intelligence - claudio-mirti
Heartificial intelligence - claudio-mirti
 
Fair by design
Fair by designFair by design
Fair by design
 
Knowledge graphs ilaria maresi the hyve 23apr2020
Knowledge graphs   ilaria maresi the hyve 23apr2020Knowledge graphs   ilaria maresi the hyve 23apr2020
Knowledge graphs ilaria maresi the hyve 23apr2020
 
2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar
 
Data market evolution, a future shaped by FAIR
Data market evolution, a future shaped by FAIRData market evolution, a future shaped by FAIR
Data market evolution, a future shaped by FAIR
 
AI in translational medicine webinar
AI in translational medicine webinarAI in translational medicine webinar
AI in translational medicine webinar
 
CEDAR work bench for metadata management
CEDAR work bench for metadata managementCEDAR work bench for metadata management
CEDAR work bench for metadata management
 
Open interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBIOpen interoperability standards, tools and services at EMBL-EBI
Open interoperability standards, tools and services at EMBL-EBI
 
Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...Fair webinar, Ted slater: progress towards commercial fair data products and ...
Fair webinar, Ted slater: progress towards commercial fair data products and ...
 
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data ResourcesApplication of recently developed FAIR metrics to the ELIXIR Core Data Resources
Application of recently developed FAIR metrics to the ELIXIR Core Data Resources
 
Implementing Blockchain applications in healthcare
Implementing Blockchain applications in healthcareImplementing Blockchain applications in healthcare
Implementing Blockchain applications in healthcare
 
Building trust and accountability - the role User Experience design can play ...
Building trust and accountability - the role User Experience design can play ...Building trust and accountability - the role User Experience design can play ...
Building trust and accountability - the role User Experience design can play ...
 
Pistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier Datathon
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the future
 
PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences PA webinar on benefits & costs of FAIR implementation in life sciences
PA webinar on benefits & costs of FAIR implementation in life sciences
 
AI & ML in Drug Design: Pistoia Alliance CoE
AI & ML in Drug Design: Pistoia Alliance CoEAI & ML in Drug Design: Pistoia Alliance CoE
AI & ML in Drug Design: Pistoia Alliance CoE
 
Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Towards a brokering framework for knowledge-based services: Learning from the Pistoia Alliance SESL pilot

  • 1. Towards a brokering framework for knowledge-based services: Learning from the Pistoia Alliance SESL pilot Ian Harrow, PhD Co-Leader of Pistoia Alliance SESL pilot (ex-Pfizer) Founder, Director & Principal Consultant at Ian Harrow Consulting Ltd Bio IT World, Hanover, October 2011 http://pistoiaalliance.org
  • 2. Outline • Industry Drivers • Mission and Strategy of Pistoia • Vision for the SESL pilot • Minimal configuration to test a brokering service • Public demonstrator and standards • Deliverables achieved by SESL pilot • Learning and future direction 2
  • 3. What is Core to your Business? What is Critical? Core? Externalize Focus for 1990 Staff on Best Critical? Innovation Practices 2012 Reduce Externalize Non-Value for Cost Added Work Reduction 3
  • 4. Why the Pistoia Alliance? • Industry was at a cross roads Henry Chesbrough, UC Berlkey 2011 – Change in business models required • We are all in this (mess) together (Life Science, technology vendors, service IT, academia, etc.) • Need industry applicable services and standards • Collect all the stakeholders together – Agree on commonly-shared, pre-competitive use cases • Focus on delivery of proofs of concept to stimulate and foster new business models 4
  • 5. The Mission of the Pistoia Alliance Lowering the barriers to innovation by improving the interoperability of R&D business processes via pre-competitive collaborations 5
  • 6. 6
  • 8. A Reality Check: Setting Expectations     8
  • 11. Domains of Action Biology & Translational Chemistry Medicine Scientific Collaboration 11
  • 12. The Focus of Each Domain Big Data, Supply Chain, Analytics, Tech Transfer Semantics Biology Chemistry Vocabularies, Use Cases, Best Practices Scientific Collaboration 12
  • 13. Try this at your desk…. Which diseases are correlated to the gene, TCF7L2? Gene/Protein Literature - Abstracts Literature – Full Text Inherited diseases Gene expression 13
  • 14. Try it again with Pistoia’s SESL…. Gene naming/synonyms Gene Function Literature statistics Disease co-occurrences Gene/protein interactions …all in one report from one search HOW? A standard vocabulary, data model, query language, report structure, etc. 14
  • 15. SESL Pilot project description • Deliverables: – Publication of standards and recommendations for brokering service implementation – Public demonstrator service for a single disease area – Dialogue and assessment of potential business impact with key content suppliers • Scope: – Development of an assertion database in combination with a user interface and associated web services for one disease/indication/phenotype of broad interest: Type II Diabetes – Assertional content derived from 3 structured data sources and limited Journal content (co-occurrence and statistical derivation from full text) – Assertional evidence for filtering and drill down to primary data. – Limited vocabulary development for area of focus: Type II Diabetes • Participants and Cost: – AZ, Pfizer, GSK, Roche, Unilever, EMBL-EBI, NPG, OUP, Elsevier & RSC – Single contract between Pistoia Alliance & EMBL-EBI – £200K cost (=2 x FTEs) – shared by industry – 12 month project, January 2010 start 15
  • 16. The Knowledge Service Framework Multiple Consumers ‘Consumer’ Disease Dossier Knowledge Applications Firewall Service Layer Std Public Common Open Assertion & Meta Data Management Vocabularies Service Stand Transform /Translate (RDF triples) Business Broker -ards Integrator/Aggregator (Triple store) Rules Supplier Firewall Content Suppliers Db 2 Db 4 Corpus 1 Db 3 Corpus 5 16 16
  • 17. Minimal configuration to test the technical feasibility of a Knowledge Broker Service Interface User Interface Layer Service Layer Std Public Service Layer Std Public Condition: Brokering service Vocabularies Vocabularies Assertion & Meta Data Mgmt Assertion & Meta Data Mgmt Identical structure. Transform / Translate Query Transform / Translate Query Different content which can overlap Triple store 1 templates Triple store 2 templates Layer Broker #1 Broker #2 Primary source Layer RSC UK-Pubmed NPG OUP corpus Central corpus corpus EBI Uniprot corpus EBI Array EBI Uniprot database Express database Elsevier database NCBI OMIM corpus database 17
  • 18. Simple Graphical User Interface to the SESL public demonstrator 1. Single point of query through a simple GUI 2. Aggregated Results on a single web page Full text detail A. Gene query results summary Title: Authors: 1) Co-occurrence Documents Citation 2) Uniprot names and annotation Co-occurrence of 3) OMIM disease names gene and disease 4) Array express disease and/or mentions in text pancreas expression extracts 5) Uniprot GO terms 6) Uniprot Binary interactions A. Gene Query Show: and/or The results include links out to the primary sources B. Disease Query Full text detail B. Disease query results summary Title: Authors: 1) Co-occurrence Documents Citation 2) OMIM disease names Co-occurrence of 3) Array express disease expression gene and disease Filtered by: 1) Everything mentions in text extracts 2) Consensus 3) Co-occurrence 4) OMIM 5) Array Express SESL public demonstrator: http://www.pistoia-sesl.org 18
  • 19. Type 2 diabetes genes in SESL demonstrator Human protein names Human Source: SESL: Google Pubmed: SESL: gene Source: SESL: Source: SESL: Source: SESL: GO Source: SESL: gene UniProt UniProt Scholar: type 2 and type 2 OMIM OMIM Array Array Uniprot terms Uniprot Binary names diabetes diabetes type 2 diabetes diabetes diabetes diabetes Express Express GO terms Intact interactions mention mention diabetes June co- mention mention Atlas pancreas binary 2006 to 2011 occurrence pancreas interactions June 2011 in Full Text ATP-binding cassette sub-family C ABCC8 1 1 753 37 6 6 6 5 7 7 9 0 0 member 8 Calpain-10 CAPN10 1 1 810 168 21 1 1 1 1 12 12 0 0 Glucokinase GCK 1 1 3,950 708 12 7 7 0 0 19 19 2 2 Hematopoietically-expressed HHEX 0 0 626 91 24 1 0 2 2 21 23 3 0 homeobox protein Hepatocyte nuclear factor 1-alpha HNF1A 1 1 633 340 23 3 4 2 2 12 12 6 6 Hepatocyte nuclear factor 1-beta HNF1B 1 1 408 269 20 1 1 2 2 9 8 1 0 Hepatocyte nuclear factor 4-alpha HNF4A 1 1 811 173 34 2 2 3 3 22 20 5 5 Insulin INS 2 1 166,000 37,670 5 9 0 7 0 59 59 0 0 Insulin receptor substrate 1 IRS1 1 1 7,970 616 9 1 0 2 2 24 24 3 0 Insulin receptor INSR 1 1 14,00 4,830 16 2 4 6 6 41 43 9 9 ATP-sensitive inward rectifier KCNJ11 1 1 1,260 45 35 3 1 0 0 12 12 1 0 potassium channel 11 Hepatic triacylglycerol lipase LIPC 1 0 2,090 89 1 1 1 1 1 17 17 0 0 C-Jun-amino-terminal kinase- MAPK8IP1 1 1 248 4 1 1 1 1 1 6 6 4 4 interacting protein 1 Neurogenic differentiation factor 1 NEUROD1 1 1 549 50 7 2 2 2 4 13 14 0 0 Pancreas/duodenum homeobox PDX1 1 1 2,270 154 9 2 0 1 1 9 9 0 0 protein 1 Peroxisome proliferator-activated PPARG 1 1 9,540 1,556 48 1 1 2 2 40 42 7 7 receptor gamma Protein phosphatase 1 regulatory PPP1R3A 1 1 141 23 3 1 0 1 1 2 2 0 0 subunit 3A Zinc transporter 8 SLC30A8 1 0 724 117 0 2 1 3 4 13 13 0 0 Transcription factor 7-like 2 TCF7L2 1 1 2,000 284 65 1 1 3 3 33 31 5 5 Mitochondrial brown fat uncoupling UCP1 1 0 1,760 50 3 0 0 0 0 6 6 0 0 protein 1 19
  • 20. Gene discovery in SESL demonstrator Pancreas T2D disease 1 gene expression in Array mention Express db in OMIM db 3 1 Gene count 20 10 0 3 intersections from 4 the data sources in the demonstrator T2D disease T2D disease genes in gene Full Text 1 mention in documents Uniprot db 20
  • 21. Selected content loaded as RDF triples Source Description # triples % Expression data Array Express 182,840 0.5% Experimental Factor Ontology from Array Express 49,026 0.1% Disease vocabulary from UMLS 6,906,735 18.8% Vocabulary from Disease Ontology 1,863,664 5.1% Terms from Gene Ontology 495,595 1.3% Human genes from Uniprot 12,552,239 34.1% Meta data from Full Text documents 3,485,212 9.5% Gene annotations from Full Text documents 2,373,584 6.5% Disease annotations from Full Text documents 4,983,788 13.6% GO annotations from Full Text documents 3,870,834 10.5% Totals 36,763,517 100% 21
  • 22. Signposting: Standards used in SESL Category Name Community RDF W3C SPARQL W3C Triple Store Jena, Sesame, Open Source Virtuoso leXML EBI & CALBC EBI, NaCTeM, U of Text Mining LexEBI/BioLexicon Pisa CALCBC EBI & CALBC UniProt EBI, PIR, SBI, etc Disease Ontology and UMLS OBO, NIH/NLM Blending of URIs ArrayExpress EBI existing NCBI Taxonomy NCBI standards Dublin Core W3C N3 notation W3C RDF Schema Co-occurrence of gene- EBI disease PMC doc standard NCBI Relation ontology OBO Ontology URI server W3C 22
  • 23. The Deliverables of the SESL pilot • A proof-of-concept to demonstrate feasibility and clarify requirements – http://www.pistoia-sesl.org • A functional specification for query brokering, result filtering, report generation – Expect publication by end 2011 – http://www.pistoiaalliance.com/workinggroups/sesl.html • Academia, Life Science Industry and Publishers – Attained a better understanding of each other’s needs – Demonstration of potential for a new business model – Explore follow-on via Open Innovation consortia 23
  • 24. Learning and Future Direction • Framework to maximise re-use of existing standards – Minimise use of bespoke, hard-coded implementations • Crucial features of a knowledge brokering service:- – RDF triples for a scalable, meta index to broker across primary sources (both databases and literature) – Important to define business rules for query & extraction – Recommend a registry of suitable data sources • similar to web services registry • What is next? – Example, follow-on to the SESL pilot:- – Open PHACTs consortium => www.openphacts.org – 3 year IMI pre-competitive project (started early 2011) – Data providers and Life Science industry working together 24
  • 25. Acknowledgements Industry EMBL-EBI Publishers Wendy Filsell - Unilever Dietrich Rebholz Schuhmann Claire Bird – OUP (SESL co-leader) (Technical Team Leader) Richard O’Bierne – OUP Ian Stott - Unilever Christoph Grabmueller Silvestras Kavaliauskas Colin Batchelor – RSC Nigel Wilkinson - PFE Richard Kidd – RSC Catherine Marshall - PFE Dominic Clark Roderigo Lopez David Hoole – NPG Peter Woollard - GSK Jo McEntyre – UK-PMC Alf Eaton – NGP Ashley George - GSK Janet Thornton Jabe Wilson – Elsevier Mike Westaway - AZ Bradley Allen – Elsevier Nick Lynch - AZ Ian Dix - AZ Michael Braxenthaler – Roche John Wise – Pistoia Alliance 25