O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.


315 visualizações

Publicada em

SciBite is an award-winning leading provider of semantic solutions for the life sciences industry. Our fast, scalable easy-to-use semantic technologies understand the complexity and variability of content within life sciences. We can quickly identify and extract scientific terminology from unstructured text and transform it into valuable machine-readable data for your downstream applications. Our hand-curated ontologies ensure accuracy and reliability of high-quality results. Headquartered in the UK, we support our customers with additional sites in the US and Japan.
More infos at: www.scibite.com

Publicada em: Software
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto


  1. 1. The #cleandata company
  2. 2. • Majority of scientific information is unstructured and underused • Information overload (Volume, Variety, Velocity, Quality) • Highly synonymous and ambiguous terminology • Complex hierarchical relationships Science isn’t simple
  3. 3. • Poor results when applying computational/AI approaches • Up to 80% of time taken to prepare data • Inaccessible and underused data • Duplicity of research • Not building on existing knowledge The downstream impact
  4. 4. Our Purpose To enable scientists to use insights locked in unstructured data to power their decision and speed up innovation by: • Using world class ontologies to revolutionise the access to and utilisation of scientific information • Transforming unstructured text into contextualised, machine readable data suitable for computational analysis
  5. 5. The SciBite Platform Harmonise terminology Scientific ontologies Adhering to public standards Manage / augment / curate your own #MANAGE Automated cleansing of semi- structured data Text to data Standardise data formats Indexing data at point of entry #CLEAN Semantic search Regular expressions Knowledge networks Visualise results Platform enrichment #DISCOVER
  6. 6. TERMite TERMite VOCab TERMite TEXT IN Any format of biomedical text-based document can be processed by TERMite STRUCTURED DATA OUT Contextualized, machine readable data ready for analysis Augment VOCabVOCab Creation Variation engine i.e. breast cancer auto expanded to include the syns breast neoplasm, cancer of the breast & mammary tumour Source Ontology Expert Curation Synonym Expansion Disambiguation settings Iterative testing VOCabs can be updated by users with simple 3 column augment files, the following (saved as drug.dictionary.aug) would add the extra synonym, extrasyn to the DRUG entity aspirin: # ID name syns CHEMBL25 extrasyn Java based, RESTful service RDBMS, NOSQL, Solr/Elastic, Hadoop, RDF, AWS & Docker compatible Scalable & fast. Runs on a server, cloud, laptop
  7. 7. • Hand curated and maintained by our expert team • Comprehensive coverage • Aligned to industry standards to maintain interoperability • Enriched with synonyms and rules to manage. the complexity of scientific language • Customize, augment our existing or deploy your own vocabularies VOCabs Ontologies are at the heart of everything CORE CLINICAL AGRO BIO-PHARM BUS INT GEN PHEN
  8. 8. Modular Microservices Architecture Compile / test vocabularies Manage / distribute ontologies VOCabulary curation Data cleaning platform Smart forms (HTML/JS) Automated data ingestion Semantic search UI Pattern matching Browser- based enrichment Workflow automation (PLP/KNIME) AI-based classification
  9. 9. Partnership Ecosystem
  10. 10. The SciBite Platform Principles • Proven track record in semantic analytics • Specialists in life sciences • Micro-services architecture. Built for integration. Scalable • End-to-end solution for processing, mining and query • Combined benefits of machine learning & ontologies • Supports IT, Data Science, Info Management & Comp. Biology • Great support connecting directly to our SciTech team • Best in class vocabularies covering >100 concepts with tooling to create your own ✔ ✔ ✔ ✔ ✘ ✔ ✔ ✘ ✔ ✘ ✔ ✘ ✔ ✔ ✔ ✘ ✘ ✔ ✘ ✔ ✘ ✘ ✔ ✘ ✔ ✘ ✔ ✔ ✘ ✘ ✘ ✘ ✘ ✘ ✔ ✘ ✔ ✘ ✘ ✔ ✘ ✘ ✔ ✘ ✘ ✔ ✘ ✘ ✔ ✘ ✘ ✔ ✔ ✘ ✘ ✘ ✘ ✔ ✘ ✔ ✔ ✘ ✔ ✘ ✘ ✔ ✘ ✘ ✔ ✘ ✘ ✘ ✔ ✘ ✘ ✘ ✘ ✔ ✘ ✘ ✔ ✔ ✘ ✘ ✔ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ ✘ Others One platform supports the most diverse set of use-cases: ELN enrichment, Target Identification, Pharmacovigilance, Enterprise Search, Literature Analysis, Opportunity/C.I. Analysis, Data Integration, Drug Repurposing, Machine Learning…
  11. 11. What’s your use case? Document Search ELN Enrichment Vocabulary building & Mapping Intelligent Forms Pharmacovigilance Patient Forums CI / Horizon Scanning Clinical Phenotype Mining Disease Networks Connecting Silos Document Classification
  12. 12. Use Case Library
  13. 13. The Problem • Poor keyword search results • Inability to search across a specific concept e.g. [GENE] • Unable to manage synonymy/ambiguity The Solution • SciBite Vocabularies cover >80 different Scientific concepts • Rule-based system to translate language of science • Flexible architecture to integrate seamlessly with partner systems The Outcome • Powerful, enterprise search transformed into scientifically aware system Enterprise Search
  14. 14. © 2018 SciBite Limited DOCstore – A Biomedically-Aware Search Engine
  15. 15. Articles identified that don’t use the word “Gilenya” but do use a synonym Articles must mention an indication. We don’t care which at this stage
  16. 16. The Problem • Search functionality often limited to keyword with no synonym support • Difficult to gain an aggregate view of the innovation within the business • Data not structured/tagged to facilitate linking with other stores The Solution • Extraction and semantic enrichment of ELN records transforms the knowledge into richly annotated, machine-readable data • Interoperable output able to be delivered into various downstream environments The Outcome • Greatly improved ability to search and analyse internal R&D information • Understand who is researching what and how people/groups are interacting Improving searchability in an ELN
  17. 17. • Highly experienced curation team • Active engagement with initiatives such as Pistoia, OBO, ICBO… • Supported by custom-developed curation software to rapidly develop and maintain new or existing ontologies Ontology & Curation Services For new domains or with using internal data sources Bespoke Ontology Curation For example in the areas of bioassays, technologies and devices Enrich & Manage Public Ontologies Expand/customise our hand-curated vocabs (gene, indication, etc…) Augment SciBite Vocabs Thought leadership on standards, ontologies and metadata Engage Our Experts For new domains or to find novel entity relationships Bespoke Semantic Queries
  18. 18. BioAssay Data Repositories The Problem • Legacy systems missing metadata • Limited ability to search results in high duplicity The Solution • Retrospective generation of metadata using semantic entity recognition to find relevant terms • Prospective auto-metadata curation using intelligent forms incorporating semantic autocomplete • Flexible architecture to integrate seamlessly into existing systems The Outcome • Greater ability to find relevant information improves re- use of legacy data and reduced duplicity. • Semantically annotated, interoperable assay data
  19. 19. Monitoring Patient Forums The Problem • The data from Patient/Social media forums are of increasing interest/value to researchers but present a challenge to monitor • Multiple formats, locations, structures of data make integration a difficult process • Consumer language used in these forums doesn’t map to standardised ontologies The Solution • Index data irrespective of source and store in central repository for analysis • Customisable vocabularies can accommodate for consumer language and map to existing public standards • DOCstore provides customisable and extensible search capabilities • Alerting function allows monitoring of relevant threads. The Outcome • Ability to transform, integrate and analyse patient forum data alongside existing workflows. • Powerful multi-source search through simple, easy to use interface. • Tailored vocabularies provide unique search environment
  20. 20. CI/ Horizon Scanning The Problem • Many sources of unstructured external data difficult to monitor and search consistently across • Data aggregation and review is a time-consuming process • Persisting legacy data – not all information is relevant right now The Solution • Index data irrespective of source and store in central repository for analysis • Customisable vocabularies allows for unique / proprietary search methodologies • DOCstore provides customisable and extensible search capabilities • Alerting function allows monitoring of many pre-defined search strategies The Outcome • Powerful multi-source search through simple, easy to use interface. • Tailored vocabularies provide unique search environment • Reduce data review times by up to 80% https://www.scibite.com/artificial-intelligence-platform/ DOCstore News, Grants, Publications – any Data Source Semantic enrichment + text analytics using customised vocabularies
  21. 21. Phenotypic Triangulation The Problem • Many diseases are understudied and lack clear molecular mechanisms • Some entities (e.g. Phenotypes) are highly synonymous and difficult to standardise • Scraping, standardising, and analysing research is time- consuming The Solution • Standardise terminology using SciBite VOCabularies • Transform unstructured text into interoperable machine-readable data compatible with downstream applications • Build network views of disease-phenotype mappings to identify common mechanistic pathways and shared knowledge The Outcome • Uncovering novel relationships in disease biology not previously evident in the source data • Scalable, structured analysis mappable to public ontologies with the flexibility to integrate additional sources over time
  22. 22. Data Preparation / Cleansing The Problem • Many sources of internal data is ‘messy’, even if structured it’s not always consistently tagged • Messy data in = Messy data out • Cleaning/curating data is time-consuming manual process The Solution • FactBio + SciBite integration = automated cleaning/ annotation using highly curated vocabularies spanning life science research • User-friendly blend of automated tagging augmented with manual review where necessary • Flexible architecture to integrate seamlessly into existing systems The Outcome • Greatly reduced effort required to cleanse / prepare data for downstream utility • Semantically annotated, interoperable assay data
  23. 23. Julien Debeauvais– Head of Sales Email: julien@scibite.com Tel: +44 (0) 7825 732 364