Published on Feb 07, 2016 by PMR
Use of ContentMine tools on the Open Access subset of EuropePubMedCentral to discover new knowledge about the Zika virus. Includes clips of the software in action
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
ContentMine + EPMC: Finding Zika!
1. Content Mine + Europe PubMedCentral
Peter Murray-Rust,
ContentMine.org and UniversityOfCambridge
Wellcome Trust, London, UK 2016-02-08
Getpapers[0] and AMI[1]download and analyze papers from
EuropePubMedCentral
[0][1] F/OSS tools from contentmine.org
2. Automated Semantic Fulltext
• EuropePMC provides coherent OpenAccess
• getpapers: wrapper for repos and search engines.
• AMI filters, checks[1], transforms facts in papers. Here:
– Sequences in text
– Species and genera
– Genes
– User dictionaries
– (RRIDs, chemistry, places, phylo)
[0] All operations shown run in total of <3 minutes.
[1] Dictionaries and lookup.
[2] Usable from home by anyone
3. catalogue
getpapers
query
Daily
Crawl
EPMC, arXiv
CORE , HAL,
(UNIV repos)
ToC
services
PDF HTML
DOC ePUB
TeX XML
PNG
EPS CSV
XLSURLs
DOIs
crawl
quickscrape
norma
Normalizer
Structurer
Semantic
Tagger
Text
Data
Figures
ami
UNIV
Repos
search
Lookup
CONTENT
MINING
Chem
Phylo
Trials
Crystal
Plants
COMMUNITY
plugins
Visualization
and Analysis
PloSONE, BMC,
peerJ… Nature, IEEE,
Elsevier…
Publisher Sites
scrapers
queries
taggers
abstract
methods
references
Captioned
Figures
Fig. 1
HTML tables
30, 000 pages/day
Semantic ScholarlyHTML
Facts
CONTENTMINE Complete OPEN Platform for Mining Scientific Literature
4. Download all Open Access “Zika” from
EuropePMC in 10 seconds (click below for movie)
5. Downloaded all Open Access “Zika” from
EuropePMC in 10 seconds
Final download screen
13. Further directions
• With Hypothes.is to use ContentMine results
to annotate literature.
• With Cambridge Univ Library extracting daily
scientific facts from open and closed
literature.
• Working with EBI.
• Running workshops, hackdays, in bio-science
and anything else you want.
Notas do Editor
Hi, I’m here to talk about AMI; a data extraction framework and tool. First, I just want highlight some of key contributors to the projects; Andy for his work on the ChemistryVisitor and Peter for the overall architecture.
In this talk, I’m going to impress the importance of data in a specific format and its utility to automated machine processing. Then I’m going to demonstrate AMI’s architecture and the transformation of data as it flows through the process. I’m going to dwell a little on a core format used, Scalable Vector Graphics (SVG) before introducing the concept of visitors, which are pluggable context specific data extractors. Next, I’m going to introduce Andy’s ChemVisitor, for extracting semantic chemistry data, along with a few other visitors that can process non-chemistry specific data. Finally, I will demonstrate some uses of the ChemVisitor, within the realm of validation and metabolism.