SlideShare uma empresa Scribd logo
1 de 15
Content Mine + Europe PubMedCentral
Peter Murray-Rust,
ContentMine.org and UniversityOfCambridge
Wellcome Trust, London, UK 2016-02-08
Getpapers[0] and AMI[1]download and analyze papers from
EuropePubMedCentral
[0][1] F/OSS tools from contentmine.org
Automated Semantic Fulltext
• EuropePMC provides coherent OpenAccess
• getpapers: wrapper for repos and search engines.
• AMI filters, checks[1], transforms facts in papers. Here:
– Sequences in text
– Species and genera
– Genes
– User dictionaries
– (RRIDs, chemistry, places, phylo)
[0] All operations shown run in total of <3 minutes.
[1] Dictionaries and lookup.
[2] Usable from home by anyone
catalogue
getpapers
query
Daily
Crawl
EPMC, arXiv
CORE , HAL,
(UNIV repos)
ToC
services
PDF HTML
DOC ePUB
TeX XML
PNG
EPS CSV
XLSURLs
DOIs
crawl
quickscrape
norma
Normalizer
Structurer
Semantic
Tagger
Text
Data
Figures
ami
UNIV
Repos
search
Lookup
CONTENT
MINING
Chem
Phylo
Trials
Crystal
Plants
COMMUNITY
plugins
Visualization
and Analysis
PloSONE, BMC,
peerJ… Nature, IEEE,
Elsevier…
Publisher Sites
scrapers
queries
taggers
abstract
methods
references
Captioned
Figures
Fig. 1
HTML tables
30, 000 pages/day
Semantic ScholarlyHTML
Facts
CONTENTMINE Complete OPEN Platform for Mining Scientific Literature
Download all Open Access “Zika” from
EuropePMC in 10 seconds (click below for movie)
Downloaded all Open Access “Zika” from
EuropePMC in 10 seconds
Final download screen
Eyeballing 20/120 Zika papers,
click below for movie
3011 virus
1939 Ae./Aedes
1212 dengue
901 mosquito/es
894 species
791 ZIKV
721 using
716 DENV
567 detection
513 aegypti
484 infection
442 RNA
428 protein
401 albopictus
360 viral
Commonest words in 120 Zika papers
Filtering local files for sequence and viruses
AMI (part of ContentMine software)
DNA Primers found in running text
Commonest species in 120 Zika papers
335 Ae. aegypti
205 Ae. albopictus
128 Aedes albopictus
88 Aedes aegypti
63 Ae. bromeliae
58 Ae. lilii
46 Ae. hensilli
42 Glossina pallidipes
40 Plasmodium vivax
35 Ae. luteocephalus
28 Ae. vittatus
25 Ae. furcifer
22 Plasmodium falciparum
21 Drosophila melanogaster
183 Wolbachia
70 Aedes
49 Flavivirus
30 Glossina
29 Crataegus
20 Flaviviridae
17 Culex
16 Drosophila
11 Ochlerotatus
10 Alphavirus
9 Musca
9 Phlebovirus
6 Plasmodium
Commonest genera in Zika papers
Example EMPC/ContentMine
• microcephaly 400/2400 papers; 2 mins;
commonest genes:
203 MCPH1
86 MECP2
54 SOX2
49 E2F1
47 SNAP29
40 IKBKG
40 NDE1
Further directions
• With Hypothes.is to use ContentMine results
to annotate literature.
• With Cambridge Univ Library extracting daily
scientific facts from open and closed
literature.
• Working with EBI.
• Running workshops, hackdays, in bio-science
and anything else you want.
ContentMine + EPMC: Finding Zika!
ContentMine + EPMC: Finding Zika!

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Automatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literatureAutomatic Extraction of Knowledge from Biomedical literature
Automatic Extraction of Knowledge from Biomedical literature
 
Cochrane workshop2016
Cochrane workshop2016Cochrane workshop2016
Cochrane workshop2016
 
Text and Data Mining explained at FTDM
Text and Data Mining explained at FTDMText and Data Mining explained at FTDM
Text and Data Mining explained at FTDM
 
The culture of researchData
The culture of researchData The culture of researchData
The culture of researchData
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
Automatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the LiteratureAutomatic Extraction of Knowledge from the Literature
Automatic Extraction of Knowledge from the Literature
 
Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape? Digital Scholarship: Enlightenment or Devastated Landscape?
Digital Scholarship: Enlightenment or Devastated Landscape?
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literatureAmanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)Can Computers understand the scientific literature (includes compscie material)
Can Computers understand the scientific literature (includes compscie material)
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSS Open software and knowledge for MIOSS
Open software and knowledge for MIOSS
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literatureHigh throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
Automatic Extraction of Science and Medicine from the scholarly literature
Automatic Extraction of Science and  Medicine from the scholarly literatureAutomatic Extraction of Science and  Medicine from the scholarly literature
Automatic Extraction of Science and Medicine from the scholarly literature
 
Open software and knowledge for MIOSS
Open software and knowledge for MIOSSOpen software and knowledge for MIOSS
Open software and knowledge for MIOSS
 
Mining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistryMining the scientific literature for plants and chemistry
Mining the scientific literature for plants and chemistry
 
Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature Amanuens.is HUmans and machines annotating scholarly literature
Amanuens.is HUmans and machines annotating scholarly literature
 
High throughput mining of the scholarly literature
High throughput mining of the scholarly literature High throughput mining of the scholarly literature
High throughput mining of the scholarly literature
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
High throughput mining of the plant-science literature
High throughput mining of the plant-science literatureHigh throughput mining of the plant-science literature
High throughput mining of the plant-science literature
 
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScienceScott Edmunds: Revolutionizing Data Dissemination: GigaScience
Scott Edmunds: Revolutionizing Data Dissemination: GigaScience
 
Museum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on themMuseum impact: linking-up specimens with research published on them
Museum impact: linking-up specimens with research published on them
 

Destaque

зно з іноземних мов 2011
зно з іноземних мов 2011зно з іноземних мов 2011
зно з іноземних мов 2011
Ruslana Shamanska
 

Destaque (14)

Making Theses USEFUL
Making Theses USEFULMaking Theses USEFUL
Making Theses USEFUL
 
OpenNotebookScience NOW!
OpenNotebookScience NOW!OpenNotebookScience NOW!
OpenNotebookScience NOW!
 
Mining Scientific Diagrams for facts
Mining Scientific Diagrams for facts Mining Scientific Diagrams for facts
Mining Scientific Diagrams for facts
 
ContentMine and WikiData
ContentMine and WikiDataContentMine and WikiData
ContentMine and WikiData
 
Open Data and Open Science
Open Data and Open ScienceOpen Data and Open Science
Open Data and Open Science
 
Mining Scientific Images
Mining Scientific ImagesMining Scientific Images
Mining Scientific Images
 
зно з іноземних мов 2011
зно з іноземних мов 2011зно з іноземних мов 2011
зно з іноземних мов 2011
 
Spiritual Meditative PPT Template
Spiritual Meditative PPT TemplateSpiritual Meditative PPT Template
Spiritual Meditative PPT Template
 
Pocket size quran english translation
Pocket size quran english translationPocket size quran english translation
Pocket size quran english translation
 
manual cobranza fiscalizacion municpalidad
manual cobranza fiscalizacion municpalidadmanual cobranza fiscalizacion municpalidad
manual cobranza fiscalizacion municpalidad
 
20120817155046521908
2012081715504652190820120817155046521908
20120817155046521908
 
Colegio nacional nicolas esguerra
Colegio nacional nicolas esguerraColegio nacional nicolas esguerra
Colegio nacional nicolas esguerra
 
Konsep Akal
Konsep AkalKonsep Akal
Konsep Akal
 
Mohamed MD Arabic
Mohamed MD ArabicMohamed MD Arabic
Mohamed MD Arabic
 

Semelhante a ContentMine + EPMC: Finding Zika!

Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Surya Saha
 
E-Utilities
E-UtilitiesE-Utilities
E-Utilities
mkim8
 

Semelhante a ContentMine + EPMC: Finding Zika! (20)

Standarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata filesStandarization in Proteomics: From raw data to metadata files
Standarization in Proteomics: From raw data to metadata files
 
Mining facts from the plant science iterature
Mining facts from the plant science iteratureMining facts from the plant science iterature
Mining facts from the plant science iterature
 
ngs.pptx
ngs.pptxngs.pptx
ngs.pptx
 
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
Saha UC Davis Plant Pathology seminar Infrastructure for battling the Citrus ...
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biology
 
ContentMining for Synthetic Biology
ContentMining for Synthetic BiologyContentMining for Synthetic Biology
ContentMining for Synthetic Biology
 
BioSB meeting 2015
BioSB meeting 2015BioSB meeting 2015
BioSB meeting 2015
 
Yeast genome project
Yeast genome projectYeast genome project
Yeast genome project
 
Next Generation Sequencing - An Overview
Next Generation Sequencing - An OverviewNext Generation Sequencing - An Overview
Next Generation Sequencing - An Overview
 
Yeast Genome
Yeast Genome Yeast Genome
Yeast Genome
 
Genome sequencing and the development of our current information library
Genome sequencing and the development of our current information libraryGenome sequencing and the development of our current information library
Genome sequencing and the development of our current information library
 
Bio-banking and metagenomics platforms for pathogen discovery
Bio-banking and metagenomics platforms for pathogen discoveryBio-banking and metagenomics platforms for pathogen discovery
Bio-banking and metagenomics platforms for pathogen discovery
 
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome SequencingMicrobial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
Microbial Phylogenomics (EVE161) Class 10-11: Genome Sequencing
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
E-Utilities
E-UtilitiesE-Utilities
E-Utilities
 
NCBI API - Integration into analysis code
NCBI API - Integration into analysis codeNCBI API - Integration into analysis code
NCBI API - Integration into analysis code
 
Genome sequencingprojects
Genome sequencingprojectsGenome sequencingprojects
Genome sequencingprojects
 
2022-04-11 Opening X-omics festival 2022, Nijmegen, Alain van Gool
2022-04-11 Opening X-omics festival 2022, Nijmegen, Alain van Gool2022-04-11 Opening X-omics festival 2022, Nijmegen, Alain van Gool
2022-04-11 Opening X-omics festival 2022, Nijmegen, Alain van Gool
 
Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016Liberating facts from the scientific literature - Jisc Digifest 2016
Liberating facts from the scientific literature - Jisc Digifest 2016
 
BM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of StrathclydeBM405 Lecture Slides 21/11/2014 University of Strathclyde
BM405 Lecture Slides 21/11/2014 University of Strathclyde
 

Mais de TheContentMine

Mais de TheContentMine (11)

Open Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics InstituteOpen Knowledge and University of Cambridge European Bioinformatics Institute
Open Knowledge and University of Cambridge European Bioinformatics Institute
 
ContentMine: Open Data and Social Machines
ContentMine: Open Data and Social MachinesContentMine: Open Data and Social Machines
ContentMine: Open Data and Social Machines
 
Disruptive Communities and Technology
Disruptive Communities and TechnologyDisruptive Communities and Technology
Disruptive Communities and Technology
 
Embrace the Open Revolution
Embrace the Open RevolutionEmbrace the Open Revolution
Embrace the Open Revolution
 
Content Mining for Machines and Humans
Content Mining for Machines and HumansContent Mining for Machines and Humans
Content Mining for Machines and Humans
 
TheContentMine: Mining for Everyone
TheContentMine: Mining for EveryoneTheContentMine: Mining for Everyone
TheContentMine: Mining for Everyone
 
Overview of Practical Content Mining
Overview of Practical Content Mining Overview of Practical Content Mining
Overview of Practical Content Mining
 
Copyright Reform and Open Data
Copyright Reform and Open DataCopyright Reform and Open Data
Copyright Reform and Open Data
 
ContentMining and Clinical Trials
ContentMining and Clinical TrialsContentMining and Clinical Trials
ContentMining and Clinical Trials
 
Content Mining at Wellcome Trust
Content Mining at Wellcome TrustContent Mining at Wellcome Trust
Content Mining at Wellcome Trust
 
ContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and thesesContentMine: Liberating scholarship from Open publications and theses
ContentMine: Liberating scholarship from Open publications and theses
 

Último

Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...
Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...
Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...
Sheetaleventcompany
 
Call Girls in Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service Avai...Call Girls in Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service Avai...
adilkhan87451
 
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
chetankumar9855
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...
Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...
Dehradun Call Girls Service {8854095900} ❤️VVIP ROCKY Call Girl in Dehradun U...
 
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
Andheri East ^ (Genuine) Escort Service Mumbai ₹7.5k Pick Up & Drop With Cash...
 
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
8980367676 Call Girls In Ahmedabad Escort Service Available 24×7 In Ahmedabad
 
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
9630942363 Genuine Call Girls In Ahmedabad Gujarat Call Girls Service
 
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
💕SONAM KUMAR💕Premium Call Girls Jaipur ↘️9257276172 ↙️One Night Stand With Lo...
 
Call Girls in Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service Avai...Call Girls in Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service Avai...
Call Girls in Lucknow Just Call 👉👉7877925207 Top Class Call Girl Service Avai...
 
Call Girls Coimbatore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service AvailableCall Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
Call Girls Jaipur Just Call 9521753030 Top Class Call Girl Service Available
 
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
Call Girl In Pune 👉 Just CALL ME: 9352988975 💋 Call Out Call Both With High p...
 
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
Coimbatore Call Girls in Coimbatore 7427069034 genuine Escort Service Girl 10...
 
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
VIP Hyderabad Call Girls Bahadurpally 7877925207 ₹5000 To 25K With AC Room 💚😋
 
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
Models Call Girls In Hyderabad 9630942363 Hyderabad Call Girl & Hyderabad Esc...
 
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...Russian Call Girls Service  Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
Russian Call Girls Service Jaipur {8445551418} ❤️PALLAVI VIP Jaipur Call Gir...
 
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
Mumbai ] (Call Girls) in Mumbai 10k @ I'm VIP Independent Escorts Girls 98333...
 
Kollam call girls Mallu aunty service 7877702510
Kollam call girls Mallu aunty service 7877702510Kollam call girls Mallu aunty service 7877702510
Kollam call girls Mallu aunty service 7877702510
 
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...Top Rated  Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
Top Rated Hyderabad Call Girls Chintal ⟟ 9332606886 ⟟ Call Me For Genuine Se...
 
Call Girls Varanasi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Varanasi Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Varanasi Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Varanasi Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7Call Girls in Gagan Vihar (delhi) call me [🔝  9953056974 🔝] escort service 24X7
Call Girls in Gagan Vihar (delhi) call me [🔝 9953056974 🔝] escort service 24X7
 
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
Coimbatore Call Girls in Thudiyalur : 7427069034 High Profile Model Escorts |...
 
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
Call Girls Kolkata Kalikapur 💯Call Us 🔝 8005736733 🔝 💃 Top Class Call Girl Se...
 

ContentMine + EPMC: Finding Zika!

  • 1. Content Mine + Europe PubMedCentral Peter Murray-Rust, ContentMine.org and UniversityOfCambridge Wellcome Trust, London, UK 2016-02-08 Getpapers[0] and AMI[1]download and analyze papers from EuropePubMedCentral [0][1] F/OSS tools from contentmine.org
  • 2. Automated Semantic Fulltext • EuropePMC provides coherent OpenAccess • getpapers: wrapper for repos and search engines. • AMI filters, checks[1], transforms facts in papers. Here: – Sequences in text – Species and genera – Genes – User dictionaries – (RRIDs, chemistry, places, phylo) [0] All operations shown run in total of <3 minutes. [1] Dictionaries and lookup. [2] Usable from home by anyone
  • 3. catalogue getpapers query Daily Crawl EPMC, arXiv CORE , HAL, (UNIV repos) ToC services PDF HTML DOC ePUB TeX XML PNG EPS CSV XLSURLs DOIs crawl quickscrape norma Normalizer Structurer Semantic Tagger Text Data Figures ami UNIV Repos search Lookup CONTENT MINING Chem Phylo Trials Crystal Plants COMMUNITY plugins Visualization and Analysis PloSONE, BMC, peerJ… Nature, IEEE, Elsevier… Publisher Sites scrapers queries taggers abstract methods references Captioned Figures Fig. 1 HTML tables 30, 000 pages/day Semantic ScholarlyHTML Facts CONTENTMINE Complete OPEN Platform for Mining Scientific Literature
  • 4. Download all Open Access “Zika” from EuropePMC in 10 seconds (click below for movie)
  • 5. Downloaded all Open Access “Zika” from EuropePMC in 10 seconds Final download screen
  • 6. Eyeballing 20/120 Zika papers, click below for movie
  • 7. 3011 virus 1939 Ae./Aedes 1212 dengue 901 mosquito/es 894 species 791 ZIKV 721 using 716 DENV 567 detection 513 aegypti 484 infection 442 RNA 428 protein 401 albopictus 360 viral Commonest words in 120 Zika papers
  • 8. Filtering local files for sequence and viruses AMI (part of ContentMine software)
  • 9. DNA Primers found in running text
  • 10. Commonest species in 120 Zika papers 335 Ae. aegypti 205 Ae. albopictus 128 Aedes albopictus 88 Aedes aegypti 63 Ae. bromeliae 58 Ae. lilii 46 Ae. hensilli 42 Glossina pallidipes 40 Plasmodium vivax 35 Ae. luteocephalus 28 Ae. vittatus 25 Ae. furcifer 22 Plasmodium falciparum 21 Drosophila melanogaster
  • 11. 183 Wolbachia 70 Aedes 49 Flavivirus 30 Glossina 29 Crataegus 20 Flaviviridae 17 Culex 16 Drosophila 11 Ochlerotatus 10 Alphavirus 9 Musca 9 Phlebovirus 6 Plasmodium Commonest genera in Zika papers
  • 12. Example EMPC/ContentMine • microcephaly 400/2400 papers; 2 mins; commonest genes: 203 MCPH1 86 MECP2 54 SOX2 49 E2F1 47 SNAP29 40 IKBKG 40 NDE1
  • 13. Further directions • With Hypothes.is to use ContentMine results to annotate literature. • With Cambridge Univ Library extracting daily scientific facts from open and closed literature. • Working with EBI. • Running workshops, hackdays, in bio-science and anything else you want.

Notas do Editor

  1. Hi, I’m here to talk about AMI; a data extraction framework and tool. First, I just want highlight some of key contributors to the projects; Andy for his work on the ChemistryVisitor and Peter for the overall architecture. In this talk, I’m going to impress the importance of data in a specific format and its utility to automated machine processing. Then I’m going to demonstrate AMI’s architecture and the transformation of data as it flows through the process. I’m going to dwell a little on a core format used, Scalable Vector Graphics (SVG) before introducing the concept of visitors, which are pluggable context specific data extractors. Next, I’m going to introduce Andy’s ChemVisitor, for extracting semantic chemistry data, along with a few other visitors that can process non-chemistry specific data. Finally, I will demonstrate some uses of the ChemVisitor, within the realm of validation and metabolism.