SlideShare uma empresa Scribd logo
1 de 17
Literature-Data Integration in the Life Sciences
Lisbon, Oct 2nd 2012
Publications and Data Sources
Europe PubMed Central


26 million abstracts



                       2.3 million full text articles


                                        Citation networks
                                        Database links
                                        Text-mining




    2006                           2011                     2012   2016?
How many open access articles in UKPMC?
                                                                     PubMed (995K)




                                                                     UKPMC (18%,182K)
                                                                     OA (9.6%, 96K)

 200   200   200   200    200   200    200     200   200   20   20
                                Publication Date



                         Total: 489,000 OA articles
45000



 • Big data
                                                                                                                                           300
                                                                          European Nucleotide Archive                                             Ensembl and Ensembl Genomes




                                   Nucleotides (millions)
                                                             40000
                                                                                                                                           250
                                                             35000




 • Thematic data
                                                             30000                                                                         200




                                                                                                                                Genomes
                                                             25000
                                                                                                                                           150
                                                             20000



 • Public data                                               15000

                                                             10000
                                                                                                                                           100


                                                                                                                                            50



 • Archived data
                                                              5000

                                                                 0                                                                           0
                                                                        2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
                                                                                                                                                  2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
                                                                                              Year
                                                      14000000                                                                            25000
                                                                                                                                                                        Year
                                                      12000000
                                                                        UniProt                                                                    InterPro
                                                                                                                                          20000
                                                      10000000




                             Entries
• Two petabytes of data




                                                                                                                               Entries
                                                            8000000                                                                       15000



• Scales to 7 pbs raw disk
                                                            6000000
                                                                                                                                          10000
                                                            4000000


• Majority is DNA
                                                                                                                                           5000
                                                            2000000

                                                                    0                                                                         0

                                                                        2001 2002 2003 2004 2005 2006 2007 2008 2009 2010                          2001 2002 2003 2004 2005 2006 2007 2008 2009 2010

                                                                                            Year                                                                        Year
                                                            500000
                                                                                                                                          70000
                                                            450000       ArrayExpress
                                                                                                                                                   PDBe
                              Hybridisations




                                                            400000                                                                        60000




                                                                                                                             Structures
                                                            350000
                                                                                                                                          50000
                                                            300000
                                                                                                                                          40000
                                                            250000
                                                            200000                                                                        30000
                                                            150000
                                                                                                                                          20000
                                                            100000
                                                                                                                                          10000
                                                            50000
                                                                0                                                                            0
                                                                         2001 2002 2003 2004 2005 2006 2007 2008 2009 2010                        2001 2002 2003 2004 2005 2006 2007 2008 2009 2010


                                                                                               Year                                                                       Year

                                                                                                    Figure 2. Growth of key resources
Literature citation from data
              vs
Data referal from literature
PMC336623   Extended to several other biological data types
Literature citation from data
800 K                     •   Proteins
                          •   Nucleotides
                          •   OMIM
                          •   Chemicals
                          •   Structure
                          •   Clinical reviews
          370 K           •   Protein families
                          •   Protein-protein interactions
                          •   Gene expression experiments
                  110 K
Data referral from literature: text mining

Semantic Type   Unique Terms             Articles   Annotations
Accession No.         233,017             66,356        387,787
Chemical                76,712      1,694,385        83,923,066
Disease               171,692       1,768,214        57,821,871
Gene/Protein          227,318       1,310,382        77,189,022
GO Terms                32,664      1,832,294        65,061,579
Organism              180,637       1,713,280        70,832,222


                  2.3 million articles
Annotation of accession numbers (OA)
100                                          100
90                                            90
80                                            80
70                                            70
60                                            60
50              publisher-annotated           50                  text-mined
40                                            40
30                                            30
20                                            20
10                                            10
  0                                            0




                ~10,000 articles                          >25,000 articles

      BMC Genomics:   1,484 TM tagged,   4,337 articles (1135 tagged)
      PLoS One:       4,226 TM tagged,   42,888 articles


                                                             SenayKafkas and Jee-Hyub Kim
Why is this important? Implications
Scientific:
    Linking articles that cite the same data
Citation:
    Data Citation as measure of impact (Thomson: Data citation index)
    Context of data citation: submission, reuse, analysis
Operational:
    Services for publishers to improve Accession number tagging
    Editorial policies and adherence
    Extension of NLM DTD
    Lessons learned for considering unstructured data

 That we can perform this analysis at all highlights a benefit of Open Access
Case Study of an FP7-funded article (1)
Case Study of an FP7-funded article (2)
Europe PubMed Central content map


   Abstract    Full text
                                               Citing
                                               articles


                           Unstructured
                           Datasets

   Databases

               Extracted
               terms



                                          Citing
                                          articles
AY387398: needle in a haystack
Europe PubMed Central and Institutional Repositories:
               content matching




                          Number of article IDs
    OpenAIRE plus



      **Coming soon: RESTful interface for data linked to articles
People
•   Paula Buttery     • Rebholz Group
•   Andrew Caines     • Peter Stoehr
•   Norman Cobley
•   Yuci Gou          • University of Manchester
•   SenayKafkas       • British Library
•   JyothiKaturi
•   Oliver Kilian     • OpenAIRE/OpenAIRE Plus
•   Jee-Hyub Kim
•   Nikos Marinos     • NCBI, NLM
•   Jo McEntyre
•   Xingjun Pi
•   Philip Rossiter

Mais conteúdo relacionado

Mais procurados

Amsa annual national leadership development seminar 30 aug 2010
Amsa annual national leadership development seminar   30 aug 2010Amsa annual national leadership development seminar   30 aug 2010
Amsa annual national leadership development seminar 30 aug 2010
Department of Health
 
Trends of Formal and Informal Livestock Marketing in Ethiopia
Trends of Formal and Informal Livestock Marketing in EthiopiaTrends of Formal and Informal Livestock Marketing in Ethiopia
Trends of Formal and Informal Livestock Marketing in Ethiopia
essp2
 
Vertical format for trading account, profit and loss account & balance sheet
Vertical format for trading account, profit and loss account & balance sheetVertical format for trading account, profit and loss account & balance sheet
Vertical format for trading account, profit and loss account & balance sheet
SAITO College Sdn Bhd
 
World Newspaper Congress 11, World Editors Forum 11, World Press Trends 2011,...
World Newspaper Congress 11, World Editors Forum 11, World Press Trends 2011,...World Newspaper Congress 11, World Editors Forum 11, World Press Trends 2011,...
World Newspaper Congress 11, World Editors Forum 11, World Press Trends 2011,...
WAN-IFRA
 
Poster presentation
Poster presentationPoster presentation
Poster presentation
redsys
 
Apstartup crowdfunding ver1
Apstartup crowdfunding ver1 Apstartup crowdfunding ver1
Apstartup crowdfunding ver1
AP DealFlow
 
CA coordination in Zimbabwe. through the Zimbabwe CA taskforce ZWCATF. Michae...
CA coordination in Zimbabwe. through the Zimbabwe CA taskforce ZWCATF. Michae...CA coordination in Zimbabwe. through the Zimbabwe CA taskforce ZWCATF. Michae...
CA coordination in Zimbabwe. through the Zimbabwe CA taskforce ZWCATF. Michae...
Joanna Hicks
 

Mais procurados (20)

Utah Adult Education Report Card (2008-2009)
Utah Adult Education Report Card (2008-2009)Utah Adult Education Report Card (2008-2009)
Utah Adult Education Report Card (2008-2009)
 
Australia's Future Health
Australia's Future HealthAustralia's Future Health
Australia's Future Health
 
Amsa annual national leadership development seminar 30 aug 2010
Amsa annual national leadership development seminar   30 aug 2010Amsa annual national leadership development seminar   30 aug 2010
Amsa annual national leadership development seminar 30 aug 2010
 
Trends of Formal and Informal Livestock Marketing in Ethiopia
Trends of Formal and Informal Livestock Marketing in EthiopiaTrends of Formal and Informal Livestock Marketing in Ethiopia
Trends of Formal and Informal Livestock Marketing in Ethiopia
 
Visual data mining with HeatMiner
Visual data mining with HeatMinerVisual data mining with HeatMiner
Visual data mining with HeatMiner
 
Climate Finance for Sustainable Infrastructure Development
Climate Finance for Sustainable Infrastructure DevelopmentClimate Finance for Sustainable Infrastructure Development
Climate Finance for Sustainable Infrastructure Development
 
Australia's Future Health
Australia's Future HealthAustralia's Future Health
Australia's Future Health
 
Resource Efficiency and Waste: The Challenge for Ireland
Resource Efficiency and Waste: The Challenge for IrelandResource Efficiency and Waste: The Challenge for Ireland
Resource Efficiency and Waste: The Challenge for Ireland
 
Vertical format for trading account, profit and loss account & balance sheet
Vertical format for trading account, profit and loss account & balance sheetVertical format for trading account, profit and loss account & balance sheet
Vertical format for trading account, profit and loss account & balance sheet
 
Netflix Business Plan with SWOT for Spain
Netflix Business Plan with SWOT for SpainNetflix Business Plan with SWOT for Spain
Netflix Business Plan with SWOT for Spain
 
Clinical Trials in Australia
Clinical Trials in AustraliaClinical Trials in Australia
Clinical Trials in Australia
 
01 Stig Andersen Five Ways To Adapt To Declining Changing Paper Markets
01 Stig Andersen Five Ways To Adapt To Declining Changing Paper Markets01 Stig Andersen Five Ways To Adapt To Declining Changing Paper Markets
01 Stig Andersen Five Ways To Adapt To Declining Changing Paper Markets
 
World Newspaper Congress 11, World Editors Forum 11, World Press Trends 2011,...
World Newspaper Congress 11, World Editors Forum 11, World Press Trends 2011,...World Newspaper Congress 11, World Editors Forum 11, World Press Trends 2011,...
World Newspaper Congress 11, World Editors Forum 11, World Press Trends 2011,...
 
Geospatially knowing the fire
Geospatially knowing the fireGeospatially knowing the fire
Geospatially knowing the fire
 
Poster presentation
Poster presentationPoster presentation
Poster presentation
 
04 heederik benzeno
04 heederik benzeno04 heederik benzeno
04 heederik benzeno
 
Impact of Agricultural Activities on Groundwater Quality and its Suitability ...
Impact of Agricultural Activities on Groundwater Quality and its Suitability ...Impact of Agricultural Activities on Groundwater Quality and its Suitability ...
Impact of Agricultural Activities on Groundwater Quality and its Suitability ...
 
Apstartup crowdfunding ver1
Apstartup crowdfunding ver1 Apstartup crowdfunding ver1
Apstartup crowdfunding ver1
 
Update on US Rail Transportation
Update on US Rail TransportationUpdate on US Rail Transportation
Update on US Rail Transportation
 
CA coordination in Zimbabwe. through the Zimbabwe CA taskforce ZWCATF. Michae...
CA coordination in Zimbabwe. through the Zimbabwe CA taskforce ZWCATF. Michae...CA coordination in Zimbabwe. through the Zimbabwe CA taskforce ZWCATF. Michae...
CA coordination in Zimbabwe. through the Zimbabwe CA taskforce ZWCATF. Michae...
 

Semelhante a Access to open data through open access articles in the life sciences

5 Dan Berman USDA Perspectiva Agricola entre Estados Unidos y México
5 Dan Berman USDA Perspectiva Agricola entre Estados Unidos y México5 Dan Berman USDA Perspectiva Agricola entre Estados Unidos y México
5 Dan Berman USDA Perspectiva Agricola entre Estados Unidos y México
Consejo MexicanodelaCarne
 
Solutions for the Texas Energy Shortage
Solutions for the Texas Energy Shortage Solutions for the Texas Energy Shortage
Solutions for the Texas Energy Shortage
Rick Borry
 
Analisis time series
Analisis time seriesAnalisis time series
Analisis time series
XYZ Williams
 
Ppt compressed sensing a tutorial
Ppt compressed sensing a tutorialPpt compressed sensing a tutorial
Ppt compressed sensing a tutorial
Terence Gao
 
SSI Event monetization method and Startups
SSI Event monetization method and StartupsSSI Event monetization method and Startups
SSI Event monetization method and Startups
01Booster
 
A Function by Any Other Name is a Function
A Function by Any Other Name is a FunctionA Function by Any Other Name is a Function
A Function by Any Other Name is a Function
Jason Strate
 
Detroit Work Project - Short Term Presentation
Detroit Work Project - Short Term PresentationDetroit Work Project - Short Term Presentation
Detroit Work Project - Short Term Presentation
stranflow
 
8.29.11.dwp short termpresentation
8.29.11.dwp short termpresentation8.29.11.dwp short termpresentation
8.29.11.dwp short termpresentation
mazimoyo
 
Automatic extraction and manual validation of a hierarchical English-Swedish ...
Automatic extraction and manual validation of a hierarchical English-Swedish ...Automatic extraction and manual validation of a hierarchical English-Swedish ...
Automatic extraction and manual validation of a hierarchical English-Swedish ...
Jody Foo
 
Png 492 pec final-presentation[1][1]
Png 492  pec final-presentation[1][1]Png 492  pec final-presentation[1][1]
Png 492 pec final-presentation[1][1]
nas-psu
 
Png 492 pec final-presentation
Png 492  pec final-presentationPng 492  pec final-presentation
Png 492 pec final-presentation
nas-psu
 
Shou qing wang
Shou qing wangShou qing wang
Shou qing wang
jenidoyle
 
Millionaire Chapter 1 OMaM
Millionaire Chapter 1 OMaMMillionaire Chapter 1 OMaM
Millionaire Chapter 1 OMaM
James Chubb
 

Semelhante a Access to open data through open access articles in the life sciences (20)

5 Dan Berman USDA Perspectiva Agricola entre Estados Unidos y México
5 Dan Berman USDA Perspectiva Agricola entre Estados Unidos y México5 Dan Berman USDA Perspectiva Agricola entre Estados Unidos y México
5 Dan Berman USDA Perspectiva Agricola entre Estados Unidos y México
 
Solutions for the Texas Energy Shortage
Solutions for the Texas Energy Shortage Solutions for the Texas Energy Shortage
Solutions for the Texas Energy Shortage
 
Bio Logical Mass Collaboration3
Bio Logical Mass Collaboration3Bio Logical Mass Collaboration3
Bio Logical Mass Collaboration3
 
Analisis time series
Analisis time seriesAnalisis time series
Analisis time series
 
Water Wednesday 2009 July George Ganf
Water Wednesday 2009 July George GanfWater Wednesday 2009 July George Ganf
Water Wednesday 2009 July George Ganf
 
Fish in the Mekong from a BFP point of view
Fish in the Mekong from a BFP point of viewFish in the Mekong from a BFP point of view
Fish in the Mekong from a BFP point of view
 
Ppt compressed sensing a tutorial
Ppt compressed sensing a tutorialPpt compressed sensing a tutorial
Ppt compressed sensing a tutorial
 
SSI Event monetization method and Startups
SSI Event monetization method and StartupsSSI Event monetization method and Startups
SSI Event monetization method and Startups
 
From Technology to Product
From Technology to ProductFrom Technology to Product
From Technology to Product
 
Dimensioning and Cost Structure Analysis of Wide Area Data Service Network - ...
Dimensioning and Cost Structure Analysis of Wide Area Data Service Network - ...Dimensioning and Cost Structure Analysis of Wide Area Data Service Network - ...
Dimensioning and Cost Structure Analysis of Wide Area Data Service Network - ...
 
A Function by Any Other Name is a Function
A Function by Any Other Name is a FunctionA Function by Any Other Name is a Function
A Function by Any Other Name is a Function
 
Presentacion Festival Agua Viva Canarias - Atun rojo
Presentacion Festival Agua Viva Canarias - Atun rojoPresentacion Festival Agua Viva Canarias - Atun rojo
Presentacion Festival Agua Viva Canarias - Atun rojo
 
Detroit Work Project - Short Term Presentation
Detroit Work Project - Short Term PresentationDetroit Work Project - Short Term Presentation
Detroit Work Project - Short Term Presentation
 
8.29.11.dwp short termpresentation
8.29.11.dwp short termpresentation8.29.11.dwp short termpresentation
8.29.11.dwp short termpresentation
 
Geographical Citizen Science
Geographical Citizen ScienceGeographical Citizen Science
Geographical Citizen Science
 
Automatic extraction and manual validation of a hierarchical English-Swedish ...
Automatic extraction and manual validation of a hierarchical English-Swedish ...Automatic extraction and manual validation of a hierarchical English-Swedish ...
Automatic extraction and manual validation of a hierarchical English-Swedish ...
 
Png 492 pec final-presentation[1][1]
Png 492  pec final-presentation[1][1]Png 492  pec final-presentation[1][1]
Png 492 pec final-presentation[1][1]
 
Png 492 pec final-presentation
Png 492  pec final-presentationPng 492  pec final-presentation
Png 492 pec final-presentation
 
Shou qing wang
Shou qing wangShou qing wang
Shou qing wang
 
Millionaire Chapter 1 OMaM
Millionaire Chapter 1 OMaMMillionaire Chapter 1 OMaM
Millionaire Chapter 1 OMaM
 

Mais de Conferência Luso-Brasileira de Ciência Aberta

Mais de Conferência Luso-Brasileira de Ciência Aberta (20)

Citações e métricas complementares: um estudo da sua correlação em artigos ci...
Citações e métricas complementares: um estudo da sua correlação em artigos ci...Citações e métricas complementares: um estudo da sua correlação em artigos ci...
Citações e métricas complementares: um estudo da sua correlação em artigos ci...
 
Pré-Workshop: Formação em Edição Eletrónica
Pré-Workshop: Formação em Edição EletrónicaPré-Workshop: Formação em Edição Eletrónica
Pré-Workshop: Formação em Edição Eletrónica
 
Análise relacional entre princípios FAIR de gestão de dados de pesquisa e nor...
Análise relacional entre princípios FAIR de gestão de dados de pesquisa e nor...Análise relacional entre princípios FAIR de gestão de dados de pesquisa e nor...
Análise relacional entre princípios FAIR de gestão de dados de pesquisa e nor...
 
10 anos RCAAP - ConfOA
10 anos RCAAP - ConfOA10 anos RCAAP - ConfOA
10 anos RCAAP - ConfOA
 
Programa de formação modular sobre Ciência Aberta
Programa de formação modular sobre Ciência AbertaPrograma de formação modular sobre Ciência Aberta
Programa de formação modular sobre Ciência Aberta
 
Análise da Produção Científica Brasileira em Periódicos de Acesso Aberto
Análise da Produção Científica Brasileira em Periódicos de Acesso AbertoAnálise da Produção Científica Brasileira em Periódicos de Acesso Aberto
Análise da Produção Científica Brasileira em Periódicos de Acesso Aberto
 
Acesso aberto como ferramenta para o empoderamento do paciente
Acesso aberto como ferramenta para o empoderamento do pacienteAcesso aberto como ferramenta para o empoderamento do paciente
Acesso aberto como ferramenta para o empoderamento do paciente
 
Livros eletrônicos, políticas de licenciamento e acesso aberto - relações con...
Livros eletrônicos, políticas de licenciamento e acesso aberto - relações con...Livros eletrônicos, políticas de licenciamento e acesso aberto - relações con...
Livros eletrônicos, políticas de licenciamento e acesso aberto - relações con...
 
Ciência aberta e revisão por pares aberta: aspectos e desafios da participaçã...
Ciência aberta e revisão por pares aberta: aspectos e desafios da participaçã...Ciência aberta e revisão por pares aberta: aspectos e desafios da participaçã...
Ciência aberta e revisão por pares aberta: aspectos e desafios da participaçã...
 
Melhorando a citabilidade de programas de computador para pesquisa com o Cita...
Melhorando a citabilidade de programas de computador para pesquisa com o Cita...Melhorando a citabilidade de programas de computador para pesquisa com o Cita...
Melhorando a citabilidade de programas de computador para pesquisa com o Cita...
 
Técnicas de Search Engine Optimization (SEO) aplicadas no site da Biblioteca ...
Técnicas de Search Engine Optimization (SEO) aplicadas no site da Biblioteca ...Técnicas de Search Engine Optimization (SEO) aplicadas no site da Biblioteca ...
Técnicas de Search Engine Optimization (SEO) aplicadas no site da Biblioteca ...
 
Café com Ciência – divulgação das publicações técnico-científicas em acesso a...
Café com Ciência – divulgação das publicações técnico-científicas em acesso a...Café com Ciência – divulgação das publicações técnico-científicas em acesso a...
Café com Ciência – divulgação das publicações técnico-científicas em acesso a...
 
Serviço Nacional de Registo de Identificadores DOI
Serviço Nacional de Registo de Identificadores DOIServiço Nacional de Registo de Identificadores DOI
Serviço Nacional de Registo de Identificadores DOI
 
Recursos educacionais abertos na Universidade Aberta. A rede como estratégia ...
Recursos educacionais abertos na Universidade Aberta. A rede como estratégia ...Recursos educacionais abertos na Universidade Aberta. A rede como estratégia ...
Recursos educacionais abertos na Universidade Aberta. A rede como estratégia ...
 
Infraestrutura OpenAIRE: desenvolvimentos para o fortalecimento da Ciência Ab...
Infraestrutura OpenAIRE: desenvolvimentos para o fortalecimento da Ciência Ab...Infraestrutura OpenAIRE: desenvolvimentos para o fortalecimento da Ciência Ab...
Infraestrutura OpenAIRE: desenvolvimentos para o fortalecimento da Ciência Ab...
 
Preservação digital, gestão de dados de pesquisa e biodversidade
Preservação digital, gestão de dados de pesquisa e biodversidadePreservação digital, gestão de dados de pesquisa e biodversidade
Preservação digital, gestão de dados de pesquisa e biodversidade
 
Dados governamentais na perspectiva da Ciência Aberta: potencialidades e desa...
Dados governamentais na perspectiva da Ciência Aberta: potencialidades e desa...Dados governamentais na perspectiva da Ciência Aberta: potencialidades e desa...
Dados governamentais na perspectiva da Ciência Aberta: potencialidades e desa...
 
Do acesso à informação aos Dados Parlamentares Abertos em Portugal
Do acesso à informação aos Dados Parlamentares Abertos em PortugalDo acesso à informação aos Dados Parlamentares Abertos em Portugal
Do acesso à informação aos Dados Parlamentares Abertos em Portugal
 
Transparência e Dados Abertos do Recife: Uma Estratégia Bem Sucedida de Publi...
Transparência e Dados Abertos do Recife: Uma Estratégia Bem Sucedida de Publi...Transparência e Dados Abertos do Recife: Uma Estratégia Bem Sucedida de Publi...
Transparência e Dados Abertos do Recife: Uma Estratégia Bem Sucedida de Publi...
 
Revistas científicas brasileiras de acesso aberto: qualidade do ponto de vist...
Revistas científicas brasileiras de acesso aberto: qualidade do ponto de vist...Revistas científicas brasileiras de acesso aberto: qualidade do ponto de vist...
Revistas científicas brasileiras de acesso aberto: qualidade do ponto de vist...
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Access to open data through open access articles in the life sciences

  • 1. Literature-Data Integration in the Life Sciences Lisbon, Oct 2nd 2012
  • 3. Europe PubMed Central 26 million abstracts 2.3 million full text articles Citation networks Database links Text-mining 2006 2011 2012 2016?
  • 4. How many open access articles in UKPMC? PubMed (995K) UKPMC (18%,182K) OA (9.6%, 96K) 200 200 200 200 200 200 200 200 200 20 20 Publication Date Total: 489,000 OA articles
  • 5. 45000 • Big data 300 European Nucleotide Archive Ensembl and Ensembl Genomes Nucleotides (millions) 40000 250 35000 • Thematic data 30000 200 Genomes 25000 150 20000 • Public data 15000 10000 100 50 • Archived data 5000 0 0 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Year 14000000 25000 Year 12000000 UniProt InterPro 20000 10000000 Entries • Two petabytes of data Entries 8000000 15000 • Scales to 7 pbs raw disk 6000000 10000 4000000 • Majority is DNA 5000 2000000 0 0 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Year Year 500000 70000 450000 ArrayExpress PDBe Hybridisations 400000 60000 Structures 350000 50000 300000 40000 250000 200000 30000 150000 20000 100000 10000 50000 0 0 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Year Year Figure 2. Growth of key resources
  • 6. Literature citation from data vs Data referal from literature
  • 7. PMC336623 Extended to several other biological data types
  • 8. Literature citation from data 800 K • Proteins • Nucleotides • OMIM • Chemicals • Structure • Clinical reviews 370 K • Protein families • Protein-protein interactions • Gene expression experiments 110 K
  • 9. Data referral from literature: text mining Semantic Type Unique Terms Articles Annotations Accession No. 233,017 66,356 387,787 Chemical 76,712 1,694,385 83,923,066 Disease 171,692 1,768,214 57,821,871 Gene/Protein 227,318 1,310,382 77,189,022 GO Terms 32,664 1,832,294 65,061,579 Organism 180,637 1,713,280 70,832,222 2.3 million articles
  • 10. Annotation of accession numbers (OA) 100 100 90 90 80 80 70 70 60 60 50 publisher-annotated 50 text-mined 40 40 30 30 20 20 10 10 0 0 ~10,000 articles >25,000 articles BMC Genomics: 1,484 TM tagged, 4,337 articles (1135 tagged) PLoS One: 4,226 TM tagged, 42,888 articles SenayKafkas and Jee-Hyub Kim
  • 11. Why is this important? Implications Scientific: Linking articles that cite the same data Citation: Data Citation as measure of impact (Thomson: Data citation index) Context of data citation: submission, reuse, analysis Operational: Services for publishers to improve Accession number tagging Editorial policies and adherence Extension of NLM DTD Lessons learned for considering unstructured data That we can perform this analysis at all highlights a benefit of Open Access
  • 12. Case Study of an FP7-funded article (1)
  • 13. Case Study of an FP7-funded article (2)
  • 14. Europe PubMed Central content map Abstract Full text Citing articles Unstructured Datasets Databases Extracted terms Citing articles
  • 15. AY387398: needle in a haystack
  • 16. Europe PubMed Central and Institutional Repositories: content matching Number of article IDs OpenAIRE plus **Coming soon: RESTful interface for data linked to articles
  • 17. People • Paula Buttery • Rebholz Group • Andrew Caines • Peter Stoehr • Norman Cobley • Yuci Gou • University of Manchester • SenayKafkas • British Library • JyothiKaturi • Oliver Kilian • OpenAIRE/OpenAIRE Plus • Jee-Hyub Kim • Nikos Marinos • NCBI, NLM • Jo McEntyre • Xingjun Pi • Philip Rossiter