SlideShare a Scribd company logo
1 of 52
CORE: Aggregating and Enriching
Content to Support Open Access
            Petr Knoth
        The Open University




              1/52
Outline
1. Aggregating Open Access (OA) publications – why, how, what
   for?
2. The CORE system
3. Supporting research in mining databases of scientific
   publications (DiggiCORE)




                              2/52
Outline
1. Aggregating Open Access (OA) publications – why, how, what
   for?
2. The CORE system
3. Supporting research in mining databases of scientific
   publications (DiggiCORE)




                             3/52
Growth of items in Open Access repositories




                         4/52
Growth of Open Access repositories




                         5/52
Growth of articles in OA journals




                           6/52
Growth of OA journals




                        7/52
Green Open Access - statistics




                       8/52
Why we need aggregations?
“Each individual repository is of limited value for research: the real
power of Open Access lies in the possibility of connecting and tying
together repositories, which is why we need interoperability. In
order to create a seamless layer of content through connected
repositories from around the world, Open Access relies on
interoperability, the ability for systems to communicate with each
other and pass information back and forth in a usable format.
Interoperability allows us to exploit today's computational power so
that we can aggregate, data mine, create new tools and
services, and generate new knowledge from repository content.’’
                                                   [COAR manifesto]


                                9/52
Access to information according to the level of abstraction




                  Metadata Transfer
                   Interoperability


                                      Metadata



                                                                         OLTP
                                                                                                  Analytical



                                                 Semantic Enrichment
Repository
                                                                                             information access




                                                                                Interfaces
                                         Aggregation
                                                                                                 Transaction
  Repository                                                                                 information access
                                      Content



                                                                         OLAP



                                                                                              Raw data access
Repository


                                                                       10/52
Who should be supported by aggregations?

The following users groups (divided according to the level of
abstraction of information they need):
   •   Raw data access.
   •   Transaction information access.
   •   Analytical information access.




                                    11/52
Who should be supported by aggregations?

• The following users groups (divided according to the level of
  abstraction of information they need):
   •   Raw data access. Developers, DLs, DL researchers, companies …
   •   Transaction information access. Researchers, students, life-long learners …
   •   Analytical information access. Funders, government, bussiness intelligence
       …




                                     12/52
Layers of an aggregation system


                                Interfaces

                 OLTP                           OLAP

                                  Enrichment

              Metadata                          Content

   Metadata Transfer Interoperability




                                        13/52
Layers of an aggregation system
                   APIs (REST, SOAP, XML-RPC), UIs, Dashboards    Statistics


                                Interfaces

                 OLTP                                OLAP

                                  Enrichment
                                                                 Catalog records
              Metadata                               Content

   Metadata Transfer Interoperability
                                                                   Annotations

    OAI-PMH, OAI-ORE …             Dublin Core, XML, RDF …       PDF, Word …


                                        14/52
Access to information according to the level of abstraction




                  Metadata Transfer
                   Interoperability


                                      Metadata



                                                              OLTP
Repository                                                                             Analytical
                                                                                  information access




                                                                     Interfaces
                                                 Enrichment
                                                                                      Transaction
  Repository                                                                      information access
                                      Content



                                                              OLAP


                                                                                   Raw data access
Repository


                                                          15/52
Related systems




     16/52
Aggregation projects – BASE



               Metadata Transfer
                Interoperability


                                   Metadata



                                                           OLTP
Repository                                                                          Analytical
                                                                               information access




                                                                  Interfaces
                                              Enrichment
                                                                                   Transaction
  Repository                                                                   information access
                                   Content



                                                           OLAP


                                                                                Raw data access
Repository


                                                       17/52
Aggregation projects – OAISter/WorldCAT



               Metadata Transfer
                Interoperability


                                   Metadata



                                                           OLTP
Repository                                                                          Analytical
                                                                               information access




                                                                  Interfaces
                                              Enrichment
                                                                                   Transaction
  Repository                                                                   information access
                                   Content



                                                           OLAP


                                                                                Raw data access
Repository


                                                       18/52
Aggregation projects – RepUK



               Metadata Transfer
                Interoperability


                                   Metadata



                                                           OLTP
Repository                                                                          Analytical
                                                                               information access




                                                                  Interfaces
                                              Enrichment
                                                                                   Transaction
  Repository                                                                   information access
                                   Content



                                                           OLAP


                                                                                Raw data access
Repository


                                                       19/52
Aggregations need access to content, not just metadata!

• Certain metadata types can be created only at the level of the
  aggregation
• Certain metadata can be changing in time
• Ensuring content:
   • accessibility
   • availability
   • validity
   • quality
   • …



                               20/52
Aggregation projects – CiteSeerX



               Metadata Transfer
                Interoperability


                                   Metadata



                                                           OLTP
Repository                                                                          Analytical
                                                                               information access




                                                                  Interfaces
                                              Enrichment
                                                                                   Transaction
  Repository                                                                   information access
                                   Content



                                                           OLAP


                                                                                Raw data access
Repository


                                                       21/52
Should an aggregation system support all three user types?

            Can be realised by more than one system
                          providing that
                    the dataset is the same!




                             22/52
Outline
1. Aggregating Open Access (OA) publications – why, how, what
   for?
2. The CORE system
3. Supporting research in mining databases of scientific
   publications (DiggiCORE)




                              23/52
CORE objectives
• CORE aims to provide a comprehensive technical infrastructure
  for Open Access scholarly publications that will support access
  and reuse of scholarly materials at different levels of abstraction.
• A nation-wide aggregation system that will improve the discovery
  of publications stored in British Open Access Repositories (OARs).




                                24/52
What does CORE provide at different aggregation levels?




                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                            Analytical
                                                                                 information access




                                                                    Interfaces
                                                Enrichment
                                                                                     Transaction
  Repository                                                                     information access
                                     Content



                                                             OLAP


                                                                                  Raw data access
Repository


                                                         25/52
CORE functionality




                     26/52
CORE functionality
Step 1: Metadata and full-text harvesting



                       Content harvesting, processing




                                    27/52
What does CORE provide at different aggregation levels?
                                                                    Semantic similarity, Citation
                                                                    extraction, classsification, …



                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                                Analytical
                                                                                     information access




                                                                      Interfaces
                                                Enrichment
                                                                                         Transaction
  Repository                                                                         information access
                                     Content



                                                             OLAP


                                                                                       Raw data access
Repository


                                                         28/52
CORE functionality
Step 2: Semantic enrichment




                                      Semantic enrichment




                              29/52
What does CORE provide at different aggregation levels?




                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                            Analytical
                                                                                 information access




                                                                    Interfaces
                                                Enrichment
                                                                                     Transaction
  Repository                                                                     information access
                                     Content



                                                             OLAP


                                                                                  Raw data access
Repository


                                                         30/52
CORE functionality
Step 3: Providing a set of services on top of the aggregation




                        Providing services




                                    31/52
CORE applications

 •   CORE Portal
 •   CORE Mobile
 •   CORE Plugin
 •   CORE API
 •   Repository Analytics




                            32/52
What does CORE provide at different aggregation levels?




                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                            Analytical
                                                                                 information access




                                                                    Interfaces
                                                Enrichment
                                                                                     Transaction
  Repository                                                                     information access
                                     Content



                                                             OLAP


                                                                                  Raw data access
Repository


                                                         33/52
CORE Applications
CORE Portal – Allows searching and navigating scientific publications
aggregated from Open Access repositories




                                   34/52
CORE Applications

CORE Mobile – Allows searching and
navigating scientific publications
aggregated from Open Access
repositories




                                35/52
CORE Applications
CORE Plugin – A plugin to system that recommendations for related
items.




                                 36/52
What does CORE provide at different aggregation levels?




                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                            Analytical
                                                                                 information access




                                                                    Interfaces
                                                Enrichment
                                                                                     Transaction
  Repository                                                                     information access
                                     Content



                                                             OLAP


                                                                                  Raw data access
Repository


                                                         37/52
CORE Applications
CORE API – Enables external systems and services to interact with the
CORE repository.




                                  38/52
What does CORE provide at different aggregation levels?




                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                            Analytical
                                                                                 information access




                                                                    Interfaces
                                                Enrichment
                                                                                     Transaction
  Repository                                                                     information access
                                     Content



                                                             OLAP


                                                                                  Raw data access
Repository


                                                         39/52
CORE Applications
Repository Analytics – is an analytical tool supporting providers of
open access content (in particular repository managers).




                                   40/52
What does CORE provide at different aggregation levels?

                                                                    Repository Analytics


                 Metadata Transfer
                  Interoperability


                                     Metadata



                                                             OLTP
Repository                                                                              Analytical
                                                                                   information access




                                                                     Interfaces
                                                Enrichment
                                                                     CORE Portal, CORE
                                                                     Mobile, CORE Plugin
                                                                                      Transaction
  Repository                                                                      information access
                                     Content



                                                             OLAP
                                                                                   CORE API

                                                                                    Raw data access
Repository


                                                         41/52
CORE statistics
• Content
   • 5.4M records
   • 192 repositories
   • 402k full-texts
• Started: February 2011
• Budget: 140k£




                           42/52
Outline
1. Aggregating Open Access (OA) publications – why, how, what
   for?
2. The CORE system
3. Supporting research in mining databases of scientific
   publications (          )




                              43/52
Partners




Advisory Board



                 44/52
Objective


Software for exploration and analysis of very large and
fast-growing amounts of research publications stored
across Open Access Repositories (OAR).




                           45/52
DiggiCORE networks




Three networks: (a) semantically related papers,
(b) citation network, (c) author citation network


                          46/52
DiggiCORE objectives

Allow researchers to use this platform to analyse
publications.
Why?
•   To identifying patterns in the behaviour of research
    communities
•   To detect trends in research disciplines
•   To gain new insights into the citation behaviour of researchers
•   To discover features that distinguish papers with high impact



                               47/52
Questions the system can help answering?
•   What are the attributes of impact publications?
•   Do these attributes differ in the humanities, social sciences and
    computer sciences?
•    What are the features of research groups within disciplines and
    how do these features relate to contributions generated by the
    group?
•   What are the attributes of high-impact authors and what is their
    role within the group?
•    What are the dynamics of successful research groups?



                                48/52
Questions the system can help answering?
•   What is the mechanism of cross-fertilisation within
    disciplines, especially between the humanities and the
    sciences?
•   Who are the authors whose work is worth monitoring because
    they contribute to the achievements of their own discipline and
    also inspire other disciplines?
•   How should the novice in the discipline get acquainted with key
    achievements in the discipline?
•    How should he/she search for the most important publications?



                               49/52
Summary
•   The rapid growth of OA content provides both an opportunity as
    well as a challenge.
•   Aggregations should serve the needs of different user groups.
•   Aggregations need to aggregate content, not just metadata.
•   We can have many services that are part of the
    infrastructure, but should work with the same data.




                               50/52
Thank you!




Yes we can!
   51/52
52/52

More Related Content

Similar to CORE: Aggregating and Enriching Content to Support Open Access

Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata HarvestingNikesh Narayanan
 
Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Jian Qin
 
Organic.Edunet Repository Tools
Organic.Edunet Repository ToolsOrganic.Edunet Repository Tools
Organic.Edunet Repository ToolsHannes Ebner
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEZalpa Rathod
 
OLAP & Data Warehouse
OLAP & Data WarehouseOLAP & Data Warehouse
OLAP & Data WarehouseZalpa Rathod
 
Enterprise linked data clouds
Enterprise linked data cloudsEnterprise linked data clouds
Enterprise linked data cloudsdamienjoyce
 
Contributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataContributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataMarcia Zeng
 
Text mining in CORE (OR2012)
Text mining in CORE (OR2012)Text mining in CORE (OR2012)
Text mining in CORE (OR2012)petrknoth
 
Net flowhadoop flocon2013_yhlee_final
Net flowhadoop flocon2013_yhlee_finalNet flowhadoop flocon2013_yhlee_final
Net flowhadoop flocon2013_yhlee_finalYeounhee Lee
 
ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides DuraSpace
 
CETIS09 OER Technical Roundtable
CETIS09 OER Technical Roundtable  CETIS09 OER Technical Roundtable
CETIS09 OER Technical Roundtable R. John Robertson
 
Data Mining: Data mining and key definitions
Data Mining: Data mining and key definitionsData Mining: Data mining and key definitions
Data Mining: Data mining and key definitionsDatamining Tools
 
Putting it all together for digital assets
Putting it all together for digital assetsPutting it all together for digital assets
Putting it all together for digital assetsJon Morley
 
The ARIADNE interoperability framework, component architecture and registry s...
The ARIADNE interoperability framework, component architecture and registry s...The ARIADNE interoperability framework, component architecture and registry s...
The ARIADNE interoperability framework, component architecture and registry s...ariadnenetwork
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceRobert H. McDonald
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiManish Gupta
 

Similar to CORE: Aggregating and Enriching Content to Support Open Access (20)

Open Archives Initiatives For Metadata Harvesting
Open Archives Initiatives For Metadata   HarvestingOpen Archives Initiatives For Metadata   Harvesting
Open Archives Initiatives For Metadata Harvesting
 
Metasearchers Benchmarking
Metasearchers BenchmarkingMetasearchers Benchmarking
Metasearchers Benchmarking
 
Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08Data repositories -- Xiamen University 2012 06-08
Data repositories -- Xiamen University 2012 06-08
 
Organic.Edunet Repository Tools
Organic.Edunet Repository ToolsOrganic.Edunet Repository Tools
Organic.Edunet Repository Tools
 
Digitisation and institutional repositories 3
Digitisation and institutional repositories 3Digitisation and institutional repositories 3
Digitisation and institutional repositories 3
 
OLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSEOLAP & DATA WAREHOUSE
OLAP & DATA WAREHOUSE
 
OLAP & Data Warehouse
OLAP & Data WarehouseOLAP & Data Warehouse
OLAP & Data Warehouse
 
Enterprise linked data clouds
Enterprise linked data cloudsEnterprise linked data clouds
Enterprise linked data clouds
 
Contributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library DataContributing to the Smart City Through Linked Library Data
Contributing to the Smart City Through Linked Library Data
 
OAI and OAI-PMH
OAI and OAI-PMHOAI and OAI-PMH
OAI and OAI-PMH
 
Text mining in CORE (OR2012)
Text mining in CORE (OR2012)Text mining in CORE (OR2012)
Text mining in CORE (OR2012)
 
Net flowhadoop flocon2013_yhlee_final
Net flowhadoop flocon2013_yhlee_finalNet flowhadoop flocon2013_yhlee_final
Net flowhadoop flocon2013_yhlee_final
 
ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides ESI Supplemental Webinar 2 - DataONE presentation slides
ESI Supplemental Webinar 2 - DataONE presentation slides
 
CETIS09 OER Technical Roundtable
CETIS09 OER Technical Roundtable  CETIS09 OER Technical Roundtable
CETIS09 OER Technical Roundtable
 
Data Mining: Data mining and key definitions
Data Mining: Data mining and key definitionsData Mining: Data mining and key definitions
Data Mining: Data mining and key definitions
 
Data Mining: Key definitions
Data Mining: Key definitionsData Mining: Key definitions
Data Mining: Key definitions
 
Putting it all together for digital assets
Putting it all together for digital assetsPutting it all together for digital assets
Putting it all together for digital assets
 
The ARIADNE interoperability framework, component architecture and registry s...
The ARIADNE interoperability framework, component architecture and registry s...The ARIADNE interoperability framework, component architecture and registry s...
The ARIADNE interoperability framework, component architecture and registry s...
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 

More from petrknoth

Qui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishingQui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishingpetrknoth
 
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in RepositoriesOAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositoriespetrknoth
 
UKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet themUKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet thempetrknoth
 
Enabling Educators to Locate High-Quality Teaching Resources
Enabling Educators to LocateHigh-Quality Teaching ResourcesEnabling Educators to LocateHigh-Quality Teaching Resources
Enabling Educators to Locate High-Quality Teaching Resourcespetrknoth
 
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository DashboardTracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository Dashboardpetrknoth
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...petrknoth
 
CORE Analytics Dashboard
CORE Analytics DashboardCORE Analytics Dashboard
CORE Analytics Dashboardpetrknoth
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...petrknoth
 
Analysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery toolsAnalysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery toolspetrknoth
 
Assessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access PolicyAssessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access Policypetrknoth
 
Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)petrknoth
 
Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure petrknoth
 
Towards effective research recommender systems for repositories
Towards effective research recommender systems for repositoriesTowards effective research recommender systems for repositories
Towards effective research recommender systems for repositoriespetrknoth
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...petrknoth
 
Seamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncSeamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncpetrknoth
 
Semantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research EvaluationSemantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research Evaluationpetrknoth
 
Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...petrknoth
 
My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?petrknoth
 
FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)petrknoth
 

More from petrknoth (20)

Qui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishingQui Bono? Cumulative advantage in open access publishing
Qui Bono? Cumulative advantage in open access publishing
 
CORE APIv3
CORE APIv3CORE APIv3
CORE APIv3
 
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in RepositoriesOAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
OAI Identifiers: Decentralised PIDs for Research Outputs in Repositories
 
UKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet themUKRI OA policy requirements for repositories and how to meet them
UKRI OA policy requirements for repositories and how to meet them
 
Enabling Educators to Locate High-Quality Teaching Resources
Enabling Educators to LocateHigh-Quality Teaching ResourcesEnabling Educators to LocateHigh-Quality Teaching Resources
Enabling Educators to Locate High-Quality Teaching Resources
 
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository DashboardTracking compliance of the REF2021 policy with the CORE Repository Dashboard
Tracking compliance of the REF2021 policy with the CORE Repository Dashboard
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
CORE Analytics Dashboard
CORE Analytics DashboardCORE Analytics Dashboard
CORE Analytics Dashboard
 
Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...Better together: building services for public good on top of content from the...
Better together: building services for public good on top of content from the...
 
Analysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery toolsAnalysing the performance of open access papers discovery tools
Analysing the performance of open access papers discovery tools
 
Assessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access PolicyAssessing Compliance with the UK REF 2021 Open Access Policy
Assessing Compliance with the UK REF 2021 Open Access Policy
 
Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)Data interoperability toolkit (OpenMinTeD)
Data interoperability toolkit (OpenMinTeD)
 
Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure Integrating research indicators for use in the repositories infrastructure
Integrating research indicators for use in the repositories infrastructure
 
Towards effective research recommender systems for repositories
Towards effective research recommender systems for repositoriesTowards effective research recommender systems for repositories
Towards effective research recommender systems for repositories
 
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...COAR Next Generation Repositories WG - Text mining and Recommender system sto...
COAR Next Generation Repositories WG - Text mining and Recommender system sto...
 
Seamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSyncSeamless access to the world’s open access research papers via ResourceSync
Seamless access to the world’s open access research papers via ResourceSync
 
Semantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research EvaluationSemantometrics: Towards Fulltext-based Research Evaluation
Semantometrics: Towards Fulltext-based Research Evaluation
 
Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...Aggregating Research papers from Publishers' Systems to Support Text and Data...
Aggregating Research papers from Publishers' Systems to Support Text and Data...
 
My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?My repository is being aggregated: a blessing or a curse?
My repository is being aggregated: a blessing or a curse?
 
FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)FOSTER - Content Delivery (WP3)
FOSTER - Content Delivery (WP3)
 

Recently uploaded

Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 

Recently uploaded (20)

Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 

CORE: Aggregating and Enriching Content to Support Open Access

  • 1. CORE: Aggregating and Enriching Content to Support Open Access Petr Knoth The Open University 1/52
  • 2. Outline 1. Aggregating Open Access (OA) publications – why, how, what for? 2. The CORE system 3. Supporting research in mining databases of scientific publications (DiggiCORE) 2/52
  • 3. Outline 1. Aggregating Open Access (OA) publications – why, how, what for? 2. The CORE system 3. Supporting research in mining databases of scientific publications (DiggiCORE) 3/52
  • 4. Growth of items in Open Access repositories 4/52
  • 5. Growth of Open Access repositories 5/52
  • 6. Growth of articles in OA journals 6/52
  • 7. Growth of OA journals 7/52
  • 8. Green Open Access - statistics 8/52
  • 9. Why we need aggregations? “Each individual repository is of limited value for research: the real power of Open Access lies in the possibility of connecting and tying together repositories, which is why we need interoperability. In order to create a seamless layer of content through connected repositories from around the world, Open Access relies on interoperability, the ability for systems to communicate with each other and pass information back and forth in a usable format. Interoperability allows us to exploit today's computational power so that we can aggregate, data mine, create new tools and services, and generate new knowledge from repository content.’’ [COAR manifesto] 9/52
  • 10. Access to information according to the level of abstraction Metadata Transfer Interoperability Metadata OLTP Analytical Semantic Enrichment Repository information access Interfaces Aggregation Transaction Repository information access Content OLAP Raw data access Repository 10/52
  • 11. Who should be supported by aggregations? The following users groups (divided according to the level of abstraction of information they need): • Raw data access. • Transaction information access. • Analytical information access. 11/52
  • 12. Who should be supported by aggregations? • The following users groups (divided according to the level of abstraction of information they need): • Raw data access. Developers, DLs, DL researchers, companies … • Transaction information access. Researchers, students, life-long learners … • Analytical information access. Funders, government, bussiness intelligence … 12/52
  • 13. Layers of an aggregation system Interfaces OLTP OLAP Enrichment Metadata Content Metadata Transfer Interoperability 13/52
  • 14. Layers of an aggregation system APIs (REST, SOAP, XML-RPC), UIs, Dashboards Statistics Interfaces OLTP OLAP Enrichment Catalog records Metadata Content Metadata Transfer Interoperability Annotations OAI-PMH, OAI-ORE … Dublin Core, XML, RDF … PDF, Word … 14/52
  • 15. Access to information according to the level of abstraction Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 15/52
  • 17. Aggregation projects – BASE Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 17/52
  • 18. Aggregation projects – OAISter/WorldCAT Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 18/52
  • 19. Aggregation projects – RepUK Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 19/52
  • 20. Aggregations need access to content, not just metadata! • Certain metadata types can be created only at the level of the aggregation • Certain metadata can be changing in time • Ensuring content: • accessibility • availability • validity • quality • … 20/52
  • 21. Aggregation projects – CiteSeerX Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 21/52
  • 22. Should an aggregation system support all three user types? Can be realised by more than one system providing that the dataset is the same! 22/52
  • 23. Outline 1. Aggregating Open Access (OA) publications – why, how, what for? 2. The CORE system 3. Supporting research in mining databases of scientific publications (DiggiCORE) 23/52
  • 24. CORE objectives • CORE aims to provide a comprehensive technical infrastructure for Open Access scholarly publications that will support access and reuse of scholarly materials at different levels of abstraction. • A nation-wide aggregation system that will improve the discovery of publications stored in British Open Access Repositories (OARs). 24/52
  • 25. What does CORE provide at different aggregation levels? Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 25/52
  • 27. CORE functionality Step 1: Metadata and full-text harvesting Content harvesting, processing 27/52
  • 28. What does CORE provide at different aggregation levels? Semantic similarity, Citation extraction, classsification, … Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 28/52
  • 29. CORE functionality Step 2: Semantic enrichment Semantic enrichment 29/52
  • 30. What does CORE provide at different aggregation levels? Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 30/52
  • 31. CORE functionality Step 3: Providing a set of services on top of the aggregation Providing services 31/52
  • 32. CORE applications • CORE Portal • CORE Mobile • CORE Plugin • CORE API • Repository Analytics 32/52
  • 33. What does CORE provide at different aggregation levels? Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 33/52
  • 34. CORE Applications CORE Portal – Allows searching and navigating scientific publications aggregated from Open Access repositories 34/52
  • 35. CORE Applications CORE Mobile – Allows searching and navigating scientific publications aggregated from Open Access repositories 35/52
  • 36. CORE Applications CORE Plugin – A plugin to system that recommendations for related items. 36/52
  • 37. What does CORE provide at different aggregation levels? Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 37/52
  • 38. CORE Applications CORE API – Enables external systems and services to interact with the CORE repository. 38/52
  • 39. What does CORE provide at different aggregation levels? Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment Transaction Repository information access Content OLAP Raw data access Repository 39/52
  • 40. CORE Applications Repository Analytics – is an analytical tool supporting providers of open access content (in particular repository managers). 40/52
  • 41. What does CORE provide at different aggregation levels? Repository Analytics Metadata Transfer Interoperability Metadata OLTP Repository Analytical information access Interfaces Enrichment CORE Portal, CORE Mobile, CORE Plugin Transaction Repository information access Content OLAP CORE API Raw data access Repository 41/52
  • 42. CORE statistics • Content • 5.4M records • 192 repositories • 402k full-texts • Started: February 2011 • Budget: 140k£ 42/52
  • 43. Outline 1. Aggregating Open Access (OA) publications – why, how, what for? 2. The CORE system 3. Supporting research in mining databases of scientific publications ( ) 43/52
  • 45. Objective Software for exploration and analysis of very large and fast-growing amounts of research publications stored across Open Access Repositories (OAR). 45/52
  • 46. DiggiCORE networks Three networks: (a) semantically related papers, (b) citation network, (c) author citation network 46/52
  • 47. DiggiCORE objectives Allow researchers to use this platform to analyse publications. Why? • To identifying patterns in the behaviour of research communities • To detect trends in research disciplines • To gain new insights into the citation behaviour of researchers • To discover features that distinguish papers with high impact 47/52
  • 48. Questions the system can help answering? • What are the attributes of impact publications? • Do these attributes differ in the humanities, social sciences and computer sciences? • What are the features of research groups within disciplines and how do these features relate to contributions generated by the group? • What are the attributes of high-impact authors and what is their role within the group? • What are the dynamics of successful research groups? 48/52
  • 49. Questions the system can help answering? • What is the mechanism of cross-fertilisation within disciplines, especially between the humanities and the sciences? • Who are the authors whose work is worth monitoring because they contribute to the achievements of their own discipline and also inspire other disciplines? • How should the novice in the discipline get acquainted with key achievements in the discipline? • How should he/she search for the most important publications? 49/52
  • 50. Summary • The rapid growth of OA content provides both an opportunity as well as a challenge. • Aggregations should serve the needs of different user groups. • Aggregations need to aggregate content, not just metadata. • We can have many services that are part of the infrastructure, but should work with the same data. 50/52
  • 51. Thank you! Yes we can! 51/52
  • 52. 52/52