SlideShare a Scribd company logo
1 of 18
The BlogForever Project
http://blogforever.eu
Vangelis Banos,
BlogForever Project Manager

MTSR 2013, 22 Nov 2013, Thessaloniki

1
Contents
The Disappearing Web
Web Archiving
The BlogForever Project

BlogForever Applications
MTSR 2013, 22 Nov 2013, Thessaloniki

2
Web content disappears

MTSR 2013, 22 Nov 2013, Thessaloniki

3
Web content disappears

MTSR 2013, 22 Nov 2013, Thessaloniki

4
Web content disappears

MTSR 2013, 22 Nov 2013, Thessaloniki

5
Web Archiving

The Internet
Archive comes
to the rescue!

MTSR 2013, 22 Nov 2013, Thessaloniki

6
Web Archiving
The process of collecting portions of the
World Wide Web to ensure the information is
preserved in an archive for future researchers,
historians, and the public.

MTSR 2013, 22 Nov 2013, Thessaloniki

7
The challenge of web archiving

File(s)

Software

Hardware

RECORD

Generic file archiving operation

MTSR 2013, 22 Nov 2013, Thessaloniki

8
The challenge of web archiving
File(s)
File(s)
Software

File(s)
File(s)

Software

???
Hardware

Website

Record(s)
???

File(s)
Software
File(s)
File(s)

Web archiving operation
MTSR 2013, 22 Nov 2013, Thessaloniki

9
We are focusing on blogs
 Blogs have become fairly established as an online
communication and web publishing tool.
 Hundreds of millions of blogs are published about every
conceivable subject.
Examples 12/9/2013
70+ million sites in the world
369 million people viewing more than
11.8 billion pages each month
38 million new posts and 62.3 million
new comments each month
136.5 million blogs
61 billion posts
83.7 million daily posts
MTSR 2013, 22 Nov 2013, Thessaloniki

10
Blog Archiving: Objectives & Concerns
 Blog characteristics:
 Database driven, dynamic websites,
 High frequency of updates,
 Special structure, metadata, semantics & communication
protocols,
 Highly interconnected,
 Quantity and range of resources,
 Ownership and DRM.

 Our aims:
 harvest, preserve, manage and reuse blogs and their
resources.
MTSR 2013, 22 Nov 2013, Thessaloniki

11
The BlogForever Project
 Collaborative EC funded project,
 Duration: 1 Mar 11’ – 31 Aug 13’,
 Aims: Theoretic and applied research on blog
archiving
 Coordinated by AUTH.
 Partners:

MTSR 2013, 22 Nov 2013, Thessaloniki

12
BlogForever project achievements
BlogForever has created a novel blog archiving approach.
It is not only about archiving pages. It is about archiving information
entities (posts, comments, authors, metadata, dates, pingbacks, etc.).

Blog modelling and
semantics

Preservation strategies

Cases studies and
validation

Implementation of the
BlogForever platform

MTSR 2013, 22 Nov 2013, Thessaloniki

13
BlogForever project achievements
Harvesting

Unstructured
information
Web services
Blog APIs

Blog crawlers






Real-time monitoring
Html data extraction engine
Spam filtering
Web services extraction
engine

Original data and
XML metadata

Web services
Web interface
Managing and reusing

Blog digital repository
Preserving

MTSR 2013, 22 Nov 2013, Thessaloniki









Digital preservation
Quality assurance
Collections curation
Public access APIs
Personalised services
Information retreival
Public web interface /
Browse, search,14
export
BlogForever Added Value
 BlogForever structures the archived blog content. BlogForever is
not only about archiving html pages. It is about archiving
information entities (posts, comments, authors, metadata,
dates, pingbacks, etc) based on a special data model.
 BlogForever is based on Invenio an open source state-of-the-art
digital library management system developed by CERN.

 Better metadata and higher information granularity.
 Open Standards and Interoperability (MARCXML, Web Services)
 Better management of archived information, increasing the
utility of the web archive.
 Easy to facilitate added value services e.g. analytics.
MTSR 2013, 22 Nov 2013, Thessaloniki

15
BlogForever Impact
Blog archiving methods and policies which
are reusable and generic.
A blog archiving solution that any institution
could use to preserve their collections of
blogs ensuring authenticity, integrity,
completeness, usability, long term accessibility
A blog archiving solution that any researcher
could use to gather, analyse and reuse blog
data.
MTSR 2013, 22 Nov 2013, Thessaloniki

16
BlogForever Applications
 CERN is currently implementing a high energy
physics blogs repository.
 AUTH is designing an academic blogs repository.
 The Linguistics Department of the University of
Hannover is doing a diachronic analysis on certain
linguistic and textual phenomena / features using
German blogs.
 The University of Warwick Computer Science
Department is doing social web analytics using blog
data.
MTSR 2013, 22 Nov 2013, Thessaloniki

17
Thank you!
Visit http://blogforever.eu
 Access all BlogForever Deliverables (Open Access).
 Download the Open Source BlogForever Platform.

Contact us:
 Project Manager: Vangelis Banos vbanos@gmail.com
 Exploitation Manager: Efstratios Arampatzis
sa@tero.gr

MTSR 2013, 22 Nov 2013, Thessaloniki

18

More Related Content

What's hot

The Needs of Archives: 16 (simple) rules for a better archival management
The Needs of Archives: 16 (simple) rules for a better archival managementThe Needs of Archives: 16 (simple) rules for a better archival management
The Needs of Archives: 16 (simple) rules for a better archival managementTom Cobbaert
 
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...Matthieu Bonicel
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Anna Perricci
 
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky ReichEDINA, University of Edinburgh
 
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...The Frick Collection
 

What's hot (8)

The Needs of Archives: 16 (simple) rules for a better archival management
The Needs of Archives: 16 (simple) rules for a better archival managementThe Needs of Archives: 16 (simple) rules for a better archival management
The Needs of Archives: 16 (simple) rules for a better archival management
 
marc portier_westtoer
marc portier_westtoermarc portier_westtoer
marc portier_westtoer
 
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...
Presentation of Biblissima at COST meeting Medioevo Europeo in Budapest Octob...
 
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
Human Scale Web Collecting for Individuals and Institutions (Webrecorder Work...
 
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich
'Digital Preservation of Academic Content: The CLOCKSS Archive' by Vicky Reich
 
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
 
Summer 2008 Conference Overview
Summer 2008 Conference OverviewSummer 2008 Conference Overview
Summer 2008 Conference Overview
 
Muehlberger umea google
Muehlberger umea googleMuehlberger umea google
Muehlberger umea google
 

Similar to BlogForever Project presentation at MTSR2013

Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Paolo Romano
 
VREs and Research Tools - supporting collaborative research
VREs and Research Tools - supporting collaborative researchVREs and Research Tools - supporting collaborative research
VREs and Research Tools - supporting collaborative researchChristopher Brown
 
PATHS at the eChallenges conference
PATHS at the eChallenges conferencePATHS at the eChallenges conference
PATHS at the eChallenges conferencepathsproject
 
Oregon Digital: Collaborative Hydra Development
Oregon Digital: Collaborative Hydra DevelopmentOregon Digital: Collaborative Hydra Development
Oregon Digital: Collaborative Hydra DevelopmentKaren Estlund
 
Preserving a MOOC - Russell Boyatt
Preserving a MOOC - Russell BoyattPreserving a MOOC - Russell Boyatt
Preserving a MOOC - Russell BoyattRepository Fringe
 
W3C Library Linked Data Incubator Group
W3C Library Linked Data Incubator GroupW3C Library Linked Data Incubator Group
W3C Library Linked Data Incubator GroupAntoine Isaac
 
Exposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway ProtocolExposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway ProtocolElectronic Resources & Libraries
 
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...dannyijwest
 
Validation of services, data and metadata
Validation of services, data and metadataValidation of services, data and metadata
Validation of services, data and metadataLuis Bermudez
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival TechnologiesCliff Landis
 
KOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyKOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyVassilis Protonotarios
 
Strategies for Expanding eJournal Preservation
Strategies for Expanding eJournal PreservationStrategies for Expanding eJournal Preservation
Strategies for Expanding eJournal PreservationNASIG
 
The Learning Registry: Social networking for open educational resources?
The Learning Registry: Social networking for open educational resources?The Learning Registry: Social networking for open educational resources?
The Learning Registry: Social networking for open educational resources?Lorna Campbell
 
BlogForever project presentation
BlogForever project presentationBlogForever project presentation
BlogForever project presentationBlogForever
 
Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...Jenny Mitcham
 

Similar to BlogForever Project presentation at MTSR2013 (20)

Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?Scientific Workflows: what do we have, what do we miss?
Scientific Workflows: what do we have, what do we miss?
 
VREs and Research Tools - supporting collaborative research
VREs and Research Tools - supporting collaborative researchVREs and Research Tools - supporting collaborative research
VREs and Research Tools - supporting collaborative research
 
PATHS at the eChallenges conference
PATHS at the eChallenges conferencePATHS at the eChallenges conference
PATHS at the eChallenges conference
 
Oregon Digital: Collaborative Hydra Development
Oregon Digital: Collaborative Hydra DevelopmentOregon Digital: Collaborative Hydra Development
Oregon Digital: Collaborative Hydra Development
 
Intro-EOSC.pptx
Intro-EOSC.pptxIntro-EOSC.pptx
Intro-EOSC.pptx
 
Preserving a MOOC - Russell Boyatt
Preserving a MOOC - Russell BoyattPreserving a MOOC - Russell Boyatt
Preserving a MOOC - Russell Boyatt
 
dh_specialist_interview
dh_specialist_interviewdh_specialist_interview
dh_specialist_interview
 
W3C Library Linked Data Incubator Group
W3C Library Linked Data Incubator GroupW3C Library Linked Data Incubator Group
W3C Library Linked Data Incubator Group
 
Exposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway ProtocolExposing Library Content with the NISO Metasearch XML Gateway Protocol
Exposing Library Content with the NISO Metasearch XML Gateway Protocol
 
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
An Approach to Owl Concept Extraction and Integration Across Multiple Ontolog...
 
Validation of services, data and metadata
Validation of services, data and metadataValidation of services, data and metadata
Validation of services, data and metadata
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival Technologies
 
KOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet OntologyKOS Management - The case of the Organic.Edunet Ontology
KOS Management - The case of the Organic.Edunet Ontology
 
Strategies for Expanding eJournal Preservation
Strategies for Expanding eJournal PreservationStrategies for Expanding eJournal Preservation
Strategies for Expanding eJournal Preservation
 
Knowledge Organization Systems (KOS): Management of Classification Systems in...
Knowledge Organization Systems (KOS): Management of Classification Systems in...Knowledge Organization Systems (KOS): Management of Classification Systems in...
Knowledge Organization Systems (KOS): Management of Classification Systems in...
 
The Learning Registry: Social networking for open educational resources?
The Learning Registry: Social networking for open educational resources?The Learning Registry: Social networking for open educational resources?
The Learning Registry: Social networking for open educational resources?
 
BlogForever project presentation
BlogForever project presentationBlogForever project presentation
BlogForever project presentation
 
Bergstrom, Carpenter, Jakobsen, Jurczyk, McKenna, Morris, and Nadav-Manes "C...
Bergstrom, Carpenter, Jakobsen, Jurczyk, McKenna, Morris, and Nadav-Manes  "C...Bergstrom, Carpenter, Jakobsen, Jurczyk, McKenna, Morris, and Nadav-Manes  "C...
Bergstrom, Carpenter, Jakobsen, Jurczyk, McKenna, Morris, and Nadav-Manes "C...
 
Caa2015 2 a_gattiglia
Caa2015 2 a_gattigliaCaa2015 2 a_gattiglia
Caa2015 2 a_gattiglia
 
Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...
 

More from eimgreece

Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ
Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ  ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥΗ ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ  ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ
Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥeimgreece
 
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών eimgreece
 
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυα
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυαΜετρολογία για έξυπνα Ηλεκτρικά δίκτυα
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυαeimgreece
 
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...eimgreece
 
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)eimgreece
 
Eim brochure-gr
Eim brochure-grEim brochure-gr
Eim brochure-greimgreece
 

More from eimgreece (6)

Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ
Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ  ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥΗ ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ  ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ
Η ΔΙΑΔΟΣΗ ΤΟΥ ΧΡΟΝΟΥ ΜΕΣΩ ΔΙΑΔΙΚΤΥΟΥ
 
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών
Διαδικτυακή εφαρμογή βαθμονόμησης δεξαμενών
 
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυα
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυαΜετρολογία για έξυπνα Ηλεκτρικά δίκτυα
Μετρολογία για έξυπνα Ηλεκτρικά δίκτυα
 
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...
Παρουσίαση του Ευρωπαϊκού Προγράμματος EMRP ENG04 "METROLOGY FOR SMART ELECTR...
 
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)
The Time and Frequency Laboratory of the Hellenic Institute of Metrology (EIM)
 
Eim brochure-gr
Eim brochure-grEim brochure-gr
Eim brochure-gr
 

Recently uploaded

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 

BlogForever Project presentation at MTSR2013

  • 1. The BlogForever Project http://blogforever.eu Vangelis Banos, BlogForever Project Manager MTSR 2013, 22 Nov 2013, Thessaloniki 1
  • 2. Contents The Disappearing Web Web Archiving The BlogForever Project BlogForever Applications MTSR 2013, 22 Nov 2013, Thessaloniki 2
  • 3. Web content disappears MTSR 2013, 22 Nov 2013, Thessaloniki 3
  • 4. Web content disappears MTSR 2013, 22 Nov 2013, Thessaloniki 4
  • 5. Web content disappears MTSR 2013, 22 Nov 2013, Thessaloniki 5
  • 6. Web Archiving The Internet Archive comes to the rescue! MTSR 2013, 22 Nov 2013, Thessaloniki 6
  • 7. Web Archiving The process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. MTSR 2013, 22 Nov 2013, Thessaloniki 7
  • 8. The challenge of web archiving File(s) Software Hardware RECORD Generic file archiving operation MTSR 2013, 22 Nov 2013, Thessaloniki 8
  • 9. The challenge of web archiving File(s) File(s) Software File(s) File(s) Software ??? Hardware Website Record(s) ??? File(s) Software File(s) File(s) Web archiving operation MTSR 2013, 22 Nov 2013, Thessaloniki 9
  • 10. We are focusing on blogs  Blogs have become fairly established as an online communication and web publishing tool.  Hundreds of millions of blogs are published about every conceivable subject. Examples 12/9/2013 70+ million sites in the world 369 million people viewing more than 11.8 billion pages each month 38 million new posts and 62.3 million new comments each month 136.5 million blogs 61 billion posts 83.7 million daily posts MTSR 2013, 22 Nov 2013, Thessaloniki 10
  • 11. Blog Archiving: Objectives & Concerns  Blog characteristics:  Database driven, dynamic websites,  High frequency of updates,  Special structure, metadata, semantics & communication protocols,  Highly interconnected,  Quantity and range of resources,  Ownership and DRM.  Our aims:  harvest, preserve, manage and reuse blogs and their resources. MTSR 2013, 22 Nov 2013, Thessaloniki 11
  • 12. The BlogForever Project  Collaborative EC funded project,  Duration: 1 Mar 11’ – 31 Aug 13’,  Aims: Theoretic and applied research on blog archiving  Coordinated by AUTH.  Partners: MTSR 2013, 22 Nov 2013, Thessaloniki 12
  • 13. BlogForever project achievements BlogForever has created a novel blog archiving approach. It is not only about archiving pages. It is about archiving information entities (posts, comments, authors, metadata, dates, pingbacks, etc.). Blog modelling and semantics Preservation strategies Cases studies and validation Implementation of the BlogForever platform MTSR 2013, 22 Nov 2013, Thessaloniki 13
  • 14. BlogForever project achievements Harvesting Unstructured information Web services Blog APIs Blog crawlers     Real-time monitoring Html data extraction engine Spam filtering Web services extraction engine Original data and XML metadata Web services Web interface Managing and reusing Blog digital repository Preserving MTSR 2013, 22 Nov 2013, Thessaloniki        Digital preservation Quality assurance Collections curation Public access APIs Personalised services Information retreival Public web interface / Browse, search,14 export
  • 15. BlogForever Added Value  BlogForever structures the archived blog content. BlogForever is not only about archiving html pages. It is about archiving information entities (posts, comments, authors, metadata, dates, pingbacks, etc) based on a special data model.  BlogForever is based on Invenio an open source state-of-the-art digital library management system developed by CERN.  Better metadata and higher information granularity.  Open Standards and Interoperability (MARCXML, Web Services)  Better management of archived information, increasing the utility of the web archive.  Easy to facilitate added value services e.g. analytics. MTSR 2013, 22 Nov 2013, Thessaloniki 15
  • 16. BlogForever Impact Blog archiving methods and policies which are reusable and generic. A blog archiving solution that any institution could use to preserve their collections of blogs ensuring authenticity, integrity, completeness, usability, long term accessibility A blog archiving solution that any researcher could use to gather, analyse and reuse blog data. MTSR 2013, 22 Nov 2013, Thessaloniki 16
  • 17. BlogForever Applications  CERN is currently implementing a high energy physics blogs repository.  AUTH is designing an academic blogs repository.  The Linguistics Department of the University of Hannover is doing a diachronic analysis on certain linguistic and textual phenomena / features using German blogs.  The University of Warwick Computer Science Department is doing social web analytics using blog data. MTSR 2013, 22 Nov 2013, Thessaloniki 17
  • 18. Thank you! Visit http://blogforever.eu  Access all BlogForever Deliverables (Open Access).  Download the Open Source BlogForever Platform. Contact us:  Project Manager: Vangelis Banos vbanos@gmail.com  Exploitation Manager: Efstratios Arampatzis sa@tero.gr MTSR 2013, 22 Nov 2013, Thessaloniki 18

Editor's Notes

  1. The key BlogForever project goals were fully achieved during the time span of the project, during a series of theoretical and applied research tasks.Initially, BlogForever focused on studying weblog structure and semantics, and started developing preservation strategies for weblogs.Later the focus gradually moved to implement the BlogForever platform as well as interoperability prospects and digital rights management strategies.An important aspect of the project was also the design and implementation of extensive case studies of variable complexity and size, to validate and test the BlogForever platform.BlogForever createdan exciting new system to harvest, preserve and manage blog content, developing new insights through its restructuring and reuse. Towards this, it has stepped into yet uncharted territories of theoretical and practical aspects of blog preservation; it first researched blog structure and semantics; it then defined solid blog preservation policies and developed a robust blog preservation software platform; finally it validated the platform through specific case studies using real world data.
  2. After working on what to preserve and how to preserve it, we present how we implemented blog preservation.