SlideShare uma empresa Scribd logo
1 de 27
Baixar para ler offline
Webarchiv
Czech web archive of National Library of the Czech Republic
Curatorial approaches, topic collections and cooperation with the research
communities
Acquisition of web resources – curatorial approaches
● What to archive?
● What will be important in the future?
● User community?
● How to build archive content?
word cloud of all domains in Webarchiv
History
• 2000 – project of National Library of the Czech Republic, Moravian Library
and Masaryk University
• 2001 – first archived website
• 2005 – regular harvesting of content
• 2007 – joining the IIPC – International Internet Preservation Consortium
one of our oldest archival copy - website of Charles University
archive copy of National Library of the Czech Republic website
Today
• 385 TB archived data
• 9,5 billion digital objects (text, images, audio and video objects, software,
scripts, etc.)
• 3,5 people in the department + 1 IT guy
Legal Issues
• Copyright act – Library Licence allows library to make a reproduction of a
work for its own archiving and conservation purposes
• Legal deposit act – does not cover born digital documents
• Online access – based on contract with publishers or on Creative Commons
licence
Software
• crawler: Heritrix 3.4
• access: Open Wayback 2.3.1
• curators: Seeder (developed in-house, available on github https://github.com/webarchivcz/)
Seeder – software for managing electronic resources, websites and harvests
Collection policy
• Comprehensive harvests
• Selective harvests
• Topic collections
Comprehensive harvests
• contract with czech domain provider CZ.NIC
• once or twice a year crawl of the whole .cz domain
• accessible only in the library
• 1,4 millions of second order domains / domain.cz
• maximum of 5000 harvested files per site
“Archived versions of this page are only available from the Reference Centre of the National Library
of the Czech Republic.”
Selective harvests
• selective approach
• bohemical character (territory, language, authorship, topic/content), not only on czech domain
• resources with historical, scientific or cultural value
• curated resources
• online access – contract or Creative Commons
• more than 5000 archived websites with online access
• crawled periodically
• maximum of 15 000 harvested files per site
Curators – workflow
• selecting and evaluating resources
• contracting with publishers
• cataloging (RDA rules, conspectus method)
• access and quality assurance
Topic collections
collections of resources related to certain event or topic
deeper capture of the topic in electronic resources
current events – harvesting usually in several stages: before, during and after the event
• planned: elections, anniversaries
• unexpected: floods, terrorist attacks
long-term collections – continuous harvesting
• Creative Commons, Periodical publications, Charles University
collaboration with IIPC (Olympics and Paralympics, Climate Change, Artificial Intelligence)
IIPC - collaborative collection 2018 Winter Olympics and Paralympics
Challenges
● make the archive as accessible to the public as possible (legislative
restrictions)
● collection policy – collection profiling
● full-text search
● development of tools for working with archived data, big data and metadata
● cooperation with the research communities
Current cooperation with the researchers / institutions
● methodological support for building own archives
The National Archives, Czech Academy of Sciences, Office for supervision of economic affairs
of political parties and political movements
● topic collections
The National Archives – Public Authorities
● archiving specific resources
Czech Language Institute of the Czech Academy of Sciences (periodical publications)
Development of centralized interface for extracting big data
from web archives – research project
● Webarchiv – National Library of the Czech Republic
● University of West Bohemia – Faculty of Applied Sciences, The Department of Cybernetics
● Institute of Sociology of the Czech Academy of Sciences
First steps:
● legislative analysis
● index analysis
● analysis of provenance, authenticity
and technical parameters of archive data
● workshop for researchers
analysis of file formats in Webarchiv
project accepted into the program of the Ministry of Culture which helps to support applied research and experimental development of national
and cultural identity (NAKI)
Thank you for your attention
Marie Haškovcová
www.webarchiv.cz
www.facebook.com/webarchivcz
marie.haskovcova@nkp.cz

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

2017 IIIF Conference - The Vatican - SACHA
2017 IIIF Conference - The Vatican - SACHA2017 IIIF Conference - The Vatican - SACHA
2017 IIIF Conference - The Vatican - SACHA
 
What can libraries do for researchers?
What can libraries do for researchers?What can libraries do for researchers?
What can libraries do for researchers?
 
The integration and management of archaeological datasets: the Europeana proj...
The integration and management of archaeological datasets: the Europeana proj...The integration and management of archaeological datasets: the Europeana proj...
The integration and management of archaeological datasets: the Europeana proj...
 
ARACHNE at the German Archaeological Institute (DAI)
ARACHNE at the German Archaeological Institute (DAI)ARACHNE at the German Archaeological Institute (DAI)
ARACHNE at the German Archaeological Institute (DAI)
 
Do MORe with your data
Do MORe with your dataDo MORe with your data
Do MORe with your data
 
Increasing Visibility of Cultural Heritage Objects: A Case of Turkish Conten...
Increasing Visibility of Cultural Heritage Objects:  A Case of Turkish Conten...Increasing Visibility of Cultural Heritage Objects:  A Case of Turkish Conten...
Increasing Visibility of Cultural Heritage Objects: A Case of Turkish Conten...
 
The Europeana Newspapers Project
The Europeana Newspapers ProjectThe Europeana Newspapers Project
The Europeana Newspapers Project
 
Once upon a time there was a website - Archiving websites for the Musicologic...
Once upon a time there was a website - Archiving websites for the Musicologic...Once upon a time there was a website - Archiving websites for the Musicologic...
Once upon a time there was a website - Archiving websites for the Musicologic...
 
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
Libraries in the Big Data Era: Strategies and Challenges in Archiving and Sha...
 
LoCloud Collections, or how to make your local heritage available on-line
LoCloud Collections, or how to make your local heritage available on-lineLoCloud Collections, or how to make your local heritage available on-line
LoCloud Collections, or how to make your local heritage available on-line
 
Scaling up to archive the UK Web. Helen Hockx-Yu
Scaling up to archive the UK Web. Helen Hockx-YuScaling up to archive the UK Web. Helen Hockx-Yu
Scaling up to archive the UK Web. Helen Hockx-Yu
 
Sharing cultural heritage the linked open data way: why you should sign up
Sharing cultural heritage the linked open data way: why you should sign up Sharing cultural heritage the linked open data way: why you should sign up
Sharing cultural heritage the linked open data way: why you should sign up
 
IIIF at europeana, IIIF conference, Vatican, 2017
IIIF at europeana, IIIF conference, Vatican, 2017IIIF at europeana, IIIF conference, Vatican, 2017
IIIF at europeana, IIIF conference, Vatican, 2017
 
Digital Archiving at the Meertens Institute
Digital Archiving at the Meertens InstituteDigital Archiving at the Meertens Institute
Digital Archiving at the Meertens Institute
 
IIIF and Mirador at the YCBA: image based scholarly collaboration and research
IIIF and Mirador at the YCBA: image based scholarly collaboration and researchIIIF and Mirador at the YCBA: image based scholarly collaboration and research
IIIF and Mirador at the YCBA: image based scholarly collaboration and research
 
Digitising Hansard
Digitising HansardDigitising Hansard
Digitising Hansard
 
LoCloud Overview
LoCloud OverviewLoCloud Overview
LoCloud Overview
 
Eaa2014 Opportunities and Challenges with Open Access and Open Data in the UK
Eaa2014 Opportunities and Challenges with Open Access and Open Data in the UKEaa2014 Opportunities and Challenges with Open Access and Open Data in the UK
Eaa2014 Opportunities and Challenges with Open Access and Open Data in the UK
 
The LoCloud lightweight digital library and alternative content sources, Adam...
The LoCloud lightweight digital library and alternative content sources, Adam...The LoCloud lightweight digital library and alternative content sources, Adam...
The LoCloud lightweight digital library and alternative content sources, Adam...
 
AddressingHistory - Tracing the Past
AddressingHistory - Tracing the PastAddressingHistory - Tracing the Past
AddressingHistory - Tracing the Past
 

Semelhante a Webarchiv - Curatorial approaches, topic collections and cooperation with the research communities

Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria
 
Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02
The European Library
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
Europeana Newspapers
 

Semelhante a Webarchiv - Curatorial approaches, topic collections and cooperation with the research communities (20)

ARCLib project presentation from Pasig 2016
ARCLib project presentation from Pasig 2016ARCLib project presentation from Pasig 2016
ARCLib project presentation from Pasig 2016
 
Investigating the PROMISE of a Belgian web archive
Investigating the PROMISE of a Belgian web archive Investigating the PROMISE of a Belgian web archive
Investigating the PROMISE of a Belgian web archive
 
Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)Web@rchive Austria (Archiving Online Media)
Web@rchive Austria (Archiving Online Media)
 
Developing a webarchiving strategy for national movements in Flanders
Developing a webarchiving strategy for national movements in FlandersDeveloping a webarchiving strategy for national movements in Flanders
Developing a webarchiving strategy for national movements in Flanders
 
Europeana Libraries: bringing content to the researcher
Europeana Libraries: bringing content to the researcherEuropeana Libraries: bringing content to the researcher
Europeana Libraries: bringing content to the researcher
 
Introduction to LoCloud
Introduction to LoCloudIntroduction to LoCloud
Introduction to LoCloud
 
G00 holy locloud_introduction
G00 holy locloud_introductionG00 holy locloud_introduction
G00 holy locloud_introduction
 
G00 holy locloud_introduction
G00 holy locloud_introductionG00 holy locloud_introduction
G00 holy locloud_introduction
 
Digitization Of Audiovisual Collections
Digitization Of Audiovisual CollectionsDigitization Of Audiovisual Collections
Digitization Of Audiovisual Collections
 
Introduction to British Library digital resources for social scientists
Introduction to British Library digital resources for social scientistsIntroduction to British Library digital resources for social scientists
Introduction to British Library digital resources for social scientists
 
Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02Dunning seedi-2013-130517083015-phpapp02
Dunning seedi-2013-130517083015-phpapp02
 
You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?You’ve Digitised Your Collection. What Next ?
You’ve Digitised Your Collection. What Next ?
 
You've Digitised. What Next ?
You've Digitised. What Next ?You've Digitised. What Next ?
You've Digitised. What Next ?
 
Ktisis: Building an Open Access Institutional and Cultural Repository
Ktisis: Building an Open Access Institutional and Cultural RepositoryKtisis: Building an Open Access Institutional and Cultural Repository
Ktisis: Building an Open Access Institutional and Cultural Repository
 
LIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers ProjectLIBER, Europeana and the Europeana Newspapers Project
LIBER, Europeana and the Europeana Newspapers Project
 
Dag Hensten - Nasjonalmuseet collections online
Dag Hensten - Nasjonalmuseet collections onlineDag Hensten - Nasjonalmuseet collections online
Dag Hensten - Nasjonalmuseet collections online
 
Introduction of the project "Books Discovered Once Again"
Introduction of the project "Books Discovered Once Again" Introduction of the project "Books Discovered Once Again"
Introduction of the project "Books Discovered Once Again"
 
Jussi Nuorteva - Power of Open Data in Archives
Jussi Nuorteva - Power of Open Data in Archives Jussi Nuorteva - Power of Open Data in Archives
Jussi Nuorteva - Power of Open Data in Archives
 
Estermann Wikidata GLAM Example Projects 20170914
Estermann Wikidata GLAM Example Projects 20170914Estermann Wikidata GLAM Example Projects 20170914
Estermann Wikidata GLAM Example Projects 20170914
 
Museums and Europeana
Museums and EuropeanaMuseums and Europeana
Museums and Europeana
 

Mais de Webarchive of National Library of the Czech Republic

Mais de Webarchive of National Library of the Czech Republic (20)

Inzerat - datovy analytik / datova analyticka
Inzerat - datovy analytik / datova analyticka Inzerat - datovy analytik / datova analyticka
Inzerat - datovy analytik / datova analyticka
 
Inzerát datovy analytik_wa
Inzerát datovy analytik_waInzerát datovy analytik_wa
Inzerát datovy analytik_wa
 
Sys admin wa_rvv
Sys admin wa_rvvSys admin wa_rvv
Sys admin wa_rvv
 
Volné pracovní místo - kurátor/ka webového archivu
Volné pracovní místo - kurátor/ka webového archivuVolné pracovní místo - kurátor/ka webového archivu
Volné pracovní místo - kurátor/ka webového archivu
 
Volné místo - analytik českého webového archivu
Volné místo - analytik českého webového archivuVolné místo - analytik českého webového archivu
Volné místo - analytik českého webového archivu
 
Webarchiv aneb až po lokty v mrtvolách
Webarchiv aneb až po lokty v mrtvoláchWebarchiv aneb až po lokty v mrtvolách
Webarchiv aneb až po lokty v mrtvolách
 
Kurz webové archivace 2018/2
Kurz webové archivace 2018/2Kurz webové archivace 2018/2
Kurz webové archivace 2018/2
 
Blok expertu
Blok expertuBlok expertu
Blok expertu
 
Kurz webové archivace 2018/1
Kurz webové archivace 2018/1Kurz webové archivace 2018/1
Kurz webové archivace 2018/1
 
Webarchiv
WebarchivWebarchiv
Webarchiv
 
Datovy analytik
Datovy analytikDatovy analytik
Datovy analytik
 
Kurz webové archivace 2017/4
Kurz webové archivace 2017/4Kurz webové archivace 2017/4
Kurz webové archivace 2017/4
 
Kurz webové archivace 2017/3
Kurz webové archivace 2017/3Kurz webové archivace 2017/3
Kurz webové archivace 2017/3
 
Kurz webové archivace 2017/2
Kurz webové archivace 2017/2Kurz webové archivace 2017/2
Kurz webové archivace 2017/2
 
Kurz webové archivace 2017/1
Kurz webové archivace 2017/1Kurz webové archivace 2017/1
Kurz webové archivace 2017/1
 
Tematické kolekce jako měřítko kvality webových archivů
Tematické kolekce jako měřítko kvality webových archivůTematické kolekce jako měřítko kvality webových archivů
Tematické kolekce jako měřítko kvality webových archivů
 
WARC 1.1 je skoro tady - co přinese nová verze?
WARC 1.1 je skoro tady - co přinese nová verze?WARC 1.1 je skoro tady - co přinese nová verze?
WARC 1.1 je skoro tady - co přinese nová verze?
 
WARC 1.1 je skoro tady - co přinese nová verze
WARC 1.1 je skoro tady - co přinese nová verzeWARC 1.1 je skoro tady - co přinese nová verze
WARC 1.1 je skoro tady - co přinese nová verze
 
Mezi snem a realitou. Otevřená data českého webového archivu.
Mezi snem a realitou. Otevřená data českého webového archivu.Mezi snem a realitou. Otevřená data českého webového archivu.
Mezi snem a realitou. Otevřená data českého webového archivu.
 
Český webový archiv
Český webový archivČeský webový archiv
Český webový archiv
 

Último

原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
ydyuyu
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
pxcywzqs
 
一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理
F
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Monica Sydney
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
ayvbos
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理
F
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
ydyuyu
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
ayvbos
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 

Último (20)

原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
原版制作美国爱荷华大学毕业证(iowa毕业证书)学位证网上存档可查
 
Best SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency DallasBest SEO Services Company in Dallas | Best SEO Agency Dallas
Best SEO Services Company in Dallas | Best SEO Agency Dallas
 
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
一比一原版(Offer)康考迪亚大学毕业证学位证靠谱定制
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理一比一原版田纳西大学毕业证如何办理
一比一原版田纳西大学毕业证如何办理
 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
Tadepalligudem Escorts Service Girl ^ 9332606886, WhatsApp Anytime Tadepallig...
 
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi EscortsRussian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
Russian Escort Abu Dhabi 0503464457 Abu DHabi Escorts
 
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
一比一原版(Flinders毕业证书)弗林德斯大学毕业证原件一模一样
 
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime BalliaBallia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
Ballia Escorts Service Girl ^ 9332606886, WhatsApp Anytime Ballia
 
Call girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girlsCall girls Service in Ajman 0505086370 Ajman call girls
Call girls Service in Ajman 0505086370 Ajman call girls
 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
 
一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理一比一原版奥兹学院毕业证如何办理
一比一原版奥兹学院毕业证如何办理
 
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call GirlsMira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
Mira Road Housewife Call Girls 07506202331, Nalasopara Call Girls
 
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...Local Call Girls in Seoni  9332606886 HOT & SEXY Models beautiful and charmin...
Local Call Girls in Seoni 9332606886 HOT & SEXY Models beautiful and charmin...
 
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查在线制作约克大学毕业证(yu毕业证)在读证明认证可查
在线制作约克大学毕业证(yu毕业证)在读证明认证可查
 
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
一比一原版(Curtin毕业证书)科廷大学毕业证原件一模一样
 
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Dindigul [ 7014168258 ] Call Me For Genuine Models ...
 

Webarchiv - Curatorial approaches, topic collections and cooperation with the research communities

  • 1. Webarchiv Czech web archive of National Library of the Czech Republic Curatorial approaches, topic collections and cooperation with the research communities
  • 2.
  • 3. Acquisition of web resources – curatorial approaches ● What to archive? ● What will be important in the future? ● User community? ● How to build archive content? word cloud of all domains in Webarchiv
  • 4. History • 2000 – project of National Library of the Czech Republic, Moravian Library and Masaryk University • 2001 – first archived website • 2005 – regular harvesting of content • 2007 – joining the IIPC – International Internet Preservation Consortium
  • 5. one of our oldest archival copy - website of Charles University
  • 6. archive copy of National Library of the Czech Republic website
  • 7. Today • 385 TB archived data • 9,5 billion digital objects (text, images, audio and video objects, software, scripts, etc.) • 3,5 people in the department + 1 IT guy
  • 8.
  • 9.
  • 10. Legal Issues • Copyright act – Library Licence allows library to make a reproduction of a work for its own archiving and conservation purposes • Legal deposit act – does not cover born digital documents • Online access – based on contract with publishers or on Creative Commons licence
  • 11. Software • crawler: Heritrix 3.4 • access: Open Wayback 2.3.1 • curators: Seeder (developed in-house, available on github https://github.com/webarchivcz/)
  • 12. Seeder – software for managing electronic resources, websites and harvests
  • 13. Collection policy • Comprehensive harvests • Selective harvests • Topic collections
  • 14. Comprehensive harvests • contract with czech domain provider CZ.NIC • once or twice a year crawl of the whole .cz domain • accessible only in the library • 1,4 millions of second order domains / domain.cz • maximum of 5000 harvested files per site
  • 15. “Archived versions of this page are only available from the Reference Centre of the National Library of the Czech Republic.”
  • 16. Selective harvests • selective approach • bohemical character (territory, language, authorship, topic/content), not only on czech domain • resources with historical, scientific or cultural value • curated resources • online access – contract or Creative Commons • more than 5000 archived websites with online access • crawled periodically • maximum of 15 000 harvested files per site
  • 17.
  • 18. Curators – workflow • selecting and evaluating resources • contracting with publishers • cataloging (RDA rules, conspectus method) • access and quality assurance
  • 19. Topic collections collections of resources related to certain event or topic deeper capture of the topic in electronic resources current events – harvesting usually in several stages: before, during and after the event • planned: elections, anniversaries • unexpected: floods, terrorist attacks long-term collections – continuous harvesting • Creative Commons, Periodical publications, Charles University collaboration with IIPC (Olympics and Paralympics, Climate Change, Artificial Intelligence)
  • 20.
  • 21. IIPC - collaborative collection 2018 Winter Olympics and Paralympics
  • 22. Challenges ● make the archive as accessible to the public as possible (legislative restrictions) ● collection policy – collection profiling ● full-text search ● development of tools for working with archived data, big data and metadata ● cooperation with the research communities
  • 23. Current cooperation with the researchers / institutions ● methodological support for building own archives The National Archives, Czech Academy of Sciences, Office for supervision of economic affairs of political parties and political movements ● topic collections The National Archives – Public Authorities ● archiving specific resources Czech Language Institute of the Czech Academy of Sciences (periodical publications)
  • 24.
  • 25. Development of centralized interface for extracting big data from web archives – research project ● Webarchiv – National Library of the Czech Republic ● University of West Bohemia – Faculty of Applied Sciences, The Department of Cybernetics ● Institute of Sociology of the Czech Academy of Sciences First steps: ● legislative analysis ● index analysis ● analysis of provenance, authenticity and technical parameters of archive data ● workshop for researchers analysis of file formats in Webarchiv project accepted into the program of the Ministry of Culture which helps to support applied research and experimental development of national and cultural identity (NAKI)
  • 26.
  • 27. Thank you for your attention Marie Haškovcová www.webarchiv.cz www.facebook.com/webarchivcz marie.haskovcova@nkp.cz