Lecture at the three-week Summer School "Data Curation" for archaeologists from Sudan, Yemen, Libya, Palestine and Tunisia at the German Archaeological Institute in Berlin from 16 July to 5 August 2017.
The workshop was planned together with the Arab League Educational, Cultural and Scientific Organization (ALECSO), as well as with the Sudanese Anti-National Service for NCAM (National Corporation for Antiquities and Museums).
2. AGENDA
2
1. Research Data Center – IANUS
2. Digital Research Data in Ancient Studies
3. Data Formats
4. Problems & Challenges
5. Data Management
6. Save – Back Up – Archive
7. Best Practices
3. 3
1. WHAT IANUS IS
»» financed by the DFG
»» coordination for the community
»» 2011–2014: requirements analysis, inspections, conception
»» 2015–2017: implementing, test operations, start archiving
»» ab 2018: regular operations
»» 9 employees (4 FTE, 5 HTE)
›› project coordinators & public relations
›› data curators
›› software developers
4. 4
1. WHO IANUS IS
Verband der
Landesarchäologen
in der Bundesrepublik
Deutschland
5. 5
1. WHOM IANUS ADRESSES
exemplary diciplines in ancient studies in Germany
6. 6
1. WHOM IANUS ADRESSES
ancient studies-institutions in Germany
7. 7
›› create an infrastructure to archive
existing data for the future
›› raise awerness for the reusability
of (research) data
›› support the sciences by providing
easy access to the data
›› enable researchers & projects to
manage their data in a sustainable
& sensible way
›› become a national adress fo IT-related questions in ancient
studies
1. AIMS OF IANUS
8. 8
future core tasks
›› long-term preservation
›› giving access
›› registry for archaeological
ressources („German ArchSearch“)
›› education & training
›› project support
›› it-recommendations
1. CORE TASKS OF IANUS
14. 14
variety of disciplines
›› archaeology
›› philology
›› ancient history
›› anthropology
›› archaeometry
›› construction history
›› material sciences
›› ...
Fächervielfalt in der virtuellen Fachbibliothek Propylaeum
https://www.propylaeum.de/altertumswissenschaften/presse/
2. DIGITAL DATA IN ANCIENT STUDIES
15. 15
variety of methods and questions
›› documentation
›› excavation
›› survey & prospection
›› architecture documentation
›› sampling
›› conservation & restoration
›› mapping
›› ... CT-Scan einer Mumie
https://news.usc.edu/files/2013/03/Mummy-CT-Scan.jpg
Napoleon in Ägypten (1798-1801)
http://www.ingolfo.de/800px-bonaparte-aux-pyramides_680_508.jpg
2. DIGITAL DATA IN ANCIENT STUDIES
17. 17
variety of data and documents
›› mark-up text
›› photogrammetry
›› raster images
›› 3D / virtual reality
›› tables
›› statistics
›› ...
Rekonstruktionen der Satet-Tempel auf Elephantine
http://proceedings.caaconference.org/paper/42_ferschin_et_al_caa2007/
Prähistorische Steinaxt mit und ohne Textur
https://www.culturartis.de/home/portfolio/3d-scan-und-druck/
2. DIGITAL DATA IN ANCIENT STUDIES
19. 19
3. DATA FORMATS
What are digital (research) data?
›› digitized analog data sets
›› digital born data
Where are the digital (research) data generated?
›› research & projects
›› management / administration
›› other work processes
What kinds are there?
›› unprocessed / primary (raw) data
›› processed / secondary data
›› published & unpublished finalized data (results)
20. 20
test data survey from 19 data collections
»» live-data, i.e. not prepared for archiving
›› no systematic data selection, format validation,
labelling of files / folders
›› no complete documentation, metadata, licences, etc.
›› often only parts of a larger data collections
Projekt-Nr Projekt-Name Institution Datum Datentransfer Meta-
Daten
Umfang
(MB)
Anzahl
Dateien
Anzahl
Formate
2013-001_TEST Taganrog DAI Zentrale, Berlin 23. Mai. 2013
nach Rücksprache kopiert
aus DAI Cloud
ja 84.130 21.566 56
2013-002_TEST Milet, Faustina-Thermen DAI Zentrale, Berlin 16. Mai. 2013
nach Rücksprache kopiert
aus DAI Cloud
nein 97.885 27.401 97
2013-003_TEST Pergamon DAI Istanbul 14. Jun. 2013
nach Rücksprache kopiert
aus DAI Cloud
ja 89.472 30.139 229
2013-004_TEST Tell Zira'a
DAI NatWiss-Referat,
Berlin
14. Feb. 2013
FileServer
(DAI interner Server)
ja 99 42 5
2013-005_TEST Wendel
Neanderthal-Museum /
NESPOS, Mettmann
6. Feb. 2013
Webportal
(Dropbox)
ja 2.008 2.192 4
2013-006_TEST Troja Universität Tübingen 27. Jun. 2013 Festplatte per Post nein 302.060 134.228 82
2013-007_TEST Altägyptisches Wörterbuch BBAW Berlin 16. Mai. 2013
Webportal
(mydrive.ch)
nein 273 11 2
2013-008_TEST Aleppo, Virtual Archaeology HTW Berlin 15. Jul. 2013 Festplatte per Post ja 126.362 3.278 6
2013-009_TEST
Archäometriedatenbank
München
Prähistorische
Sammlung München
5. Mär. 2013 DVD per Post nein 1.100 8.571 107
2013-010_TEST Burgen im Rheinland
LVR Rheinland,
10. Mai. 2013 email ja 3 14 5
3. DATA FORMATS
21. 21
quantities in total
»» 684,9 GByte disk space
»» 237.403 files in 7.537 folders
»» max. directory depth: 12 levels
»» 462 file formats
average of an archaeological project
»» 38 GByte disk space
»» 12.425 files in 380 folders
»» max. directory depth: 4 levels
»» 40 file formats
3. DATA FORMATS
23. 23
Reduce
»» diversity and complexity in preferred & accepted file formats
»» definition of significant properties with regard to content
and technical charateristics
»» non-proprietary, software independent, open formats
»» in relevant formats for community
à development of requirements / guidelines for producers / data
providers in order to submit data in a suitable form
3. DATA FORMATS
24. 24
3. DATA FORMATS
AIP – Archive Format DIP – Presentation Format
PDF/A-1 pdf preferred pdf/A-2 pdf/A
PDF/A-2 pdf preferred pdf/A-2 pdf/A
PDF/A-3 pdf accepted pdf/A-2 + additional files pdf/A
Other PDF-Variants pdf accepted pdf/A-2 pdf/A
Portable Document Format (PDF/A) pdf preferred pdf/A pdf/A
Other PDF-Variants pdf accepted pdf/A-2 pdf/A
OpenDocument Format odt preferred odt + pdf/A odt, pdf/A
Microsoft Office XML docx preferred docx + pdf/A docx, pdf/A
Microsoft Word doc accepted docx + pdf/A docx, pdf/A
Rich Text Format rtf accepted docx + pdf/A docx, pdf/A
Open Office XML sxw accepted odt + pdf/A odt, pdf/A
Plain Text txt preferred txt txt
Structured Text, Markup
xml, sgml, html, etc. +
dtd, xsd, etc.
preferred xml, sgml, html, etc. + dtd, xsd, etc. xml, sgml, html, etc. + dtd, xsd, etc.
Baseline TIFF v. 6, uncompressed tiff, tif preferred tiff (uncompressed v.6) jpeg
Adobe Digital Negative dng preferred dng dng, jpeg
Portable Network Graphic png accepted tiff (uncompressed v.6) png
Joint Photographic Expert Group jpeg, jpg accepted tiff (uncompressed v.6) jpeg
Graphics Interchange Format gif accepted tiff (uncompressed v.6) png
Windows Bitmap bmp accepted tiff (uncompressed v.6) png
Photoshop (Adobe) psd accepted tiff (uncompressed v.6) png, jpeg
CorelPaint cpt accepted tiff (uncompressed v.6) png, jpeg
JPEG2000 jp2, jpx accepted tiff (uncompressed v.6) jp2, jpx, jpeg
RAW image format nef, crw, etc. accepted dng jpeg
Scalable Vector Graphics 1.1,
uncompressed
svg preferred svg svg
Computer Graphics Metafile cgm accepted svg svg
WebCGM cgm accepted svg svg
Drawing Interchange Format (Autodesk) dxf accepted dxf (2010 AC1024) dxf
Drawing (Autodesk) dwg accepted dxf (2010 AC1024) dxf
DATA FORMATS & DATA MIGRATION
– May 2017 –
PDF-
DOCUMENTS
TEXTS/DOCUMENTS
SIP – Delivery Format
RASTERGRAPHICSGRAPHICS
26. 26
4. PROBLEMS & CHALLENGES
„Digital information lasts forever —
or for five years, which ever comes first.“
Jeff Rothenberg, RAND Corp. 1997
27. 27
4. PROBLEMS & CHALLENGES
Zusammenstellung unterschiedlicher Speichermedien durch Archaeology Data Service in York / UK
technical readability
»» aging of storage media
29. 29
4. PROBLEMS & CHALLENGES
as regards content comprehensibility
»» answers to questions like: who, what, when, how and why?
»» incomplete documentation
»» missing or unstructured metadata
»» implicit & explicit information / meanings
„Implizite Semantik - Tagging - strukturierte Metadaten“ am Beispiel von Schlüssel; http://dokmagazin.de/ueber-die-bedeutung-semantischer-metadaten-und-war-
um-ihre-generierung-nicht-einfach-maschinen-und-algorithmen-ueberlassen-werden-sollte/
30. 30
as regards content readability
»» different structure & naming
4. PROBLEMS & CHALLENGES
31. 31
4. PROBLEMS & CHALLENGES
Conclusions
scientifc data in ancient studies is highly
»» unique because they describe individual, non-reproducible
objects and contexts
»» durable because they have beyond the limits of projects –
high scientifc relevance
»» distributed and disparate as players and use in administration,
tourism, science and education is very different
»» heterogeneous in content and form (different disciplines)
»» at risk because specialized concepts and infrastructures to
sustainable management of digital data are missing
»» sustainable reusable, if these are structured, described
(metadata) and documented in a standardized manner
33. 33
5. DATA MANAGEMENT
What is Data Management?
»» Data management is the development, execution and
supervision of plans, policies, programs and practices that
control, protect, deliver and enhance the value of data and
information assets over time.
Why should you take care?
»» In order to ensure that stored / archived digital data can be used,
understood, and applied not only today, but also tomorrow.
34. 34
5. DATA MANAGEMENT
Aims of (Research) Data Management
»» development and implementation of methods, procedures,
guidelines and best practices
»» clear appropriate and responsibilities, sustainable data
documentation
»» uniform, non-personal organization of the data
»» efficient handling of own and foreign data
»» minimize the risk of data loss
»» cross-institutional data usage
35. 35
5. DATA MANAGEMENT
Benefits and Value
»» Transfer of knowledge to others irrespective of individuals,
projects and institutions
»» Preservation of primary and secondary data for the future,
not only by publications
»» Allow reuse of data for new tasks, questions and methods
»» Cost reduction in the generation of new data and avoid
redundant data collections
»» More efficient work due to better interoperability and exchange
»» Compliance with legal requirements, such as the obligation to
keep information
»» Increase the relevance of own work through increased visibility
36. 36
5. DATA MANAGEMENT
Checklist for a Data Management Plan, v4.0
Please cite as: DCC. (2013). Checklist for a Data Management Plan. v.4.0. Edinburgh: Digital Curation
Centre. Available online: http://www.dcc.ac.uk/resources/data-management-plans
DCC Checklist DCC Guidance and questions to consider
Administrative Data
ID A pertinent ID as determined by the funder and/or institution.
Funder State research funder if relevant
Grant Reference
Number
Enter grant reference number if applicable [POST-AWARD DMPs ONLY]
Project Name If applying for funding, state the name exactly as in the grant proposal.
Project Description Questions to consider:
- What is the nature of your research project?
- What research questions are you addressing?
- For what purpose are the data being collected or created?
Guidance:
Briefly summarise the type of study (or studies) to help others understand the purposes
for which the data are being collected or created.
PI / Researcher Name of Principal Investigator(s) or main researcher(s) on the project.
PI / Researcher ID E.g ORCID http://orcid.org/
Project Data Contact Name (if different to above), telephone and email contact details
Date of First Version Date the first version of the DMP was completed
Date of Last Update Date the DMP was last changed
Related Policies Questions to consider:
- Are there any existing procedures that you will base your approach on?
- Does your department/group have data management guidelines?
- Does your institution have a data protection or security policy that you will follow?
- Does your institution have a Research Data Management (RDM) policy?
- Does your funder have a Research Data Management policy?
- Are there any formal standards that you will adopt?
Guidance:
List any other relevant funder, institutional, departmental or group policies on data
management, data sharing and data security. Some of the information you give in the
37. 37
5. DATA MANAGEMENT
Categories of (Research) Data Management Plans
»» frameworks and administrative information
›› conditions, objectives, project promoters, etc.
»» responsibilities
›› assure conditions, backups, permission, integrity of data, etc.
»» legal aspects
›› data covered by copyright / protection, how documented,
requirements for publishing the data, which license for third
parties, etc.
»» methods
›› used methods, guidelines / requirements, which documentation
method, affect the method the amount of data, etc.
38. 38
5. DATA MANAGEMENT
»» specifications, guidelines and standards
›› check for laws, regulations, infrastructure, standards, etc.,
quality of the data, etc.
»» costs
›› kind of personnel / storage / infrastructure / tools / electricity,
for reproducible data: storage vs. recovery, etc.
»» external partners or service providers
›› coop with whom, implications, exchange, rights of data, etc.
»» hardware and software
›› what is available, special needs, fulfillment of requirements,
check replacement of paid software by open source, etc.
39. 39
5. DATA MANAGEMENT
»» data types & data formats
›› methods – types – formats, requirements of data (archive,
reuse) open / proprietary, implications for hard- / software, etc.
»» reuse of existing data
›› existing data by own / third parties, access / reuse options,
»» creation of new data
›› decision of unique / reproducable, sensitive / protective data, ...
»» amount of data
›› expectation, versioning, consequences for storage / backup /
archive
40. 40
5. DATA MANAGEMENT
»» file storage / file backup
›› necessary actions, where (hard disk, server), determination
number of redundant copies, current anti virus software,
›› backup
intervalls by whom / how / how often, responsibility, overwrite
protection (read only), check data integrity/completeness
›› disaster management, recovery management been rehearsed,
»» file management
›› how files ordered / named / versionned, namimg rules, handling
of different file version, repository structure documented, etc.
41. 41
5. DATA MANAGEMENT
»» documentation
›› understandable describtion of data for short / longterm, kind
of information, time, requirements, changes & updates, how to
store / save / archive metadata, exceptions, support tools,
provenance etc.
»» quality assurance
›› critera for existing standards, data are accurate / consistent /
authentic / complete, clearly documented (who did what for
what purpose), checklists, activities against accidental
deletion / manipulation of data, etc.
»» data exchange
›› between whom and how, requirements rights / restrictions / -
technical infrastructure, access policy, rights of use, exchange
formats, etc.
42. 42
5. DATA MANAGEMENT
»» medium term data storage
›› reasons for keeping data, requirements time / locations, how,
selction must / should – kept / deleted, access rights, how long,
where, responsibility for keeping the data, costs, etc.
»» longterm data storage (archiving)
›› selection, criteria for selection, suitable archive solution,
contact to an existing archive, who is doing what, etc.
»» accessibility & reuse
›› how should the data accessible, what additional information
to understand the data, who can use, which licence, are there
restrictions, etc.
43. 43
5. DATA MANAGEMENT
Conclusion
»» document your
›› methods, terms, systems and questions
»» use common standards and define working rules
»» make your data explicit, not implicit
»» implement (research) data management plans
»» structure your data in a comprehensible way
»» involve all relevant actors and describe workflows
à the higher the data quality is the easier it can be archived for the
future and the better it can be reused by anyone
46. 46
differentiation – terms / concepts
»» different storage concepts
›› save —
transfer data from a working memory of a programm or a RAM
of a computer to a disk drive (mainly computer internal)
6. SAVE – BACKUP – ARCHIVE
47. 47
differentiation – terms / concepts
»» different storage concepts
›› backup —
copy of saved data (sync to second instance of redundant data)
for disaster-recover reasons (mainly on external drive / network)
6. SAVE – BACKUP – ARCHIVE
48. 48
differentiation – terms / concepts
»» different storage concepts
›› (longterm) archiving —
preservation of digital information, to enable / gurantee the
long time accessibility for the re-use of data,
incl. bitstream preservation, i.e. physical conservation of a
given bit sequence
6. SAVE – BACKUP – ARCHIVE
51. 51
Guides to Good Practice
»» published by
›› Archaeology Data Service (ADS), United Kingdom
›› The Digital Archaeological Record (tDAR), USA
»» central web portal with information about
›› the application of IT in archaeology
›› adressing all phases of a data lifecycle
›› collect, curate and promote exsiting standards, including
practical help to apply them (e.g. tutorials, templates, tools,
best practice examples)
›› wiki to enable collaborative development on the standards
and guides
7. BEST PRACTICES
53. 53
7. BEST PRACTICES
Data Management Plans
»» published by
›› DMPOnline (DCC), United Kingdom
›› DMPTool, university of California, USA
›› Data Management Plans
54. 54
7. BEST PRACTICES
Digital Preservation
»» published by Digital Preservation Coalition (DPC), UK
›› information about
›› tools
›› preservation strategies
›› technical solutions
›› ...
55. 55
FURTHER INFORMATIONEN
IT-Recommedations (only in german)
»» https://www.ianus-fdz.de/it-empfehlungen
Guides to Good Practice
»» http://guides.archaeologydataservice.ac.uk/
Data Management Plans
»» DMP à http://www.dcc.ac.uk/resources/data-management-plans
»» Data Managemen Planing Tool à https://dmptool.org/
»» Data Management Plan Online à https://dmponline.dcc.ac.uk/
Digital Preservation Coalition
»» http://www.dpconline.org/knowledge-base
56. https://www.ianus-fdz.de
THANK YOU !
Forschungsdatenzentrum
Archäologie &
Altertumswissenschaften
Austausch
Digitale Daten
Forschung
Nachnutzung
Archivierung
Planung
Datenerhaltung
Metadaten
Dokumentation
IT-Empfehlungen
IANUS
c/o Deutsches Archäologisches Institut
Podbielskiallee 69-71
D-14195 Berlin
Tel.: +49-(0)30-187711-359
Project Leaders
Prof. Dr. Friederike Fless
Prof. Dr. Ortwin Dally
Project Coordinators
Maurice Heinrich
Dr. Felix F. Schäfer
Further Informations
homepage: https://www.ianus-fdz.de
twitter: @Ianus_fdz
facebook: IANUS-Forschungsdatenzentrum
youtube: IANUS-Forschungsdatenzentrum