SlideShare uma empresa Scribd logo
1 de 45
Preservation Capability Miscellany
By Ross Spencer
Twitter: @beet_keeper
A brief ‘provenance’ note

2014-06-20: Play It Again Conference Report:
http://bit.ly/2d8Bnw0
(playitagain.org)
2014-11-25: The Reality of Digital Transfer:
http://bit.ly/2ctxocQ
(slideshare.net)
We (Archives NZ) have got quite far
 But
there's still a lot more to do

So let's remind ourselves: What is the point?
● Work in concert with agencies and their consultants.
● Generate better information and records management
● Cleaner transfers...
● Create a more open and transparent government where the digital record is
concerned...
● DIA’s line... Support New Zealanders to build strong communities by providing
access to trusted information and knowledge.
And! Digital Preservation
● At this point in time, idiomatic methods of preservation are still forming...
● Whatever the future of archival custodianship...
● Or the future of digital preservation...
● Techniques need to be developed to support agencies with information and records
management, and memory institutes with long-term custodianship.
● Don't fall into the processing trap...
What can we identify as important?
● Infrastructure/team, supported by the organisation
● Some things work, some don’t; some change... be flexible.
● Work iteratively...
● Look at what you can do...
● Continue to develop... evidence, real use-cases
Is it all there for us..?
No, but we have a good foundation

Policy...
●Has been a constant in my time here.
●Was a draw to me starting in NZ
●Sets the rules by which we can play

●Literally, play: bend don’t break
● Achieved through careful stakeholder consultation and consideration of
impact.
●Sign-off process at director level.
●Two favourite policies, checksum, pre-conditioning.
Team...
●We could always do with more people

●But we recognise that we've been allowed more folk dedicated to this
than some places.
●The team is supported in their decision making and their skills.
●Breakdown: Curious; driven; up-to-date; drive to ‘solve’ born-digital
transfer; different but complementary skills
 *passion*!
●(And opinionated! ;-) )
●It doesn’t always look that way but there is a certain amount of leeway
from IT support too...
Technology...?
Rosetta by Ex-Libris: is the Long-term preservation system, it allows us to manage some
quite complex bits 'n' pieces
 but:
●Does not yet enable transfer from Agency-to-Archives (it supports)
●Is not a clearing house for records
●Spot preservation risks up-front
●Doesn't 'do' sentencing

●Does not build ingest packages

●Does not 'do' archival description...
●Does not contain every tool under the sun to handle all the file formats

Machine Learning: http://nautil.us/blog/the-fundamental-limits-of-machine-learning
The processes we need are biased toward transfer
and ingest

Rosetta can only help so much

||----------------||---------------------------------------------------------------------------------------------------||
Creation Transfer (Life of a record ~25 years) Life of an archive ~∞
The other processes we will still need will be
about (active) long term custodianship

Rosetta is still only beginning that journey...
The miscellany in this presentation...
A story about the tools that can help us...
● Technical Registries (of practice)
● DROID/Siegfried Analysis Report
● Fuzzy Hashes
With everything we need to do

We cannot action it all at the same time...
Knowledge needs to remain alive and accessible, record it:
Source: https://commons.wikimedia.org/wiki/Category:Kanban#/media/File:Simple_Task_Kanban.jpg
Trello: is one option...
Features...
● Kanban
● Teams
● Ownership
● Visibility
● Accessibility
● Reduce transitory records
● Create temporality
● Centralize knowledge
● Invite external colleagues
DROID/Siegfried Analysis Report
● Example of changing needs and capability
● Initially a plain-text reporting tool
● Evolved into a 'team' tool

● Evolving into an organisation’s tool

● Hopefully a community tool

● Our first port of call for any transfer...
* Marriage of DROID and Siegfried: http://bit.ly/2ddS0IP
* A little bit more about the tool: http://bit.ly/2dii3jP
DROID/Siegfried Analysis Report
● Available to all the community (December 2013): http://bit.ly/2cB8gFY
● Maps DROID and Siegfried output to an SQLite database for querying power and speed.
● Aside from Python, ZERO-dependencies – user needs to be able to download it and go...
● Complete flexibility over output.
● TXT, HTML, Rogues, Heroes
 Normalization via database layer – write your own!
● Normalization via database layer – abstracted for multiple ID tools
● The tools each do what they're supposed to well, the dissection of output can be left to others.
* Marriage of DROID and Siegfried (OPF Blog): http://bit.ly/2ddS0IP
* A little bit more about the tool (OPF Blog): http://bit.ly/2dii3jP
● Plain-text example...
● HTML Example

Let’s have a look

http://bit.ly/2dircst
Benefits...
● Sets a baseline for a lingua franca
 beginners and experts
alike...
● Definitions contributed by our archivists!
● Easier on the eye
● Re-factored to be more flexible
● Give it a try! Let us know how it goes!
Checksums
● Look like:
– MD5: d41d8cd98f00b204e9800998ecf8427e
– SHA1: da39a3ee5e6b4b0d3255bfef95601890afd80709
Checksums
Checksums
● Looking to be unique
– De-duplication
– Fixity
● No connection between
– Security function
– Cannot reverse
But every file has a connection...
● Binary
● File Format
● Textual Content
● Embedded Content
● Template
● Author
● Like DNA, with many different strands to dissect...
● Fuzzy Hashing!
Fuzzy Hashing: SSDEEP
Source: https://github.com/KLDavies/ssdeep/
Fuzzy Hashing: tlsh
Source: https://github.com/trendmicro/tlsh
And they look like...
● aad371039d588b43e02887f87e570f6d2b1a7f1da89667ef11227d
9b3e706610d8e12d
● 0dc36013dd088b43e02983f87e534e6d2b1a7f1da88627ef11267d
8b3e716610d9e16d
● Not that different from regular checksums!
● But help us to demonstrate a closer relationship between files

● “The sum of the parts is greater than the whole.”
~ Arist!otle
Which we're about to find out!
Workshop!
Results!
Results!
How can we use this?
● Sentencing... while still teaching our machines, we can still close
the net while looking at records manually

● Discovery: Amazon like results: You might also like this record!
The experiment continues...
● Matches are relative to themselves...
● Algorithms make a difference...
● And perhaps, like genetics... some traits are more dominant than
others...
● Consider working with content in different ways...
– Utilize format bias... normalize
– Separate content from structure and analyse?
● Keep trying things, but at minimum cost... (another agile concept:
minimal viable product)
Conclusion: A bit more miscellany
●Keyword: Interim
●Our needs change constantly, and there's a lot to do

●Don't suffer paralysis by analysis.
●Do a requirements analysis
●Look at what you can do (minimum viable product) and iterate...
Conclusion: A bit more miscellany
●Lot's of hints to bits 'n' pieces I haven't been able to talk about:
●Role of the community
 (They/We're here to help! Same problems!)
●Communication and sharing
 (Do it!)
●Software development skills
 (There are other ways to be involved)
What's the point? (OPF Blog): http://bit.ly/2ddXnaY
●Maybe also a seed for discussion.
Thank you!

Mais conteĂșdo relacionado

Mais procurados (7)

Why Link?
Why Link?Why Link?
Why Link?
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
The Danish National Bibliography as LOD
The Danish National Bibliography as LODThe Danish National Bibliography as LOD
The Danish National Bibliography as LOD
 
Submitting your data to ProteomeXchange – a mini tutorial
Submitting your data to ProteomeXchange – a mini tutorialSubmitting your data to ProteomeXchange – a mini tutorial
Submitting your data to ProteomeXchange – a mini tutorial
 
SUMMER SCHOOL LEX 2014 - RDF + SPARQL querying the web of (lex)data
SUMMER SCHOOL LEX 2014 - RDF + SPARQL querying the web of (lex)dataSUMMER SCHOOL LEX 2014 - RDF + SPARQL querying the web of (lex)data
SUMMER SCHOOL LEX 2014 - RDF + SPARQL querying the web of (lex)data
 
OrientDB & Node.js Overview - JS.Everywhere() KW
OrientDB & Node.js Overview - JS.Everywhere() KWOrientDB & Node.js Overview - JS.Everywhere() KW
OrientDB & Node.js Overview - JS.Everywhere() KW
 
While the Sun Shines: Assessing Born-Digital Holdings Before It's Too Late
While the Sun Shines: Assessing Born-Digital Holdings Before It's Too LateWhile the Sun Shines: Assessing Born-Digital Holdings Before It's Too Late
While the Sun Shines: Assessing Born-Digital Holdings Before It's Too Late
 

Semelhante a ASA Trial Workshop Slides for Archives NZ [2016-09-28]

What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...Stefano Fago
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Demi Ben-Ari
 
Pen Testing Development
Pen Testing DevelopmentPen Testing Development
Pen Testing DevelopmentCTruncer
 
Python in Industry
Python in IndustryPython in Industry
Python in IndustryDharmit Shah
 
Behind the Scenes at Coolblue - Feb 2017
Behind the Scenes at Coolblue - Feb 2017Behind the Scenes at Coolblue - Feb 2017
Behind the Scenes at Coolblue - Feb 2017Pat Hermens
 
Blockchain and smart contracts, what they are and why you should really care ...
Blockchain and smart contracts, what they are and why you should really care ...Blockchain and smart contracts, what they are and why you should really care ...
Blockchain and smart contracts, what they are and why you should really care ...maeste
 
D.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationD.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationPRELIDA Project
 
Cloud accounting software uk
Cloud accounting software ukCloud accounting software uk
Cloud accounting software ukArcus Universe Ltd
 
Scalable, good, cheap
Scalable, good, cheapScalable, good, cheap
Scalable, good, cheapMarc Cluet
 
Digital game preservation conference 12 25-2018
Digital game preservation conference   12 25-2018Digital game preservation conference   12 25-2018
Digital game preservation conference 12 25-2018peterchanws
 
My talk at Linux Piter 2015
My talk at Linux Piter 2015My talk at Linux Piter 2015
My talk at Linux Piter 2015Alex Chistyakov
 
Kibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod serversKibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod serversHYS Enterprise
 
Messaging
MessagingMessaging
MessagingSean Kelly
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingMartinStrycek
 
Years of (not) learning , from devops to devoops
Years of (not) learning , from devops to devoopsYears of (not) learning , from devops to devoops
Years of (not) learning , from devops to devoopsKris Buytaert
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Chris Dagdigian
 
Seun - Breaking into Protocol Engineering (1).pptx
Seun - Breaking into Protocol Engineering (1).pptxSeun - Breaking into Protocol Engineering (1).pptx
Seun - Breaking into Protocol Engineering (1).pptxSeunLanLege1
 
ContainerDays Boston 2016: "Hiding in Plain Sight: Managing Secrets in a Cont...
ContainerDays Boston 2016: "Hiding in Plain Sight: Managing Secrets in a Cont...ContainerDays Boston 2016: "Hiding in Plain Sight: Managing Secrets in a Cont...
ContainerDays Boston 2016: "Hiding in Plain Sight: Managing Secrets in a Cont...DynamicInfraDays
 
Data Modeling for communication
Data Modeling for communicationData Modeling for communication
Data Modeling for communicationRichard Freggi
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival TechnologiesCliff Landis
 

Semelhante a ASA Trial Workshop Slides for Archives NZ [2016-09-28] (20)

What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...What drives Innovation? Innovations And Technological Solutions for the Distr...
What drives Innovation? Innovations And Technological Solutions for the Distr...
 
Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"Monitoring Big Data Systems - "The Simple Way"
Monitoring Big Data Systems - "The Simple Way"
 
Pen Testing Development
Pen Testing DevelopmentPen Testing Development
Pen Testing Development
 
Python in Industry
Python in IndustryPython in Industry
Python in Industry
 
Behind the Scenes at Coolblue - Feb 2017
Behind the Scenes at Coolblue - Feb 2017Behind the Scenes at Coolblue - Feb 2017
Behind the Scenes at Coolblue - Feb 2017
 
Blockchain and smart contracts, what they are and why you should really care ...
Blockchain and smart contracts, what they are and why you should really care ...Blockchain and smart contracts, what they are and why you should really care ...
Blockchain and smart contracts, what they are and why you should really care ...
 
D.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital PreservationD.3.1: State of the Art - Linked Data and Digital Preservation
D.3.1: State of the Art - Linked Data and Digital Preservation
 
Cloud accounting software uk
Cloud accounting software ukCloud accounting software uk
Cloud accounting software uk
 
Scalable, good, cheap
Scalable, good, cheapScalable, good, cheap
Scalable, good, cheap
 
Digital game preservation conference 12 25-2018
Digital game preservation conference   12 25-2018Digital game preservation conference   12 25-2018
Digital game preservation conference 12 25-2018
 
My talk at Linux Piter 2015
My talk at Linux Piter 2015My talk at Linux Piter 2015
My talk at Linux Piter 2015
 
Kibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod serversKibana+ElasticSearch+LogStash to handle Log messages on Prod servers
Kibana+ElasticSearch+LogStash to handle Log messages on Prod servers
 
Messaging
MessagingMessaging
Messaging
 
Piano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processingPiano Media - approach to data gathering and processing
Piano Media - approach to data gathering and processing
 
Years of (not) learning , from devops to devoops
Years of (not) learning , from devops to devoopsYears of (not) learning , from devops to devoops
Years of (not) learning , from devops to devoops
 
Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)Bio-IT Trends From The Trenches (digital edition)
Bio-IT Trends From The Trenches (digital edition)
 
Seun - Breaking into Protocol Engineering (1).pptx
Seun - Breaking into Protocol Engineering (1).pptxSeun - Breaking into Protocol Engineering (1).pptx
Seun - Breaking into Protocol Engineering (1).pptx
 
ContainerDays Boston 2016: "Hiding in Plain Sight: Managing Secrets in a Cont...
ContainerDays Boston 2016: "Hiding in Plain Sight: Managing Secrets in a Cont...ContainerDays Boston 2016: "Hiding in Plain Sight: Managing Secrets in a Cont...
ContainerDays Boston 2016: "Hiding in Plain Sight: Managing Secrets in a Cont...
 
Data Modeling for communication
Data Modeling for communicationData Modeling for communication
Data Modeling for communication
 
Archival Technologies
Archival TechnologiesArchival Technologies
Archival Technologies
 

Último

Item # 4 - 231 Encino Ave (Significance Only).pdf
Item # 4 - 231 Encino Ave (Significance Only).pdfItem # 4 - 231 Encino Ave (Significance Only).pdf
Item # 4 - 231 Encino Ave (Significance Only).pdfahcitycouncil
 
EDUROOT SME_ Performance upto March-2024.pptx
EDUROOT SME_ Performance upto March-2024.pptxEDUROOT SME_ Performance upto March-2024.pptx
EDUROOT SME_ Performance upto March-2024.pptxaaryamanorathofficia
 
Get Premium Balaji Nagar Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
Get Premium Balaji Nagar Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...Get Premium Balaji Nagar Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
Get Premium Balaji Nagar Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...MOHANI PANDEY
 
Lucknow 💋 Russian Call Girls Lucknow â‚č7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Russian Call Girls Lucknow â‚č7.5k Pick Up & Drop With Cash Payment 8...Lucknow 💋 Russian Call Girls Lucknow â‚č7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Russian Call Girls Lucknow â‚č7.5k Pick Up & Drop With Cash Payment 8...anilsa9823
 
↑VVIP celebrity ( Pune ) Serampore Call Girls 8250192130 unlimited shot and a...
↑VVIP celebrity ( Pune ) Serampore Call Girls 8250192130 unlimited shot and a...↑VVIP celebrity ( Pune ) Serampore Call Girls 8250192130 unlimited shot and a...
↑VVIP celebrity ( Pune ) Serampore Call Girls 8250192130 unlimited shot and a...ranjana rawat
 
Climate change and occupational safety and health.
Climate change and occupational safety and health.Climate change and occupational safety and health.
Climate change and occupational safety and health.Christina Parmionova
 
Call On 6297143586 Viman Nagar Call Girls In All Pune 24/7 Provide Call With...
Call On 6297143586  Viman Nagar Call Girls In All Pune 24/7 Provide Call With...Call On 6297143586  Viman Nagar Call Girls In All Pune 24/7 Provide Call With...
Call On 6297143586 Viman Nagar Call Girls In All Pune 24/7 Provide Call With...tanu pandey
 
Just Call Vip call girls Wardha Escorts ☎8617370543 Starting From 5K to 25K ...
Just Call Vip call girls Wardha Escorts ☎8617370543 Starting From 5K to 25K ...Just Call Vip call girls Wardha Escorts ☎8617370543 Starting From 5K to 25K ...
Just Call Vip call girls Wardha Escorts ☎8617370543 Starting From 5K to 25K ...Dipal Arora
 
Human-AI Collaboration for Virtual Capacity in Emergency Operation Centers (E...
Human-AI Collaborationfor Virtual Capacity in Emergency Operation Centers (E...Human-AI Collaborationfor Virtual Capacity in Emergency Operation Centers (E...
Human-AI Collaboration for Virtual Capacity in Emergency Operation Centers (E...Hemant Purohit
 
PPT Item # 4 - 231 Encino Ave (Significance Only)
PPT Item # 4 - 231 Encino Ave (Significance Only)PPT Item # 4 - 231 Encino Ave (Significance Only)
PPT Item # 4 - 231 Encino Ave (Significance Only)ahcitycouncil
 
Postal Ballots-For home voting step by step process 2024.pptx
Postal Ballots-For home voting step by step process 2024.pptxPostal Ballots-For home voting step by step process 2024.pptx
Postal Ballots-For home voting step by step process 2024.pptxSwastiRanjanNayak
 
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'IsraëlAntisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'IsraëlEdouardHusson
 
Global debate on climate change and occupational safety and health.
Global debate on climate change and occupational safety and health.Global debate on climate change and occupational safety and health.
Global debate on climate change and occupational safety and health.Christina Parmionova
 
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Call On 6297143586 Yerwada Call Girls In All Pune 24/7 Provide Call With Bes...
Call On 6297143586  Yerwada Call Girls In All Pune 24/7 Provide Call With Bes...Call On 6297143586  Yerwada Call Girls In All Pune 24/7 Provide Call With Bes...
Call On 6297143586 Yerwada Call Girls In All Pune 24/7 Provide Call With Bes...tanu pandey
 
Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...MOHANI PANDEY
 
Artificial Intelligence in Philippine Local Governance: Challenges and Opport...
Artificial Intelligence in Philippine Local Governance: Challenges and Opport...Artificial Intelligence in Philippine Local Governance: Challenges and Opport...
Artificial Intelligence in Philippine Local Governance: Challenges and Opport...CedZabala
 
Expressive clarity oral presentation.pptx
Expressive clarity oral presentation.pptxExpressive clarity oral presentation.pptx
Expressive clarity oral presentation.pptxtsionhagos36
 

Último (20)

Call Girls Service Connaught Place @9999965857 Delhi đŸ«Š No Advance VVIP 🍎 SER...
Call Girls Service Connaught Place @9999965857 Delhi đŸ«Š No Advance  VVIP 🍎 SER...Call Girls Service Connaught Place @9999965857 Delhi đŸ«Š No Advance  VVIP 🍎 SER...
Call Girls Service Connaught Place @9999965857 Delhi đŸ«Š No Advance VVIP 🍎 SER...
 
Item # 4 - 231 Encino Ave (Significance Only).pdf
Item # 4 - 231 Encino Ave (Significance Only).pdfItem # 4 - 231 Encino Ave (Significance Only).pdf
Item # 4 - 231 Encino Ave (Significance Only).pdf
 
EDUROOT SME_ Performance upto March-2024.pptx
EDUROOT SME_ Performance upto March-2024.pptxEDUROOT SME_ Performance upto March-2024.pptx
EDUROOT SME_ Performance upto March-2024.pptx
 
Get Premium Balaji Nagar Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
Get Premium Balaji Nagar Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...Get Premium Balaji Nagar Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
Get Premium Balaji Nagar Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
 
Lucknow 💋 Russian Call Girls Lucknow â‚č7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Russian Call Girls Lucknow â‚č7.5k Pick Up & Drop With Cash Payment 8...Lucknow 💋 Russian Call Girls Lucknow â‚č7.5k Pick Up & Drop With Cash Payment 8...
Lucknow 💋 Russian Call Girls Lucknow â‚č7.5k Pick Up & Drop With Cash Payment 8...
 
↑VVIP celebrity ( Pune ) Serampore Call Girls 8250192130 unlimited shot and a...
↑VVIP celebrity ( Pune ) Serampore Call Girls 8250192130 unlimited shot and a...↑VVIP celebrity ( Pune ) Serampore Call Girls 8250192130 unlimited shot and a...
↑VVIP celebrity ( Pune ) Serampore Call Girls 8250192130 unlimited shot and a...
 
Climate change and occupational safety and health.
Climate change and occupational safety and health.Climate change and occupational safety and health.
Climate change and occupational safety and health.
 
Call On 6297143586 Viman Nagar Call Girls In All Pune 24/7 Provide Call With...
Call On 6297143586  Viman Nagar Call Girls In All Pune 24/7 Provide Call With...Call On 6297143586  Viman Nagar Call Girls In All Pune 24/7 Provide Call With...
Call On 6297143586 Viman Nagar Call Girls In All Pune 24/7 Provide Call With...
 
Just Call Vip call girls Wardha Escorts ☎8617370543 Starting From 5K to 25K ...
Just Call Vip call girls Wardha Escorts ☎8617370543 Starting From 5K to 25K ...Just Call Vip call girls Wardha Escorts ☎8617370543 Starting From 5K to 25K ...
Just Call Vip call girls Wardha Escorts ☎8617370543 Starting From 5K to 25K ...
 
Human-AI Collaboration for Virtual Capacity in Emergency Operation Centers (E...
Human-AI Collaborationfor Virtual Capacity in Emergency Operation Centers (E...Human-AI Collaborationfor Virtual Capacity in Emergency Operation Centers (E...
Human-AI Collaboration for Virtual Capacity in Emergency Operation Centers (E...
 
PPT Item # 4 - 231 Encino Ave (Significance Only)
PPT Item # 4 - 231 Encino Ave (Significance Only)PPT Item # 4 - 231 Encino Ave (Significance Only)
PPT Item # 4 - 231 Encino Ave (Significance Only)
 
Postal Ballots-For home voting step by step process 2024.pptx
Postal Ballots-For home voting step by step process 2024.pptxPostal Ballots-For home voting step by step process 2024.pptx
Postal Ballots-For home voting step by step process 2024.pptx
 
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'IsraëlAntisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
Antisemitism Awareness Act: pénaliser la critique de l'Etat d'Israël
 
(NEHA) Call Girls Nagpur Call Now 8250077686 Nagpur Escorts 24x7
(NEHA) Call Girls Nagpur Call Now 8250077686 Nagpur Escorts 24x7(NEHA) Call Girls Nagpur Call Now 8250077686 Nagpur Escorts 24x7
(NEHA) Call Girls Nagpur Call Now 8250077686 Nagpur Escorts 24x7
 
Global debate on climate change and occupational safety and health.
Global debate on climate change and occupational safety and health.Global debate on climate change and occupational safety and health.
Global debate on climate change and occupational safety and health.
 
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Sangamwadi Call Me 7737669865 Budget Friendly No Advance Booking
 
Call On 6297143586 Yerwada Call Girls In All Pune 24/7 Provide Call With Bes...
Call On 6297143586  Yerwada Call Girls In All Pune 24/7 Provide Call With Bes...Call On 6297143586  Yerwada Call Girls In All Pune 24/7 Provide Call With Bes...
Call On 6297143586 Yerwada Call Girls In All Pune 24/7 Provide Call With Bes...
 
Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
Get Premium Budhwar Peth Call Girls (8005736733) 24x7 Rate 15999 with A/c Roo...
 
Artificial Intelligence in Philippine Local Governance: Challenges and Opport...
Artificial Intelligence in Philippine Local Governance: Challenges and Opport...Artificial Intelligence in Philippine Local Governance: Challenges and Opport...
Artificial Intelligence in Philippine Local Governance: Challenges and Opport...
 
Expressive clarity oral presentation.pptx
Expressive clarity oral presentation.pptxExpressive clarity oral presentation.pptx
Expressive clarity oral presentation.pptx
 

ASA Trial Workshop Slides for Archives NZ [2016-09-28]

  • 1. Preservation Capability Miscellany By Ross Spencer Twitter: @beet_keeper
  • 3.
  • 4. 2014-06-20: Play It Again Conference Report: http://bit.ly/2d8Bnw0 (playitagain.org) 2014-11-25: The Reality of Digital Transfer: http://bit.ly/2ctxocQ (slideshare.net)
  • 5. We (Archives NZ) have got quite far
 But there's still a lot more to do

  • 6. So let's remind ourselves: What is the point? ● Work in concert with agencies and their consultants. ● Generate better information and records management ● Cleaner transfers... ● Create a more open and transparent government where the digital record is concerned... ● DIA’s line... Support New Zealanders to build strong communities by providing access to trusted information and knowledge.
  • 7. And! Digital Preservation ● At this point in time, idiomatic methods of preservation are still forming... ● Whatever the future of archival custodianship... ● Or the future of digital preservation... ● Techniques need to be developed to support agencies with information and records management, and memory institutes with long-term custodianship. ● Don't fall into the processing trap...
  • 8. What can we identify as important? ● Infrastructure/team, supported by the organisation ● Some things work, some don’t; some change... be flexible. ● Work iteratively... ● Look at what you can do... ● Continue to develop... evidence, real use-cases
  • 9. Is it all there for us..?
  • 10. No, but we have a good foundation

  • 11. Policy... ●Has been a constant in my time here. ●Was a draw to me starting in NZ ●Sets the rules by which we can play
 ●Literally, play: bend don’t break ● Achieved through careful stakeholder consultation and consideration of impact. ●Sign-off process at director level. ●Two favourite policies, checksum, pre-conditioning.
  • 12. Team... ●We could always do with more people
 ●But we recognise that we've been allowed more folk dedicated to this than some places. ●The team is supported in their decision making and their skills. ●Breakdown: Curious; driven; up-to-date; drive to ‘solve’ born-digital transfer; different but complementary skills
 *passion*! ●(And opinionated! ;-) ) ●It doesn’t always look that way but there is a certain amount of leeway from IT support too...
  • 13. Technology...? Rosetta by Ex-Libris: is the Long-term preservation system, it allows us to manage some quite complex bits 'n' pieces
 but: ●Does not yet enable transfer from Agency-to-Archives (it supports) ●Is not a clearing house for records ●Spot preservation risks up-front ●Doesn't 'do' sentencing
 ●Does not build ingest packages
 ●Does not 'do' archival description... ●Does not contain every tool under the sun to handle all the file formats
 Machine Learning: http://nautil.us/blog/the-fundamental-limits-of-machine-learning
  • 14. The processes we need are biased toward transfer and ingest
 Rosetta can only help so much
 ||----------------||---------------------------------------------------------------------------------------------------|| Creation Transfer (Life of a record ~25 years) Life of an archive ~∞ The other processes we will still need will be about (active) long term custodianship
 Rosetta is still only beginning that journey...
  • 15. The miscellany in this presentation... A story about the tools that can help us... ● Technical Registries (of practice) ● DROID/Siegfried Analysis Report ● Fuzzy Hashes
  • 16.
  • 17.
  • 18. With everything we need to do
 We cannot action it all at the same time...
  • 19. Knowledge needs to remain alive and accessible, record it: Source: https://commons.wikimedia.org/wiki/Category:Kanban#/media/File:Simple_Task_Kanban.jpg
  • 20. Trello: is one option...
  • 21. Features... ● Kanban ● Teams ● Ownership ● Visibility ● Accessibility ● Reduce transitory records ● Create temporality ● Centralize knowledge ● Invite external colleagues
  • 22. DROID/Siegfried Analysis Report ● Example of changing needs and capability ● Initially a plain-text reporting tool ● Evolved into a 'team' tool
 ● Evolving into an organisation’s tool
 ● Hopefully a community tool
 ● Our first port of call for any transfer... * Marriage of DROID and Siegfried: http://bit.ly/2ddS0IP * A little bit more about the tool: http://bit.ly/2dii3jP
  • 23. DROID/Siegfried Analysis Report ● Available to all the community (December 2013): http://bit.ly/2cB8gFY ● Maps DROID and Siegfried output to an SQLite database for querying power and speed. ● Aside from Python, ZERO-dependencies – user needs to be able to download it and go... ● Complete flexibility over output. ● TXT, HTML, Rogues, Heroes
 Normalization via database layer – write your own! ● Normalization via database layer – abstracted for multiple ID tools ● The tools each do what they're supposed to well, the dissection of output can be left to others. * Marriage of DROID and Siegfried (OPF Blog): http://bit.ly/2ddS0IP * A little bit more about the tool (OPF Blog): http://bit.ly/2dii3jP
  • 24.
  • 27. Let’s have a look
 http://bit.ly/2dircst
  • 28. Benefits... ● Sets a baseline for a lingua franca
 beginners and experts alike... ● Definitions contributed by our archivists! ● Easier on the eye ● Re-factored to be more flexible ● Give it a try! Let us know how it goes!
  • 29. Checksums ● Look like: – MD5: d41d8cd98f00b204e9800998ecf8427e – SHA1: da39a3ee5e6b4b0d3255bfef95601890afd80709
  • 31. Checksums ● Looking to be unique – De-duplication – Fixity ● No connection between – Security function – Cannot reverse
  • 32. But every file has a connection... ● Binary ● File Format ● Textual Content ● Embedded Content ● Template ● Author ● Like DNA, with many different strands to dissect... ● Fuzzy Hashing!
  • 33. Fuzzy Hashing: SSDEEP Source: https://github.com/KLDavies/ssdeep/
  • 34. Fuzzy Hashing: tlsh Source: https://github.com/trendmicro/tlsh
  • 35. And they look like... ● aad371039d588b43e02887f87e570f6d2b1a7f1da89667ef11227d 9b3e706610d8e12d ● 0dc36013dd088b43e02983f87e534e6d2b1a7f1da88627ef11267d 8b3e716610d9e16d ● Not that different from regular checksums! ● But help us to demonstrate a closer relationship between files
 ● “The sum of the parts is greater than the whole.” ~ Arist!otle
  • 36. Which we're about to find out!
  • 40. How can we use this? ● Sentencing... while still teaching our machines, we can still close the net while looking at records manually
 ● Discovery: Amazon like results: You might also like this record!
  • 41. The experiment continues... ● Matches are relative to themselves... ● Algorithms make a difference... ● And perhaps, like genetics... some traits are more dominant than others... ● Consider working with content in different ways... – Utilize format bias... normalize – Separate content from structure and analyse? ● Keep trying things, but at minimum cost... (another agile concept: minimal viable product)
  • 42.
  • 43. Conclusion: A bit more miscellany ●Keyword: Interim ●Our needs change constantly, and there's a lot to do
 ●Don't suffer paralysis by analysis. ●Do a requirements analysis ●Look at what you can do (minimum viable product) and iterate...
  • 44. Conclusion: A bit more miscellany ●Lot's of hints to bits 'n' pieces I haven't been able to talk about: ●Role of the community
 (They/We're here to help! Same problems!) ●Communication and sharing
 (Do it!) ●Software development skills
 (There are other ways to be involved) What's the point? (OPF Blog): http://bit.ly/2ddXnaY ●Maybe also a seed for discussion.