SlideShare a Scribd company logo
1 of 29
Download to read offline
Digital Forensics for
   Digital Archives
              Mark A. Matienzo
Manuscripts and Archives, Yale University Library
           Digital Preservation 2012
                 Arlington, VA
                 July 25, 2012
Digital Archives at Yale
http://www.flickr.com/photos/johnjack/3449280414
Digital Forensics
Digital Forensics
        +
Digital Archives
Design Principles
•Use digital forensics software and methodology to
  support accessioning, arrangement, and description of
  born-digital archival records

•Mitigate risk of media deterioration and obsolescence
•Prefer open source solutions whenever possible
•Integrate into a larger, but yet-to-be-defined workflow
Applied Methodology
•Use Carrier’s (2005) model of the digital investigation
  process: Preservation   Searching    Reconstruction


•Volume and file system as main areas for analysis
•Assume much of the state is already lost
•Methods should approach or intend forensic soundness
Start
accessioning                              Write-protect media               Verify image
  process            Media




                                          Record identifying              Extract filesystem-     Disk     Meta-
                 Retrieve media            characteristics of               and file-level      images     data

                                          media as metadata                   metadata         Transfer package




                                                                           Package images
               Assign identifiers to                                                             Ingest transfer
                                            Create image                  and metadata for
                     media                                                                         package
                                                                               ingest




                                  Media                         FS/File                           Document
                                                Disk
                                   MD                            MD                              accessioning
                                               image
                                                                                                   process




                                                                                                     End
                                                                                                 accessioning
                                                                                                   process
Documenting Media
•SharePoint-based list
•Unique identifiers for each piece
•Allows basic documentation of imaging process
Imaging Media
• Requires a combination of hardware and software
• In some cases, software depends on particular hardware
• No single universal solution for our workflow
Imaging Hardware
• Drives (eg floppy drives, flash cardreaders)
• Interface cards (Catweasel, Kryoflux, FC5025)
• Writeblockers
Imaging Software
• FTK Imager (proprietary; gratis)
• Hardware-specific imaging software for floppy interface
  cards

• Other software tested: dd, Guymager, etc.
Metadata Extraction
• Desire to repurpose existing information as
   archival description and reports to other staff

• Ideal output is XML; can be packaged with disk images
   going into medium- or long-term storage

• Tools: Fiwalk/The Sleuth Kit; FTK Imager; testing others
• Have integrated file-format identification (using OPF’s
   FIDO) and virus/malware recognition (using ClamAV)
   using Fiwalk’s plugin architecture
Sample DFXML Output
<?xml version='1.0' encoding='UTF-8'?>
<dfxml version='1.0'>
  <metadata
  xmlns='http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML'
  xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
  xmlns:dc='http://purl.org/dc/elements/1.1/'>
    <dc:type>Disk Image</dc:type>
  </metadata>
  <creator version='1.0'>
    <!-- provenance information re: extraction - software used; operating system -->
  </creator>
  <source>
    <image_filename>2004-M-088.0018.dd</image_filename>
  </source>
  <volume offset='0'><!-- partitions within each disk image -->
      <fileobject><!-- files within each partition --></fileobject>
  </volume>
  <runstats><!-- performance and other statistics --></runstats>
</dfxml>
Sample DFXML Output
<fileobject>
  <filename>_ublist1.wpd</filename>
  <partition>1</partition>
  <id>1</id>
  <name_type>r</name_type>
  <filesize>202152</filesize>
  <unalloc>1</unalloc>
  <used>1</used>
  <inode>3</inode>
  <meta_type>1</meta_type>
  <mode>511</mode>
  <nlink>0</nlink>
  <uid>0</uid>
  <gid>0</gid>
  <mtime>2001-02-22T22:30:52Z</mtime>
  <atime>2001-02-22T05:00:00Z</atime>
  <crtime>2001-02-22T22:31:54Z</crtime>
  <libmagic>(Corel/WP)</libmagic>
  <byte_runs>
   <byte_run file_offset='0' fs_offset='16896' img_offset='16896' len='512'/>
  </byte_runs>
  <hashdigest type='md5'>d7bc22242c0a88fd8b68712980d5ab28</hashdigest>
  <hashdigest type='sha1'>64bf2bdf82e33fcda50158804483ac611e753db5</hashdigest>
</fileobject>
Analysis/Processing
• Once acquired, we can perform additional analysis or
   reporting to captured assets or records

• Few tools are easily useable by archivists (BitCurator
   toolset under development will help)

• Additional forensic tools can be used for archival
   arrangement and description of this information
Forensic Toolkit
• Proprietary application to analyze files, filesystems, etc.
• Provides full-text indexing, tagging, bookmarking,
   file presentation/viewing, and reporting

• Used at Yale, Stanford, and other institutions for archival
   processing of born-digital records

• Still a challenge to use given the complexity of the
   application
Gumshoe
• Prototype based on Blacklight (Ruby on Rails + Solr)
• Indexing code works with fiwalk output or directly from a
   disk image

• Populates Solr index with all file-level metadata from
   fiwalk and, optionally, text strings extracted from files

• Provides searching, sorting and faceting based on
   metadata extracted from filesystems and files

• Code at http://github.com/anarchivist/gumshoe
Advantages
•Faster (and more forensically sound) to extract metadata
  once rather than having to keep processing an image

•Develop better assessments during accessioning process
  (directory structure significant? timestamps accurate?)

•Integrating additional extraction processes and building
  supplemental tools takes less time
Limitations
• Use of tools limited to specific types of filesystems
• Additional software requires additional integration and
   data normalization

• DFXML is not (currently) a metadata format common
   within domains of archives/libraries

• Extracted metadata maybe harder to repurpose for
   descriptive purposes based on level of granularity
Work in Progress
• BitCurator project under development; early release
   available for testing: http://wiki.bitcurator.net

• The Sleuth Kit and related tools under development
   (Autopsy, fiwalk, etc.): http://sleuthkit.org

• Additional testing and integration under work at Yale,
   using DFXML as common schema whenever possible

• Possible development of a new media log to record
   media/imaging metadata and workflow status
Thanks!
   Mark A. Matienzo
mark.matienzo@yale.edu
  http://matienzo.org
      @anarchivist

More Related Content

What's hot

NCompass Live: Digital Preservation, Part 2: Storage and Protection
NCompass Live: Digital Preservation, Part 2: Storage and ProtectionNCompass Live: Digital Preservation, Part 2: Storage and Protection
NCompass Live: Digital Preservation, Part 2: Storage and ProtectionNebraska Library Commission
 
Needs for Data Management & Citation Throughout the Information Lifecycle
Needs for Data Management & Citation Throughout  the Information LifecycleNeeds for Data Management & Citation Throughout  the Information Lifecycle
Needs for Data Management & Citation Throughout the Information LifecycleMicah Altman
 
PRESERVATION Web archiving
PRESERVATION  Web archivingPRESERVATION  Web archiving
PRESERVATION Web archivingEssam Obaid
 
Xenit alfresco webinar on fred 1dot2 for end users
Xenit alfresco webinar on fred 1dot2 for end usersXenit alfresco webinar on fred 1dot2 for end users
Xenit alfresco webinar on fred 1dot2 for end usersAlfresco Software
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and LibariesRob Grim
 
KeepIt Course 4: Putting storage, format management and preservation planning...
KeepIt Course 4: Putting storage, format management and preservation planning...KeepIt Course 4: Putting storage, format management and preservation planning...
KeepIt Course 4: Putting storage, format management and preservation planning...JISC KeepIt project
 
Eudat user forum-london-11march2013-biovel-v3
Eudat user forum-london-11march2013-biovel-v3Eudat user forum-london-11march2013-biovel-v3
Eudat user forum-london-11march2013-biovel-v3Alex Hardisty
 

What's hot (11)

NCompass Live: Digital Preservation, Part 2: Storage and Protection
NCompass Live: Digital Preservation, Part 2: Storage and ProtectionNCompass Live: Digital Preservation, Part 2: Storage and Protection
NCompass Live: Digital Preservation, Part 2: Storage and Protection
 
Metadata Workshop
Metadata WorkshopMetadata Workshop
Metadata Workshop
 
Needs for Data Management & Citation Throughout the Information Lifecycle
Needs for Data Management & Citation Throughout  the Information LifecycleNeeds for Data Management & Citation Throughout  the Information Lifecycle
Needs for Data Management & Citation Throughout the Information Lifecycle
 
PRESERVATION Web archiving
PRESERVATION  Web archivingPRESERVATION  Web archiving
PRESERVATION Web archiving
 
Xenit alfresco webinar on fred 1dot2 for end users
Xenit alfresco webinar on fred 1dot2 for end usersXenit alfresco webinar on fred 1dot2 for end users
Xenit alfresco webinar on fred 1dot2 for end users
 
Databases
DatabasesDatabases
Databases
 
JeromeDL Tutorial
JeromeDL TutorialJeromeDL Tutorial
JeromeDL Tutorial
 
e-Science, Research Data and Libaries
e-Science, Research Data and Libariese-Science, Research Data and Libaries
e-Science, Research Data and Libaries
 
KeepIt Course 4: Putting storage, format management and preservation planning...
KeepIt Course 4: Putting storage, format management and preservation planning...KeepIt Course 4: Putting storage, format management and preservation planning...
KeepIt Course 4: Putting storage, format management and preservation planning...
 
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data ServicesNISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
NISO Forum, Denver, Sept. 24, 2012: DataCite and Campus Data Services
 
Eudat user forum-london-11march2013-biovel-v3
Eudat user forum-london-11march2013-biovel-v3Eudat user forum-london-11march2013-biovel-v3
Eudat user forum-london-11march2013-biovel-v3
 

Viewers also liked

Digital forensics lessons
Digital forensics lessons   Digital forensics lessons
Digital forensics lessons Amr Nasr
 
CNZ2013 Keynote | Trust in Digital Preservation | Natalie Harrower
CNZ2013 Keynote | Trust in Digital Preservation | Natalie HarrowerCNZ2013 Keynote | Trust in Digital Preservation | Natalie Harrower
CNZ2013 Keynote | Trust in Digital Preservation | Natalie Harrowerdri_ireland
 
HNI leeromgeving 02
HNI leeromgeving 02HNI leeromgeving 02
HNI leeromgeving 02fneggers
 
Character profiles
Character profilesCharacter profiles
Character profilesEllenorHandy
 
Progress with FITS for analyzing video
Progress with FITS for analyzing videoProgress with FITS for analyzing video
Progress with FITS for analyzing videoprwheatley
 
Mapping the Digital Preservation Wilderness: What you need to know
Mapping the Digital Preservation Wilderness:  What you need to knowMapping the Digital Preservation Wilderness:  What you need to know
Mapping the Digital Preservation Wilderness: What you need to knowJody DeRidder
 
The lifecycle of a short story
The lifecycle of a short storyThe lifecycle of a short story
The lifecycle of a short storyLynne Thomas
 
HNI leeromgeving 04
HNI leeromgeving 04HNI leeromgeving 04
HNI leeromgeving 04fneggers
 
Cultural heritage collections in a web 2
Cultural heritage collections in a web 2Cultural heritage collections in a web 2
Cultural heritage collections in a web 2Lynne Thomas
 
HNI leeromgeving 05
HNI leeromgeving 05HNI leeromgeving 05
HNI leeromgeving 05fneggers
 
Digital Preservation
Digital PreservationDigital Preservation
Digital PreservationMichael Day
 
Challenges & opportunities in the preservation of (digital) information: the ...
Challenges & opportunities in the preservation of (digital) information: the ...Challenges & opportunities in the preservation of (digital) information: the ...
Challenges & opportunities in the preservation of (digital) information: the ...LIBER Europe
 
Rebecca Grant - Collection creation, management and ingest
Rebecca Grant - Collection creation, management and ingestRebecca Grant - Collection creation, management and ingest
Rebecca Grant - Collection creation, management and ingestdri_ireland
 

Viewers also liked (20)

Digital forensics lessons
Digital forensics lessons   Digital forensics lessons
Digital forensics lessons
 
CNZ2013 Keynote | Trust in Digital Preservation | Natalie Harrower
CNZ2013 Keynote | Trust in Digital Preservation | Natalie HarrowerCNZ2013 Keynote | Trust in Digital Preservation | Natalie Harrower
CNZ2013 Keynote | Trust in Digital Preservation | Natalie Harrower
 
HNI leeromgeving 02
HNI leeromgeving 02HNI leeromgeving 02
HNI leeromgeving 02
 
cv (2)
cv (2)cv (2)
cv (2)
 
Character profiles
Character profilesCharacter profiles
Character profiles
 
Progress with FITS for analyzing video
Progress with FITS for analyzing videoProgress with FITS for analyzing video
Progress with FITS for analyzing video
 
SHAREmodule2
SHAREmodule2SHAREmodule2
SHAREmodule2
 
Characterization of CDROMs for Emulation-based Access. Klaus Rechert, Thomas ...
Characterization of CDROMs for Emulation-based Access. Klaus Rechert, Thomas ...Characterization of CDROMs for Emulation-based Access. Klaus Rechert, Thomas ...
Characterization of CDROMs for Emulation-based Access. Klaus Rechert, Thomas ...
 
Mapping the Digital Preservation Wilderness: What you need to know
Mapping the Digital Preservation Wilderness:  What you need to knowMapping the Digital Preservation Wilderness:  What you need to know
Mapping the Digital Preservation Wilderness: What you need to know
 
The lifecycle of a short story
The lifecycle of a short storyThe lifecycle of a short story
The lifecycle of a short story
 
HNI leeromgeving 04
HNI leeromgeving 04HNI leeromgeving 04
HNI leeromgeving 04
 
SHAREmodule3
SHAREmodule3SHAREmodule3
SHAREmodule3
 
Cultural heritage collections in a web 2
Cultural heritage collections in a web 2Cultural heritage collections in a web 2
Cultural heritage collections in a web 2
 
ENArC- international cooperation, current and past project activities - statu...
ENArC- international cooperation, current and past project activities - statu...ENArC- international cooperation, current and past project activities - statu...
ENArC- international cooperation, current and past project activities - statu...
 
HNI leeromgeving 05
HNI leeromgeving 05HNI leeromgeving 05
HNI leeromgeving 05
 
Digital Preservation
Digital PreservationDigital Preservation
Digital Preservation
 
Electronic textbooks for studying of the archival sciences
Electronic textbooks for studying of the archival sciencesElectronic textbooks for studying of the archival sciences
Electronic textbooks for studying of the archival sciences
 
Challenges & opportunities in the preservation of (digital) information: the ...
Challenges & opportunities in the preservation of (digital) information: the ...Challenges & opportunities in the preservation of (digital) information: the ...
Challenges & opportunities in the preservation of (digital) information: the ...
 
Adding MediaConch to Archivematica for mkv/ffv1 checking
Adding MediaConch to Archivematica for mkv/ffv1 checkingAdding MediaConch to Archivematica for mkv/ffv1 checking
Adding MediaConch to Archivematica for mkv/ffv1 checking
 
Rebecca Grant - Collection creation, management and ingest
Rebecca Grant - Collection creation, management and ingestRebecca Grant - Collection creation, management and ingest
Rebecca Grant - Collection creation, management and ingest
 

Similar to Digital Forensics for Digital Archives

Systems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling offSystems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling offWellcome Library
 
Workshop 1 revised
Workshop 1 revisedWorkshop 1 revised
Workshop 1 revisedpeterchanws
 
Digital Forensics in the Archive
Digital Forensics in the ArchiveDigital Forensics in the Archive
Digital Forensics in the ArchiveGarethKnight
 
Analytics with unified file and object
Analytics with unified file and object Analytics with unified file and object
Analytics with unified file and object Sandeep Patil
 
Profile based Video segmentation system to support E-learning
Profile based Video segmentation system to support E-learningProfile based Video segmentation system to support E-learning
Profile based Video segmentation system to support E-learningGihan Wikramanayake
 
Lessons learned from the Digital Trenches: the experiences of two archivists ...
Lessons learned from the Digital Trenches: the experiences of two archivists ...Lessons learned from the Digital Trenches: the experiences of two archivists ...
Lessons learned from the Digital Trenches: the experiences of two archivists ...samalanmeister
 
SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012
SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012
SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012peterchanws
 
Accessioning Born-Digital Materials
Accessioning Born-Digital MaterialsAccessioning Born-Digital Materials
Accessioning Born-Digital Materialspeterchanws
 
A little simple explanation abut Digital imaging
A little simple explanation abut Digital imagingA little simple explanation abut Digital imaging
A little simple explanation abut Digital imagingaechaa93
 
Cochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsCochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsFuture Perfect 2012
 
Bit Level Preservation
Bit Level PreservationBit Level Preservation
Bit Level PreservationMicah Altman
 
January 2006 Document Scanning Considerations Presentation
January 2006 Document Scanning Considerations PresentationJanuary 2006 Document Scanning Considerations Presentation
January 2006 Document Scanning Considerations PresentationJohn Wang
 
Getting Bits off Disks: Using open source tools to stabilize and prepare born...
Getting Bits off Disks: Using open source tools to stabilize and prepare born...Getting Bits off Disks: Using open source tools to stabilize and prepare born...
Getting Bits off Disks: Using open source tools to stabilize and prepare born...samalanmeister
 
Understanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) EnvironmentUnderstanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) EnvironmentAdetula Bunmi
 
Tackling File Characterization and Analysis in Archivematica
Tackling File Characterization and Analysis in ArchivematicaTackling File Characterization and Analysis in Archivematica
Tackling File Characterization and Analysis in ArchivematicaCourtney Mumma
 
Lect 21 components_of_database_management_system
Lect 21 components_of_database_management_systemLect 21 components_of_database_management_system
Lect 21 components_of_database_management_systemnadine016
 

Similar to Digital Forensics for Digital Archives (20)

Systems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling offSystems, processes & how we stop the wheels falling off
Systems, processes & how we stop the wheels falling off
 
Workshop 1 revised
Workshop 1 revisedWorkshop 1 revised
Workshop 1 revised
 
Digital Forensics in the Archive
Digital Forensics in the ArchiveDigital Forensics in the Archive
Digital Forensics in the Archive
 
Nov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.HNov 2010 HUG: Fuzzy Table - B.A.H
Nov 2010 HUG: Fuzzy Table - B.A.H
 
Analytics with unified file and object
Analytics with unified file and object Analytics with unified file and object
Analytics with unified file and object
 
Profile based Video segmentation system to support E-learning
Profile based Video segmentation system to support E-learningProfile based Video segmentation system to support E-learning
Profile based Video segmentation system to support E-learning
 
Lessons learned from the Digital Trenches: the experiences of two archivists ...
Lessons learned from the Digital Trenches: the experiences of two archivists ...Lessons learned from the Digital Trenches: the experiences of two archivists ...
Lessons learned from the Digital Trenches: the experiences of two archivists ...
 
SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012
SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012
SCA Accessioning Born-Digital Materials Workshop, Nov. 8, 2012
 
Accessioning Born-Digital Materials
Accessioning Born-Digital MaterialsAccessioning Born-Digital Materials
Accessioning Born-Digital Materials
 
A little simple explanation abut Digital imaging
A little simple explanation abut Digital imagingA little simple explanation abut Digital imaging
A little simple explanation abut Digital imaging
 
Cochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and FormatsCochrane von Suchodoletz File Creation, Rendering and Formats
Cochrane von Suchodoletz File Creation, Rendering and Formats
 
Bit Level Preservation
Bit Level PreservationBit Level Preservation
Bit Level Preservation
 
Electronic Records
Electronic RecordsElectronic Records
Electronic Records
 
January 2006 Document Scanning Considerations Presentation
January 2006 Document Scanning Considerations PresentationJanuary 2006 Document Scanning Considerations Presentation
January 2006 Document Scanning Considerations Presentation
 
Getting Bits off Disks: Using open source tools to stabilize and prepare born...
Getting Bits off Disks: Using open source tools to stabilize and prepare born...Getting Bits off Disks: Using open source tools to stabilize and prepare born...
Getting Bits off Disks: Using open source tools to stabilize and prepare born...
 
Module 02 ftk imager
Module 02 ftk imagerModule 02 ftk imager
Module 02 ftk imager
 
Understanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) EnvironmentUnderstanding EDP (Electronic Data Processing) Environment
Understanding EDP (Electronic Data Processing) Environment
 
Tackling File Characterization and Analysis in Archivematica
Tackling File Characterization and Analysis in ArchivematicaTackling File Characterization and Analysis in Archivematica
Tackling File Characterization and Analysis in Archivematica
 
Lect 21 components_of_database_management_system
Lect 21 components_of_database_management_systemLect 21 components_of_database_management_system
Lect 21 components_of_database_management_system
 
Windows Forensic 101
Windows Forensic 101Windows Forensic 101
Windows Forensic 101
 

More from Mark Matienzo

To Hell With Good Intentions: Linked Data and the Power to Name
To Hell With Good Intentions: Linked Data and the Power to NameTo Hell With Good Intentions: Linked Data and the Power to Name
To Hell With Good Intentions: Linked Data and the Power to NameMark Matienzo
 
Linked Data and the Semantic Web in the Archival Context
Linked Data and the Semantic Web in the Archival ContextLinked Data and the Semantic Web in the Archival Context
Linked Data and the Semantic Web in the Archival ContextMark Matienzo
 
Archival Sensemaking: Personal Digital Archiving as an Iteration
Archival Sensemaking: Personal Digital Archiving as an IterationArchival Sensemaking: Personal Digital Archiving as an Iteration
Archival Sensemaking: Personal Digital Archiving as an IterationMark Matienzo
 
Findability in the Flow: Discovery through Linking
Findability in the Flow: Discovery through LinkingFindability in the Flow: Discovery through Linking
Findability in the Flow: Discovery through LinkingMark Matienzo
 
Learning to Take, Learning to Give: Linking as Repurposing Metadata
Learning to Take, Learning to Give: Linking as Repurposing MetadataLearning to Take, Learning to Give: Linking as Repurposing Metadata
Learning to Take, Learning to Give: Linking as Repurposing MetadataMark Matienzo
 
EAD and MARC sitting in a tree: D-R-U-P-A-L
EAD and MARC sitting in a tree: D-R-U-P-A-LEAD and MARC sitting in a tree: D-R-U-P-A-L
EAD and MARC sitting in a tree: D-R-U-P-A-LMark Matienzo
 
Online Presence and Participation
Online Presence and ParticipationOnline Presence and Participation
Online Presence and ParticipationMark Matienzo
 
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and ConflictsLinked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and ConflictsMark Matienzo
 
Cheeseburgers With Everything: Context, Content, and Connections in Archival ...
Cheeseburgers With Everything: Context, Content, and Connections in Archival ...Cheeseburgers With Everything: Context, Content, and Connections in Archival ...
Cheeseburgers With Everything: Context, Content, and Connections in Archival ...Mark Matienzo
 
Archives & the Semantic Web
Archives & the Semantic WebArchives & the Semantic Web
Archives & the Semantic WebMark Matienzo
 
How I failed to present on using DVCS to control archival metadata
How I failed to present on using DVCS to control archival metadataHow I failed to present on using DVCS to control archival metadata
How I failed to present on using DVCS to control archival metadataMark Matienzo
 

More from Mark Matienzo (11)

To Hell With Good Intentions: Linked Data and the Power to Name
To Hell With Good Intentions: Linked Data and the Power to NameTo Hell With Good Intentions: Linked Data and the Power to Name
To Hell With Good Intentions: Linked Data and the Power to Name
 
Linked Data and the Semantic Web in the Archival Context
Linked Data and the Semantic Web in the Archival ContextLinked Data and the Semantic Web in the Archival Context
Linked Data and the Semantic Web in the Archival Context
 
Archival Sensemaking: Personal Digital Archiving as an Iteration
Archival Sensemaking: Personal Digital Archiving as an IterationArchival Sensemaking: Personal Digital Archiving as an Iteration
Archival Sensemaking: Personal Digital Archiving as an Iteration
 
Findability in the Flow: Discovery through Linking
Findability in the Flow: Discovery through LinkingFindability in the Flow: Discovery through Linking
Findability in the Flow: Discovery through Linking
 
Learning to Take, Learning to Give: Linking as Repurposing Metadata
Learning to Take, Learning to Give: Linking as Repurposing MetadataLearning to Take, Learning to Give: Linking as Repurposing Metadata
Learning to Take, Learning to Give: Linking as Repurposing Metadata
 
EAD and MARC sitting in a tree: D-R-U-P-A-L
EAD and MARC sitting in a tree: D-R-U-P-A-LEAD and MARC sitting in a tree: D-R-U-P-A-L
EAD and MARC sitting in a tree: D-R-U-P-A-L
 
Online Presence and Participation
Online Presence and ParticipationOnline Presence and Participation
Online Presence and Participation
 
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and ConflictsLinked Data and Archival Description: Confluences, Contingencies, and Conflicts
Linked Data and Archival Description: Confluences, Contingencies, and Conflicts
 
Cheeseburgers With Everything: Context, Content, and Connections in Archival ...
Cheeseburgers With Everything: Context, Content, and Connections in Archival ...Cheeseburgers With Everything: Context, Content, and Connections in Archival ...
Cheeseburgers With Everything: Context, Content, and Connections in Archival ...
 
Archives & the Semantic Web
Archives & the Semantic WebArchives & the Semantic Web
Archives & the Semantic Web
 
How I failed to present on using DVCS to control archival metadata
How I failed to present on using DVCS to control archival metadataHow I failed to present on using DVCS to control archival metadata
How I failed to present on using DVCS to control archival metadata
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Digital Forensics for Digital Archives

  • 1. Digital Forensics for Digital Archives Mark A. Matienzo Manuscripts and Archives, Yale University Library Digital Preservation 2012 Arlington, VA July 25, 2012
  • 5. Digital Forensics + Digital Archives
  • 6. Design Principles •Use digital forensics software and methodology to support accessioning, arrangement, and description of born-digital archival records •Mitigate risk of media deterioration and obsolescence •Prefer open source solutions whenever possible •Integrate into a larger, but yet-to-be-defined workflow
  • 7. Applied Methodology •Use Carrier’s (2005) model of the digital investigation process: Preservation Searching Reconstruction •Volume and file system as main areas for analysis •Assume much of the state is already lost •Methods should approach or intend forensic soundness
  • 8. Start accessioning Write-protect media Verify image process Media Record identifying Extract filesystem- Disk Meta- Retrieve media characteristics of and file-level images data media as metadata metadata Transfer package Package images Assign identifiers to Ingest transfer Create image and metadata for media package ingest Media FS/File Document Disk MD MD accessioning image process End accessioning process
  • 9. Documenting Media •SharePoint-based list •Unique identifiers for each piece •Allows basic documentation of imaging process
  • 10.
  • 11.
  • 12. Imaging Media • Requires a combination of hardware and software • In some cases, software depends on particular hardware • No single universal solution for our workflow
  • 13. Imaging Hardware • Drives (eg floppy drives, flash cardreaders) • Interface cards (Catweasel, Kryoflux, FC5025) • Writeblockers
  • 14.
  • 15. Imaging Software • FTK Imager (proprietary; gratis) • Hardware-specific imaging software for floppy interface cards • Other software tested: dd, Guymager, etc.
  • 16.
  • 17.
  • 18. Metadata Extraction • Desire to repurpose existing information as archival description and reports to other staff • Ideal output is XML; can be packaged with disk images going into medium- or long-term storage • Tools: Fiwalk/The Sleuth Kit; FTK Imager; testing others • Have integrated file-format identification (using OPF’s FIDO) and virus/malware recognition (using ClamAV) using Fiwalk’s plugin architecture
  • 19. Sample DFXML Output <?xml version='1.0' encoding='UTF-8'?> <dfxml version='1.0'> <metadata xmlns='http://www.forensicswiki.org/wiki/Category:Digital_Forensics_XML' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns:dc='http://purl.org/dc/elements/1.1/'> <dc:type>Disk Image</dc:type> </metadata> <creator version='1.0'> <!-- provenance information re: extraction - software used; operating system --> </creator> <source> <image_filename>2004-M-088.0018.dd</image_filename> </source> <volume offset='0'><!-- partitions within each disk image --> <fileobject><!-- files within each partition --></fileobject> </volume> <runstats><!-- performance and other statistics --></runstats> </dfxml>
  • 20. Sample DFXML Output <fileobject> <filename>_ublist1.wpd</filename> <partition>1</partition> <id>1</id> <name_type>r</name_type> <filesize>202152</filesize> <unalloc>1</unalloc> <used>1</used> <inode>3</inode> <meta_type>1</meta_type> <mode>511</mode> <nlink>0</nlink> <uid>0</uid> <gid>0</gid> <mtime>2001-02-22T22:30:52Z</mtime> <atime>2001-02-22T05:00:00Z</atime> <crtime>2001-02-22T22:31:54Z</crtime> <libmagic>(Corel/WP)</libmagic> <byte_runs> <byte_run file_offset='0' fs_offset='16896' img_offset='16896' len='512'/> </byte_runs> <hashdigest type='md5'>d7bc22242c0a88fd8b68712980d5ab28</hashdigest> <hashdigest type='sha1'>64bf2bdf82e33fcda50158804483ac611e753db5</hashdigest> </fileobject>
  • 21. Analysis/Processing • Once acquired, we can perform additional analysis or reporting to captured assets or records • Few tools are easily useable by archivists (BitCurator toolset under development will help) • Additional forensic tools can be used for archival arrangement and description of this information
  • 22. Forensic Toolkit • Proprietary application to analyze files, filesystems, etc. • Provides full-text indexing, tagging, bookmarking, file presentation/viewing, and reporting • Used at Yale, Stanford, and other institutions for archival processing of born-digital records • Still a challenge to use given the complexity of the application
  • 23.
  • 24. Gumshoe • Prototype based on Blacklight (Ruby on Rails + Solr) • Indexing code works with fiwalk output or directly from a disk image • Populates Solr index with all file-level metadata from fiwalk and, optionally, text strings extracted from files • Provides searching, sorting and faceting based on metadata extracted from filesystems and files • Code at http://github.com/anarchivist/gumshoe
  • 25.
  • 26. Advantages •Faster (and more forensically sound) to extract metadata once rather than having to keep processing an image •Develop better assessments during accessioning process (directory structure significant? timestamps accurate?) •Integrating additional extraction processes and building supplemental tools takes less time
  • 27. Limitations • Use of tools limited to specific types of filesystems • Additional software requires additional integration and data normalization • DFXML is not (currently) a metadata format common within domains of archives/libraries • Extracted metadata maybe harder to repurpose for descriptive purposes based on level of granularity
  • 28. Work in Progress • BitCurator project under development; early release available for testing: http://wiki.bitcurator.net • The Sleuth Kit and related tools under development (Autopsy, fiwalk, etc.): http://sleuthkit.org • Additional testing and integration under work at Yale, using DFXML as common schema whenever possible • Possible development of a new media log to record media/imaging metadata and workflow status
  • 29. Thanks! Mark A. Matienzo mark.matienzo@yale.edu http://matienzo.org @anarchivist