SlideShare uma empresa Scribd logo
1 de 27
Baixar para ler offline
SCAPE
Johan van der Knijff
Koninklijke Bibliotheek – National Library of the Netherlands
DPC, PDF/A-3 Briefing, Leeds, 13.3.2013
PDF/A-3 for preservation
Notes on embedded files and JPEG 2000
Part 1: Embedded files
PDF/A-3: embedding of any file (type)
Key point:
Use of “embedded files” really means
“embedded file streams” = specific data
structure in PDF!
File specification dictionary
31 0 obj
<</Type /Filespec /F (mysvg.svg) /EF <</F 32 0 R>> >>
endobj
File specification dictionary
31 0 obj
<</Type /Filespec /F (mysvg.svg) /EF <</F 32 0 R>> >>
endobj
EF key
points to embedded file
stream
Embedded file stream
32 0 obj
<</Type /EmbeddedFile /Subtype /image#2Fsvg+xml /Length 72>>
stream
…SVG Data…
endstream
endobj
Uses of embedded file streams
File attachments not meant to be rendered by
viewer
File attachment annotation
EmbeddedFiles entry in name dictionary
PDF/A-3
Rendered in/by PDF viewer
Rendition actions
Screen annotations
PDF/A-3
What about inline images?
Not based on “embedded file stream”, but on
“Image XObject” data structure (allows
limited set of pre-defined formats)
What about inline images?
No impact on content that is meant to be
rendered by PDF viewer
But PDF/A-3’s may contain file of any possible
format as an attachment
Embedded files wrap-up:
Part 2: JPEG 2000
Supported since PDF/A-2
Image XObject
1614 0 obj
<</Subtype/Image/Width 615/Height 978/ColorSpace/DeviceRGB
/BitsPerComponent 8/Interpolate true/Length 5278
/Filter/JPXDecode>>
stream
… Image data …
::
::
endstream
endobj
Image XObject
1614 0 obj
<</Subtype/Image/Width 615/Height 978/ColorSpace/DeviceRGB
/BitsPerComponent 8/Interpolate true/Length 5278
/Filter/JPXDecode>>
stream
… Image data …
::
::
endstream
endobj
Identifies object as
JPEG 2000 image
ISO 19005-2 (PDF/A-2):
JPEG 2000 support based on subset of JPEG
2000 Part 2 (JPX baseline)
Only Part 1 of the standard (JP2) commonly
used for archival applications!
JP2 vs JPX
JP2
JPX
JPEG 2000 Part 1:
Basic still image format
JPEG 2000 Part 2:
= JP2 + assorted
advanced stuff …
Fragmented codestreams
Allowed in JPX Baseline!
OS PDF viewers – JPEG 2000 libraries
Ghostscript: OpenJPEG or JasPer
Evince: OpenJPEG
Mupdf: OpenJPEG
Firefox PDF viewer: built-in decoder
 None of these libraries support fragmented
codestreams!
Is it really a problem?
Fragmented codestreams extremely rare
But why is this feature even allowed in a long-
term archival format?
OS support of JPEG 2000 in general remains
problematic
#SCAPEProject
http://www.scape-project.eu
This work was partially supported by the SCAPE Project.
The SCAPE project is co-funded by the European Union under
FP7 ICT-2009.4.1 (Grant Agreement number 270137).
Funding

Mais conteúdo relacionado

Destaque

Animation in power point
Animation in power pointAnimation in power point
Animation in power point
leoleogo
 

Destaque (6)

The social construction of reality
The social construction of realityThe social construction of reality
The social construction of reality
 
Animation in power point
Animation in power pointAnimation in power point
Animation in power point
 
Mail merge - Get Complete Information !!
Mail merge - Get Complete Information !!Mail merge - Get Complete Information !!
Mail merge - Get Complete Information !!
 
Mail merge
Mail mergeMail merge
Mail merge
 
Mail Merge in Microsoft Word
Mail Merge in Microsoft WordMail Merge in Microsoft Word
Mail Merge in Microsoft Word
 
Mail Merge - the basics
Mail Merge - the basicsMail Merge - the basics
Mail Merge - the basics
 

Semelhante a PDF/A-3 for preservation. Notes on embedded files and JPEG2000

Using the JPEG2000 image format for storage and access in biodiversity collec...
Using the JPEG2000 image format for storage and access in biodiversity collec...Using the JPEG2000 image format for storage and access in biodiversity collec...
Using the JPEG2000 image format for storage and access in biodiversity collec...
Chris Freeland
 
presentation
presentationpresentation
presentation
Videoguy
 

Semelhante a PDF/A-3 for preservation. Notes on embedded files and JPEG2000 (20)

Gewinen mit 3W
Gewinen mit 3WGewinen mit 3W
Gewinen mit 3W
 
Jpeg 2000 For Digital Archives
Jpeg 2000 For Digital ArchivesJpeg 2000 For Digital Archives
Jpeg 2000 For Digital Archives
 
Apple's live http streaming
Apple's live http streamingApple's live http streaming
Apple's live http streaming
 
Mpeg 7 slides
Mpeg 7 slides Mpeg 7 slides
Mpeg 7 slides
 
5.Arne_Nowak_Digital_Archiving_Pilots.pdf
5.Arne_Nowak_Digital_Archiving_Pilots.pdf5.Arne_Nowak_Digital_Archiving_Pilots.pdf
5.Arne_Nowak_Digital_Archiving_Pilots.pdf
 
spraa64
spraa64spraa64
spraa64
 
spraa64
spraa64spraa64
spraa64
 
spraa64
spraa64spraa64
spraa64
 
spraa64
spraa64spraa64
spraa64
 
Using the JPEG2000 image format for storage and access in biodiversity collec...
Using the JPEG2000 image format for storage and access in biodiversity collec...Using the JPEG2000 image format for storage and access in biodiversity collec...
Using the JPEG2000 image format for storage and access in biodiversity collec...
 
presentation
presentationpresentation
presentation
 
Content packaging and MPEG-21 DID
Content packaging and MPEG-21 DIDContent packaging and MPEG-21 DID
Content packaging and MPEG-21 DID
 
Hw2
Hw2Hw2
Hw2
 
Performance Analysis of Various Video Compression Techniques
Performance Analysis of Various Video Compression TechniquesPerformance Analysis of Various Video Compression Techniques
Performance Analysis of Various Video Compression Techniques
 
File types, photoshop
File types, photoshopFile types, photoshop
File types, photoshop
 
JPEG2000 Alliance IBC 2009
JPEG2000 Alliance IBC 2009JPEG2000 Alliance IBC 2009
JPEG2000 Alliance IBC 2009
 
Videostream compression in iOS
Videostream compression in iOSVideostream compression in iOS
Videostream compression in iOS
 
Mpeg 7-21
Mpeg 7-21Mpeg 7-21
Mpeg 7-21
 
Lecture 6 -_presentation_layer
Lecture 6 -_presentation_layerLecture 6 -_presentation_layer
Lecture 6 -_presentation_layer
 
Integrating media
Integrating mediaIntegrating media
Integrating media
 

Mais de SCAPE Project

Mais de SCAPE Project (20)

C sz z6
C sz z6C sz z6
C sz z6
 
SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with Nanite
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation Tool
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation Environments
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven Schlarb
 
Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3PO
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulation
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, Aarhus
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collections
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionality
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 

PDF/A-3 for preservation. Notes on embedded files and JPEG2000

  • 1. SCAPE Johan van der Knijff Koninklijke Bibliotheek – National Library of the Netherlands DPC, PDF/A-3 Briefing, Leeds, 13.3.2013 PDF/A-3 for preservation Notes on embedded files and JPEG 2000
  • 2. Part 1: Embedded files PDF/A-3: embedding of any file (type)
  • 3.
  • 4. Key point: Use of “embedded files” really means “embedded file streams” = specific data structure in PDF!
  • 5. File specification dictionary 31 0 obj <</Type /Filespec /F (mysvg.svg) /EF <</F 32 0 R>> >> endobj
  • 6. File specification dictionary 31 0 obj <</Type /Filespec /F (mysvg.svg) /EF <</F 32 0 R>> >> endobj EF key points to embedded file stream
  • 7. Embedded file stream 32 0 obj <</Type /EmbeddedFile /Subtype /image#2Fsvg+xml /Length 72>> stream …SVG Data… endstream endobj
  • 8. Uses of embedded file streams
  • 9.
  • 10. File attachments not meant to be rendered by viewer
  • 11. File attachment annotation EmbeddedFiles entry in name dictionary PDF/A-3
  • 12.
  • 15. What about inline images?
  • 16. Not based on “embedded file stream”, but on “Image XObject” data structure (allows limited set of pre-defined formats) What about inline images?
  • 17. No impact on content that is meant to be rendered by PDF viewer But PDF/A-3’s may contain file of any possible format as an attachment Embedded files wrap-up:
  • 18. Part 2: JPEG 2000 Supported since PDF/A-2
  • 19.
  • 20. Image XObject 1614 0 obj <</Subtype/Image/Width 615/Height 978/ColorSpace/DeviceRGB /BitsPerComponent 8/Interpolate true/Length 5278 /Filter/JPXDecode>> stream … Image data … :: :: endstream endobj
  • 21. Image XObject 1614 0 obj <</Subtype/Image/Width 615/Height 978/ColorSpace/DeviceRGB /BitsPerComponent 8/Interpolate true/Length 5278 /Filter/JPXDecode>> stream … Image data … :: :: endstream endobj Identifies object as JPEG 2000 image
  • 22. ISO 19005-2 (PDF/A-2): JPEG 2000 support based on subset of JPEG 2000 Part 2 (JPX baseline) Only Part 1 of the standard (JP2) commonly used for archival applications!
  • 23. JP2 vs JPX JP2 JPX JPEG 2000 Part 1: Basic still image format JPEG 2000 Part 2: = JP2 + assorted advanced stuff …
  • 25. OS PDF viewers – JPEG 2000 libraries Ghostscript: OpenJPEG or JasPer Evince: OpenJPEG Mupdf: OpenJPEG Firefox PDF viewer: built-in decoder  None of these libraries support fragmented codestreams!
  • 26. Is it really a problem? Fragmented codestreams extremely rare But why is this feature even allowed in a long- term archival format? OS support of JPEG 2000 in general remains problematic
  • 27. #SCAPEProject http://www.scape-project.eu This work was partially supported by the SCAPE Project. The SCAPE project is co-funded by the European Union under FP7 ICT-2009.4.1 (Grant Agreement number 270137). Funding