Sally Kleinfeldt and Aaron VanDerlip describe ore.bigfile, a minimalist solution to the problem of uploading, downloading, and versioning very large files in Plone.
Automating Google Workspace (GWS) & more with Apps Script
Large Files without the Trials
1. Large Files
Without the Trials
Aaron VanDerlip and Sally Kleinfeldt
Plone Symposium East 2010
Thursday, June 3, 2010
2. Acknowledgments
• Bioneers provides environmental education
and social connectivity through
conferences, radio and TV, books, and online
materials
• Engaged Jazkarta to build a file asset server
based on Plone to help them organize,
capture, and store multimedia and textual
content with files as large as 5 GB.
Thursday, June 3, 2010
8. Uploading Big Files
• Browser encodes file in multipart mime
format
• Zope must undo this encoding
• CPU and memory intensive, and SLOW
• Zope thread is blocked during this process
Thursday, June 3, 2010
10. Learning from Rails
• Get file encoding/unencoding and read/
write operations out of Plone
• Web servers are really good at this -
Apache, Nginx, and Lighttpd
• Our implementation uses Apache
• Apache file streaming is fast and threads
are cheap
Thursday, June 3, 2010
11. Learning from Rails
• Uploads: Apache plus mod_porter
http://therailsway.com/tags/porter
• Downloads: Apache plus mod_xsendfile
http://john.guen.in/past/2007/4/17/
send_files_faster_with_xsendfile/
• ...and of course ZODB Blob storage
Thursday, June 3, 2010
12. Mod Porter
• Parses the multipart mime data
• Writes the file to disk
• Changes the Request to contain a pointer
to the temp file on disk
• All done efficiently in C code inside your
Apache process
Thursday, June 3, 2010
14. Apache Config for
Mod Porter
LoadModule apreq_module /usr/lib/Apache2/modules/mod_apreq2.so
LoadModule porter_module /usr/lib/Apache2/modules/mod_porter.so
# Apache has a default read limit of 64MB, set it higher
APREQ2_ReadLimit 2G
...
Porter On
# Files below this size will not be handled by mod-porter
PorterMinSize 14M
# Where the uploaded files are stored
PorterDir /mnt/uploads-Apache
Thursday, June 3, 2010
15. X-Sendfile
• HTTP header
• Set an X-Sendfile header and the path of a
file on your response
• Apache does the rest
Thursday, June 3, 2010
16. Apache Config for
X-Sendfile
LoadModule xsendfile_module /usr/lib/Apache2/modules/mod_xsendfile.so
...
EnableSendfile On
XSendFile on
# Config to send file resources directly from blob storage
XSendFilePath /mnt/bioneers/var/blobstorage
Thursday, June 3, 2010
17. Using X-Sendfile
from Python
def download(self, response, file_path):
response.setHeader("X-Sendfile",
file_path)
Thursday, June 3, 2010
18. Blob Storage
• Uploads
• Blob.consumeFile moves file from
Apache’s temp area to blob storage
(ZODB/blob.py)
• Uses os.rename, file never enters Plone
• Downloads
• Served directly from blob storage
Thursday, June 3, 2010
20. What About Really
Really Big Files?
• Use FTP
• Supports continuation and batching
• Handles files too large for browser limits
• Content editors use FTP to transfer files to
an upload directory
Thursday, June 3, 2010
23. ore.bigfile
• Minimally intrusive, works with the grain of
Plone
• Provides Big File content type
• IFrontendFileServer interface defines two
methods that provide web server support
for upload and download
• Apache and Nginx implementations
provided
Thursday, June 3, 2010
24. ore.bigfile
Limitations
• Upload directory is hardcoded
• Possibility of error on very large images
which Mod Porter intercepts
Thursday, June 3, 2010
26. Solution
• Bypass CMFEditions - no file size limitation
• Create a new version only when file
changes (not metadata)
• Allow old versions to be purged
• Version information stored on Big File
object using annotations
Thursday, June 3, 2010
28. Conclusion
• ore.bigfile solves the Big File problem for a
particular use case, not feature complete
• It does so by taking advantage of mature
web server technology
• The code is minimally intrusive
• It provides a strategy for implementation
we can learn from as we improve Plone’s
Big File story
Thursday, June 3, 2010