2. Motivation
• Get quick overview of file attributes for
multiple files
• Compare attribute values between files
• Identify groups of files
• Reuse overview results
• Avoid “opening” files with applications
20. PDF-Plugin (1)
class PDFInvestigator(BaseInvestigator):
"A class for determining attributes of PDF files."
attrMap = {
"title": "getTitle",
"author": "getAuthor",
"producer": "getProducer",
"creationdate": "getCreationDate",
"npages": "getNumPdfPages",
"nimgs": "getNumImages",
}
totals = ("npages", "nimgs")
def activate(self):
"Try activating self, setting 'active' variable."
# calculate self.active...
return self.active
21. PDF-Plugin (2)
def getNumPdfPages(self):
"Return the number of pages in a PDF document."
try:
# uses PyPdf
res = self.input.getNumPages()
except:
res = "n/a"
return res
22. PDF-Plugin (3)
def getNumImages(self):
"Return the number of images in a PDF document."
expr = r"d+ +d+ +obj.*?endobjs+(?:%.*?[rn])?"
objPat = re.compile(expr, re.M | re.S)
items = re.findall(objPat, self.content)
for p in [ re.compile("/%ss*/%s" % (k, v), re.M | re.S)
for (k, v) in [("Type", "XObject"), ("Subtype", "Image")]]:
items = [i for i in items if re.search(p, i) != None]
return len(items)
24. Spotlight
• Desktop file search
• Mac OS X 10.4 and 10.5
• Deeply integrated in Mac OS X
• Index-based, with attributes
• Results based on relevance and recency
• Plug-ins/API for custom file formats
• GUI & command-line
28. Spotlight – Pro
• Great index/search technology
• Very fast, useful and easy to use
• ~125 search attributes in Mac OS X 10.5
(e.g. Aperture, Composer, …)
• Extensible (Python plug-in available)
29. Spotlight – Con
• Result on command-line not in table form
• Result in GUI is always a list of file names +
the attributes, that the Finder (!) knows
• Weak on providing overview
• Mac OS X only
32. More Features?
• Output format plug-ins?
• Pylint plug-in for fileinfo?
• Fileinfo Python plug-in pyinfo.py?
• Plug-ins for functions like total()
• Access intra-file dataset attributes?
• Multi-line attribute values?
• ”Abreviations“ for attribute lists?
• Derived attributes (ncomments/loc)?
33. Summary
• Useful as general purpose attribute ”browser“
• Access to Spotlight meta-data (Mac OS X)
• Easy to write plug-ins
• Fileinfo not like Spotlight (no index/search)
• More like iTunes (on the command-line ;-)