SlideShare uma empresa Scribd logo
1 de 35
Baixar para ler offline
Accessing File-Specific
Attributes on Steroids

      Dinu C. Gherman
      gherman@python.net

     EuroPython Conference
       2008-07-07, Vilnius
Motivation
• Get quick overview of file attributes for
  multiple files
• Compare attribute values between files
• Identify groups of files
• Reuse overview results
• Avoid “opening” files with applications
Background
~1971(?)
                wc
$ cd mercurial/hgweb
$ wc -lwc *.py

  16      66     502   __init__.py
 118     438    3988   common.py
 993    2876   36064   hgweb_mod.py
 305     910   12420   hgwebdir_mod.py
 228     683    7258   protocol.py
 101     320    3577   request.py
 298     863   10698   server.py
 127     414    3907   webcommands.py
  65     190    2090   wsgicgi.py
2251    6760   80504   total
~2002
                pycount
$ cd mercurial/hgweb
$ pycount2.py *.py

lines code doc comment blank   file
   16    5   0       7     4   __init__.py
  118   77 22        8    11   common.py
  993 809    3      31   150   hgweb_mod.py
  305 249    0      17    39   hgwebdir_mod.py
  228 174    0      19    35   protocol.py
  101   76   2       7    16   request.py
  298 244    5      10    39   server.py
  127   93   0      10    24   webcommands.py
   65   43   0      11    11   wsgicgi.py
 2251 1770 32      120   329   total
~2005
                     ttfinfo
$ cd fonts/truetype/
$ ttfinfo.py -a maxp.numGlyphs
  -a kern.nPairs -a head.unitsPerEm A*.ttf

 249      0   1000   AmericanTypewriter.ttf
1320   3072   2048   Arial.ttf
 245   1536   2048   ArialBlack.ttf
1320   3072   2048   ArialBold.ttf
 956   3072   2048   ArialBoldItalic.ttf
 956   3072   2048   ArialItalic.ttf
 244    384   2048   ArialNarrow.ttf
 245    384   2048   ArialNarrowBold.ttf
 244    384   2048   ArialNarrowBoldItalic.ttf
 244    384   2048   ArialNarrowItalic.ttf
 243   1536   2048   ArialRoundedMTBold.ttf
2007…
                   pyinfo
$ cd mercurial/hgweb
$ pyinfo.py -a nclass:ndef:ncalls:ndiffkw *.py

nclass   ndef   ncalls   ndiffkw   file
     0      2        2         3   __init__.py
     1      9       31        18   common.py
     1     60      492        24   hgweb_mod.py
     1     15      133        23   hgwebdir_mod.py
     0     11      121        21   protocol.py
     1     12       30        16   request.py
     6     24      104        18   server.py
     0     14       50        15   webcommands.py
     0      3       15        13   wsgicgi.py
    10    150      978             total
2007…
                   pdfinfo
$ cd brandeins/200805_bildung
$ pdfinfo.py -a npages:nimgs:author *.pdf

npages   nimgs   author    file
     1       1   Kathrin   802053_008b10508m.pdf
     1       0   Kathrin   802055_010b10508w.pdf
     2       0   Kathrin   802056_012b10508m.pdf
     2       1   Kathrin   802057_018b10508d.pdf
     1       1   Kathrin   802060_020b10508m.pdf
     9       8   n/a       802064_022b10508d.pdf
     8       8   Kathrin   802067_036b10508w.pdf
     2       0   Kathrin   803048_136b10508w.pdf
    26      19             total
Fileinfo
Big Picture
• Describe input files & attributes
• Locate input files
• Investigate file attributes
• Process file attributes
• Present tabular output
Input Files Examples
• fileinfo [opts] /mypath/*.pdf
• fileinfo [opts] $(find /mypath -name "*.py")
• fileinfo [opts] $(mdfind -onlyin /mypath
  -name "*.py")
Attributes Examples
• --attrs nclasses:ndefs
• --sort size:ndefs
• --filter "rec.ndefs > 1000"
Output Formats
• Text, HTML, CSV, ReST (simple)
• Cocoa, WxPython
• Django
Selected Plug-ins
General    XML             PDF         Python      Quicktime
counter    nattrs          title       ndefs       duration
wc         ndattrs         author      nclasses    box
lc         ntags           producer    ncalls      datasize
md5        ndtags          creation   nstrs       ntracks
           depht               date    ndocstrs
OS                         npages      nkws        OS X bundles
uid        TTF             nimgs       ndkws       bundlename
username   kern.nPairs                 nimpstmts   bundleversion
mtime      maxp.numGlyphs MP3          nops
size       maxp.version    album       mlw         Spotlight
level      head.unitsPerEm artist      mil, …      kMDItem*
Examples
$ cd /Data/brandeins/200712_design
$ fileinfo --format rest-simple -a npages:nimgs 
  -f "rec.nimgs > 2" *.pdf
====== ===== =====================
npages nimgs path
====== ===== =====================
    11      3 540237_058b11207s.pdf
     8     18 540238_070b11207r.pdf
     7     11 540240_082b11207a.pdf
     9      9 540242_096b11207f.pdf
     3      5 540243_106b11207r.pdf
    11     15 540244_110b11207s.pdf
     7      8 540245_122b11207s.pdf
     2      3 540246_136b11207s.pdf
     2      3 540248_148b11207d.pdf
     6      6 540252_138b11207b.pdf
     8      6 540260_026b11207h.pdf
     6      5 540261_038b11207o.pdf
     8     10 540262_048b11207m.pdf
     6      6 540263_156b11207d.pdf
     7      6 540265_170b11207h.pdf
   101    114 total
====== ===== =====================
Implementation
PDF-Plugin (1)
class PDFInvestigator(BaseInvestigator):
    "A class for determining attributes of PDF files."

    attrMap = {
        "title": "getTitle",
        "author": "getAuthor",
        "producer": "getProducer",
        "creationdate": "getCreationDate",
        "npages": "getNumPdfPages",
        "nimgs": "getNumImages",
    }

    totals = ("npages", "nimgs")

    def activate(self):
        "Try activating self, setting 'active' variable."

        # calculate self.active...

        return self.active
PDF-Plugin (2)
def getNumPdfPages(self):
    "Return the number of pages in a PDF document."

    try:
        # uses PyPdf
        res = self.input.getNumPages()
    except:
        res = "n/a"

    return res
PDF-Plugin (3)
def getNumImages(self):
    "Return the number of images in a PDF document."

    expr = r"d+ +d+ +obj.*?endobjs+(?:%.*?[rn])?"
    objPat = re.compile(expr, re.M | re.S)
    items = re.findall(objPat, self.content)
    for p in [ re.compile("/%ss*/%s" % (k, v), re.M | re.S)
        for (k, v) in [("Type", "XObject"), ("Subtype", "Image")]]:
        items = [i for i in items if re.search(p, i) != None]

    return len(items)
An Aside: Spotlight
Spotlight
• Desktop file search
• Mac OS X 10.4 and 10.5
• Deeply integrated in Mac OS X
• Index-based, with attributes
• Results based on relevance and recency
• Plug-ins/API for custom file formats
• GUI & command-line
Spotlight Menu
Spotlight Window
Spotlight
$ mdfind europython | egrep ".pdf$"

/Users/dinu/Desktop/EuroPython2008Timetable.pdf
/Users/dinu/Developer/Python/fileinfo/presentation/fileinfo-slides.pdf
/Data/Perso/CV/cv-dg.pdf
/Users/dinu/Library/Mail Downloads/cv-dg.pdf
/Users/dinu/Developer/Python/epc2008/badge_data.pdf
/Data/Docs/dev/The Python Papers/ThePythonPapersVolume2Issue4.pdf
/Data/Docs/dev/The Python Papers/ThePythonPapersVolume3Issue1.pdf
/Data/Docs/dev/The Python Papers/ThePythonPapersVolume2Issue3.pdf
/Data/Docs/dev/The Python Papers/The Python Papers Volume 2, Issue 2.pdf
/Data/Docs/dev/The Python Papers/The Python Papers Volume 2, Issue 1.pdf
/Users/dinu/Developer/Python/epc2008/badge_data-hpda.pdf
/Users/dinu/Developer/Python/epc2008/badge_data-hpda-sliced.pdf
/Data/Perso/Travel/Vilnius2008/EuroPython 2008 Invoice.pdf
/Users/dinu/Developer/Python/hipsterpda/output/badges.pdf
...
Spotlight – Pro
• Great index/search technology
• Very fast, useful and easy to use
• ~125 search attributes in Mac OS X 10.5
  (e.g. Aperture, Composer, …)
• Extensible (Python plug-in available)
Spotlight – Con
• Result on command-line not in table form
• Result in GUI is always a list of file names +
  the attributes, that the Finder (!) knows
• Weak on providing overview
• Mac OS X only
Future
Issues
• Testing, debugging & refactoring, …
• Better folder handling (e.g. OS X bundles)
• Attribute namespaces (pdf.npages)?
• Attribute parameters (nattr#h2)?
• Attribute Null values (”n/a“)?
• Better dependancies handling
More Features?
• Output format plug-ins?
• Pylint plug-in for fileinfo?
• Fileinfo Python plug-in pyinfo.py?
• Plug-ins for functions like total()
• Access intra-file dataset attributes?
• Multi-line attribute values?
• ”Abreviations“ for attribute lists?
• Derived attributes (ncomments/loc)?
Summary
• Useful as general purpose attribute ”browser“
• Access to Spotlight meta-data (Mac OS X)
• Easy to write plug-ins
• Fileinfo not like Spotlight (no index/search)
• More like iTunes (on the command-line ;-)
Links
•   http://www.dinu-gherman.net/tmp/
    fileinfo-0.3.2.tar.gz

•   http://developer.apple.com/macosx/spotlight.html

•   http://www.apple.com/downloads/macosx/
    spotlight/

•   http://toxicsoftware.com/
    python_metadata_importer_106_released/
Questions?

Mais conteúdo relacionado

Mais procurados

Apache Hadoop Shell Rewrite
Apache Hadoop Shell RewriteApache Hadoop Shell Rewrite
Apache Hadoop Shell RewriteAllen Wittenauer
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWJonathan Katz
 
How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...
How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...
How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...Anton
 
Py conkr 20150829_docker-python
Py conkr 20150829_docker-pythonPy conkr 20150829_docker-python
Py conkr 20150829_docker-pythonEric Ahn
 
Web Scraping with Python
Web Scraping with PythonWeb Scraping with Python
Web Scraping with PythonPaul Schreiber
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117exsuns
 
Downloading the internet with Python + Scrapy
Downloading the internet with Python + ScrapyDownloading the internet with Python + Scrapy
Downloading the internet with Python + ScrapyErin Shellman
 
Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarOperational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarMongoDB
 
Commands documentaion
Commands documentaionCommands documentaion
Commands documentaionTejalNijai
 
Back to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDBBack to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDBMongoDB
 
Keep it simple web development stack
Keep it simple web development stackKeep it simple web development stack
Keep it simple web development stackEric Ahn
 
Vancouver presentation
Vancouver presentationVancouver presentation
Vancouver presentationColleen_Murphy
 
RESTing with the new Yandex.Disk API, Clemens Аuer
RESTing with the new Yandex.Disk API, Clemens АuerRESTing with the new Yandex.Disk API, Clemens Аuer
RESTing with the new Yandex.Disk API, Clemens АuerYandex
 
PyCon Russian 2015 - Dive into full text search with python.
PyCon Russian 2015 - Dive into full text search with python.PyCon Russian 2015 - Dive into full text search with python.
PyCon Russian 2015 - Dive into full text search with python.Andrii Soldatenko
 
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...MongoDB
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRick Copeland
 

Mais procurados (20)

Apache Hadoop Shell Rewrite
Apache Hadoop Shell RewriteApache Hadoop Shell Rewrite
Apache Hadoop Shell Rewrite
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
 
How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...
How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...
How to Scrap Any Website's content using ScrapyTutorial of How to scrape (cra...
 
Py conkr 20150829_docker-python
Py conkr 20150829_docker-pythonPy conkr 20150829_docker-python
Py conkr 20150829_docker-python
 
Web Scraping with Python
Web Scraping with PythonWeb Scraping with Python
Web Scraping with Python
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
Downloading the internet with Python + Scrapy
Downloading the internet with Python + ScrapyDownloading the internet with Python + Scrapy
Downloading the internet with Python + Scrapy
 
Operational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB WebinarOperational Intelligence with MongoDB Webinar
Operational Intelligence with MongoDB Webinar
 
Commands documentaion
Commands documentaionCommands documentaion
Commands documentaion
 
Back to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDBBack to Basics, webinar 2: La tua prima applicazione MongoDB
Back to Basics, webinar 2: La tua prima applicazione MongoDB
 
Keep it simple web development stack
Keep it simple web development stackKeep it simple web development stack
Keep it simple web development stack
 
Vancouver presentation
Vancouver presentationVancouver presentation
Vancouver presentation
 
RESTing with the new Yandex.Disk API, Clemens Аuer
RESTing with the new Yandex.Disk API, Clemens АuerRESTing with the new Yandex.Disk API, Clemens Аuer
RESTing with the new Yandex.Disk API, Clemens Аuer
 
Web Scrapping with Python
Web Scrapping with PythonWeb Scrapping with Python
Web Scrapping with Python
 
PyCon Russian 2015 - Dive into full text search with python.
PyCon Russian 2015 - Dive into full text search with python.PyCon Russian 2015 - Dive into full text search with python.
PyCon Russian 2015 - Dive into full text search with python.
 
Indexing
IndexingIndexing
Indexing
 
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
MongoDB World 2019: Exploring your MongoDB Data with Pirates (R) and Snakes (...
 
Page compression. PGCON_2016
Page compression. PGCON_2016Page compression. PGCON_2016
Page compression. PGCON_2016
 
Routing @ Scuk.cz
Routing @ Scuk.czRouting @ Scuk.cz
Routing @ Scuk.cz
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
 

Destaque

Anthony Rhys, ICT Coordinator and Teacher, Trinity Fields School & Gesture SEN
Anthony Rhys, ICT Coordinator and Teacher, Trinity Fields School & Gesture SENAnthony Rhys, ICT Coordinator and Teacher, Trinity Fields School & Gesture SEN
Anthony Rhys, ICT Coordinator and Teacher, Trinity Fields School & Gesture SENKarel Van Isacker
 
11 M-CARE: Environmental control
11 M-CARE: Environmental control11 M-CARE: Environmental control
11 M-CARE: Environmental controlKarel Van Isacker
 
CSR Food Programme Cuba Denmark
CSR Food Programme Cuba DenmarkCSR Food Programme Cuba Denmark
CSR Food Programme Cuba DenmarkJens Eybye
 
The Joy of SciPy, Part I
The Joy of SciPy, Part IThe Joy of SciPy, Part I
The Joy of SciPy, Part IDinu Gherman
 
Visualizing Relationships between Python objects - EuroPython 2008
Visualizing Relationships between Python objects - EuroPython 2008Visualizing Relationships between Python objects - EuroPython 2008
Visualizing Relationships between Python objects - EuroPython 2008Dinu Gherman
 
David Stewart, HeadTeacher, Oakfield School and Sports College
David Stewart, HeadTeacher, Oakfield School and Sports CollegeDavid Stewart, HeadTeacher, Oakfield School and Sports College
David Stewart, HeadTeacher, Oakfield School and Sports CollegeKarel Van Isacker
 
Electronic Brochure
Electronic BrochureElectronic Brochure
Electronic Brochurerwilkerson
 
MMV Programbeskrivelse 2016-1 SDU
MMV Programbeskrivelse 2016-1 SDUMMV Programbeskrivelse 2016-1 SDU
MMV Programbeskrivelse 2016-1 SDUJens Eybye
 
Twittori - Twittwoch Berlin
Twittori - Twittwoch BerlinTwittori - Twittwoch Berlin
Twittori - Twittwoch BerlinDinu Gherman
 
Evolution - what is the future of neurogaming? Kim Baden-Kristensen, CEO and ...
Evolution - what is the future of neurogaming? Kim Baden-Kristensen, CEO and ...Evolution - what is the future of neurogaming? Kim Baden-Kristensen, CEO and ...
Evolution - what is the future of neurogaming? Kim Baden-Kristensen, CEO and ...Karel Van Isacker
 
ReportLab Paragraphs Reloaded-EuroPython 2008
ReportLab Paragraphs Reloaded-EuroPython 2008ReportLab Paragraphs Reloaded-EuroPython 2008
ReportLab Paragraphs Reloaded-EuroPython 2008Dinu Gherman
 
Plastic kills - Labels
Plastic kills - LabelsPlastic kills - Labels
Plastic kills - LabelsDinu Gherman
 
Hipster PDA, A4, Filofax Pocket
Hipster PDA, A4, Filofax PocketHipster PDA, A4, Filofax Pocket
Hipster PDA, A4, Filofax PocketDinu Gherman
 

Destaque (18)

Anthony Rhys, ICT Coordinator and Teacher, Trinity Fields School & Gesture SEN
Anthony Rhys, ICT Coordinator and Teacher, Trinity Fields School & Gesture SENAnthony Rhys, ICT Coordinator and Teacher, Trinity Fields School & Gesture SEN
Anthony Rhys, ICT Coordinator and Teacher, Trinity Fields School & Gesture SEN
 
11 M-CARE: Environmental control
11 M-CARE: Environmental control11 M-CARE: Environmental control
11 M-CARE: Environmental control
 
CSR Food Programme Cuba Denmark
CSR Food Programme Cuba DenmarkCSR Food Programme Cuba Denmark
CSR Food Programme Cuba Denmark
 
The Joy of SciPy, Part I
The Joy of SciPy, Part IThe Joy of SciPy, Part I
The Joy of SciPy, Part I
 
Visualizing Relationships between Python objects - EuroPython 2008
Visualizing Relationships between Python objects - EuroPython 2008Visualizing Relationships between Python objects - EuroPython 2008
Visualizing Relationships between Python objects - EuroPython 2008
 
Gsa Award
Gsa AwardGsa Award
Gsa Award
 
David Stewart, HeadTeacher, Oakfield School and Sports College
David Stewart, HeadTeacher, Oakfield School and Sports CollegeDavid Stewart, HeadTeacher, Oakfield School and Sports College
David Stewart, HeadTeacher, Oakfield School and Sports College
 
Electronic Brochure
Electronic BrochureElectronic Brochure
Electronic Brochure
 
M-CARE newsletter 6
M-CARE newsletter 6M-CARE newsletter 6
M-CARE newsletter 6
 
MMV Programbeskrivelse 2016-1 SDU
MMV Programbeskrivelse 2016-1 SDUMMV Programbeskrivelse 2016-1 SDU
MMV Programbeskrivelse 2016-1 SDU
 
Twittori - Twittwoch Berlin
Twittori - Twittwoch BerlinTwittori - Twittwoch Berlin
Twittori - Twittwoch Berlin
 
8 M-CARE: Хигиена
8 M-CARE: Хигиена8 M-CARE: Хигиена
8 M-CARE: Хигиена
 
Evolution - what is the future of neurogaming? Kim Baden-Kristensen, CEO and ...
Evolution - what is the future of neurogaming? Kim Baden-Kristensen, CEO and ...Evolution - what is the future of neurogaming? Kim Baden-Kristensen, CEO and ...
Evolution - what is the future of neurogaming? Kim Baden-Kristensen, CEO and ...
 
ReportLab Paragraphs Reloaded-EuroPython 2008
ReportLab Paragraphs Reloaded-EuroPython 2008ReportLab Paragraphs Reloaded-EuroPython 2008
ReportLab Paragraphs Reloaded-EuroPython 2008
 
Plastic kills - Labels
Plastic kills - LabelsPlastic kills - Labels
Plastic kills - Labels
 
Hipster PDA, A4, Filofax Pocket
Hipster PDA, A4, Filofax PocketHipster PDA, A4, Filofax Pocket
Hipster PDA, A4, Filofax Pocket
 
8 M-CARE: Hygiene
8 M-CARE: Hygiene8 M-CARE: Hygiene
8 M-CARE: Hygiene
 
Sociocracy
SociocracySociocracy
Sociocracy
 

Semelhante a Accessing File-Specific Attributes on Steroids - EuroPython 2008

Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)DECK36
 
2012 coscup - Build your PHP application on Heroku
2012 coscup - Build your PHP application on Heroku2012 coscup - Build your PHP application on Heroku
2012 coscup - Build your PHP application on Herokuronnywang_tw
 
Querying 1.8 billion reddit comments with python
Querying 1.8 billion reddit comments with pythonQuerying 1.8 billion reddit comments with python
Querying 1.8 billion reddit comments with pythonDaniel Rodriguez
 
Improving go-git performance
Improving go-git performanceImproving go-git performance
Improving go-git performancesource{d}
 
Digital RSE: automated code quality checks - RSE group meeting
Digital RSE: automated code quality checks - RSE group meetingDigital RSE: automated code quality checks - RSE group meeting
Digital RSE: automated code quality checks - RSE group meetingHenry Schreiner
 
PyCon 2013 : Scripting to PyPi to GitHub and More
PyCon 2013 : Scripting to PyPi to GitHub and MorePyCon 2013 : Scripting to PyPi to GitHub and More
PyCon 2013 : Scripting to PyPi to GitHub and MoreMatt Harrison
 
Drupal Day 2012 - Automating Drupal Development: Make!les, Features and Beyond
Drupal Day 2012 - Automating Drupal Development: Make!les, Features and BeyondDrupal Day 2012 - Automating Drupal Development: Make!les, Features and Beyond
Drupal Day 2012 - Automating Drupal Development: Make!les, Features and BeyondDrupalDay
 
Python Load Testing - Pygotham 2012
Python Load Testing - Pygotham 2012Python Load Testing - Pygotham 2012
Python Load Testing - Pygotham 2012Dan Kuebrich
 
Nagios Conference 2014 - Rob Hassing - How To Maintain Over 20 Monitoring App...
Nagios Conference 2014 - Rob Hassing - How To Maintain Over 20 Monitoring App...Nagios Conference 2014 - Rob Hassing - How To Maintain Over 20 Monitoring App...
Nagios Conference 2014 - Rob Hassing - How To Maintain Over 20 Monitoring App...Nagios
 
Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4Ilya Haykinson
 
Effective Linux Development Using PetaLinux Tools 2017.4
Effective Linux Development Using PetaLinux Tools 2017.4Effective Linux Development Using PetaLinux Tools 2017.4
Effective Linux Development Using PetaLinux Tools 2017.4Zach Pfeffer
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Jimmy Lai
 
Automating Drupal Development: Makefiles, features and beyond
Automating Drupal Development: Makefiles, features and beyondAutomating Drupal Development: Makefiles, features and beyond
Automating Drupal Development: Makefiles, features and beyondNuvole
 
Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014biicode
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsemBO_Conference
 
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...Athens Big Data
 
On the Edge Systems Administration with Golang
On the Edge Systems Administration with GolangOn the Edge Systems Administration with Golang
On the Edge Systems Administration with GolangChris McEniry
 

Semelhante a Accessing File-Specific Attributes on Steroids - EuroPython 2008 (20)

Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)Our Puppet Story (GUUG FFG 2015)
Our Puppet Story (GUUG FFG 2015)
 
2012 coscup - Build your PHP application on Heroku
2012 coscup - Build your PHP application on Heroku2012 coscup - Build your PHP application on Heroku
2012 coscup - Build your PHP application on Heroku
 
Querying 1.8 billion reddit comments with python
Querying 1.8 billion reddit comments with pythonQuerying 1.8 billion reddit comments with python
Querying 1.8 billion reddit comments with python
 
Improving go-git performance
Improving go-git performanceImproving go-git performance
Improving go-git performance
 
Digital RSE: automated code quality checks - RSE group meeting
Digital RSE: automated code quality checks - RSE group meetingDigital RSE: automated code quality checks - RSE group meeting
Digital RSE: automated code quality checks - RSE group meeting
 
PyCon 2013 : Scripting to PyPi to GitHub and More
PyCon 2013 : Scripting to PyPi to GitHub and MorePyCon 2013 : Scripting to PyPi to GitHub and More
PyCon 2013 : Scripting to PyPi to GitHub and More
 
Drupal Day 2012 - Automating Drupal Development: Make!les, Features and Beyond
Drupal Day 2012 - Automating Drupal Development: Make!les, Features and BeyondDrupal Day 2012 - Automating Drupal Development: Make!les, Features and Beyond
Drupal Day 2012 - Automating Drupal Development: Make!les, Features and Beyond
 
Python Load Testing - Pygotham 2012
Python Load Testing - Pygotham 2012Python Load Testing - Pygotham 2012
Python Load Testing - Pygotham 2012
 
Nagios Conference 2014 - Rob Hassing - How To Maintain Over 20 Monitoring App...
Nagios Conference 2014 - Rob Hassing - How To Maintain Over 20 Monitoring App...Nagios Conference 2014 - Rob Hassing - How To Maintain Over 20 Monitoring App...
Nagios Conference 2014 - Rob Hassing - How To Maintain Over 20 Monitoring App...
 
Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4Why and How Powershell will rule the Command Line - Barcamp LA 4
Why and How Powershell will rule the Command Line - Barcamp LA 4
 
Malcon2017
Malcon2017Malcon2017
Malcon2017
 
R sharing 101
R sharing 101R sharing 101
R sharing 101
 
Effective Linux Development Using PetaLinux Tools 2017.4
Effective Linux Development Using PetaLinux Tools 2017.4Effective Linux Development Using PetaLinux Tools 2017.4
Effective Linux Development Using PetaLinux Tools 2017.4
 
Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013Big data analysis in python @ PyCon.tw 2013
Big data analysis in python @ PyCon.tw 2013
 
Automating Drupal Development: Makefiles, features and beyond
Automating Drupal Development: Makefiles, features and beyondAutomating Drupal Development: Makefiles, features and beyond
Automating Drupal Development: Makefiles, features and beyond
 
Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014Dependencies Managers in C/C++. Using stdcpp 2014
Dependencies Managers in C/C++. Using stdcpp 2014
 
Profiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf ToolsProfiling your Applications using the Linux Perf Tools
Profiling your Applications using the Linux Perf Tools
 
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
22nd Athens Big Data Meetup - 1st Talk - MLOps Workshop: The Full ML Lifecycl...
 
On the Edge Systems Administration with Golang
On the Edge Systems Administration with GolangOn the Edge Systems Administration with Golang
On the Edge Systems Administration with Golang
 
Unix Philosophy
Unix PhilosophyUnix Philosophy
Unix Philosophy
 

Último

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Último (20)

Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Accessing File-Specific Attributes on Steroids - EuroPython 2008

  • 1. Accessing File-Specific Attributes on Steroids Dinu C. Gherman gherman@python.net EuroPython Conference 2008-07-07, Vilnius
  • 2. Motivation • Get quick overview of file attributes for multiple files • Compare attribute values between files • Identify groups of files • Reuse overview results • Avoid “opening” files with applications
  • 4. ~1971(?) wc $ cd mercurial/hgweb $ wc -lwc *.py 16 66 502 __init__.py 118 438 3988 common.py 993 2876 36064 hgweb_mod.py 305 910 12420 hgwebdir_mod.py 228 683 7258 protocol.py 101 320 3577 request.py 298 863 10698 server.py 127 414 3907 webcommands.py 65 190 2090 wsgicgi.py 2251 6760 80504 total
  • 5. ~2002 pycount $ cd mercurial/hgweb $ pycount2.py *.py lines code doc comment blank file 16 5 0 7 4 __init__.py 118 77 22 8 11 common.py 993 809 3 31 150 hgweb_mod.py 305 249 0 17 39 hgwebdir_mod.py 228 174 0 19 35 protocol.py 101 76 2 7 16 request.py 298 244 5 10 39 server.py 127 93 0 10 24 webcommands.py 65 43 0 11 11 wsgicgi.py 2251 1770 32 120 329 total
  • 6. ~2005 ttfinfo $ cd fonts/truetype/ $ ttfinfo.py -a maxp.numGlyphs -a kern.nPairs -a head.unitsPerEm A*.ttf 249 0 1000 AmericanTypewriter.ttf 1320 3072 2048 Arial.ttf 245 1536 2048 ArialBlack.ttf 1320 3072 2048 ArialBold.ttf 956 3072 2048 ArialBoldItalic.ttf 956 3072 2048 ArialItalic.ttf 244 384 2048 ArialNarrow.ttf 245 384 2048 ArialNarrowBold.ttf 244 384 2048 ArialNarrowBoldItalic.ttf 244 384 2048 ArialNarrowItalic.ttf 243 1536 2048 ArialRoundedMTBold.ttf
  • 7. 2007… pyinfo $ cd mercurial/hgweb $ pyinfo.py -a nclass:ndef:ncalls:ndiffkw *.py nclass ndef ncalls ndiffkw file 0 2 2 3 __init__.py 1 9 31 18 common.py 1 60 492 24 hgweb_mod.py 1 15 133 23 hgwebdir_mod.py 0 11 121 21 protocol.py 1 12 30 16 request.py 6 24 104 18 server.py 0 14 50 15 webcommands.py 0 3 15 13 wsgicgi.py 10 150 978 total
  • 8. 2007… pdfinfo $ cd brandeins/200805_bildung $ pdfinfo.py -a npages:nimgs:author *.pdf npages nimgs author file 1 1 Kathrin 802053_008b10508m.pdf 1 0 Kathrin 802055_010b10508w.pdf 2 0 Kathrin 802056_012b10508m.pdf 2 1 Kathrin 802057_018b10508d.pdf 1 1 Kathrin 802060_020b10508m.pdf 9 8 n/a 802064_022b10508d.pdf 8 8 Kathrin 802067_036b10508w.pdf 2 0 Kathrin 803048_136b10508w.pdf 26 19 total
  • 10. Big Picture • Describe input files & attributes • Locate input files • Investigate file attributes • Process file attributes • Present tabular output
  • 11. Input Files Examples • fileinfo [opts] /mypath/*.pdf • fileinfo [opts] $(find /mypath -name "*.py") • fileinfo [opts] $(mdfind -onlyin /mypath -name "*.py")
  • 12. Attributes Examples • --attrs nclasses:ndefs • --sort size:ndefs • --filter "rec.ndefs > 1000"
  • 13. Output Formats • Text, HTML, CSV, ReST (simple) • Cocoa, WxPython • Django
  • 14. Selected Plug-ins General XML PDF Python Quicktime counter nattrs title ndefs duration wc ndattrs author nclasses box lc ntags producer ncalls datasize md5 ndtags creation nstrs ntracks depht date ndocstrs OS npages nkws OS X bundles uid TTF nimgs ndkws bundlename username kern.nPairs nimpstmts bundleversion mtime maxp.numGlyphs MP3 nops size maxp.version album mlw Spotlight level head.unitsPerEm artist mil, … kMDItem*
  • 16. $ cd /Data/brandeins/200712_design $ fileinfo --format rest-simple -a npages:nimgs -f "rec.nimgs > 2" *.pdf ====== ===== ===================== npages nimgs path ====== ===== ===================== 11 3 540237_058b11207s.pdf 8 18 540238_070b11207r.pdf 7 11 540240_082b11207a.pdf 9 9 540242_096b11207f.pdf 3 5 540243_106b11207r.pdf 11 15 540244_110b11207s.pdf 7 8 540245_122b11207s.pdf 2 3 540246_136b11207s.pdf 2 3 540248_148b11207d.pdf 6 6 540252_138b11207b.pdf 8 6 540260_026b11207h.pdf 6 5 540261_038b11207o.pdf 8 10 540262_048b11207m.pdf 6 6 540263_156b11207d.pdf 7 6 540265_170b11207h.pdf 101 114 total ====== ===== =====================
  • 17.
  • 18.
  • 20. PDF-Plugin (1) class PDFInvestigator(BaseInvestigator): "A class for determining attributes of PDF files." attrMap = { "title": "getTitle", "author": "getAuthor", "producer": "getProducer", "creationdate": "getCreationDate", "npages": "getNumPdfPages", "nimgs": "getNumImages", } totals = ("npages", "nimgs") def activate(self): "Try activating self, setting 'active' variable." # calculate self.active... return self.active
  • 21. PDF-Plugin (2) def getNumPdfPages(self): "Return the number of pages in a PDF document." try: # uses PyPdf res = self.input.getNumPages() except: res = "n/a" return res
  • 22. PDF-Plugin (3) def getNumImages(self): "Return the number of images in a PDF document." expr = r"d+ +d+ +obj.*?endobjs+(?:%.*?[rn])?" objPat = re.compile(expr, re.M | re.S) items = re.findall(objPat, self.content) for p in [ re.compile("/%ss*/%s" % (k, v), re.M | re.S) for (k, v) in [("Type", "XObject"), ("Subtype", "Image")]]: items = [i for i in items if re.search(p, i) != None] return len(items)
  • 24. Spotlight • Desktop file search • Mac OS X 10.4 and 10.5 • Deeply integrated in Mac OS X • Index-based, with attributes • Results based on relevance and recency • Plug-ins/API for custom file formats • GUI & command-line
  • 27. Spotlight $ mdfind europython | egrep ".pdf$" /Users/dinu/Desktop/EuroPython2008Timetable.pdf /Users/dinu/Developer/Python/fileinfo/presentation/fileinfo-slides.pdf /Data/Perso/CV/cv-dg.pdf /Users/dinu/Library/Mail Downloads/cv-dg.pdf /Users/dinu/Developer/Python/epc2008/badge_data.pdf /Data/Docs/dev/The Python Papers/ThePythonPapersVolume2Issue4.pdf /Data/Docs/dev/The Python Papers/ThePythonPapersVolume3Issue1.pdf /Data/Docs/dev/The Python Papers/ThePythonPapersVolume2Issue3.pdf /Data/Docs/dev/The Python Papers/The Python Papers Volume 2, Issue 2.pdf /Data/Docs/dev/The Python Papers/The Python Papers Volume 2, Issue 1.pdf /Users/dinu/Developer/Python/epc2008/badge_data-hpda.pdf /Users/dinu/Developer/Python/epc2008/badge_data-hpda-sliced.pdf /Data/Perso/Travel/Vilnius2008/EuroPython 2008 Invoice.pdf /Users/dinu/Developer/Python/hipsterpda/output/badges.pdf ...
  • 28. Spotlight – Pro • Great index/search technology • Very fast, useful and easy to use • ~125 search attributes in Mac OS X 10.5 (e.g. Aperture, Composer, …) • Extensible (Python plug-in available)
  • 29. Spotlight – Con • Result on command-line not in table form • Result in GUI is always a list of file names + the attributes, that the Finder (!) knows • Weak on providing overview • Mac OS X only
  • 31. Issues • Testing, debugging & refactoring, … • Better folder handling (e.g. OS X bundles) • Attribute namespaces (pdf.npages)? • Attribute parameters (nattr#h2)? • Attribute Null values (”n/a“)? • Better dependancies handling
  • 32. More Features? • Output format plug-ins? • Pylint plug-in for fileinfo? • Fileinfo Python plug-in pyinfo.py? • Plug-ins for functions like total() • Access intra-file dataset attributes? • Multi-line attribute values? • ”Abreviations“ for attribute lists? • Derived attributes (ncomments/loc)?
  • 33. Summary • Useful as general purpose attribute ”browser“ • Access to Spotlight meta-data (Mac OS X) • Easy to write plug-ins • Fileinfo not like Spotlight (no index/search) • More like iTunes (on the command-line ;-)
  • 34. Links • http://www.dinu-gherman.net/tmp/ fileinfo-0.3.2.tar.gz • http://developer.apple.com/macosx/spotlight.html • http://www.apple.com/downloads/macosx/ spotlight/ • http://toxicsoftware.com/ python_metadata_importer_106_released/