Hacker 102 - regexes w/Javascript, Python

hacker 102
code4lib 2010 preconference
Asheville, NC, USA 2010-02-21

iv. regular expressions

JavaScript

if all language
looked like
“aabaaaabbbabaababa”
it’d be
easy to parse

parsing
“aabaaaabbbabaababa”
• there are two
elements, “a” and “b”
• either may occur in
any order
• /([ab]+)/

• [] denotes “elements” or “class”
• // demarcates regex
• + denotes “one or more of previous thing”
• () denotes “remember this matched group”
• /[ab]/ # an ‘a’ or a ‘b’
• /[ab]+/ # one or more ‘a’s or ‘b’s
• /([ab]+)/ # a group of one or more ‘a’s or ‘b’s

• [a-z] is any lower case char bet. a-z
• [0-9] is any digit
• + is one or more of previous thing
• ? is zero or one of previous thing
• | is or, e.g. [a|b] is ‘a’ or ‘b’
• * is zero to many of previous thing
• . matches any character

• [^a-z] is anything *but* [a-z]
• [a-zA-Z0-9] is any of a-z, A-Z, 0-9
• {5} matches only 5 of the preceding thing
• {2,} matches at least 2 of the preceding thing
• {2,6} matches from 2 to 6 of preceding thing
• [d] is like [0-9] (any digit)
• [S] is any non-whitespace

try this

• visit any web page
• open ﬁrebug console
• title = window.document.title
• try regexes to match parts of
the title

most every language
has regex support

v. glue it together

Python

TITLE: ABA journal.
BD. HOLDINGS: Vol. 70 (1984) - Vol. 94 (2008)
CURRENT VOL.: Vol. 95 (2009) -
OTHER LIBRARIES:
Miami:v. 68 (1982) -
USDC: v. 88 (2002) -
Birm.:v. 89 (2003) -
(Formerly: American Bar Association Journal)
(Bound and on Hein)

TITLE: Administrative law review.
BD. HOLDINGS: Vol. 22 (1969/1970) - Vol. 60
(2008)
CURRENT VOL.: Vol. 61 (2009) -
(Bound and on Hein)

#!/usr/bin/env python
import re
re_tag = re.compile(r'([A-Z .]+):')
re_title = re.compile('TITLE: (.*)')
for line in open('journals-carol-bean.txt'):
line = line.strip()
m1 = re_tag.match(line)
m2 = re_title.match(line)
if line == "":
continue
print "n->", line, "<-"
if m1 or m2:
print "MATCH"
if m1:
print 'tag:', m1.groups()
if m2:
print 'title:', m2.groups()

Hacker 102 - regexes w/Javascript, Python

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (14)

Semelhante a Hacker 102 - regexes w/Javascript, Python

Semelhante a Hacker 102 - regexes w/Javascript, Python (20)

Hacker 102 - regexes w/Javascript, Python