5. RE overview
match “foo” replace with “bar”
Perl /foo/ (on $_) s/foo/bar/ (on $_)
Javascript /foo/ “foolish”.replace(/foo/, “bar”)
Vi /foo/ :s/foo/bar/
TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
6. RE overview
match “foo” replace with “bar”
Perl /foo/ (on $_) s/foo/bar/ (on $_)
Javascript /foo/ “foolish”.replace(/foo/, “bar”)
Vi /foo/ :s/foo/bar/
TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
7. RE overview
match “foo” replace with “bar”
Perl /foo/ (on $_) s/foo/bar/ (on $_)
Javascript /foo/ “foolish”.replace(/foo/, “bar”)
Vi /foo/ :s/foo/bar/
TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
15. Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.
/reveal(.*)plain/
/reveal(.*?)plain/
/t.{2,3}t/
16. Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.
/reveal(.*)plain/
/reveal(.*?)plain/
/t.{2,3}t/
17. Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.
/reveal(.*)plain/
/reveal(.*?)plain/
/t.{2,3}t/
18. Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.
/reveal(.*)plain/
/reveal(.*?)plain/
/t.{2,3}t/
29. Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
• more in unicode
30. Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
• more in unicode
• w == word char == cca [0-9a-zA-Z_]
• is complicated in unicode
31. Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
• more in unicode
• w == word char == cca [0-9a-zA-Z_]
• is complicated in unicode
• d == digit == [0-9]
• h == hexadecimal digit == [0-9a-fA-F]
32. Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
• more in unicode
• w == word char == cca [0-9a-zA-Z_]
• is complicated in unicode
• d == digit == [0-9]
• h == hexadecimal digit == [0-9a-fA-F]
• SWD == [^s][^w][^d]
33. Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.
/b[w&&[^aA]]+b/
/W{2,}w+b/
34. Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.
/b[w&&[^aA]]+b/
/W{2,}w+b/
41. Options
• /foo/imsx
• i - case insensitive
• m - multiline (^,$ represent start of string/file)
• s - single line (. matches newlines)
• x - extended!
• g - global
42. Options
• /foo/imsx
• i - case insensitive
• m - multiline (^,$ represent start of string/file)
• s - single line (. matches newlines)
• x - extended!
• g - global
• can be written inline
• (?imsx-imsx)
• (?imsx-imsx:...)
43. Options
• /foo/imsx
• i - case insensitive
• m - multiline (^,$ represent start of string/file)
• s - single line (. matches newlines)
• x - extended!
• g - global (?x-i)
#this is cool
• can be written inline (
foo #my important value
• | #don't forget the alternative
(?imsx-imsx)
bar
• ) # result equals to (foo|bar)
(?imsx-imsx:...)
55. Recursive RE
• very important!
• quote & bracket matching
• technically not part of regular grammar
• two styles
• g<name> or g<n> - TextMate
• (?R) - Perl
56. Example
(?x:
( # match the initial opening parenthesis
# Now make a named group 'balanced' which
# matches a balanced substring.
(?<balanced>
[^()] # A balanced substring is either something
# that is not a parenthesis:
| # …or a parenthesised string:
( # A parenthesised string begins with an opening parenthesis
g<balanced>* # …followed by a sequence of balanced substrings
) # …and ends with a closing parenthesis
)* # Look for a sequence of balanced substrings
) # Finally, the outer closing parenthesis
)
57. Example
(?x:
( # match the initial opening parenthesis
# Now make a named group 'balanced' which
# matches a balanced substring.
(?<balanced>
[^()] # A balanced substring is either something
# that is not a parenthesis:
| # …or a parenthesised string:
( # A parenthesised string begins with an opening parenthesis
g<balanced>* # …followed by a sequence of balanced substrings
) # …and ends with a closing parenthesis
)* # Look for a sequence of balanced substrings
) # Finally, the outer closing parenthesis
)
or: (([^()]|(?R))*)