A wise hacker said: Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Regular expressions are a powerful tool in our hands and a first class citizen in ruby so it is tempting to overuse them. But knowing them and using them properly is a fundamental asset of every developer. We'll see hands-on examples of proper Reg Exps usage in ruby code, we'll look at bad and ugly cases, and learn how to approach writing and debugging them.
5. @lmea
Regexp syntax
literals: /cat/ matches any ‘cat’ substring
the dot: /./ matches any character
character classes: /[aeiou]/ /[a-z]/ /[01]/
negated character classes: /[^abc]/
6. @lmea
Regexp syntax
case insensitive: /./i
only interpolate #{} blocks once: /./o
multiline mode - '.' will match newline: /./m
extended mode - whitespace is ignored: /./x
Modifiers
7. @lmea
Regexp syntax
/d/ digit /D/ non digit
/s/ whitespace /S/ non whitespace
/w/ word character /W/ non word character
/h/ hexdigit /H/ non hexdigit
Shorthand classes
8. @lmea
Regexp syntax
/^/ beginning of line /$/ end of line
/b/ word boundary /B/ non word boundary
/A/ beginning of string /z/ end of string
/Z/
end of string. If string
ends with a newline,
it matches just
before newline
Anchors
9. @lmea
Regexp syntax
alternation: /cat|dog/ matches ‘cats and dogs’
0-or-more: /ab*/ matches ‘a’ ‘ab’ ‘abb’...
1-or-more: /ab+/ matches ‘ab’ ‘abb’ ...
given-number: /ab{2}/ matches ‘abb’ but not
‘ab’ or the whole ‘abbb’ string
10. @lmea
Regexp syntax
greedy matches: /.+cat/ matches ‘the cat is
catching a mouse’
lazy matches: /.+?scat/ matches ‘the cat is
catching a mouse’
11. @lmea
Regexp syntax
grouping: /(d{3}.){3}d{3}/ matches IP-
like strings
capturing: /a (cat|dog)/ the match is
captured in $1 to be used later
non capturing: /a (?:cat|dog)/ no content
captured
atomic grouping: /(?>a+)/ doesn’t backtrack
12. @lmea
String substitution
"My cat eats catfood".sub(/cat/, "dog")
# => My dog eats catfood
"My cat eats catfood".gsub(/cat/, "dog")
# => My dog eats dogfood
"My cat eats catfood".gsub(/bcat(w+)/, "dog1")
# => My cat eats dogfood
"My cat eats catfood".gsub(/bcat(w+)/){|m| $1.reverse}
# => My cat eats doof
13. @lmea
String parsing
"Codemotion Rome: Mar 20 to Mar 23".scan(/w{3} d{1,2}/)
# => ["Mar 20", "Mar 23"]
"Codemotion Rome: Mar 20 to Mar 23".scan(/(w{3}) (d{1,2})/)
# => [["Mar", "20"], ["Mar", "23"]]
"Codemotion Rome: Mar 20 to Mar 23".scan(/(w{3}) (d{1,2})/)
{|a,b| puts b+"/"+a}
# 20/Mar
# 23/Mar
# => "Codemotion Rome: Mar 20 to Mar 23"
14. @lmea
Regexp methods
if "what a wonderful world" =~ /(world)/
puts "hello #{$1.upcase}"
end
# hello WORLD
if /(world)/.match("The world")
puts "hello #{$1.upcase}"
end
# hello WORLD
match_data = /(world)/.match("The world")
puts "hello #{match_data[1].upcase}"
# hello WORLD
16. @lmea
Rails examples
# in ActiveModel::Validations::NumericalityValidator
def parse_raw_value_as_an_integer(raw_value)
raw_value.to_i if raw_value.to_s =~ /A[+-]?d+Z/
end
# in ActionDispatch::RemoteIp::IpSpoofAttackError
# IP addresses that are "trusted proxies" that can be stripped from
# the comma-delimited list in the X-Forwarded-For header. See also:
# http://en.wikipedia.org/wiki/Private_network#Private_IPv4_address_spaces
TRUSTED_PROXIES = %r{
^127.0.0.1$ | # localhost
^(10 | # private IP 10.x.x.x
172.(1[6-9]|2[0-9]|3[0-1]) | # private IP in the range 172.16.0.0 .. 172.31.255.255
192.168 # private IP 192.168.x.x
).
}x
WILDCARD_PATH = %r{*([^/)]+))?$}
17. @lmea
Regexps are
dangerous
"If I was going to place a bet on something
about Rails security, it'd be that there are more
regex vulnerabilities in the tree. I am
uncomfortable with how much Rails leans on
regex for policy decisions."
Thomas H. Ptacek (Founder @ Matasano, Feb 2013)
18. @lmea
Tip #1
Beware of nested quantifiers
/(x+x+)+y/ =~ 'xxxxxxxxxy'
/(xx+)+y/ =~ 'xxxxxxxxxx'
/(?>x+x+)+y/ =~ 'xxxxxxxxx'
19. @lmea
Tip #2
Don’t make everything optional
/[-+]?[0-9]*.?[0-9]*/ =~ '.'
/[-+]?([0-9]*.?[0-9]+|[0-9]+)/
/[-+]?[0-9]*.?[0-9]+/
21. @lmea
Tip #4
Capture repeated groups and don’t
repeat a captured group
/!(abc|123)+!/ =~ '!abc123!'
# $1 == '123'
/!((abc|123)+)!/ =~ '!abc123!'
# $1 == 'abc123'
22. @lmea
Tip #5
use interpolation with care
str = "cat"
/#{str}/ =~ "My cat eats catfood"
/#{Regexp.quote(str)}/ =~ "My cat eats catfood"
23. @lmea
Tip #6
Don’t use ^ and $ to match the
strings beginning and end
validates :url, :format => /^https?/
"http://example.com" =~ /^https?/
"javascript:alert('hello!');%0Ahttp://example.com"
"javascript:alert('hello!');nhttp://example.com" =~ /^https?/
"javascript:alert('hello!');nhttp://example.com" =~ /Ahttps?/