Regular Expressions is a powerful tool for text and data processing. What kind of support do browsers provide for that? What are those little misconceptions that prevent people from using RE effectively?
The talk gives an overview of the regular expression syntax and typical usage examples.
4. Types of regular expressions
• POSIX (BRE, ERE)
• PCRE = Perl-Compatible Regular Expressions
From the JavaScript language specification:
"The form and functionality of regular expressions
is modelled after the regular expression facility in
the Perl 5 programming language".
4
21. Whitespace
/s/ (inverted version: /S/)
FF:
t
u00a0
u2003
u2009
n
v
u1680 u180e
u2004 u2005
u200a u2028
Chrome, IE 9:
as in FF plus ufeff
f
r
u2000 u2001
u2006 u2007
u2029 u202f
IE 7, 8 :-(
only:
t n v f r u0020
21
u0020
u2002
u2008
u205f u3000
22. Alphanumeric characters
/d/ ~ digits from 0 to 9
/w/ ~ Latin letters, digits, underscore
Does not work for Cyrillic, Greek etc.
Inverted forms:
/D/ ~ anything but digits
/W/ ~ anything but alphanumeric characters
22
77. Representing a character
x09 === t (not Unicode but ASCII/ANSI)
u20AC === € (in Unicode)
backslash takes away special character
meaning:
/()/.test('()')
/n/.test('n')
77
// true
// true
78. Representing a character
x09 === t (not Unicode but ASCII/ANSI)
u20AC === € (in Unicode)
backslash takes away special character
meaning:
/()/.test('()')
/n/.test('n')
// true
// true
...or vice versa!
/f/.test('f') // false!
78
84. Regular expression flags
g i m s x y
global match
ignore case
multiline matching for ^ and $
JavaScript does NOT provide support for:
string as single line
extend pattern
84
85. Regular expression flags
g i m s x y
global match
ignore case
multiline matching for ^ and $
Mozilla-only, non-standard:
sticky
Match only from the .lastIndex index (a
regexp instance property). Thus, ^ can
match at a predefined position.
85
86. Alternative syntax for flags
/(?i)foo/
/(?i-m)bar$/
/(?i-sm).x$/
/(?i)foo(?-i)bar/
Some implementations do NOT support flag
switching on-the-go.
In JS, flags are set for the whole regexp
instance and you can't change them.
86
88. Methods
RegExp instances:
/regexp/.exec('string')
null or array ['whole match', $1, $2, ...]
/regexp/.test('string')
false or true
String instances:
'str'.match(/regexp/)
'str'.match('w{1,3}')
- same as /regexp/.exec if no 'g' flag used;
- array of all matches if 'g' flag used (internal
capturing groups ignored)
'str'.search(/regexp/)
'str'.search('w{1,3}')
first match index, or -1
88
89. Methods
String instances:
'str'.replace(/old/, 'new');
WARNING: special magic supported in the replacement string:
$$
inserts a dollar sign "$"
$&
substring that matches the regexp
$`
substring before $&
$'
substring after $&
$1, $2, $3 etc.:
string that matches n-th capturing group
'str'.replace(/(r)(e)gexp/g,
function(matched, $1, $2, offset, sourceString) {
// what should replace the matched part on this iteration?
return 'replacement';
});
89
90. RegExp injection
// BAD CODE
var re = new RegExp('^' + userInput + '$');
// ...
var userInput = '[abc]'; // oops!
// GOOD, DO IT AT HOME
RegExp.escape = function(text) {
return text.replace(/[-[]{}()*+?.,^$|#s]/g, "$&");
};
var re = new RegExp('^' + RegExp.escape(userInput) + '$');
90