SlideShare uma empresa Scribd logo
1 de 35
Baixar para ler offline
Don’t Fear the Regex
Sandy Smith - CapitalCamp and GovDays 2014
Regex Basics
Demystifying Regular Expressions
So what are Regular
Expressions?
!
“...a means for matching strings of text, such as
particular characters, words, or patterns of
characters.”	

!
Source: http://en.wikipedia.org/wiki/
Regular_expression
3
Demystifying Regular Expressions
A Common Joke
Some people, when confronted with a problem, think: 	

'I know, I'll use regular expressions.' 	

Now they have two problems.	

But really, it’s not that bad, and Regular
Expressions (Regex) are a powerful tool.
4
Demystifying Regular Expressions
So what are they good at?
Regex is good at one thing, and that is to match
patterns in strings. You might use this to:	

• Scrape information off of a webpage	

• Pull data out of text files	

• Process/Validate data sent to you by a user	

- Such as phone or credit card numbers	

- Usernames or even addresses	

• Evaluate URLs to process what code to execute	

- Via a framework bootstrap router, or mod_rewrite
5
Demystifying Regular Expressions
So what do they not do?
You can only match/filter/replace patterns of
characters you expect to see, if you get
unexpected (or non-standard) input, you won’t be
able pull the patterns.
6
Demystifying Regular Expressions
Can you do this in PHP?
Yes! PHP contains a great regular expression library, the
Perl-Compatible Regular Expression (PCRE) engine.	

• Perl was (is?) the gold-standard language for doing
text manipulation	

• Lots of programmers knew its regex syntax.	

• The first PHP regex engine was slow and used a
slightly different syntax	

• PCRE was created to speed things up and let people
use their Perl skillz.	

• Regexes are also useful in text editors!
7
Pattern Matching
Demystifying Regular Expressions
Delimiters
All regex patterns are between delimiters.	

By custom, because Perl did it, this is /	

• e.g. '/regex-goes-here/'	

However, PCRE allows anything other than letters,
numbers, whitespace, or a backslash () to be delimiters.	

•'#regex-goes-here#'
Why have delimiters?	

• All will become clear later.	

9
Demystifying Regular Expressions
Straight Text Syntax
The most basic regexes match text like strpos() does:	

• Take the string "PHP is my language of choice"	

• /PHP/ will match, but /Ruby/ won't.	

They start getting powerful when you add the ability to
only match text at the beginning or end of a line:	

• Use ^ to match text at the beginning:	

- /^PHP/ will match, but /^my/ won't.	

• Use $ to match text at the end:	

- /choice$/ will match, but /PHP$/ won't.
10
Demystifying Regular Expressions
Basic Pattern Matching
Regular expressions are often referred to as "pattern
matching," because that's what makes them powerful.	

Use special characters to match patterns of text:	

. matches any single character:	

/P.P/ matches PHP or PIP	

+ matches one or more of the previous character:	

/PH+P/ matches PHP or PHHHP	

* matches zero or more of the previous characters:	

/PH*P/ matches PHP or PP or PHHHHP
11
Demystifying Regular Expressions
Basic Pattern Matching
? matches zero or one of the previous character:	

/PH?P/ matches PHP or PP but not PHHP	

{<min>,<max>} matches from min to max
occurrences of the previous character:	

/PH{1,2}P/ matches PHP or PHHP but not PP or PHHHP
12
Demystifying Regular Expressions
Powerful Basic Patterns
You can use combinations of these patterns to find
lots of things. Here are the most common:	

.? Find zero or one characters of any type:	

/P.?P/ gets you PP, PHP, PIP, but not PHHP.	

.+ Find one or more characters of any type:	

/P.+P/ gets you PHP, PIP, PHPPHP, PIIIP, but not PP.	

.* Find zero or more characters of any type:	

/P.*P/ gets PP, PHP, PHPHP, PIIIP, but not PHX.
13
Demystifying Regular Expressions
Beware of Greed
.* and .+ are "greedy" by default, meaning they match as
much as they can while still fulfilling the pattern.	

/P.+P/ will match not only "PHP" but "PHP PHP"	

Greedy pattern don't care.	

What if you want to only match "PHP", "PHHP", or "PIP",
but not "PHP PHP"?	

? kills greed.	

/P.*?P/ will match PHP, PP, or PIIIP but only the first
PHP in "PHP PHP"	

Great for matching tags in HTML, e.g. /<.+?>/
14
Demystifying Regular Expressions
Matching literal symbols
If you need to match a character used as a symbol,
such as $, +, ., ^, or *, escape it by preceding it with
a backslash ().	

/./ matches a literal period (.).	

/^$/ matches a dollar sign at the beginning of a
string.
15
Demystifying Regular Expressions
Calling this in PHP
To match regular expressions in PHP, use
preg_match().	

It returns 1 if a pattern is matched and 0 if not. It
returns false if you blew your regex syntax.	

Simplest Example:
16
$subject = "PHP regex gives PHP PEP!";
$found = preg_match("/P.P/", $subject);
echo $found; // returns 1
Character Classes and
Subpatterns
Demystifying Regular Expressions
Character Classes
Matching any character can be powerful, but lots of
times you'll want to only match specific characters.
Enter character classes.	

• Character classes are enclosed by [ and ] (square
brackets)	

• Character classes can be individual characters	

• They can also be ranges	

• They can be any combination of the above	

• No "glue" character: any character is a valid pattern
18
Demystifying Regular Expressions
Character Class Examples
Single character:	

[aqT,] matches a, q,T (note the case), or a comma (,)	

Range:	

[a-c] matches either a, b, or c (but not A or d or ...)	

[4-6] matches 4, 5, or 6	

Combination	

[a-c4z6-8] matches a, b, c, 4, z, 6, 7, or 8
19
Demystifying Regular Expressions
Negative classes
Even more powerful is the ability to match anything
except characters in a character class.	

• Negative classes are denoted by ^ at the beginning
of the class	

• [^a] matches any character except a	

• [^a-c] matches anything except a, b, or c	

• [^,0-9] matches anything except commas or digits
20
Demystifying Regular Expressions
Using Character Classes
Just using the elements you've learned so far, you can
write the majority of patterns commonly used in
regular expressions.	

/<[^>]+?/>/ matches all the text inside an HTML tag	

/^[0-9]+/ matches the same digits PHP will when
casting a string to an integer.	

/^[a-zA-Z0-9]+$/ matches a username that must be
only alphanumeric characters	

/^$[a-zA-Z_][a-zA-Z0-9_]*$/ matches a valid
variable name in PHP.
21
Demystifying Regular Expressions
Subpatterns
What if you want to look for a pattern within a pattern? Or
a specific sequence of characters? It's pattern inception with
subpatterns.	

• Subpatterns are enclosed by ( and ) (parentheses)	

• They can contain a string of characters to match as a
group, such as (cat)	

• Combined with other symbols, this means you can look
for catcatcat with (cat)+	

• You can look for alternate strings, such as (cat|dog)
matching cat or dog	

• They can also contain character classes and expressions
22
Demystifying Regular Expressions
Revisiting preg_match()
What if you want to extract the text that's been matched? 	

• preg_match() has an optional third argument for an array
that it will fill with the matched results.	

• Why an array? Because it assumes you'll be using
subpatterns.	

• The first element of the array is the text matched by your
entire pattern.	

• The second element is the text matched by the first
subpattern (from left to right), the second with the
second, and so on.	

• The array is passed by reference for extra confusion.
23
Demystifying Regular Expressions
Matching with Subpatterns
<?php
$variable = '$variable';
$pattern = ‘/^$([a-zA-Z_][a-zA-Z_0-9]*)$/';
$matches = array();
$result = preg_match($pattern, $variable, $matches);
var_dump($matches); // passed by reference
/*
array(2) {
[0]=>
string(9) "$variable"
[1]=>
string(8) "variable"
}
*/
24
Demystifying Regular Expressions
Alternatives
Subpatterns can do more than simply group patterns
for back references.They can also let you identify
strings of alternatives that you can match, using the
pipe character (|) to separate them.	

For example, /(cat|dog)/ will match cat or dog.
When combined with other patterns, it becomes
powerful: 	

/^((http|https|ftp|gopher|file)://)?([^.]+?)/
would let you match the first domain or subdomain
of a URL.
25
Demystifying Regular Expressions
Escape Sequences
Now that we've made you write [0-9] a whole
bunch of times, let's show you a shortcut for that
plus a bunch of others. (Ain't we a pill?)	

• d gets you any digit. D gets you anything that
isn't a digit.	

• s gets you any whitespace character. Careful,
this usually* includes newlines (n) and carriage
returns (r). S gets you anything not
whitespace.
26
Demystifying Regular Expressions
Escape Sequences (cont'd)
You've already seen how to escape special characters used
in regular expressions as well as replacements for character
classes.What about specific whitespace characters?	

• t is a tab character.	

• n is the Unix newline (line feed); also the default line
ending in PHP.	

• r is the carriage return. Formerly used on Macs. rn is
Windows's end of line statement; R gets you all three.	

• h gets you any non-line ending whitespace character
(horizontal whitespace).
27
Demystifying Regular Expressions
Special Escape Sequences
There are some oddities that are holdovers from the
way Perl thinks about regular expressions.	

• w gets you a "word" character, which means

[a-zA-Z0-9_] (just like a variable!), but is locale-aware
(captures accents in other languages). W is everything
else. I'm sure Larry Wall has a long-winded
explanation. Note it doesn't include a hyphen (-) or
apostrophe (').	

• b is even weirder. It's a "word boundary," so not a
character, per se, but marking the transition between
whitespace and "word" characters as defined by w.
28
Demystifying Regular Expressions
Back References
Rather than repeat complicated subpatterns, you can
use a back reference.	

Each back reference is denoted by a backslash and the
ordinal number of the subpattern. (e.g., 1, 2, 3, etc.)	

As in preg_match(), subpatterns count left
parentheses from left to right.	

• In /(outersub(innersub))12/, 1 matches
outersub(innersub), and 2 matches innersub.	

• Similarly, in /(sub1)(sub2)12/, 1 matches
sub1, and 2 matches sub2.
29
Demystifying Regular Expressions
Back Reference Example
The easiest real-world example of a back reference
is matching closing to opening quotes, whether they
are single or double.
30
$subject = 'Get me "stuff in quotes."';
$pattern = '/(['"])(.*?)1/';
$matches = array();
$result = preg_match($pattern, $subject, $matches);
var_dump($matches);
/*
array(3) {
[0]=>
string(18) ""stuff in quotes.""
[1]=>
string(1) """
[2]=>
string(16) "stuff in quotes."
}
*/
Demystifying Regular Expressions
Replacing
Matching is great, but what about manipulating
strings?	

Enter preg_replace().	

Instead of having a results array passed by
reference, preg_replace returns the altered
string.	

It can also work with arrays. If you supply an array,
it performs the replace on each element of the
array and returns an altered array.
31
Demystifying Regular Expressions
preg_replace()
<?php
$pattern = '/(['"]).*?1/';
$subject = 'Look at the "stuff in quotes!"';
$replacement = '$1quoted stuff!$1';
$result = preg_replace($pattern, $replacement, $subject);
echo $result; // Look at the "quoted stuff!"
!
$pattern = array('/quick/', '/brown/', '/fox/');
$subject = array('overweight programmer', 'quick brown fox', 'spry red fox');
$replacement = array('slow', 'black', 'bear');
$result = preg_replace($pattern, $replacement, $subject);
var_dump($result);
/* array(3) {
[0]=>
string(21) "overweight programmer"
[1]=>
string(15) "slow black bear"
[2]=>
string(13) "spry red bear"
} */
32
Demystifying Regular Expressions
Case-insensitive modifier
Remember when we said we’d explain why regular
expressions use delimiters? By now, some of you
may have asked about case-sensitivity, too, and we
said we’d get to it later. Now is the time for both.	

Regular expressions can have options that modify
the behavior of the whole expression.These are
placed after the expression, outside the delimiters.	

Simplest example: i means the expression is case-
insensitive. /asdf/i matches ASDF, aSDf, and asdf.
33
Demystifying Regular Expressions
When not to use Regex?
One more important topic. Regular expressions are
powerful, but when abused, they can lead to harder-to-
maintain code, security vulnerabilities, and other bad things.	

In particular, don’t reinvent the wheel. PHP already has
great, tested libraries for filtering and validating input input
(filter_var) and parsing URLs (parse_url). Use them.	

The rules for valid email addresses are surprisingly vague,
so best practice is to simply look for an @ or use
filter_var’s FILTER_VALIDATE_EMAIL and try to send
an email to the supplied address with a confirmation link.
34
Demystifying Regular Expressions
Thank you!
There’s much more to learn!	

phparch.com/training	

Follow us on Twitter:	

@phparch	

@SandyS1 - Me	

Feedback is always welcome:
training@phparch.com
35

Mais conteúdo relacionado

Mais procurados

Basta mastering regex power
Basta mastering regex powerBasta mastering regex power
Basta mastering regex powerMax Kleiner
 
Textpad and Regular Expressions
Textpad and Regular ExpressionsTextpad and Regular Expressions
Textpad and Regular ExpressionsOCSI
 
Regular Expressions in PHP
Regular Expressions in PHPRegular Expressions in PHP
Regular Expressions in PHPAndrew Kandels
 
Javascript正则表达式
Javascript正则表达式Javascript正则表达式
Javascript正则表达式ji guang
 
Regular Expression
Regular ExpressionRegular Expression
Regular ExpressionBharat17485
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsNicole Ryan
 
15 practical grep command examples in linux
15 practical grep command examples in linux15 practical grep command examples in linux
15 practical grep command examples in linuxTeja Bheemanapally
 
Maxbox starter20
Maxbox starter20Maxbox starter20
Maxbox starter20Max Kleiner
 
Looking for Patterns
Looking for PatternsLooking for Patterns
Looking for PatternsKeith Wright
 
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionProf. Wim Van Criekinge
 
Regex Presentation
Regex PresentationRegex Presentation
Regex Presentationarnolambert
 
Regular Expression Cheat Sheet
Regular Expression Cheat SheetRegular Expression Cheat Sheet
Regular Expression Cheat SheetSydneyJohnson57
 
Finaal application on regular expression
Finaal application on regular expressionFinaal application on regular expression
Finaal application on regular expressionGagan019
 

Mais procurados (20)

Regular Expressions in Stata
Regular Expressions in StataRegular Expressions in Stata
Regular Expressions in Stata
 
Regex lecture
Regex lectureRegex lecture
Regex lecture
 
2.regular expressions
2.regular expressions2.regular expressions
2.regular expressions
 
Basta mastering regex power
Basta mastering regex powerBasta mastering regex power
Basta mastering regex power
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Textpad and Regular Expressions
Textpad and Regular ExpressionsTextpad and Regular Expressions
Textpad and Regular Expressions
 
2013 - Andrei Zmievski: Clínica Regex
2013 - Andrei Zmievski: Clínica Regex2013 - Andrei Zmievski: Clínica Regex
2013 - Andrei Zmievski: Clínica Regex
 
Regex posix
Regex posixRegex posix
Regex posix
 
Regular Expressions in PHP
Regular Expressions in PHPRegular Expressions in PHP
Regular Expressions in PHP
 
PHP Regular Expressions
PHP Regular ExpressionsPHP Regular Expressions
PHP Regular Expressions
 
Javascript正则表达式
Javascript正则表达式Javascript正则表达式
Javascript正则表达式
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
15 practical grep command examples in linux
15 practical grep command examples in linux15 practical grep command examples in linux
15 practical grep command examples in linux
 
Maxbox starter20
Maxbox starter20Maxbox starter20
Maxbox starter20
 
Looking for Patterns
Looking for PatternsLooking for Patterns
Looking for Patterns
 
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introduction
 
Regex Presentation
Regex PresentationRegex Presentation
Regex Presentation
 
Regular Expression Cheat Sheet
Regular Expression Cheat SheetRegular Expression Cheat Sheet
Regular Expression Cheat Sheet
 
Finaal application on regular expression
Finaal application on regular expressionFinaal application on regular expression
Finaal application on regular expression
 

Destaque

Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"Daniel_Rhodes
 
Lessons from a Dying CMS
Lessons from a Dying CMSLessons from a Dying CMS
Lessons from a Dying CMSSandy Smith
 
TDA Center Depok update 2014 (Concept)
TDA Center Depok update 2014 (Concept)TDA Center Depok update 2014 (Concept)
TDA Center Depok update 2014 (Concept)Herri Setiawan
 
Architecting with Queues for Scale, Speed, and Separation (DCPHP 3/11/15)
Architecting with Queues for Scale, Speed, and Separation (DCPHP 3/11/15)Architecting with Queues for Scale, Speed, and Separation (DCPHP 3/11/15)
Architecting with Queues for Scale, Speed, and Separation (DCPHP 3/11/15)Sandy Smith
 
Multibyte string handling in PHP
Multibyte string handling in PHPMultibyte string handling in PHP
Multibyte string handling in PHPDaniel_Rhodes
 
Iterators, ArrayAccess & Countable (Oh My!) - Madison PHP 2014
Iterators, ArrayAccess & Countable (Oh My!) - Madison PHP 2014Iterators, ArrayAccess & Countable (Oh My!) - Madison PHP 2014
Iterators, ArrayAccess & Countable (Oh My!) - Madison PHP 2014Sandy Smith
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsdavidfstr
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsJames Gray
 
Unicode Regular Expressions
Unicode Regular ExpressionsUnicode Regular Expressions
Unicode Regular ExpressionsNova Patch
 
GAIQ - Regular expressions-google-analytics
GAIQ - Regular expressions-google-analyticsGAIQ - Regular expressions-google-analytics
GAIQ - Regular expressions-google-analyticsAnkita Kishore
 
Regular Expressions: Backtracking, and The Little Engine that Could(n't)?
Regular Expressions: Backtracking, and The Little Engine that Could(n't)?Regular Expressions: Backtracking, and The Little Engine that Could(n't)?
Regular Expressions: Backtracking, and The Little Engine that Could(n't)?daoswald
 
Architecting with Queues - Northeast PHP 2015
Architecting with Queues - Northeast PHP 2015Architecting with Queues - Northeast PHP 2015
Architecting with Queues - Northeast PHP 2015Sandy Smith
 
How to report a bug
How to report a bugHow to report a bug
How to report a bugSandy Smith
 
Working with Databases and MySQL
Working with Databases and MySQLWorking with Databases and MySQL
Working with Databases and MySQLNicole Ryan
 
Learning Regular Expressions for the Extraction of Product Attributes from E-...
Learning Regular Expressions for the Extraction of Product Attributes from E-...Learning Regular Expressions for the Extraction of Product Attributes from E-...
Learning Regular Expressions for the Extraction of Product Attributes from E-...Volha Bryl
 
EDUPUB 2013: Schema.org LRMI and A11Y for Discovery
EDUPUB 2013: Schema.org LRMI and A11Y for DiscoveryEDUPUB 2013: Schema.org LRMI and A11Y for Discovery
EDUPUB 2013: Schema.org LRMI and A11Y for DiscoveryGerardo Capiel
 
101 3.7 search text files using regular expressions
101 3.7 search text files using regular expressions101 3.7 search text files using regular expressions
101 3.7 search text files using regular expressionsAcácio Oliveira
 

Destaque (20)

Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"Hyperlocalisation or "localising everything"
Hyperlocalisation or "localising everything"
 
Lessons from a Dying CMS
Lessons from a Dying CMSLessons from a Dying CMS
Lessons from a Dying CMS
 
Intoduction to php strings
Intoduction to php  stringsIntoduction to php  strings
Intoduction to php strings
 
Dom
DomDom
Dom
 
TDA Center Depok update 2014 (Concept)
TDA Center Depok update 2014 (Concept)TDA Center Depok update 2014 (Concept)
TDA Center Depok update 2014 (Concept)
 
Architecting with Queues for Scale, Speed, and Separation (DCPHP 3/11/15)
Architecting with Queues for Scale, Speed, and Separation (DCPHP 3/11/15)Architecting with Queues for Scale, Speed, and Separation (DCPHP 3/11/15)
Architecting with Queues for Scale, Speed, and Separation (DCPHP 3/11/15)
 
Grokking regex
Grokking regexGrokking regex
Grokking regex
 
Multibyte string handling in PHP
Multibyte string handling in PHPMultibyte string handling in PHP
Multibyte string handling in PHP
 
Iterators, ArrayAccess & Countable (Oh My!) - Madison PHP 2014
Iterators, ArrayAccess & Countable (Oh My!) - Madison PHP 2014Iterators, ArrayAccess & Countable (Oh My!) - Madison PHP 2014
Iterators, ArrayAccess & Countable (Oh My!) - Madison PHP 2014
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Unicode Regular Expressions
Unicode Regular ExpressionsUnicode Regular Expressions
Unicode Regular Expressions
 
GAIQ - Regular expressions-google-analytics
GAIQ - Regular expressions-google-analyticsGAIQ - Regular expressions-google-analytics
GAIQ - Regular expressions-google-analytics
 
Regular Expressions: Backtracking, and The Little Engine that Could(n't)?
Regular Expressions: Backtracking, and The Little Engine that Could(n't)?Regular Expressions: Backtracking, and The Little Engine that Could(n't)?
Regular Expressions: Backtracking, and The Little Engine that Could(n't)?
 
Architecting with Queues - Northeast PHP 2015
Architecting with Queues - Northeast PHP 2015Architecting with Queues - Northeast PHP 2015
Architecting with Queues - Northeast PHP 2015
 
How to report a bug
How to report a bugHow to report a bug
How to report a bug
 
Working with Databases and MySQL
Working with Databases and MySQLWorking with Databases and MySQL
Working with Databases and MySQL
 
Learning Regular Expressions for the Extraction of Product Attributes from E-...
Learning Regular Expressions for the Extraction of Product Attributes from E-...Learning Regular Expressions for the Extraction of Product Attributes from E-...
Learning Regular Expressions for the Extraction of Product Attributes from E-...
 
EDUPUB 2013: Schema.org LRMI and A11Y for Discovery
EDUPUB 2013: Schema.org LRMI and A11Y for DiscoveryEDUPUB 2013: Schema.org LRMI and A11Y for Discovery
EDUPUB 2013: Schema.org LRMI and A11Y for Discovery
 
101 3.7 search text files using regular expressions
101 3.7 search text files using regular expressions101 3.7 search text files using regular expressions
101 3.7 search text files using regular expressions
 

Semelhante a Don't Fear the Regex - CapitalCamp/GovDays 2014

Python - Regular Expressions
Python - Regular ExpressionsPython - Regular Expressions
Python - Regular ExpressionsMukesh Tekwani
 
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/ibrettflorio
 
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfFUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfBryan Alejos
 
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekingeBioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekingeProf. Wim Van Criekinge
 
Chapter 3: Introduction to Regular Expression
Chapter 3: Introduction to Regular ExpressionChapter 3: Introduction to Regular Expression
Chapter 3: Introduction to Regular Expressionazzamhadeel89
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionskeeyre
 
Regular_Expressions.pptx
Regular_Expressions.pptxRegular_Expressions.pptx
Regular_Expressions.pptxDurgaNayak4
 
RegEx : Expressions and Parsing Examples
RegEx : Expressions and Parsing ExamplesRegEx : Expressions and Parsing Examples
RegEx : Expressions and Parsing Exampleszeteo12
 
Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions Ahmed El-Arabawy
 
Php Chapter 4 Training
Php Chapter 4 TrainingPhp Chapter 4 Training
Php Chapter 4 TrainingChris Chubb
 
Regular Expressions and You
Regular Expressions and YouRegular Expressions and You
Regular Expressions and YouJames Armes
 
Lecture 18 - Regular Expressions.pdf
Lecture 18 - Regular Expressions.pdfLecture 18 - Regular Expressions.pdf
Lecture 18 - Regular Expressions.pdfSaravana Kumar
 

Semelhante a Don't Fear the Regex - CapitalCamp/GovDays 2014 (20)

Bioinformatica p2-p3-introduction
Bioinformatica p2-p3-introductionBioinformatica p2-p3-introduction
Bioinformatica p2-p3-introduction
 
Python - Regular Expressions
Python - Regular ExpressionsPython - Regular Expressions
Python - Regular Expressions
 
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
/Regex makes me want to (weep|give up|(╯°□°)╯︵ ┻━┻)\.?/i
 
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfFUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
 
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekingeBioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
 
Chapter 3: Introduction to Regular Expression
Chapter 3: Introduction to Regular ExpressionChapter 3: Introduction to Regular Expression
Chapter 3: Introduction to Regular Expression
 
Adv. python regular expression by Rj
Adv. python regular expression by RjAdv. python regular expression by Rj
Adv. python regular expression by Rj
 
Bioinformatics p2-p3-perl-regexes v2014
Bioinformatics p2-p3-perl-regexes v2014Bioinformatics p2-p3-perl-regexes v2014
Bioinformatics p2-p3-perl-regexes v2014
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Regular_Expressions.pptx
Regular_Expressions.pptxRegular_Expressions.pptx
Regular_Expressions.pptx
 
RegEx : Expressions and Parsing Examples
RegEx : Expressions and Parsing ExamplesRegEx : Expressions and Parsing Examples
RegEx : Expressions and Parsing Examples
 
Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions
 
Python - Lecture 7
Python - Lecture 7Python - Lecture 7
Python - Lecture 7
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Andrei's Regex Clinic
Andrei's Regex ClinicAndrei's Regex Clinic
Andrei's Regex Clinic
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
Php Chapter 4 Training
Php Chapter 4 TrainingPhp Chapter 4 Training
Php Chapter 4 Training
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
Regular Expressions and You
Regular Expressions and YouRegular Expressions and You
Regular Expressions and You
 
Lecture 18 - Regular Expressions.pdf
Lecture 18 - Regular Expressions.pdfLecture 18 - Regular Expressions.pdf
Lecture 18 - Regular Expressions.pdf
 

Último

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odishasmiwainfosol
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptrcbcrtm
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...Technogeeks
 

Último (20)

Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company OdishaBalasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
Balasore Best It Company|| Top 10 IT Company || Balasore Software company Odisha
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
Odoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting ServiceOdoo Development Company in India | Devintelle Consulting Service
Odoo Development Company in India | Devintelle Consulting Service
 
cpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.pptcpct NetworkING BASICS AND NETWORK TOOL.ppt
cpct NetworkING BASICS AND NETWORK TOOL.ppt
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...What is Advanced Excel and what are some best practices for designing and cre...
What is Advanced Excel and what are some best practices for designing and cre...
 

Don't Fear the Regex - CapitalCamp/GovDays 2014

  • 1. Don’t Fear the Regex Sandy Smith - CapitalCamp and GovDays 2014
  • 3. Demystifying Regular Expressions So what are Regular Expressions? ! “...a means for matching strings of text, such as particular characters, words, or patterns of characters.” ! Source: http://en.wikipedia.org/wiki/ Regular_expression 3
  • 4. Demystifying Regular Expressions A Common Joke Some people, when confronted with a problem, think: 'I know, I'll use regular expressions.' Now they have two problems. But really, it’s not that bad, and Regular Expressions (Regex) are a powerful tool. 4
  • 5. Demystifying Regular Expressions So what are they good at? Regex is good at one thing, and that is to match patterns in strings. You might use this to: • Scrape information off of a webpage • Pull data out of text files • Process/Validate data sent to you by a user - Such as phone or credit card numbers - Usernames or even addresses • Evaluate URLs to process what code to execute - Via a framework bootstrap router, or mod_rewrite 5
  • 6. Demystifying Regular Expressions So what do they not do? You can only match/filter/replace patterns of characters you expect to see, if you get unexpected (or non-standard) input, you won’t be able pull the patterns. 6
  • 7. Demystifying Regular Expressions Can you do this in PHP? Yes! PHP contains a great regular expression library, the Perl-Compatible Regular Expression (PCRE) engine. • Perl was (is?) the gold-standard language for doing text manipulation • Lots of programmers knew its regex syntax. • The first PHP regex engine was slow and used a slightly different syntax • PCRE was created to speed things up and let people use their Perl skillz. • Regexes are also useful in text editors! 7
  • 9. Demystifying Regular Expressions Delimiters All regex patterns are between delimiters. By custom, because Perl did it, this is / • e.g. '/regex-goes-here/' However, PCRE allows anything other than letters, numbers, whitespace, or a backslash () to be delimiters. •'#regex-goes-here#' Why have delimiters? • All will become clear later. 9
  • 10. Demystifying Regular Expressions Straight Text Syntax The most basic regexes match text like strpos() does: • Take the string "PHP is my language of choice" • /PHP/ will match, but /Ruby/ won't. They start getting powerful when you add the ability to only match text at the beginning or end of a line: • Use ^ to match text at the beginning: - /^PHP/ will match, but /^my/ won't. • Use $ to match text at the end: - /choice$/ will match, but /PHP$/ won't. 10
  • 11. Demystifying Regular Expressions Basic Pattern Matching Regular expressions are often referred to as "pattern matching," because that's what makes them powerful. Use special characters to match patterns of text: . matches any single character: /P.P/ matches PHP or PIP + matches one or more of the previous character: /PH+P/ matches PHP or PHHHP * matches zero or more of the previous characters: /PH*P/ matches PHP or PP or PHHHHP 11
  • 12. Demystifying Regular Expressions Basic Pattern Matching ? matches zero or one of the previous character: /PH?P/ matches PHP or PP but not PHHP {<min>,<max>} matches from min to max occurrences of the previous character: /PH{1,2}P/ matches PHP or PHHP but not PP or PHHHP 12
  • 13. Demystifying Regular Expressions Powerful Basic Patterns You can use combinations of these patterns to find lots of things. Here are the most common: .? Find zero or one characters of any type: /P.?P/ gets you PP, PHP, PIP, but not PHHP. .+ Find one or more characters of any type: /P.+P/ gets you PHP, PIP, PHPPHP, PIIIP, but not PP. .* Find zero or more characters of any type: /P.*P/ gets PP, PHP, PHPHP, PIIIP, but not PHX. 13
  • 14. Demystifying Regular Expressions Beware of Greed .* and .+ are "greedy" by default, meaning they match as much as they can while still fulfilling the pattern. /P.+P/ will match not only "PHP" but "PHP PHP" Greedy pattern don't care. What if you want to only match "PHP", "PHHP", or "PIP", but not "PHP PHP"? ? kills greed. /P.*?P/ will match PHP, PP, or PIIIP but only the first PHP in "PHP PHP" Great for matching tags in HTML, e.g. /<.+?>/ 14
  • 15. Demystifying Regular Expressions Matching literal symbols If you need to match a character used as a symbol, such as $, +, ., ^, or *, escape it by preceding it with a backslash (). /./ matches a literal period (.). /^$/ matches a dollar sign at the beginning of a string. 15
  • 16. Demystifying Regular Expressions Calling this in PHP To match regular expressions in PHP, use preg_match(). It returns 1 if a pattern is matched and 0 if not. It returns false if you blew your regex syntax. Simplest Example: 16 $subject = "PHP regex gives PHP PEP!"; $found = preg_match("/P.P/", $subject); echo $found; // returns 1
  • 18. Demystifying Regular Expressions Character Classes Matching any character can be powerful, but lots of times you'll want to only match specific characters. Enter character classes. • Character classes are enclosed by [ and ] (square brackets) • Character classes can be individual characters • They can also be ranges • They can be any combination of the above • No "glue" character: any character is a valid pattern 18
  • 19. Demystifying Regular Expressions Character Class Examples Single character: [aqT,] matches a, q,T (note the case), or a comma (,) Range: [a-c] matches either a, b, or c (but not A or d or ...) [4-6] matches 4, 5, or 6 Combination [a-c4z6-8] matches a, b, c, 4, z, 6, 7, or 8 19
  • 20. Demystifying Regular Expressions Negative classes Even more powerful is the ability to match anything except characters in a character class. • Negative classes are denoted by ^ at the beginning of the class • [^a] matches any character except a • [^a-c] matches anything except a, b, or c • [^,0-9] matches anything except commas or digits 20
  • 21. Demystifying Regular Expressions Using Character Classes Just using the elements you've learned so far, you can write the majority of patterns commonly used in regular expressions. /<[^>]+?/>/ matches all the text inside an HTML tag /^[0-9]+/ matches the same digits PHP will when casting a string to an integer. /^[a-zA-Z0-9]+$/ matches a username that must be only alphanumeric characters /^$[a-zA-Z_][a-zA-Z0-9_]*$/ matches a valid variable name in PHP. 21
  • 22. Demystifying Regular Expressions Subpatterns What if you want to look for a pattern within a pattern? Or a specific sequence of characters? It's pattern inception with subpatterns. • Subpatterns are enclosed by ( and ) (parentheses) • They can contain a string of characters to match as a group, such as (cat) • Combined with other symbols, this means you can look for catcatcat with (cat)+ • You can look for alternate strings, such as (cat|dog) matching cat or dog • They can also contain character classes and expressions 22
  • 23. Demystifying Regular Expressions Revisiting preg_match() What if you want to extract the text that's been matched? • preg_match() has an optional third argument for an array that it will fill with the matched results. • Why an array? Because it assumes you'll be using subpatterns. • The first element of the array is the text matched by your entire pattern. • The second element is the text matched by the first subpattern (from left to right), the second with the second, and so on. • The array is passed by reference for extra confusion. 23
  • 24. Demystifying Regular Expressions Matching with Subpatterns <?php $variable = '$variable'; $pattern = ‘/^$([a-zA-Z_][a-zA-Z_0-9]*)$/'; $matches = array(); $result = preg_match($pattern, $variable, $matches); var_dump($matches); // passed by reference /* array(2) { [0]=> string(9) "$variable" [1]=> string(8) "variable" } */ 24
  • 25. Demystifying Regular Expressions Alternatives Subpatterns can do more than simply group patterns for back references.They can also let you identify strings of alternatives that you can match, using the pipe character (|) to separate them. For example, /(cat|dog)/ will match cat or dog. When combined with other patterns, it becomes powerful: /^((http|https|ftp|gopher|file)://)?([^.]+?)/ would let you match the first domain or subdomain of a URL. 25
  • 26. Demystifying Regular Expressions Escape Sequences Now that we've made you write [0-9] a whole bunch of times, let's show you a shortcut for that plus a bunch of others. (Ain't we a pill?) • d gets you any digit. D gets you anything that isn't a digit. • s gets you any whitespace character. Careful, this usually* includes newlines (n) and carriage returns (r). S gets you anything not whitespace. 26
  • 27. Demystifying Regular Expressions Escape Sequences (cont'd) You've already seen how to escape special characters used in regular expressions as well as replacements for character classes.What about specific whitespace characters? • t is a tab character. • n is the Unix newline (line feed); also the default line ending in PHP. • r is the carriage return. Formerly used on Macs. rn is Windows's end of line statement; R gets you all three. • h gets you any non-line ending whitespace character (horizontal whitespace). 27
  • 28. Demystifying Regular Expressions Special Escape Sequences There are some oddities that are holdovers from the way Perl thinks about regular expressions. • w gets you a "word" character, which means
 [a-zA-Z0-9_] (just like a variable!), but is locale-aware (captures accents in other languages). W is everything else. I'm sure Larry Wall has a long-winded explanation. Note it doesn't include a hyphen (-) or apostrophe ('). • b is even weirder. It's a "word boundary," so not a character, per se, but marking the transition between whitespace and "word" characters as defined by w. 28
  • 29. Demystifying Regular Expressions Back References Rather than repeat complicated subpatterns, you can use a back reference. Each back reference is denoted by a backslash and the ordinal number of the subpattern. (e.g., 1, 2, 3, etc.) As in preg_match(), subpatterns count left parentheses from left to right. • In /(outersub(innersub))12/, 1 matches outersub(innersub), and 2 matches innersub. • Similarly, in /(sub1)(sub2)12/, 1 matches sub1, and 2 matches sub2. 29
  • 30. Demystifying Regular Expressions Back Reference Example The easiest real-world example of a back reference is matching closing to opening quotes, whether they are single or double. 30 $subject = 'Get me "stuff in quotes."'; $pattern = '/(['"])(.*?)1/'; $matches = array(); $result = preg_match($pattern, $subject, $matches); var_dump($matches); /* array(3) { [0]=> string(18) ""stuff in quotes."" [1]=> string(1) """ [2]=> string(16) "stuff in quotes." } */
  • 31. Demystifying Regular Expressions Replacing Matching is great, but what about manipulating strings? Enter preg_replace(). Instead of having a results array passed by reference, preg_replace returns the altered string. It can also work with arrays. If you supply an array, it performs the replace on each element of the array and returns an altered array. 31
  • 32. Demystifying Regular Expressions preg_replace() <?php $pattern = '/(['"]).*?1/'; $subject = 'Look at the "stuff in quotes!"'; $replacement = '$1quoted stuff!$1'; $result = preg_replace($pattern, $replacement, $subject); echo $result; // Look at the "quoted stuff!" ! $pattern = array('/quick/', '/brown/', '/fox/'); $subject = array('overweight programmer', 'quick brown fox', 'spry red fox'); $replacement = array('slow', 'black', 'bear'); $result = preg_replace($pattern, $replacement, $subject); var_dump($result); /* array(3) { [0]=> string(21) "overweight programmer" [1]=> string(15) "slow black bear" [2]=> string(13) "spry red bear" } */ 32
  • 33. Demystifying Regular Expressions Case-insensitive modifier Remember when we said we’d explain why regular expressions use delimiters? By now, some of you may have asked about case-sensitivity, too, and we said we’d get to it later. Now is the time for both. Regular expressions can have options that modify the behavior of the whole expression.These are placed after the expression, outside the delimiters. Simplest example: i means the expression is case- insensitive. /asdf/i matches ASDF, aSDf, and asdf. 33
  • 34. Demystifying Regular Expressions When not to use Regex? One more important topic. Regular expressions are powerful, but when abused, they can lead to harder-to- maintain code, security vulnerabilities, and other bad things. In particular, don’t reinvent the wheel. PHP already has great, tested libraries for filtering and validating input input (filter_var) and parsing URLs (parse_url). Use them. The rules for valid email addresses are surprisingly vague, so best practice is to simply look for an @ or use filter_var’s FILTER_VALIDATE_EMAIL and try to send an email to the supplied address with a confirmation link. 34
  • 35. Demystifying Regular Expressions Thank you! There’s much more to learn! phparch.com/training Follow us on Twitter: @phparch @SandyS1 - Me Feedback is always welcome: training@phparch.com 35