2. Outline
1 Introduction
What does grep offer?
When should I use grep?
2 Understanding Regular Expressions
Class Basics
Quantifiers & Grouping
Online Tools
Examples
3 Using Regular Expressions With grep
2 / 16
Colloquium - grep, v1.0
A. Magee
3. Outline
1 Introduction
What does grep offer?
When should I use grep?
2 Understanding Regular Expressions
Class Basics
Quantifiers & Grouping
Online Tools
Examples
3 Using Regular Expressions With grep
2 / 16
Colloquium - grep, v1.0
A. Magee
4. Outline
1 Introduction
What does grep offer?
When should I use grep?
2 Understanding Regular Expressions
Class Basics
Quantifiers & Grouping
Online Tools
Examples
3 Using Regular Expressions With grep
2 / 16
Colloquium - grep, v1.0
A. Magee
5. Introduction What?
What does grep offer?
grep matches regular expressions.
Your first question should be“What is a regular expression?”
A regular expression is a language pattern.
grep and REs allow us to find complex things in text.
Complex is relative and can vary from a single character to an IP
address.
Single character complex: [ajk+0-]
IP complex: (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
3 / 16
Colloquium - grep, v1.0
A. Magee
6. Introduction What?
What does grep offer?
grep matches regular expressions.
Your first question should be“What is a regular expression?”
A regular expression is a language pattern.
grep and REs allow us to find complex things in text.
Complex is relative and can vary from a single character to an IP
address.
Single character complex: [ajk+0-]
IP complex: (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
3 / 16
Colloquium - grep, v1.0
A. Magee
7. Introduction What?
What does grep offer?
grep matches regular expressions.
Your first question should be“What is a regular expression?”
A regular expression is a language pattern.
grep and REs allow us to find complex things in text.
Complex is relative and can vary from a single character to an IP
address.
Single character complex: [ajk+0-]
IP complex: (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?).){3}
(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)
3 / 16
Colloquium - grep, v1.0
A. Magee
8. Introduction When?
When should I use grep?
Always!
Unless you find some better tool.
P.S. - grep stands for g/re/p, an ed command that means global/reg
ex/print
4 / 16
Colloquium - grep, v1.0
A. Magee
9. Regular Expressions Class Basics
Class Basics
A character class is a symbol or collection of symbols that describes a
group of characters.
. (period): This matches any single character.
[...]: This matches any one character in the set.
[aeiou] matches one of the vowels.
[a-z] matches one of the lowercase alphabet.
[0-5] matches one numeral 0 through 5.
You will not remember all of these until you use them often, but
there are many special classes that can save you some typing.
5 / 16
Colloquium - grep, v1.0
A. Magee
10. Regular Expressions Class Basics
Common Classes
Special Class Meaning Simple RE
d Digit characters [0-9]
D Non-digit characters [ˆ0-9]
w Word characters [a-zA-Z 0-9]
W Non-word characters [ˆa-zA-Z 0-9]
s Whitespace characters characters [fnrt]
S Non-space characters [ˆfnrt]
b Word boundary
The word boundary class is very special as it is zero length and matches
transitions between s and w and vice versa.
6 / 16
Colloquium - grep, v1.0
A. Magee
11. Regular Expressions Class Basics
More Common Classes
Special Class Meaning Simple RE
[:alpha:] All alphabetic characters [a-zA-Z]
[:alnum:] All alphabetic and numeric [a-zA-Z0-9]
[:blank:] Tab and space
[:cntrl:] Control characters [x00-x1Fx7F]
[:digit:] A numeric digit [0-9]
[:graph:] Any visible character [x21-x7E]
[:lower:] Lowercase characters [a-z]
[:print:] Printables (i.e. no controls) [x20-x7E]
[:punct:] Punctuation & symbols [!”#$%&’()*+,-./:;<=>?
@[ ]ˆ ‘{|}∼]
[:space:] Space, tab, newline, etc [ trnvf]
[:upper:] Uppercase characters [A-Z]
[:word:] Word characters [a-zA-Z0-9 ]
[:xdigit:] Hex digits [A-Fa-f0-9]
7 / 16
Colloquium - grep, v1.0
A. Magee
12. Regular Expressions Quantifiers & Grouping
Quantifiers & Grouping
Quantifiers are how a RE counts things.
? Exactly zero or one occurrence
* Zero or more occurrences
+ One or more occurrences
*? Zero or more occurrences non-greedy
+? One or more occurrences non-greedy
{x} Exactly x occurrences
{x,} At least x occurrences
{x,y} At least x but no more than y occurrences
Grouping is used to collect patterns together and to create
back-references. A group is simply a set of parentheses ().
8 / 16
Colloquium - grep, v1.0
A. Magee
13. Regular Expressions Online Tools
Helpful Tools
The best way to understand the rest of this presentation is to see what is
being matched live. Here are some online tools that work for our needs.
RegExr - www.gskinner.com/RegExr
beware Flash, but it works well
regexpal - regexpal.com
very simple
reanimator - osteele.com/tools/reanimator
beware Flash, recommend CS 4/570 first
rubular - rubular.com
nice on-page reference
9 / 16
Colloquium - grep, v1.0
A. Magee
14. Regular Expressions Examples
Your First RE
Let’s skip trivial REs and get on to something useful. These may be more
complex than you’re used to but the quicker you are able to read long,
complex REs the better. This is a nice, but not perfect, email address
matcher.
[[:alnum:]][[:word:].%+-]*@(?:[[:alnum:]-]+.)+[[:alpha:]]{2,4}
[[:alnum:]][[:word:].%+-]*
Match a word that doesn’t start with [.%+-].
@(?:[[:alnum:]-]+.)+
Match the @ symbol and any number of subdomains followed by
periods.
[[:alpha:]]{2,4}
Match the top level domain of 2, 3 or 4 characters.
10 / 16
Colloquium - grep, v1.0
A. Magee
15. Regular Expressions Examples
Your First RE - Part 2
Let’s examine the first part.
[[:alnum:]][[:word:].%+-]*
[[:alnum:]] - Must start with an alphanumeric character.
NB: All [: ... :] classes must live in a set like [[: ... :]].
[[:word:].%+-] - Other characters maybe a ‘word’ character,
a literal space, percent symbol, plus symbol or a dash.
NB: The period must be escaped because it has special meaning.
* - repeat the previous set zero or more times.
11 / 16
Colloquium - grep, v1.0
A. Magee
16. Regular Expressions Examples
Your First RE - Part 2
Let’s examine the first part.
[[:alnum:]][[:word:].%+-]*
[[:alnum:]] - Must start with an alphanumeric character.
NB: All [: ... :] classes must live in a set like [[: ... :]].
[[:word:].%+-] - Other characters maybe a ‘word’ character,
a literal space, percent symbol, plus symbol or a dash.
NB: The period must be escaped because it has special meaning.
* - repeat the previous set zero or more times.
11 / 16
Colloquium - grep, v1.0
A. Magee
17. Regular Expressions Examples
Your First RE - Part 2
Let’s examine the first part.
[[:alnum:]][[:word:].%+-]*
[[:alnum:]] - Must start with an alphanumeric character.
NB: All [: ... :] classes must live in a set like [[: ... :]].
[[:word:].%+-] - Other characters maybe a ‘word’ character,
a literal space, percent symbol, plus symbol or a dash.
NB: The period must be escaped because it has special meaning.
* - repeat the previous set zero or more times.
11 / 16
Colloquium - grep, v1.0
A. Magee
18. Regular Expressions Examples
Your First RE - Part 3
Now the second part, the subdomains, sub-subdomains, etc.
@(?:[[:alnum:]-]+.)+
@ - Well that literally matches the ‘at’ character.
The parenthesis denote the beginning of a group.
The ?: is a confusing notation that suppresses the creation of a
back reference. It is here so you’ll know of it, but it is rarely needed.
Again we see a special class for alphanumerics, but we’ve also
included a dash. The plus symbol tells us to look for one or more of
these characters, followed by a period.
And lastly we close the group and the plus symbol now tells us to
look for one or more of these groups.
12 / 16
Colloquium - grep, v1.0
A. Magee
19. Regular Expressions Examples
Your First RE - Part 3
Now the second part, the subdomains, sub-subdomains, etc.
@(?:[[:alnum:]-]+.)+
@ - Well that literally matches the ‘at’ character.
The parenthesis denote the beginning of a group.
The ?: is a confusing notation that suppresses the creation of a
back reference. It is here so you’ll know of it, but it is rarely needed.
Again we see a special class for alphanumerics, but we’ve also
included a dash. The plus symbol tells us to look for one or more of
these characters, followed by a period.
And lastly we close the group and the plus symbol now tells us to
look for one or more of these groups.
12 / 16
Colloquium - grep, v1.0
A. Magee
20. Regular Expressions Examples
Your First RE - Part 3
Now the second part, the subdomains, sub-subdomains, etc.
@(?:[[:alnum:]-]+.)+
@ - Well that literally matches the ‘at’ character.
The parenthesis denote the beginning of a group.
The ?: is a confusing notation that suppresses the creation of a
back reference. It is here so you’ll know of it, but it is rarely needed.
Again we see a special class for alphanumerics, but we’ve also
included a dash. The plus symbol tells us to look for one or more of
these characters, followed by a period.
And lastly we close the group and the plus symbol now tells us to
look for one or more of these groups.
12 / 16
Colloquium - grep, v1.0
A. Magee
21. Regular Expressions Examples
Your First RE - Part 3
Now the second part, the subdomains, sub-subdomains, etc.
@(?:[[:alnum:]-]+.)+
@ - Well that literally matches the ‘at’ character.
The parenthesis denote the beginning of a group.
The ?: is a confusing notation that suppresses the creation of a
back reference. It is here so you’ll know of it, but it is rarely needed.
Again we see a special class for alphanumerics, but we’ve also
included a dash. The plus symbol tells us to look for one or more of
these characters, followed by a period.
And lastly we close the group and the plus symbol now tells us to
look for one or more of these groups.
12 / 16
Colloquium - grep, v1.0
A. Magee
22. Regular Expressions Examples
Your First RE - Part 4
Finally the third part, the domain.
[[:alpha:]]{2,4}
We’ll now this part is easy. Just match 2, 3 or 4 alphabetical
characters.
13 / 16
Colloquium - grep, v1.0
A. Magee
23. Regular Expressions Examples
Your Second RE
Now we’ll look at a RE that can help use build a header file for a c
program file, given that some neglectful programmer has failed to design
his/her c program properly. This will be a quicker example.
ˆ[ws]*([ws*&,]*)s*{
ˆ[ws]*(
At the beginning of a line match some keywords and types and
the function name and then literal parenthesis.
[ws*&,]*
Match some more words, keywords, variable modifiers and commas.
)s*{
Finally match the closing parenthesis, some whitespace and the
left curly brace, denoting the start of the function body.
14 / 16
Colloquium - grep, v1.0
A. Magee
24. Regular Expressions Examples
Your Second RE - Fine Details
ˆ[ws]*([ws*&,]*)s*{
In general, most RE parsers will not match across multiple lines, even
though the s class matches the newline character. This is very
bothersome but is easily overcome by using pcregrep. pcre is Perl
Compatible Regular Expression. This is all I will ever say about Perl.
Notice that the literal * must be escaped like so, *.
As must the parentheses due to their special RE meaning.
Escaping so many characters is very annoying, but unfortunately it is
necessary.
15 / 16
Colloquium - grep, v1.0
A. Magee