SlideShare a Scribd company logo
1 of 52
Introduction toRegular Expressions Matt Casto http://google.com/profiles/mattcasto
Introduction toRegular Expressions Matt Casto Quick Solutions http://google.com/profiles/mattcasto
“Some people, when confronted with a problem, think “I know, I'll use regular expressions.      Now they have two problems.” - Jamie Zawinski, August 12, 1997
What are Regular Expressions? ^+@[a-zA-Z_]+?[a-zA-Z]{2,3}$ [-]+@([-]+)+[-]+ ^.+@[^].*[a-z]{2,}$ ^([a-zA-Z0-9_]+)@(([0-9]{1,3}[0-9]{1,3}[0-9]{1,3})|(([a-zA-Z0-9]+)+))([a-zA-Z]{2,4}|[0-9]{1,3})(?)$
History Stephen Cole Kleene American mathematician credited for inventing Regular Expressions in the 1950’s using a mathematic notation called regular sets.
History Ken Thompson American pioneer of computer science who, among many other things, used Kleene’s regular sets for searching in his QED and ed text editors.
History grep Global Regular Expression Print
History Henry Spencer Wrote the regex library which is what Perl and Tcl languages used for regular expressions.
Why Should You Care? Example:  finding duplicate words in a file. Requirements: ,[object Object]
 Find doubled words that expand lines
 Ignore capitalization differences
 Ignore HTML tags,[object Object]
Why Should You Care? Example:  finding duplicate words in a file. Solution: $/ = “.”; while (<>) {   next if !s/([a-z]+)((?:<[^>]+>)+)()/[7m$1[m$2[7m$3[m/ig;   s/^(?:[^]*)+//mg;   s/^/$ARGV: /mg;   print; }
Literal Characters Any character except a small list of reserved characters. regex is Jack is a boy match in target string
Literal Characters Literals will match characters in the middle of words. regex a Jack is a boy matches in target string
Literal Characters Literals are case sensitive – capitalization matters! regex j Jack is a boy NOT a match
Special Characters [ ^ $ . | ? * + ( )
Special Characters You can match special characters by escaping them with a backslash. 11=2 I wrote 1+1=2 on the chalkboard.
Special Characters Some characters, such as { and } are only reserved depending on context. if (true)  else if (true) { beep; }
Non-Printable Characters Some literal characters can be escaped to represent non-printable characters.  – tab  – carriage return  – line feed  – bell  – escape  – form feed  – vertical tab
Period The period character matches any single character. a.boy Jack is a boy
Character Classes Used to match only one of the characters inside square braces. [Gg]r[ae]y Grayson drives a grey sedan.
Character Classes Hyphen is a reserved character inside a character class and indicates a range. [0-9a-fA-F] The HTML codefor White is #FFFFFF
Character Classes Caret inside a character class negates the match. q[^u] Qatar is home to quite a lot of Iraqi citizens, but is not a city in Iraq
Character Classes Normal special characters are valid inside of character classes. Only ] ^ and – are reserved. [+*] 6 * 7 and 18 + 24 both equal 42
Shorthand Character Classes  – digit or [0-9]  – word or [A-Za-z0-9_]  – whitespace or [ ] (space, tab, CR, LF) [] 1 + 2 = 3
Shorthand Character Classes  – non-digit or [^]  – non-word or [^]  – non-whitespace or [^] [] 1 + 2 = 3
Repetition The asterisk repeats the preceding character class 0 or more times. <[A-Za-z][A-Za-z0-9]*> <HTML>Regex is <b>Awesome</b></HTML>
Repetition The plus repeats the preceding character class 1 or more times. <[A-Za-z0-9]+> Watch out for invalid <HTML> tags like <1> and <>!
Repetition The question mark repeats the preceding character class 0 or 1 times, in effect making it optional. </?[A-Za-z][A-Za-z0-9]*> <HTML>Regex is <b>Awesome</b></HTML>
Anchors The caret anchor matches the position before the first character in a string. ^vac vacation evacuation
Anchors The dollar sign anchor matches the position after the last character in a string. tion$ vacation evacuation
Anchors The caret and dollar sign anchors match the start and end of the line if the engine has multi-line turned on. tion$ vacation evacuation has ruined my evaluation
Anchors The  and  shorthand character classes are like ^ and $ but only match the start and end of the string. tion vacation evacuation has ruined my evaluation
Word Boundaries The  shorthand character class matches… ,[object Object]
 position after the last character in a string (like $)
 between two characters where one is a word character and the other is not4 We’ve got 4 orders for 44 lbs of C4
Word Boundaries The  shorthand character class is the negated word boundary – any position between to word characters or two non-word characters. at vacation evacuation at that time ate my evaluation
Alternation The pipe symbol delimits two or more character classes that can both match. cat|dog A cat and dog are expected to follow the dogma that their presence with one another leads to catastrophe.
Alternation Alternations include any character classes. cat|dog A cat and dog are expected to follow the dogma that their presence with one another leads to catastrophe.
Alternation Use parenthesis to group alternating matches when you want to limit the reach of alternation. (cat|dog) A cat and dog are expected to follow the dogma that their presence with one another leads to catastrophe.
Eagerness Eagerness causes the order of alternations to matter. and|android A robot and an android fight. The ninja wins.
Greediness Greediness means that the engine will always try to match as much as possible. an+ A robot and an android fight. The ninja wins.
Laziness Laziness, or reluctant, modifies a repetition operator to only match as much as it needs to. an+? A robot and an android fight. The ninja wins.
Limiting Repetition You can limit repetition with curly braces. {2,4} 1 111111111 11111
Limiting Repetition The second number can be omitted to mean infinite. Essentially {0,} is the same as * and {1,} same as +. {2,} 1 11111111111111
Limiting Repetition The a single number can be used to match an exact number of times. {4} 1 11 111 1111 11111
Back References Parenthesis around a character set groups those characters and creates a back reference. ([ai]).. The magician said abracadabra!
Named Groups Named groups let you reference matched groups by their name rather than just index. (?<vowel>[ai]).<vowel>. The magician said abracadabra!

More Related Content

What's hot

Regex Presentation
Regex PresentationRegex Presentation
Regex Presentationarnolambert
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressionsBen Brumfield
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsEran Zimbler
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expressionvaluebound
 
Regular Expressions Cheat Sheet
Regular Expressions Cheat SheetRegular Expressions Cheat Sheet
Regular Expressions Cheat SheetAkash Bisariya
 
REGULAR EXPRESSION TO N.F.A
REGULAR EXPRESSION TO N.F.AREGULAR EXPRESSION TO N.F.A
REGULAR EXPRESSION TO N.F.ADev Ashish
 
Regular expression
Regular expressionRegular expression
Regular expressionLarry Nung
 
Regular expressions tutorial for SEO & Website Analysis
Regular expressions tutorial for SEO & Website AnalysisRegular expressions tutorial for SEO & Website Analysis
Regular expressions tutorial for SEO & Website AnalysisGlobal Media Insight
 
Quill vs Slick Smackdown
Quill vs Slick SmackdownQuill vs Slick Smackdown
Quill vs Slick SmackdownAlexander Ioffe
 
3.2 javascript regex
3.2 javascript regex3.2 javascript regex
3.2 javascript regexJalpesh Vasa
 
Advanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter pptAdvanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter pptMuhammad Sikandar Mustafa
 
Pushdown automata
Pushdown automataPushdown automata
Pushdown automataeugenesri
 
MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329Douglas Duncan
 
Regular language and Regular expression
Regular language and Regular expressionRegular language and Regular expression
Regular language and Regular expressionAnimesh Chaturvedi
 
Data Structures In Scala
Data Structures In ScalaData Structures In Scala
Data Structures In ScalaKnoldus Inc.
 

What's hot (20)

Regex Presentation
Regex PresentationRegex Presentation
Regex Presentation
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressions
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Regular Expressions Cheat Sheet
Regular Expressions Cheat SheetRegular Expressions Cheat Sheet
Regular Expressions Cheat Sheet
 
REGULAR EXPRESSION TO N.F.A
REGULAR EXPRESSION TO N.F.AREGULAR EXPRESSION TO N.F.A
REGULAR EXPRESSION TO N.F.A
 
Regular expression
Regular expressionRegular expression
Regular expression
 
Regular expressions tutorial for SEO & Website Analysis
Regular expressions tutorial for SEO & Website AnalysisRegular expressions tutorial for SEO & Website Analysis
Regular expressions tutorial for SEO & Website Analysis
 
Regex Basics
Regex BasicsRegex Basics
Regex Basics
 
Quill vs Slick Smackdown
Quill vs Slick SmackdownQuill vs Slick Smackdown
Quill vs Slick Smackdown
 
3.2 javascript regex
3.2 javascript regex3.2 javascript regex
3.2 javascript regex
 
Advanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter pptAdvanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter ppt
 
Pushdown automata
Pushdown automataPushdown automata
Pushdown automata
 
vim - Tips and_tricks
vim - Tips and_tricksvim - Tips and_tricks
vim - Tips and_tricks
 
Backtracking
BacktrackingBacktracking
Backtracking
 
MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329MongoDB and Indexes - MUG Denver - 20160329
MongoDB and Indexes - MUG Denver - 20160329
 
Python : Regular expressions
Python : Regular expressionsPython : Regular expressions
Python : Regular expressions
 
Data Structures - Lecture 3 [Arrays]
Data Structures - Lecture 3 [Arrays]Data Structures - Lecture 3 [Arrays]
Data Structures - Lecture 3 [Arrays]
 
Regular language and Regular expression
Regular language and Regular expressionRegular language and Regular expression
Regular language and Regular expression
 
Data Structures In Scala
Data Structures In ScalaData Structures In Scala
Data Structures In Scala
 

Similar to Introduction to Regular Expressions

An Introduction to Regular expressions
An Introduction to Regular expressionsAn Introduction to Regular expressions
An Introduction to Regular expressionsYamagata Europe
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsJames Gray
 
Eloquent Ruby chapter 4 - Find The Right String with Regular Expression
Eloquent Ruby chapter 4 - Find The Right String with Regular ExpressionEloquent Ruby chapter 4 - Find The Right String with Regular Expression
Eloquent Ruby chapter 4 - Find The Right String with Regular ExpressionKuyseng Chhoeun
 
Perl Intro 5 Regex Matches And Substitutions
Perl Intro 5 Regex Matches And SubstitutionsPerl Intro 5 Regex Matches And Substitutions
Perl Intro 5 Regex Matches And SubstitutionsShaun Griffith
 
Regexp secrets
Regexp secretsRegexp secrets
Regexp secretsHiro Asari
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Ben Brumfield
 
Advanced Regular Expressions Redux
Advanced Regular Expressions ReduxAdvanced Regular Expressions Redux
Advanced Regular Expressions ReduxJakub Nesetril
 
Regular expressions-ada-2018
Regular expressions-ada-2018Regular expressions-ada-2018
Regular expressions-ada-2018Emma Burrows
 
And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...Codemotion
 
Regular Expression in Action
Regular Expression in ActionRegular Expression in Action
Regular Expression in ActionFolio3 Software
 
PERL Regular Expression
PERL Regular ExpressionPERL Regular Expression
PERL Regular ExpressionBinsent Ribera
 
Php Chapter 4 Training
Php Chapter 4 TrainingPhp Chapter 4 Training
Php Chapter 4 TrainingChris Chubb
 
Regular Expressions and You
Regular Expressions and YouRegular Expressions and You
Regular Expressions and YouJames Armes
 
Regular expressions quick reference
Regular expressions quick referenceRegular expressions quick reference
Regular expressions quick referencejvinhit
 
Regular Expressions Boot Camp
Regular Expressions Boot CampRegular Expressions Boot Camp
Regular Expressions Boot CampChris Schiffhauer
 

Similar to Introduction to Regular Expressions (20)

An Introduction to Regular expressions
An Introduction to Regular expressionsAn Introduction to Regular expressions
An Introduction to Regular expressions
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Eloquent Ruby chapter 4 - Find The Right String with Regular Expression
Eloquent Ruby chapter 4 - Find The Right String with Regular ExpressionEloquent Ruby chapter 4 - Find The Right String with Regular Expression
Eloquent Ruby chapter 4 - Find The Right String with Regular Expression
 
Perl Intro 5 Regex Matches And Substitutions
Perl Intro 5 Regex Matches And SubstitutionsPerl Intro 5 Regex Matches And Substitutions
Perl Intro 5 Regex Matches And Substitutions
 
Regexp secrets
Regexp secretsRegexp secrets
Regexp secrets
 
Andrei's Regex Clinic
Andrei's Regex ClinicAndrei's Regex Clinic
Andrei's Regex Clinic
 
Ruby RegEx
Ruby RegExRuby RegEx
Ruby RegEx
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013
 
Advanced Regular Expressions Redux
Advanced Regular Expressions ReduxAdvanced Regular Expressions Redux
Advanced Regular Expressions Redux
 
Regular expressions-ada-2018
Regular expressions-ada-2018Regular expressions-ada-2018
Regular expressions-ada-2018
 
And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...
 
Regular Expression in Action
Regular Expression in ActionRegular Expression in Action
Regular Expression in Action
 
Expresiones Regulares
Expresiones RegularesExpresiones Regulares
Expresiones Regulares
 
PERL Regular Expression
PERL Regular ExpressionPERL Regular Expression
PERL Regular Expression
 
Php Chapter 4 Training
Php Chapter 4 TrainingPhp Chapter 4 Training
Php Chapter 4 Training
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Regular Expressions and You
Regular Expressions and YouRegular Expressions and You
Regular Expressions and You
 
Working with text, Regular expressions
Working with text, Regular expressionsWorking with text, Regular expressions
Working with text, Regular expressions
 
Regular expressions quick reference
Regular expressions quick referenceRegular expressions quick reference
Regular expressions quick reference
 
Regular Expressions Boot Camp
Regular Expressions Boot CampRegular Expressions Boot Camp
Regular Expressions Boot Camp
 

Recently uploaded

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 

Recently uploaded (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 

Introduction to Regular Expressions

  • 1. Introduction toRegular Expressions Matt Casto http://google.com/profiles/mattcasto
  • 2. Introduction toRegular Expressions Matt Casto Quick Solutions http://google.com/profiles/mattcasto
  • 3. “Some people, when confronted with a problem, think “I know, I'll use regular expressions. Now they have two problems.” - Jamie Zawinski, August 12, 1997
  • 4.
  • 5. What are Regular Expressions? ^+@[a-zA-Z_]+?[a-zA-Z]{2,3}$ [-]+@([-]+)+[-]+ ^.+@[^].*[a-z]{2,}$ ^([a-zA-Z0-9_]+)@(([0-9]{1,3}[0-9]{1,3}[0-9]{1,3})|(([a-zA-Z0-9]+)+))([a-zA-Z]{2,4}|[0-9]{1,3})(?)$
  • 6.
  • 7. History Stephen Cole Kleene American mathematician credited for inventing Regular Expressions in the 1950’s using a mathematic notation called regular sets.
  • 8. History Ken Thompson American pioneer of computer science who, among many other things, used Kleene’s regular sets for searching in his QED and ed text editors.
  • 9. History grep Global Regular Expression Print
  • 10. History Henry Spencer Wrote the regex library which is what Perl and Tcl languages used for regular expressions.
  • 11.
  • 12. Find doubled words that expand lines
  • 14.
  • 15. Why Should You Care? Example: finding duplicate words in a file. Solution: $/ = “.”; while (<>) { next if !s/([a-z]+)((?:<[^>]+>)+)()/[7m$1[m$2[7m$3[m/ig; s/^(?:[^]*)+//mg; s/^/$ARGV: /mg; print; }
  • 16.
  • 17. Literal Characters Any character except a small list of reserved characters. regex is Jack is a boy match in target string
  • 18. Literal Characters Literals will match characters in the middle of words. regex a Jack is a boy matches in target string
  • 19. Literal Characters Literals are case sensitive – capitalization matters! regex j Jack is a boy NOT a match
  • 20. Special Characters [ ^ $ . | ? * + ( )
  • 21. Special Characters You can match special characters by escaping them with a backslash. 11=2 I wrote 1+1=2 on the chalkboard.
  • 22. Special Characters Some characters, such as { and } are only reserved depending on context. if (true) else if (true) { beep; }
  • 23. Non-Printable Characters Some literal characters can be escaped to represent non-printable characters. – tab – carriage return – line feed – bell – escape – form feed – vertical tab
  • 24. Period The period character matches any single character. a.boy Jack is a boy
  • 25. Character Classes Used to match only one of the characters inside square braces. [Gg]r[ae]y Grayson drives a grey sedan.
  • 26. Character Classes Hyphen is a reserved character inside a character class and indicates a range. [0-9a-fA-F] The HTML codefor White is #FFFFFF
  • 27. Character Classes Caret inside a character class negates the match. q[^u] Qatar is home to quite a lot of Iraqi citizens, but is not a city in Iraq
  • 28. Character Classes Normal special characters are valid inside of character classes. Only ] ^ and – are reserved. [+*] 6 * 7 and 18 + 24 both equal 42
  • 29. Shorthand Character Classes – digit or [0-9] – word or [A-Za-z0-9_] – whitespace or [ ] (space, tab, CR, LF) [] 1 + 2 = 3
  • 30. Shorthand Character Classes – non-digit or [^] – non-word or [^] – non-whitespace or [^] [] 1 + 2 = 3
  • 31. Repetition The asterisk repeats the preceding character class 0 or more times. <[A-Za-z][A-Za-z0-9]*> <HTML>Regex is <b>Awesome</b></HTML>
  • 32. Repetition The plus repeats the preceding character class 1 or more times. <[A-Za-z0-9]+> Watch out for invalid <HTML> tags like <1> and <>!
  • 33. Repetition The question mark repeats the preceding character class 0 or 1 times, in effect making it optional. </?[A-Za-z][A-Za-z0-9]*> <HTML>Regex is <b>Awesome</b></HTML>
  • 34. Anchors The caret anchor matches the position before the first character in a string. ^vac vacation evacuation
  • 35. Anchors The dollar sign anchor matches the position after the last character in a string. tion$ vacation evacuation
  • 36. Anchors The caret and dollar sign anchors match the start and end of the line if the engine has multi-line turned on. tion$ vacation evacuation has ruined my evaluation
  • 37. Anchors The and shorthand character classes are like ^ and $ but only match the start and end of the string. tion vacation evacuation has ruined my evaluation
  • 38.
  • 39. position after the last character in a string (like $)
  • 40. between two characters where one is a word character and the other is not4 We’ve got 4 orders for 44 lbs of C4
  • 41. Word Boundaries The shorthand character class is the negated word boundary – any position between to word characters or two non-word characters. at vacation evacuation at that time ate my evaluation
  • 42. Alternation The pipe symbol delimits two or more character classes that can both match. cat|dog A cat and dog are expected to follow the dogma that their presence with one another leads to catastrophe.
  • 43. Alternation Alternations include any character classes. cat|dog A cat and dog are expected to follow the dogma that their presence with one another leads to catastrophe.
  • 44. Alternation Use parenthesis to group alternating matches when you want to limit the reach of alternation. (cat|dog) A cat and dog are expected to follow the dogma that their presence with one another leads to catastrophe.
  • 45. Eagerness Eagerness causes the order of alternations to matter. and|android A robot and an android fight. The ninja wins.
  • 46. Greediness Greediness means that the engine will always try to match as much as possible. an+ A robot and an android fight. The ninja wins.
  • 47. Laziness Laziness, or reluctant, modifies a repetition operator to only match as much as it needs to. an+? A robot and an android fight. The ninja wins.
  • 48. Limiting Repetition You can limit repetition with curly braces. {2,4} 1 111111111 11111
  • 49. Limiting Repetition The second number can be omitted to mean infinite. Essentially {0,} is the same as * and {1,} same as +. {2,} 1 11111111111111
  • 50. Limiting Repetition The a single number can be used to match an exact number of times. {4} 1 11 111 1111 11111
  • 51. Back References Parenthesis around a character set groups those characters and creates a back reference. ([ai]).. The magician said abracadabra!
  • 52. Named Groups Named groups let you reference matched groups by their name rather than just index. (?<vowel>[ai]).<vowel>. The magician said abracadabra!
  • 53. Negative Lookahead Negative lookaheads match something that is not there. q(?!u) Qatar is home to quite a lot of Iraqi citizens, but is not a city in Iraq
  • 54. Positive Lookahead Positive lookaheads match something that is there without having that group included in the match. q(?=u) Qatar is home to quite a lot of Iraqi citizens, but is not a city in Iraq
  • 55. Positive & Negative Lookbehind Lookbehinds are just like lookaheads, but working backwards. (?<=a)q Qatar is home to quite a lot of Iraqi citizens, but is not a city in Iraq
  • 56. Resources Lots of web pages http://del.icio.us/mattcasto/regex “Mastering Regular Expressions” by Jeffrey Friedl http://oreilly.com/catalog/9780596528126/

Editor's Notes

  1. I just wanted to get this famous quote out of the way from the beginning. Like all great quotes, it has been falsely attributed all over the place. See http://regex.info/blog/2006-09-15/247 for a comprehensive investigation on where the quote originates.
  2. If you’re never seen a regular expression before, it might look like Q*bert’s language to you.
  3. These are all examples of regular expressions. They were all created with the intent of matching email addresses. As you can see, they’re very different from one another.
  4. Regular expressions are often called a “write only language” and aren’t easily understood. To a novice they don’t even look like they have any meaning.
  5. Kleene is pronounced “clay-knee”
  6. Kenneth Lane Thompson, or just ‘ken’ in hacker circles, is often credited for creating the Unix operating system along with Dennis Ritchie
  7. Unix command line tool which borrowed Regular Expression pattern matching from the “ed” editor.
  8. Theregex library was eventually used to develop the PCRE library – Perl Compatible Reguarl Expressions. Most major programming languages these days use regular expressions based on the PCRE, including Java, .NET and Ruby.
  9. Without knowing regular expressions you could certainly accomplish this task, but it would take much longer, be more complex, and much more likely to contain bugs.
  10. Here’s some c# code that I threw together in a few minutes to parse a text file. There are bugs in the code and requirements that it doesn’t meet. Also, if requirements were to be added, this code would be more difficult to refactor than a regex.
  11. This solution is in Perl. I could have re-written it in c# for good comparison with my previous slide, but I decided that the original example solution looks much nicer. By the way, this example is taken from Mastering Regular Expressions by Jeffrey Friedl
  12. Knowing regular expressions won’t make you a hero, but you may feel like one when you’ve saved lots of time.
  13. The basics of regular expressions start with literal characters. Any character, except for a small list of reserved characters which will be covered next, is considered a literal character. This example has a regex “is” which is two literal characters.
  14. The regex “a” matches any letter “a” in the target string, even if its inside a word.
  15. Some regex engines have the option to turn off case sensitivity. Some engines may have this option on by default.
  16. These are all reserved characters, also known as meta-characters.
  17. Special characters can be escaped with a backslash, as demonstrated in the example.
  18. The curly brace characters are reserved, but only when in the context of a repetition modifier, and don’t need to be escaped otherwise. You can escape them without any side effects.
  19. There are other non-printable characters too, especially when you use unicode. You can also reference ASCII character codes, but I don’t see the need to show an example of this.
  20. The period, or dot, character matches any single character. There is an exception to this – if the engine is in single line mode (which used to be the only mode, but now is usually off by default) then the period will not match a new line. Javascript and VBScript don’t have a multi-line option, but [sS] works.The period is the most common meta-character, but that is because it is often mis-used.
  21. Also known as Character Sets match only one of the characters inside the square brackets.
  22. The hyphen, or dash, character inside a character class indicates a range of characters. The example would match any hexadecimal value. Note that the hyphen can be escaped inside a character class, but doesn’t need to be escaped if its at the beginning or end because its not a range in that context.
  23. A caret just inside a character negates the match. Note that the example won’t match the string “Iraq” because there is no character following the q. Also, “Qatar” isn’t matched because the Q is capitalized.
  24. The caret and hyphen characters are only reserved when they’re used in a context that could suggest a negation (for carets) or a range (for hyphens).
  25. These can be used inside and outside of character classes. All of these aren’t necessary, but are shortcuts that make it easier to write readable regular expressions – HA!Note that in the example the match includes the space before the numbers, but I couldn’t easily represent that with coloring.
  26. These are the negated versions of the shorthand character classes from the previous slide.
  27. Like the period, the asterisk is overused and dangerous.
  28. Like the period, the asterisk is overused and dangerous.
  29. The question mark can also modify a repetition symbol to make it non-greedy or lazy, but that’s a more advanced subject. Its all about context.
  30. Anchors don’t match a character, they match a position which could be before, after or between characters. In this example, the string only matches because there is no whitespace before the word “vacation”
  31. Note that anchors can result in a zero length match. For example, a regex of just “$” matches the position at the end of a line, but the match has no characters. This can be useful or cause issues in your code!
  32. Note that another exception is a new line at the end of a file will not be matched by $ or . The z shorthand character class will handle this circumstance.
  33. Note that the words “at”, “that” and “ate” don’t contain a match in the example.
  34. Regular expressions are eager, which means they match as soon as they can. This means that the order of similar character classes in an alternation can affect the result. In the example, the “and” match is found first, even though the full word “android” exists. If the character classes were switched, the whole “and” word would still match, but the whole “android” word would match instead of just the beginning of the word.
  35. Greediness causes the example to always match the full “android” instead of just “and” which is also a valid match.
  36. The ? mark after a repetition operator (*+?) makes it lazy.
  37. Using parenthesis, or round braces, to group a character set not only groups those characters together to apply repetition to them, but it also creates a back reference. A back reference stores the grouped part of the match for use later in the regex. Back references can also be used for replaces.In the example, “agici” doesn’t match because the group is “a” not “i”.
  38. The groups can be named with single parenthesis instead of greater/less than characters.
  39. This example matches a “q” not followed by a “u”. Unlike the example earlier in the presentation, this regex won’t match the character following the “q”.
  40. This example matches a “q” followed by a “u”, but notice that the “u” is not included with that match.
  41. This example matches a “q” followed by a “u”, but notice that the “u” is not included with that match.