SlideShare a Scribd company logo
1 of 33
Regular Expressions
Powerful string validation and extraction
Ignaz Wanders – Architect @ Archimiddle
@ignazw
Topics
• What are regular expressions?
• Patterns
• Character classes
• Quantifiers
• Capturing groups
• Boundaries
• Internationalization
• Regular expressions in Java
• Quiz
• References
What are regular expressions?
• A regex is a string pattern used to search and manipulate text
• A regex has special syntax
• Very powerful for any type of String manipulation ranging from simple to very
complex structures:
– Input validation
– S(ubs)tring replacement
– ...
• Example:
• [A-Z0-9._%-]+@[A-Z0-9._%-]+.[A-Z0-9._%-]{2,4}
History
• Originates from automata and formal-language theories of computer science
• Stephen Kleene  50’s: Kleene algebra
• Kenneth Thompson  1969: unix: qed, ed
• 70’s - 90’s: unix: grep, awk, sed, emacs
• Programming languages:
– C, Perl
– JavaScript, Java
Patterns
• Regex is based on pattern matching: Strings are searched for certain patterns
• Simplest regex is a string-literal pattern
• Metacharacters: ([{^$|)?*+.
– Period means “any character”
– To search for period as string literal, escape with “”
REGEX: fox
TEXT: The quick brown fox
RESULT: fox
REGEX: fo.
TEXT: The quick brown fox
RESULT: fox
REGEX: .o.
TEXT: The quick brown fox
RESULT: row, fox
Character classes (1/3)
• Syntax: any characters between [ and ]
• Character classes denote one letter
• Negation: ^
REGEX: [rcb]at
TEXT: bat
RESULT: bat
REGEX: [rcb]at
TEXT: rat
RESULT: rat
REGEX: [rcb]at
TEXT: cat
RESULT: cat
REGEX: [rcb]at
TEXT: hat
RESULT: -
REGEX: [^rcb]at
TEXT: rat
RESULT: -
REGEX: [^rcb]at
TEXT: hat
RESULT: hat
Character classes (2/3)
• Ranges: [a-z], [0-9], [i-n], [a-zA-Z]...
• Unions: [0-4[6-8]], [a-p[r-w]], ...
• Intersections: [a-f&&[efg]], [a-f&&[e-k]], ...
• Subtractions: [a-f&&[^efg]], ...
REGEX: [rcb]at[1-5]
TEXT: bat4 RESULT: bat4
REGEX: [rcb]at[1-5[7-8]]
TEXT: hat7 RESULT: -
REGEX: [rcb]at[1-7&&[78]]
TEXT: rat7 RESULT: rat7
REGEX: [rcb]at[1-5&&[^34]]
TEXT: bat4 RESULT: -
Character classes (3/3)
predefined character classes equivalence
. any character
d any digit [0-9]
D any non-digit [^0-9], [^d]
s any white-space character [ tnx0Bfr]
S any non-white-space character [^s]
w any word character [a-zA-Z_0-9]
W any non-word character [^w]
Quantifiers (1/5)
• Quantifiers allow character classes to match more than one character at a time.
Quantifiers for character classes X
X? zero or one time
X* zero or more times
X+ one or more times
X{n} exactly n times
X{n,} at least n times
X{n,m} at least n and at most m times
Quantifiers (2/5)
• Examples of X?, X*, X+
REGEX: “a?”
TEXT: “”
RESULT: “”
REGEX: “a*”
TEXT: “”
RESULT: “”
REGEX: “a+”
TEXT: “”
RESULT: -
REGEX: “a?”
TEXT: “a”
RESULT: “a”
REGEX: “a*”
TEXT: “a”
RESULT: “a”
REGEX: “a+”
TEXT: “a”
RESULT: “a”
REGEX: “a?”
TEXT: “aaa”
RESULT:
“a”,”a”,”a”
REGEX: “a*”
TEXT: “aaa”
RESULT: “aaa”
REGEX: “a+”
TEXT: “aaa”
RESULT: “aaa”
Quantifiers (3/5)
REGEX: “[abc]{3}”
TEXT: “abccabaaaccbbbc”
RESULT: “abc”,”cab”,”aaa”,”ccb”,”bbc”
REGEX: “abc{3}”
TEXT: “abccabaaaccbbbc”
RESULT: -
REGEX: “(dog){3}”
TEXT: “dogdogdogdogdogdog”
RESULT: “dogdogdog”,”dogdogdog”
Quantifiers (4/5)
• Greedy quantifiers:
– read complete string
– work backwards until match found
– syntax: X?, X*, X+, ...
• Reluctant quantifiers:
– read one character at a time
– work forward until match found
– syntax: X??, X*?, X+?, ...
• Possessive quantifiers:
– read complete string
– try match only once
– syntax: X?+, X*+, X++, ...
Quantifiers (5/5)
REGEX: “.*foo”
TEXT: “xfooxxxxxxfoo”
RESULT: “xfooxxxxxxfoo”
REGEX: .*?foo”
TEXT: “xfooxxxxxxfoo”
RESULT: “xfoo”, “xxxxxxfoo”
REGEX: “.*+foo”
TEXT: “xfooxxxxxxfoo”
RESULT: -
greedy
reluctant
possessive
Capturing groups (1/2)
• Capturing groups treat multiple characters as a single unit
• Syntax: between braces ( and )
• Example: (dog){3}
• Numbering from left to right
– Example: ((A)(B(C)))
• Group 1: ((A)(B(C)))
• Group 2: (A)
• Group 3: (B(C))
• Group 4: (C)
Capturing groups (2/2)
• Backreferences to capturing groups are denoted by i with i an integer number
REGEX: “(dd)1”
TEXT: “1212”
RESULT: “1212”
REGEX: “(dd)1”
TEXT: “1234”
RESULT: -
Boundaries (1/2)
Boundary characters
^ beginning of line
$ end of line
b a word boundary
B a non-word boundary
A beginning of input
G end of previous match
z end of input
Z end of input, but before final terminator, if any
Boundaries (2/2)
• Be aware:
• End-of-line marker is $
– Unix EOL is n
– Windows EOL is rn
– JDK uses any of the following as EOL:
• 'n', 'rn', 'u0085', 'u2028', 'u2029'
• Always test your regular expressions on the target OS
Internationalization (1/2)
• Regular expressions originally designed for the ascii Basic Latin set of characters.
– Thus “België” is not matched by ^w+$
• Extension to unicode character sets denoted by p{...}
• Character set: [p{InCharacterSet}]
– Create character classes from symbols in character sets.
– “België” is matched by ^*w|[p{InLatin-1Supplement}]]+$
Internationalization (2/2)
• Note that there are non-letters in character sets as well:
– Latin-1 Supplement:
• Categories:
– Letters: p{L}
– Uppercase letters: p{Lu}
– “België” is matched by ^p{L}+$
• Other (POSIX) categories:
– Unicode currency symbols: p{Sc}
– ASCII punctuation characters: p{Punct}
¡¢£¤¥¦§¨©ª«-®¯°±²³´µ·¸¹º»¼½¾¿÷
Regular expressions in Java
• Since JDK 1.4
• Package java.util.regex
– Pattern class
– Matcher class
• Convenience methods in java.lang.String
• Alternative for JDK 1.3
– Jakarta ORO project
java.util.regex.Pattern
• Wrapper class for regular expressions
• Useful methods:
– compile(String regex): Pattern
– matches(String regex, CharSequence text): boolean
– split(String text): String[]
String regex = “(dd)1”;
Pattern p = Pattern.compile(regex);
java.util.regex.Matcher
• Useful methods:
– matches(): boolean
– find(): boolean
– find(int start): boolean
– group(): String
– replaceFirst(String replace): String
– replaceAll(String replace): String
String regex = “(dd)1”;
Pattern p = Pattern.compile(regex);
String text = “1212”;
Matcher m = p.matcher(text);
boolean matches = m.matches();
java.lang.String
• Pattern and Matcher methods in String:
– matches(String regex): boolean
– split(String regex): String[]
– replaceFirst(String regex, String replace): String
– replaceAll(String regex, String replace): String
Examples
• Validation
• Searching text
• Filtering
• Parsing
• Removing duplicate lines
• On-the-fly editing
Examples: validation
• Validate an e-mail address
• A URL
[A-Z0-9._%-]+@[A-Z0-9._%-]+.[A-Z0-9._%-]{2,4}
(http|https|ftp)://([a-zA-Z0-9](w+.)+w{2,7}
|localw*)(:d+)?(/(w+[w/-.]*)?)?
Examples: searching text
• Write HttpUnit test to submit HTML form and check whether HTTP response is a
confirmation screen containing a generated form number of the form 9xxxxxx-
xxxxxx:
9[0-9]{6}-[0-9]{6}
Pattern p = Pattern.compile(regexp);
Matcher m = p.matcher(text);
boolean ok = m.find();
String nr = m.group();
Examples: filtering
• Filter e-mail with subjects with capitals only, and including a leading “Re:”
(R[eE]:)*[^a-z]*$
Examples: parsing
• Matches any opening and closing XML tag:
– Note the use of the back reference
<([A-Z][A-Z0-9]*)[^>]*>(.*?)</1>
Examples: duplicate lines
• Suppose you want to remove duplicate lines from a text.
– requirement here is that the lines are sorted alphabetically
^(.*)(r?n1)+$
Examples: on-the-fly editing
• Suppose you want to edit a file in batch: all occurrances of a certain string pattern
should be replaced with another string.
• In unix: use the sed command with a regex
• In Java: use string.replaceAll(regex,”mystring”)
• In Ant: use replaceregexp optional task to, e.g., edit deployment descriptors
depending on environment
Quiz
• What are the following regular expressions looking for?
d+ at least one digit
[-+]?d+ any integer
((d*.?)?d+|d+(.?d*)) any positive decimal
[p{L}']['-.p{L} ]+ a place name
Conclusion
• When doing one of the following:
– validating strings
– on-the-fly editing of strings
– searching strings
– filtering strings
• think regex!
References
• http://www.regular-expressions.info/
• http://www.regexlib.com/
• http://developer.java.sun.com/developer/technicalArticles/releases/1.4regex/
• http://java.sun.com/docs/books/tutorial/extra/regex/
• http://www.wellho.net/regex/javare.html
• >JDK 1.4 API
• Mastering Regular Expressions

More Related Content

What's hot

Regular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular ExpressionsRegular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular ExpressionsDanny Bryant
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressionsBen Brumfield
 
Regular expressions tutorial for SEO & Website Analysis
Regular expressions tutorial for SEO & Website AnalysisRegular expressions tutorial for SEO & Website Analysis
Regular expressions tutorial for SEO & Website AnalysisGlobal Media Insight
 
Regular expression
Regular expressionRegular expression
Regular expressionLarry Nung
 
1 - Introduction to Compilers.ppt
1 - Introduction to Compilers.ppt1 - Introduction to Compilers.ppt
1 - Introduction to Compilers.pptRakesh Kumar
 
Regular expressions
Regular expressionsRegular expressions
Regular expressionsEran Zimbler
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP StringsAhmed Swilam
 
Regular expressions-Theory of computation
Regular expressions-Theory of computationRegular expressions-Theory of computation
Regular expressions-Theory of computationBipul Roy Bpl
 
JavaScript: Variables and Functions
JavaScript: Variables and FunctionsJavaScript: Variables and Functions
JavaScript: Variables and FunctionsJussi Pohjolainen
 
SQL Joins With Examples | Edureka
SQL Joins With Examples | EdurekaSQL Joins With Examples | Edureka
SQL Joins With Examples | EdurekaEdureka!
 
Solving linear homogeneous recurrence relations
Solving linear homogeneous recurrence relationsSolving linear homogeneous recurrence relations
Solving linear homogeneous recurrence relationsDr. Maamoun Ahmed
 
SQL Tutorial - Basic Commands
SQL Tutorial - Basic CommandsSQL Tutorial - Basic Commands
SQL Tutorial - Basic Commands1keydata
 
LINUX:Control statements in shell programming
LINUX:Control statements in shell programmingLINUX:Control statements in shell programming
LINUX:Control statements in shell programmingbhatvijetha
 
Advanced regular expressions
Advanced regular expressionsAdvanced regular expressions
Advanced regular expressionsNeha Jain
 

What's hot (20)

Regular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular ExpressionsRegular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular Expressions
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressions
 
Regular expressions tutorial for SEO & Website Analysis
Regular expressions tutorial for SEO & Website AnalysisRegular expressions tutorial for SEO & Website Analysis
Regular expressions tutorial for SEO & Website Analysis
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
Testing In Django
Testing In DjangoTesting In Django
Testing In Django
 
Regular expression
Regular expressionRegular expression
Regular expression
 
Grammar
GrammarGrammar
Grammar
 
1 - Introduction to Compilers.ppt
1 - Introduction to Compilers.ppt1 - Introduction to Compilers.ppt
1 - Introduction to Compilers.ppt
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Training basic latex
Training basic latexTraining basic latex
Training basic latex
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP Strings
 
Regular expressions-Theory of computation
Regular expressions-Theory of computationRegular expressions-Theory of computation
Regular expressions-Theory of computation
 
Software testing
Software testingSoftware testing
Software testing
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
JavaScript: Variables and Functions
JavaScript: Variables and FunctionsJavaScript: Variables and Functions
JavaScript: Variables and Functions
 
SQL Joins With Examples | Edureka
SQL Joins With Examples | EdurekaSQL Joins With Examples | Edureka
SQL Joins With Examples | Edureka
 
Solving linear homogeneous recurrence relations
Solving linear homogeneous recurrence relationsSolving linear homogeneous recurrence relations
Solving linear homogeneous recurrence relations
 
SQL Tutorial - Basic Commands
SQL Tutorial - Basic CommandsSQL Tutorial - Basic Commands
SQL Tutorial - Basic Commands
 
LINUX:Control statements in shell programming
LINUX:Control statements in shell programmingLINUX:Control statements in shell programming
LINUX:Control statements in shell programming
 
Advanced regular expressions
Advanced regular expressionsAdvanced regular expressions
Advanced regular expressions
 

Viewers also liked

Lecture: Regular Expressions and Regular Languages
Lecture: Regular Expressions and Regular LanguagesLecture: Regular Expressions and Regular Languages
Lecture: Regular Expressions and Regular LanguagesMarina Santini
 
Introduction to Regular Expressions
Introduction to Regular ExpressionsIntroduction to Regular Expressions
Introduction to Regular ExpressionsMatt Casto
 
Learn PHP Lacture1
Learn PHP Lacture1Learn PHP Lacture1
Learn PHP Lacture1ADARSH BHATT
 
Regular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And BeyondRegular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And BeyondMax Shirshin
 
Bitcoin: the future money, or a scam?
Bitcoin: the future money, or a scam?Bitcoin: the future money, or a scam?
Bitcoin: the future money, or a scam?Ignaz Wanders
 
The Service doing "Ping"
The Service doing "Ping"The Service doing "Ping"
The Service doing "Ping"Ignaz Wanders
 
Web Service Versioning
Web Service VersioningWeb Service Versioning
Web Service VersioningIgnaz Wanders
 
Lecture 03 lexical analysis
Lecture 03 lexical analysisLecture 03 lexical analysis
Lecture 03 lexical analysisIffat Anjum
 
Finite Automata
Finite AutomataFinite Automata
Finite AutomataShiraz316
 
Regular expression with DFA
Regular expression with DFARegular expression with DFA
Regular expression with DFAMaulik Togadiya
 
Field Extractions: Making Regex Your Buddy
Field Extractions: Making Regex Your BuddyField Extractions: Making Regex Your Buddy
Field Extractions: Making Regex Your BuddyMichael Wilde
 

Viewers also liked (17)

Lecture: Regular Expressions and Regular Languages
Lecture: Regular Expressions and Regular LanguagesLecture: Regular Expressions and Regular Languages
Lecture: Regular Expressions and Regular Languages
 
Introduction to Regular Expressions
Introduction to Regular ExpressionsIntroduction to Regular Expressions
Introduction to Regular Expressions
 
Regular expression (compiler)
Regular expression (compiler)Regular expression (compiler)
Regular expression (compiler)
 
Learn PHP Lacture1
Learn PHP Lacture1Learn PHP Lacture1
Learn PHP Lacture1
 
Regular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And BeyondRegular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And Beyond
 
Bitcoin: the future money, or a scam?
Bitcoin: the future money, or a scam?Bitcoin: the future money, or a scam?
Bitcoin: the future money, or a scam?
 
The Service doing "Ping"
The Service doing "Ping"The Service doing "Ping"
The Service doing "Ping"
 
Reflexive Access List
Reflexive Access ListReflexive Access List
Reflexive Access List
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
Tests
TestsTests
Tests
 
Regular expression examples
Regular expression examplesRegular expression examples
Regular expression examples
 
Lecture2 B
Lecture2 BLecture2 B
Lecture2 B
 
Web Service Versioning
Web Service VersioningWeb Service Versioning
Web Service Versioning
 
Lecture 03 lexical analysis
Lecture 03 lexical analysisLecture 03 lexical analysis
Lecture 03 lexical analysis
 
Finite Automata
Finite AutomataFinite Automata
Finite Automata
 
Regular expression with DFA
Regular expression with DFARegular expression with DFA
Regular expression with DFA
 
Field Extractions: Making Regex Your Buddy
Field Extractions: Making Regex Your BuddyField Extractions: Making Regex Your Buddy
Field Extractions: Making Regex Your Buddy
 

Similar to Regular expressions

Regular Expressions: QA Challenge Accepted Conf (March 2015)
Regular Expressions: QA Challenge Accepted Conf (March 2015)Regular Expressions: QA Challenge Accepted Conf (March 2015)
Regular Expressions: QA Challenge Accepted Conf (March 2015)Svetlin Nakov
 
Regular Expressions and You
Regular Expressions and YouRegular Expressions and You
Regular Expressions and YouJames Armes
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Ben Brumfield
 
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekingeBioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekingeProf. Wim Van Criekinge
 
Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions Ahmed El-Arabawy
 
Regular Expressions grep and egrep
Regular Expressions grep and egrepRegular Expressions grep and egrep
Regular Expressions grep and egrepTri Truong
 
Regular expressions-ada-2018
Regular expressions-ada-2018Regular expressions-ada-2018
Regular expressions-ada-2018Emma Burrows
 
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfFUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfBryan Alejos
 
Regular Expressions Boot Camp
Regular Expressions Boot CampRegular Expressions Boot Camp
Regular Expressions Boot CampChris Schiffhauer
 
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in PracticeWeek-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in PracticeBertram Ludäscher
 
Js reg正则表达式
Js reg正则表达式Js reg正则表达式
Js reg正则表达式keke302
 
Introduction to Regular Expressions
Introduction to Regular ExpressionsIntroduction to Regular Expressions
Introduction to Regular ExpressionsJesse Anderson
 

Similar to Regular expressions (20)

Regular expression for everyone
Regular expression for everyoneRegular expression for everyone
Regular expression for everyone
 
Regular Expressions: QA Challenge Accepted Conf (March 2015)
Regular Expressions: QA Challenge Accepted Conf (March 2015)Regular Expressions: QA Challenge Accepted Conf (March 2015)
Regular Expressions: QA Challenge Accepted Conf (March 2015)
 
Regular Expressions and You
Regular Expressions and YouRegular Expressions and You
Regular Expressions and You
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Bioinformatica p2-p3-introduction
Bioinformatica p2-p3-introductionBioinformatica p2-p3-introduction
Bioinformatica p2-p3-introduction
 
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekingeBioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
 
Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions
 
JavaScript.pptx
JavaScript.pptxJavaScript.pptx
JavaScript.pptx
 
Regular Expressions grep and egrep
Regular Expressions grep and egrepRegular Expressions grep and egrep
Regular Expressions grep and egrep
 
Json the-x-in-ajax1588
Json the-x-in-ajax1588Json the-x-in-ajax1588
Json the-x-in-ajax1588
 
Regular expressions-ada-2018
Regular expressions-ada-2018Regular expressions-ada-2018
Regular expressions-ada-2018
 
Json demo
Json demoJson demo
Json demo
 
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfFUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
 
Regular Expressions Boot Camp
Regular Expressions Boot CampRegular Expressions Boot Camp
Regular Expressions Boot Camp
 
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in PracticeWeek-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
Week-2: Theory & Practice of Data Cleaning: Regular Expressions in Practice
 
Js reg正则表达式
Js reg正则表达式Js reg正则表达式
Js reg正则表达式
 
Introduction to Regular Expressions
Introduction to Regular ExpressionsIntroduction to Regular Expressions
Introduction to Regular Expressions
 
Quick start reg ex
Quick start reg exQuick start reg ex
Quick start reg ex
 
Regex posix
Regex posixRegex posix
Regex posix
 

Recently uploaded

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Recently uploaded (20)

08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Regular expressions

  • 1. Regular Expressions Powerful string validation and extraction Ignaz Wanders – Architect @ Archimiddle @ignazw
  • 2. Topics • What are regular expressions? • Patterns • Character classes • Quantifiers • Capturing groups • Boundaries • Internationalization • Regular expressions in Java • Quiz • References
  • 3. What are regular expressions? • A regex is a string pattern used to search and manipulate text • A regex has special syntax • Very powerful for any type of String manipulation ranging from simple to very complex structures: – Input validation – S(ubs)tring replacement – ... • Example: • [A-Z0-9._%-]+@[A-Z0-9._%-]+.[A-Z0-9._%-]{2,4}
  • 4. History • Originates from automata and formal-language theories of computer science • Stephen Kleene  50’s: Kleene algebra • Kenneth Thompson  1969: unix: qed, ed • 70’s - 90’s: unix: grep, awk, sed, emacs • Programming languages: – C, Perl – JavaScript, Java
  • 5. Patterns • Regex is based on pattern matching: Strings are searched for certain patterns • Simplest regex is a string-literal pattern • Metacharacters: ([{^$|)?*+. – Period means “any character” – To search for period as string literal, escape with “” REGEX: fox TEXT: The quick brown fox RESULT: fox REGEX: fo. TEXT: The quick brown fox RESULT: fox REGEX: .o. TEXT: The quick brown fox RESULT: row, fox
  • 6. Character classes (1/3) • Syntax: any characters between [ and ] • Character classes denote one letter • Negation: ^ REGEX: [rcb]at TEXT: bat RESULT: bat REGEX: [rcb]at TEXT: rat RESULT: rat REGEX: [rcb]at TEXT: cat RESULT: cat REGEX: [rcb]at TEXT: hat RESULT: - REGEX: [^rcb]at TEXT: rat RESULT: - REGEX: [^rcb]at TEXT: hat RESULT: hat
  • 7. Character classes (2/3) • Ranges: [a-z], [0-9], [i-n], [a-zA-Z]... • Unions: [0-4[6-8]], [a-p[r-w]], ... • Intersections: [a-f&&[efg]], [a-f&&[e-k]], ... • Subtractions: [a-f&&[^efg]], ... REGEX: [rcb]at[1-5] TEXT: bat4 RESULT: bat4 REGEX: [rcb]at[1-5[7-8]] TEXT: hat7 RESULT: - REGEX: [rcb]at[1-7&&[78]] TEXT: rat7 RESULT: rat7 REGEX: [rcb]at[1-5&&[^34]] TEXT: bat4 RESULT: -
  • 8. Character classes (3/3) predefined character classes equivalence . any character d any digit [0-9] D any non-digit [^0-9], [^d] s any white-space character [ tnx0Bfr] S any non-white-space character [^s] w any word character [a-zA-Z_0-9] W any non-word character [^w]
  • 9. Quantifiers (1/5) • Quantifiers allow character classes to match more than one character at a time. Quantifiers for character classes X X? zero or one time X* zero or more times X+ one or more times X{n} exactly n times X{n,} at least n times X{n,m} at least n and at most m times
  • 10. Quantifiers (2/5) • Examples of X?, X*, X+ REGEX: “a?” TEXT: “” RESULT: “” REGEX: “a*” TEXT: “” RESULT: “” REGEX: “a+” TEXT: “” RESULT: - REGEX: “a?” TEXT: “a” RESULT: “a” REGEX: “a*” TEXT: “a” RESULT: “a” REGEX: “a+” TEXT: “a” RESULT: “a” REGEX: “a?” TEXT: “aaa” RESULT: “a”,”a”,”a” REGEX: “a*” TEXT: “aaa” RESULT: “aaa” REGEX: “a+” TEXT: “aaa” RESULT: “aaa”
  • 11. Quantifiers (3/5) REGEX: “[abc]{3}” TEXT: “abccabaaaccbbbc” RESULT: “abc”,”cab”,”aaa”,”ccb”,”bbc” REGEX: “abc{3}” TEXT: “abccabaaaccbbbc” RESULT: - REGEX: “(dog){3}” TEXT: “dogdogdogdogdogdog” RESULT: “dogdogdog”,”dogdogdog”
  • 12. Quantifiers (4/5) • Greedy quantifiers: – read complete string – work backwards until match found – syntax: X?, X*, X+, ... • Reluctant quantifiers: – read one character at a time – work forward until match found – syntax: X??, X*?, X+?, ... • Possessive quantifiers: – read complete string – try match only once – syntax: X?+, X*+, X++, ...
  • 13. Quantifiers (5/5) REGEX: “.*foo” TEXT: “xfooxxxxxxfoo” RESULT: “xfooxxxxxxfoo” REGEX: .*?foo” TEXT: “xfooxxxxxxfoo” RESULT: “xfoo”, “xxxxxxfoo” REGEX: “.*+foo” TEXT: “xfooxxxxxxfoo” RESULT: - greedy reluctant possessive
  • 14. Capturing groups (1/2) • Capturing groups treat multiple characters as a single unit • Syntax: between braces ( and ) • Example: (dog){3} • Numbering from left to right – Example: ((A)(B(C))) • Group 1: ((A)(B(C))) • Group 2: (A) • Group 3: (B(C)) • Group 4: (C)
  • 15. Capturing groups (2/2) • Backreferences to capturing groups are denoted by i with i an integer number REGEX: “(dd)1” TEXT: “1212” RESULT: “1212” REGEX: “(dd)1” TEXT: “1234” RESULT: -
  • 16. Boundaries (1/2) Boundary characters ^ beginning of line $ end of line b a word boundary B a non-word boundary A beginning of input G end of previous match z end of input Z end of input, but before final terminator, if any
  • 17. Boundaries (2/2) • Be aware: • End-of-line marker is $ – Unix EOL is n – Windows EOL is rn – JDK uses any of the following as EOL: • 'n', 'rn', 'u0085', 'u2028', 'u2029' • Always test your regular expressions on the target OS
  • 18. Internationalization (1/2) • Regular expressions originally designed for the ascii Basic Latin set of characters. – Thus “België” is not matched by ^w+$ • Extension to unicode character sets denoted by p{...} • Character set: [p{InCharacterSet}] – Create character classes from symbols in character sets. – “België” is matched by ^*w|[p{InLatin-1Supplement}]]+$
  • 19. Internationalization (2/2) • Note that there are non-letters in character sets as well: – Latin-1 Supplement: • Categories: – Letters: p{L} – Uppercase letters: p{Lu} – “België” is matched by ^p{L}+$ • Other (POSIX) categories: – Unicode currency symbols: p{Sc} – ASCII punctuation characters: p{Punct} ¡¢£¤¥¦§¨©ª«-®¯°±²³´µ·¸¹º»¼½¾¿÷
  • 20. Regular expressions in Java • Since JDK 1.4 • Package java.util.regex – Pattern class – Matcher class • Convenience methods in java.lang.String • Alternative for JDK 1.3 – Jakarta ORO project
  • 21. java.util.regex.Pattern • Wrapper class for regular expressions • Useful methods: – compile(String regex): Pattern – matches(String regex, CharSequence text): boolean – split(String text): String[] String regex = “(dd)1”; Pattern p = Pattern.compile(regex);
  • 22. java.util.regex.Matcher • Useful methods: – matches(): boolean – find(): boolean – find(int start): boolean – group(): String – replaceFirst(String replace): String – replaceAll(String replace): String String regex = “(dd)1”; Pattern p = Pattern.compile(regex); String text = “1212”; Matcher m = p.matcher(text); boolean matches = m.matches();
  • 23. java.lang.String • Pattern and Matcher methods in String: – matches(String regex): boolean – split(String regex): String[] – replaceFirst(String regex, String replace): String – replaceAll(String regex, String replace): String
  • 24. Examples • Validation • Searching text • Filtering • Parsing • Removing duplicate lines • On-the-fly editing
  • 25. Examples: validation • Validate an e-mail address • A URL [A-Z0-9._%-]+@[A-Z0-9._%-]+.[A-Z0-9._%-]{2,4} (http|https|ftp)://([a-zA-Z0-9](w+.)+w{2,7} |localw*)(:d+)?(/(w+[w/-.]*)?)?
  • 26. Examples: searching text • Write HttpUnit test to submit HTML form and check whether HTTP response is a confirmation screen containing a generated form number of the form 9xxxxxx- xxxxxx: 9[0-9]{6}-[0-9]{6} Pattern p = Pattern.compile(regexp); Matcher m = p.matcher(text); boolean ok = m.find(); String nr = m.group();
  • 27. Examples: filtering • Filter e-mail with subjects with capitals only, and including a leading “Re:” (R[eE]:)*[^a-z]*$
  • 28. Examples: parsing • Matches any opening and closing XML tag: – Note the use of the back reference <([A-Z][A-Z0-9]*)[^>]*>(.*?)</1>
  • 29. Examples: duplicate lines • Suppose you want to remove duplicate lines from a text. – requirement here is that the lines are sorted alphabetically ^(.*)(r?n1)+$
  • 30. Examples: on-the-fly editing • Suppose you want to edit a file in batch: all occurrances of a certain string pattern should be replaced with another string. • In unix: use the sed command with a regex • In Java: use string.replaceAll(regex,”mystring”) • In Ant: use replaceregexp optional task to, e.g., edit deployment descriptors depending on environment
  • 31. Quiz • What are the following regular expressions looking for? d+ at least one digit [-+]?d+ any integer ((d*.?)?d+|d+(.?d*)) any positive decimal [p{L}']['-.p{L} ]+ a place name
  • 32. Conclusion • When doing one of the following: – validating strings – on-the-fly editing of strings – searching strings – filtering strings • think regex!
  • 33. References • http://www.regular-expressions.info/ • http://www.regexlib.com/ • http://developer.java.sun.com/developer/technicalArticles/releases/1.4regex/ • http://java.sun.com/docs/books/tutorial/extra/regex/ • http://www.wellho.net/regex/javare.html • >JDK 1.4 API • Mastering Regular Expressions