SlideShare a Scribd company logo
1 of 20
Regular Expressions for the
Web Application Developer
             By Andrew Kandels
Regular Expressions
Regular expressions provide a concise, flexible means for
matching strings of text, such as words or patterns of
characters.
POSIX                                 PCRE
Portable Operating System Interface   Perl Compatible Regular Expressions


• Traditional Unix regular            •   Perl 5 Extended Features
  expression syntax                   •   Native C Extension
                                      •   Generally Faster
• PHP’s ereg_ functions               •   Optimization Qualifiers

• Basic and extended versions Used by:
                              • Programming languages
                              • Apache and other servers
Why Use Them?
•   Input Validation
•   Input Filtering
•   Search and Replace
•   Parsing and Data Extraction
•   Dynamic Recursion
•   Automation
In PHP, POSIX = Deprecated
ereg_* functions are now deprecated in newer versions of
PHP.
Switching to preg_* is generally pain free. Pain points:

•   Different matching criteria (greed)
•   preg_* requires delimiters
•   Different characters require escape sequences
•   preg favors option modifiers over functions
Anatomy of a PHP Regular Expression


                           /foo/i
• Delimiters
• Pattern to match
• Options/modifiers
preg_replace(
   „/(href|src)=„([^‟])*‟/i‟,
   „1=“2”‟,
   $str
);
PHP Regular Expressions

• Must use a delimiter: ! @ # /
• Use PHP’s single quotes (no escaping ’s)

preg_match                      Match against a pattern and
                                extract text
preg_replace                    Like str_replace with a pattern
                                (and sub-patterns)
preg_match_all                  Like preg_match, but an array
                                and count for every match
preg_split                      Like explode() but with a
                                pattern
preg_quote                      Escapes text for use in a regular
                                expression
Modifiers and Options
i   PCRE_CASELESS – Ignores case

m   PCRE_MULTILINE – Ignores new-lines

s   PCRE_DOTALL – New lines count with dots
    (.)
U   Don’t be greedy
Performance Killers

Slow-downs in performance generally come from:

• Alternation, the pipe/OR operator (|)
  Use [abcd] when possible over (a|b|c|d)
• Multi-line (PCRE_DOTALL or /s)
• Recursion: (d+)d*
  Use lengths when possible

It’s not that slow!
Sub-Patterns

Sub-Patterns allow you to extract relevant text from searches:




• For preg_replace, use either 1 or $1 in your replacement string
• Sub-patterns are left-most indexed by first left parenthesis “(“
Named Sub-Patterns




(?P<name>pattern)
Lookaheads
Are zero-match so they won’t modify your cursor or be included in any sub-patterns.




                            (?=pattern)
                   Pattern can be any valid regex
Lookbehinds




   (?<!pattern)
Accepts some basic regex
Multi-Line Processing




                     /msU
(Multi-line, include newlines with dots, non-greedy)
Once-Only Sub-Patterns

Eliminates slow recursion from wildcard searching.




       Less scans = more speed.
Greedy

By default, PCRE returns the biggest match.




        100,000 runs took 0.2791 seconds
Non-Greedy with Modifier

The /U modifier returns the SMALLEST match.




       100,000 runs took 0.2638 seconds
               (a little better, and it’s right)
Restrictive Wild-Carding

No greedy flag needed, faster without broad wild-cards.




         100,000 runs took 0.2271 seconds
                (fastest yet, no options needed)
grep

Use grep –E or egrep for extended regular expressions (+, ?, |)
and advanced functionality.

-A n         Print the next n lines after each match.
-B n         Print the previous n lines before each match.
-i           Ignore case
-m n         Stop after n matches
-r           Recursively search the file system
-n           Show line numbers
-v           Only show lines that don’t match
sed

Use –r (-E on OS X / FreeBSD) for extended regular expressions.
The End

  Web: http://andrewkandels.com

  Mail: mailto:akandels@gmail.com

Twitter: @andrewkandels

More Related Content

What's hot

What's hot (20)

Php array
Php arrayPhp array
Php array
 
Css3
Css3Css3
Css3
 
Html / CSS Presentation
Html / CSS PresentationHtml / CSS Presentation
Html / CSS Presentation
 
Html frames
Html framesHtml frames
Html frames
 
php
phpphp
php
 
Java script
Java scriptJava script
Java script
 
JSON: The Basics
JSON: The BasicsJSON: The Basics
JSON: The Basics
 
PHP - Introduction to PHP Cookies and Sessions
PHP - Introduction to PHP Cookies and SessionsPHP - Introduction to PHP Cookies and Sessions
PHP - Introduction to PHP Cookies and Sessions
 
Html forms
Html formsHtml forms
Html forms
 
javaScript.ppt
javaScript.pptjavaScript.ppt
javaScript.ppt
 
jQuery for beginners
jQuery for beginnersjQuery for beginners
jQuery for beginners
 
Php sessions & cookies
Php sessions & cookiesPhp sessions & cookies
Php sessions & cookies
 
JavaScript: Events Handling
JavaScript: Events HandlingJavaScript: Events Handling
JavaScript: Events Handling
 
Java script
Java scriptJava script
Java script
 
Intro to HTML and CSS basics
Intro to HTML and CSS basicsIntro to HTML and CSS basics
Intro to HTML and CSS basics
 
Php forms
Php formsPhp forms
Php forms
 
Bootstrap 5 ppt
Bootstrap 5 pptBootstrap 5 ppt
Bootstrap 5 ppt
 
XML Schema
XML SchemaXML Schema
XML Schema
 
JavaScript & Dom Manipulation
JavaScript & Dom ManipulationJavaScript & Dom Manipulation
JavaScript & Dom Manipulation
 
Java script arrays
Java script arraysJava script arrays
Java script arrays
 

Similar to Regular Expressions in PHP

9780538745840 ppt ch03
9780538745840 ppt ch039780538745840 ppt ch03
9780538745840 ppt ch03
Terry Yoast
 
Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
AtreyiB
 
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...
corehard_by
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
rantav
 

Similar to Regular Expressions in PHP (20)

Spsl II unit
Spsl   II unitSpsl   II unit
Spsl II unit
 
09 string processing_with_regex copy
09 string processing_with_regex copy09 string processing_with_regex copy
09 string processing_with_regex copy
 
PHP Web Programming
PHP Web ProgrammingPHP Web Programming
PHP Web Programming
 
Modern C++
Modern C++Modern C++
Modern C++
 
Regular expressions in Python
Regular expressions in PythonRegular expressions in Python
Regular expressions in Python
 
Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014
 
9780538745840 ppt ch03
9780538745840 ppt ch039780538745840 ppt ch03
9780538745840 ppt ch03
 
Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel Programming
 
Programming in Computational Biology
Programming in Computational BiologyProgramming in Computational Biology
Programming in Computational Biology
 
Finaal application on regular expression
Finaal application on regular expressionFinaal application on regular expression
Finaal application on regular expression
 
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...
C++ CoreHard Autumn 2018. Text Formatting For a Future Range-Based Standard L...
 
Parallelism in sql server
Parallelism in sql serverParallelism in sql server
Parallelism in sql server
 
introduction to server-side scripting
introduction to server-side scriptingintroduction to server-side scripting
introduction to server-side scripting
 
Regexes in .NET
Regexes in .NETRegexes in .NET
Regexes in .NET
 
Python Programming Basics for begginners
Python Programming Basics for begginnersPython Programming Basics for begginners
Python Programming Basics for begginners
 
Bioinformatica p2-p3-introduction
Bioinformatica p2-p3-introductionBioinformatica p2-p3-introduction
Bioinformatica p2-p3-introduction
 
Bioinformatics v2014 wim_vancriekinge
Bioinformatics v2014 wim_vancriekingeBioinformatics v2014 wim_vancriekinge
Bioinformatics v2014 wim_vancriekinge
 
Regular expressions
Regular expressionsRegular expressions
Regular expressions
 
Python regular expressions
Python regular expressionsPython regular expressions
Python regular expressions
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
 

Recently uploaded

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Regular Expressions in PHP

  • 1. Regular Expressions for the Web Application Developer By Andrew Kandels
  • 2. Regular Expressions Regular expressions provide a concise, flexible means for matching strings of text, such as words or patterns of characters. POSIX PCRE Portable Operating System Interface Perl Compatible Regular Expressions • Traditional Unix regular • Perl 5 Extended Features expression syntax • Native C Extension • Generally Faster • PHP’s ereg_ functions • Optimization Qualifiers • Basic and extended versions Used by: • Programming languages • Apache and other servers
  • 3. Why Use Them? • Input Validation • Input Filtering • Search and Replace • Parsing and Data Extraction • Dynamic Recursion • Automation
  • 4. In PHP, POSIX = Deprecated ereg_* functions are now deprecated in newer versions of PHP. Switching to preg_* is generally pain free. Pain points: • Different matching criteria (greed) • preg_* requires delimiters • Different characters require escape sequences • preg favors option modifiers over functions
  • 5. Anatomy of a PHP Regular Expression /foo/i • Delimiters • Pattern to match • Options/modifiers preg_replace( „/(href|src)=„([^‟])*‟/i‟, „1=“2”‟, $str );
  • 6. PHP Regular Expressions • Must use a delimiter: ! @ # / • Use PHP’s single quotes (no escaping ’s) preg_match Match against a pattern and extract text preg_replace Like str_replace with a pattern (and sub-patterns) preg_match_all Like preg_match, but an array and count for every match preg_split Like explode() but with a pattern preg_quote Escapes text for use in a regular expression
  • 7. Modifiers and Options i PCRE_CASELESS – Ignores case m PCRE_MULTILINE – Ignores new-lines s PCRE_DOTALL – New lines count with dots (.) U Don’t be greedy
  • 8. Performance Killers Slow-downs in performance generally come from: • Alternation, the pipe/OR operator (|) Use [abcd] when possible over (a|b|c|d) • Multi-line (PCRE_DOTALL or /s) • Recursion: (d+)d* Use lengths when possible It’s not that slow!
  • 9. Sub-Patterns Sub-Patterns allow you to extract relevant text from searches: • For preg_replace, use either 1 or $1 in your replacement string • Sub-patterns are left-most indexed by first left parenthesis “(“
  • 11. Lookaheads Are zero-match so they won’t modify your cursor or be included in any sub-patterns. (?=pattern) Pattern can be any valid regex
  • 12. Lookbehinds (?<!pattern) Accepts some basic regex
  • 13. Multi-Line Processing /msU (Multi-line, include newlines with dots, non-greedy)
  • 14. Once-Only Sub-Patterns Eliminates slow recursion from wildcard searching. Less scans = more speed.
  • 15. Greedy By default, PCRE returns the biggest match. 100,000 runs took 0.2791 seconds
  • 16. Non-Greedy with Modifier The /U modifier returns the SMALLEST match. 100,000 runs took 0.2638 seconds (a little better, and it’s right)
  • 17. Restrictive Wild-Carding No greedy flag needed, faster without broad wild-cards. 100,000 runs took 0.2271 seconds (fastest yet, no options needed)
  • 18. grep Use grep –E or egrep for extended regular expressions (+, ?, |) and advanced functionality. -A n Print the next n lines after each match. -B n Print the previous n lines before each match. -i Ignore case -m n Stop after n matches -r Recursively search the file system -n Show line numbers -v Only show lines that don’t match
  • 19. sed Use –r (-E on OS X / FreeBSD) for extended regular expressions.
  • 20. The End Web: http://andrewkandels.com Mail: mailto:akandels@gmail.com Twitter: @andrewkandels