SlideShare uma empresa Scribd logo
1 de 24
Baixar para ler offline
Perl Programming
                 Course
             Working with text
            Regular expressions



Krassimir Berov

I-can.eu
Contents
1. Simple word matching
2. Character classes
3. Matching this or that
4. Grouping
5. Extracting matches
6. Matching repetitions
7. Search and replace
8. The split operator
Simple word matching
• It's all about identifying patterns in text
• The simplest regex is simply
  a word – a string of characters.
• A regex consisting of a word matches any
  string that contains that word
• The sense of the match can be reversed
  by using !~ operator
 my $string ='some probably big string containing just
 about anything in it';
 print "found 'string'n" if $string =~ /string/;
 print "it is not about dogsn" if $string !~ /dog/;
Simple word matching
                                                  (2)
• The literal string in the regex can be
  replaced by a variable
• If matching against $_ , the $_ =~ part
  can be omitted:

 my $string ='stringify this world';
 my $word = 'string'; my $animal = 'dog';
 print "found '$word'n" if $string =~ /$word/;
 print "it is not about ${animal}sn"
     if $string !~ /$animal/;
 for('dog','string','dog'){
     print "$wordn" if /$word/
 }
Simple word matching
                                                   (3)
• The // default delimiters for a match can be
  changed to arbitrary delimiters by putting an 'm'
  in front
• Regexes must match a part of the string exactly
  in order for the statement to be true
 my $string ='Stringify this world!';
 my $word = 'string'; my $animal = 'dog';
 print "found '$word'n" if $string =~ m#$word#;
 print "found '$word' in any casen"
    if $string =~ m#$word#i;
 print "it is not about ${animal}sn"
     if $string !~ m($animal);
 for('dog','string','Dog'){
     local $=$/;
     print if m|$animal|
 }
Simple word matching
                                                     (4)
• perl will always match at the earliest possible
  point in the string
 my $string ='Stringify this stringy world!';
 my $word = 'string';
 print "found '$word' in any casen"
    if $string =~ m{$word}i;

• Some characters, called metacharacters, are
  reserved for use in regex notation. The
  metacharacters are (14):
 { } [ ] ( ) ^ $ . | * + ? 

• A metacharacter can be matched by putting a
  backslash before it
 print "The string n'$string'n contains a DOTn"
     if $string =~ m|.|;
Simple word matching
                                                        (5)
• Non-printable ASCII characters are
  represented by escape sequences
• Arbitrary bytes are represented by octal
  escape sequences
 use utf8;
 binmode(STDOUT, ':utf8') if $ENV{LANG} =~/UTF-8/;
 $=$/;
 my $string ="containsrn Then we have sometttabs.
 б";
 print 'matched б(x{431})'
     if $string =~ /x{431}/;
 print 'matched б' if $string =~/б/;
 print 'matched rn' if $string =~/rn/;
 print 'The string was:"' . $string.'"';
Simple word matching
                                                          (6)
• To specify where it should match, use the
  anchor metacharacters ^ and $ .



 use strict; use warnings;$=$/;
 my $string ='A probably long chunk of text containing
 strings';
 print 'matched "A"' if $string =~ /^A/;
 print 'matched "strings"' if $string =~ /strings$/;
 print 'matched "A", matched "strings" and something in
 between'
 if $string =~ /^A.*?strings$/;
Character classes
• A character class allows a set of possible
  characters, to match
• Character classes are denoted by brackets [ ]
  with the set of characters to be possibly matched
  inside
• The special characters for a character class are
  - ]  ^ $ and are matched using an escape
• The special character '-' acts as a range operator
  within character classes so you can write [0-9]
  and [a-z]
Character classes
• Example
 use strict; use warnings;$=$/;
 my $string ='A probably long chunk of text containing
 strings';
 my $thing = 'ong ung ang enanything';
 my $every = 'iiiiii';
 my $nums   = 'I have 4325 Euro';
 my $class = 'dog';
 print 'matched any of a, b or c'
    if $string =~ /[abc]/;

 for($thing, $every, $string){
     print 'ingy brrrings nothing using: '.$_
         if /[$class]/
 }
 print $nums if $nums =~/[0-9]/;
Character classes
• Perl has several abbreviations for common character
  classes
   • d is a digit – [0-9]
   • s is a whitespace character – [ trnf]
   • w is a word character
     (alphanumeric or _) – [0-9a-zA-Z_]
   • D is a negated d – any character but a digit [^0-9]
   • S is a negated s; it represents any non-whitespace
     character [^s]
   • W is a negated w – any non-word character
   • The period '.' matches any character but "n"
   • The dswDSW inside and outside of character classes
   • The word anchor b matches a boundary between a word
     character and a non-word character wW or Ww
Character classes
• Example
 my $digits ="here are some digits3434 and then ";

 print 'found digit' if $digits =~/d/;

 print 'found alphanumeric' if $digits =~/w/;

 print 'found space' if $digits =~/s/;

 print 'digit followed by space, followed by letter'
    if $digits =~/ds[A-z]/;
Matching this or that
• We can match different character strings
  with the alternation metacharacter '|'
• perl will try to match the regex at the
  earliest possible point in the string
 my $digits ="here are some digits3434 and then ";

 print 'found "are" or "and"' if $digits =~/are|and/;
Extracting matches
• The grouping metacharacters () also
  allow the extraction of the parts of a
  string that matched
• For each grouping, the part that matched
  inside goes into the special variables $1 ,
  $2 , etc.
• They can be used just as ordinary
  variables
Extracting matches
• The grouping metacharacters () allow a part of a
  regex to be treated as a single unit
• If the groupings in a regex are nested, $1 gets the
  group with the leftmost opening parenthesis, $2
  the next opening parenthesis, etc.
  my $digits ="here are some digits3434 and then678 ";

  print 'found a letter followed by leters or digits":'.$1
      if $digits =~/[a-z]([a-z]|d+)/;
  print 'found a letter followed by digits":'.$1
      if $digits =~/([a-z](d+))/;
  #                   $1    $2
  print 'found letters followed by digits":'.$1
      if $digits =~/([a-z]+)(d+)/;
  #                   $1     $2
Matching repetitions
• The quantifier metacharacters ?, * , + , and {} allow
  us to determine the number of repeats of a portion
  of a regex
• Quantifiers are put immediately after the character,
  character class, or grouping
   • a? = match 'a' 1 or 0 times
   • a* = match 'a' 0 or more times, i.e., any number of times
   • a+ = match 'a' 1 or more times, i.e., at least once
   • a{n,m} = match at least n times, but not more than m
     times
   • a{n,} = match at least n or more times
   • a{n} = match exactly n times
Matching repetitions


use strict; use warnings;$=$/;
my $digits ="here are some digits3434 and then678 ";

print 'found some letters followed by leters or
digits":'.$1 .$2
if $digits =~/([a-z]{2,})(w+)/;

print 'found three letter followed by   digits":'.$1 .$2
if $digits =~/([a-z]{3}(d+))/;

print 'found up to four letters followed by   digits":'.
$1 .$2
if $digits =~/([a-z]{1,4})(d+)/;
Matching repetitions
• Greeeedy

 use strict; use warnings;$=$/;
 my $digits ="here are some digits3434 and then678 ";

 print 'found as much as possible letters
 followed by digits":'.$1 .$2
 if $digits =~/([a-z]*)(d+)/;
Search and replace
• Search and replace is performed using
  s/regex/replacement/modifiers.
• The replacement is a Perl double quoted string
  that replaces in the string whatever is matched with
  the regex .
• The operator =~ is used to associate a string with
  s///.
• If matching against $_ , the $_ =~ can be dropped.
• If there is a match, s/// returns the number of
  substitutions made, otherwise it returns false
Search and replace
• The matched variables $1 , $2 , etc. are immediately
  available for use in the replacement expression.
• With the global modifier, s///g will search and
  replace all occurrences of the regex in the string
• The evaluation modifier s///e wraps an eval{...}
  around the replacement string and the evaluated
  result is substituted for the matched substring.
• s/// can use other delimiters, such as s!!! and s{}{},
  and even s{}//
• If single quotes are used s''', then the regex and
  replacement are treated as single quoted strings
Search and replace
• Example

 #TODO....
The split operator
• split /regex/, string
 splits string into a list of substrings and
 returns that list
• The regex determines the character sequence
  that string is split with respect to
 #TODO....
Regular expressions

• Resources
  • perlrequick - Perl regular expressions quick start
  • perlre - Perl regular expressions
  • perlreref - Perl Regular Expressions Reference
  • Beginning Perl
    (Chapter 5 – Regular Expressions)
Regular expressions




Questions?

Mais conteúdo relacionado

Mais procurados

Perl programming language
Perl programming languagePerl programming language
Perl programming language
Elie Obeid
 
String variable in php
String variable in phpString variable in php
String variable in php
chantholnet
 
Scripting3
Scripting3Scripting3
Scripting3
Nao Dara
 
Perl 101 - The Basics of Perl Programming
Perl  101 - The Basics of Perl ProgrammingPerl  101 - The Basics of Perl Programming
Perl 101 - The Basics of Perl Programming
Utkarsh Sengar
 

Mais procurados (19)

Perl programming language
Perl programming languagePerl programming language
Perl programming language
 
Intro to Perl and Bioperl
Intro to Perl and BioperlIntro to Perl and Bioperl
Intro to Perl and Bioperl
 
Introduction to Perl - Day 1
Introduction to Perl - Day 1Introduction to Perl - Day 1
Introduction to Perl - Day 1
 
Perl 5.10 for People Who Aren't Totally Insane
Perl 5.10 for People Who Aren't Totally InsanePerl 5.10 for People Who Aren't Totally Insane
Perl 5.10 for People Who Aren't Totally Insane
 
Class 5 - PHP Strings
Class 5 - PHP StringsClass 5 - PHP Strings
Class 5 - PHP Strings
 
Bioinformatics p1-perl-introduction v2013
Bioinformatics p1-perl-introduction v2013Bioinformatics p1-perl-introduction v2013
Bioinformatics p1-perl-introduction v2013
 
Intoduction to php strings
Intoduction to php  stringsIntoduction to php  strings
Intoduction to php strings
 
Introduction to Perl and BioPerl
Introduction to Perl and BioPerlIntroduction to Perl and BioPerl
Introduction to Perl and BioPerl
 
DBIx::Class introduction - 2010
DBIx::Class introduction - 2010DBIx::Class introduction - 2010
DBIx::Class introduction - 2010
 
LPW: Beginners Perl
LPW: Beginners PerlLPW: Beginners Perl
LPW: Beginners Perl
 
PHP Strings and Patterns
PHP Strings and PatternsPHP Strings and Patterns
PHP Strings and Patterns
 
Intermediate Perl
Intermediate PerlIntermediate Perl
Intermediate Perl
 
String variable in php
String variable in phpString variable in php
String variable in php
 
Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
 
Scripting3
Scripting3Scripting3
Scripting3
 
Perl 101 - The Basics of Perl Programming
Perl  101 - The Basics of Perl ProgrammingPerl  101 - The Basics of Perl Programming
Perl 101 - The Basics of Perl Programming
 
You Can Do It! Start Using Perl to Handle Your Voyager Needs
You Can Do It! Start Using Perl to Handle Your Voyager NeedsYou Can Do It! Start Using Perl to Handle Your Voyager Needs
You Can Do It! Start Using Perl to Handle Your Voyager Needs
 
Introduction to Perl Best Practices
Introduction to Perl Best PracticesIntroduction to Perl Best Practices
Introduction to Perl Best Practices
 
DBIx::Class beginners
DBIx::Class beginnersDBIx::Class beginners
DBIx::Class beginners
 

Semelhante a Working with text, Regular expressions

Lecture 23
Lecture 23Lecture 23
Lecture 23
rhshriva
 
Regular Expressions grep and egrep
Regular Expressions grep and egrepRegular Expressions grep and egrep
Regular Expressions grep and egrep
Tri Truong
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013
Ben Brumfield
 
PERL Regular Expression
PERL Regular ExpressionPERL Regular Expression
PERL Regular Expression
Binsent Ribera
 
Eloquent Ruby chapter 4 - Find The Right String with Regular Expression
Eloquent Ruby chapter 4 - Find The Right String with Regular ExpressionEloquent Ruby chapter 4 - Find The Right String with Regular Expression
Eloquent Ruby chapter 4 - Find The Right String with Regular Expression
Kuyseng Chhoeun
 

Semelhante a Working with text, Regular expressions (20)

Lecture 23
Lecture 23Lecture 23
Lecture 23
 
Bioinformatics p2-p3-perl-regexes v2014
Bioinformatics p2-p3-perl-regexes v2014Bioinformatics p2-p3-perl-regexes v2014
Bioinformatics p2-p3-perl-regexes v2014
 
regex.ppt
regex.pptregex.ppt
regex.ppt
 
Bioinformatica p2-p3-introduction
Bioinformatica p2-p3-introductionBioinformatica p2-p3-introduction
Bioinformatica p2-p3-introduction
 
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekingeBioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
Bioinformatics p2-p3-perl-regexes v2013-wim_vancriekinge
 
Basta mastering regex power
Basta mastering regex powerBasta mastering regex power
Basta mastering regex power
 
Php Chapter 4 Training
Php Chapter 4 TrainingPhp Chapter 4 Training
Php Chapter 4 Training
 
Regular Expressions grep and egrep
Regular Expressions grep and egrepRegular Expressions grep and egrep
Regular Expressions grep and egrep
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013
 
PERL Regular Expression
PERL Regular ExpressionPERL Regular Expression
PERL Regular Expression
 
Bioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introduction
 
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
Bioinformatica: Esercizi su Perl, espressioni regolari e altre amenità (BMR G...
 
Regular Expressions
Regular ExpressionsRegular Expressions
Regular Expressions
 
Regular Expression
Regular ExpressionRegular Expression
Regular Expression
 
Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions Course 102: Lecture 13: Regular Expressions
Course 102: Lecture 13: Regular Expressions
 
Regexp secrets
Regexp secretsRegexp secrets
Regexp secrets
 
Tutorial on Regular Expression in Perl (perldoc Perlretut)
Tutorial on Regular Expression in Perl (perldoc Perlretut)Tutorial on Regular Expression in Perl (perldoc Perlretut)
Tutorial on Regular Expression in Perl (perldoc Perlretut)
 
Eloquent Ruby chapter 4 - Find The Right String with Regular Expression
Eloquent Ruby chapter 4 - Find The Right String with Regular ExpressionEloquent Ruby chapter 4 - Find The Right String with Regular Expression
Eloquent Ruby chapter 4 - Find The Right String with Regular Expression
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressions
 
Looking for Patterns
Looking for PatternsLooking for Patterns
Looking for Patterns
 

Mais de Krasimir Berov (Красимир Беров)

Mais de Krasimir Berov (Красимир Беров) (11)

Хешове
ХешовеХешове
Хешове
 
Списъци и масиви
Списъци и масивиСписъци и масиви
Списъци и масиви
 
Скаларни типове данни
Скаларни типове данниСкаларни типове данни
Скаларни типове данни
 
Въведение в Perl
Въведение в PerlВъведение в Perl
Въведение в Perl
 
System Programming and Administration
System Programming and AdministrationSystem Programming and Administration
System Programming and Administration
 
Network programming
Network programmingNetwork programming
Network programming
 
Processes and threads
Processes and threadsProcesses and threads
Processes and threads
 
Working with databases
Working with databasesWorking with databases
Working with databases
 
IO Streams, Files and Directories
IO Streams, Files and DirectoriesIO Streams, Files and Directories
IO Streams, Files and Directories
 
Syntax
SyntaxSyntax
Syntax
 
Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Working with text, Regular expressions

  • 1. Perl Programming Course Working with text Regular expressions Krassimir Berov I-can.eu
  • 2. Contents 1. Simple word matching 2. Character classes 3. Matching this or that 4. Grouping 5. Extracting matches 6. Matching repetitions 7. Search and replace 8. The split operator
  • 3. Simple word matching • It's all about identifying patterns in text • The simplest regex is simply a word – a string of characters. • A regex consisting of a word matches any string that contains that word • The sense of the match can be reversed by using !~ operator my $string ='some probably big string containing just about anything in it'; print "found 'string'n" if $string =~ /string/; print "it is not about dogsn" if $string !~ /dog/;
  • 4. Simple word matching (2) • The literal string in the regex can be replaced by a variable • If matching against $_ , the $_ =~ part can be omitted: my $string ='stringify this world'; my $word = 'string'; my $animal = 'dog'; print "found '$word'n" if $string =~ /$word/; print "it is not about ${animal}sn" if $string !~ /$animal/; for('dog','string','dog'){ print "$wordn" if /$word/ }
  • 5. Simple word matching (3) • The // default delimiters for a match can be changed to arbitrary delimiters by putting an 'm' in front • Regexes must match a part of the string exactly in order for the statement to be true my $string ='Stringify this world!'; my $word = 'string'; my $animal = 'dog'; print "found '$word'n" if $string =~ m#$word#; print "found '$word' in any casen" if $string =~ m#$word#i; print "it is not about ${animal}sn" if $string !~ m($animal); for('dog','string','Dog'){ local $=$/; print if m|$animal| }
  • 6. Simple word matching (4) • perl will always match at the earliest possible point in the string my $string ='Stringify this stringy world!'; my $word = 'string'; print "found '$word' in any casen" if $string =~ m{$word}i; • Some characters, called metacharacters, are reserved for use in regex notation. The metacharacters are (14): { } [ ] ( ) ^ $ . | * + ? • A metacharacter can be matched by putting a backslash before it print "The string n'$string'n contains a DOTn" if $string =~ m|.|;
  • 7. Simple word matching (5) • Non-printable ASCII characters are represented by escape sequences • Arbitrary bytes are represented by octal escape sequences use utf8; binmode(STDOUT, ':utf8') if $ENV{LANG} =~/UTF-8/; $=$/; my $string ="containsrn Then we have sometttabs. б"; print 'matched б(x{431})' if $string =~ /x{431}/; print 'matched б' if $string =~/б/; print 'matched rn' if $string =~/rn/; print 'The string was:"' . $string.'"';
  • 8. Simple word matching (6) • To specify where it should match, use the anchor metacharacters ^ and $ . use strict; use warnings;$=$/; my $string ='A probably long chunk of text containing strings'; print 'matched "A"' if $string =~ /^A/; print 'matched "strings"' if $string =~ /strings$/; print 'matched "A", matched "strings" and something in between' if $string =~ /^A.*?strings$/;
  • 9. Character classes • A character class allows a set of possible characters, to match • Character classes are denoted by brackets [ ] with the set of characters to be possibly matched inside • The special characters for a character class are - ] ^ $ and are matched using an escape • The special character '-' acts as a range operator within character classes so you can write [0-9] and [a-z]
  • 10. Character classes • Example use strict; use warnings;$=$/; my $string ='A probably long chunk of text containing strings'; my $thing = 'ong ung ang enanything'; my $every = 'iiiiii'; my $nums = 'I have 4325 Euro'; my $class = 'dog'; print 'matched any of a, b or c' if $string =~ /[abc]/; for($thing, $every, $string){ print 'ingy brrrings nothing using: '.$_ if /[$class]/ } print $nums if $nums =~/[0-9]/;
  • 11. Character classes • Perl has several abbreviations for common character classes • d is a digit – [0-9] • s is a whitespace character – [ trnf] • w is a word character (alphanumeric or _) – [0-9a-zA-Z_] • D is a negated d – any character but a digit [^0-9] • S is a negated s; it represents any non-whitespace character [^s] • W is a negated w – any non-word character • The period '.' matches any character but "n" • The dswDSW inside and outside of character classes • The word anchor b matches a boundary between a word character and a non-word character wW or Ww
  • 12. Character classes • Example my $digits ="here are some digits3434 and then "; print 'found digit' if $digits =~/d/; print 'found alphanumeric' if $digits =~/w/; print 'found space' if $digits =~/s/; print 'digit followed by space, followed by letter' if $digits =~/ds[A-z]/;
  • 13. Matching this or that • We can match different character strings with the alternation metacharacter '|' • perl will try to match the regex at the earliest possible point in the string my $digits ="here are some digits3434 and then "; print 'found "are" or "and"' if $digits =~/are|and/;
  • 14. Extracting matches • The grouping metacharacters () also allow the extraction of the parts of a string that matched • For each grouping, the part that matched inside goes into the special variables $1 , $2 , etc. • They can be used just as ordinary variables
  • 15. Extracting matches • The grouping metacharacters () allow a part of a regex to be treated as a single unit • If the groupings in a regex are nested, $1 gets the group with the leftmost opening parenthesis, $2 the next opening parenthesis, etc. my $digits ="here are some digits3434 and then678 "; print 'found a letter followed by leters or digits":'.$1 if $digits =~/[a-z]([a-z]|d+)/; print 'found a letter followed by digits":'.$1 if $digits =~/([a-z](d+))/; # $1 $2 print 'found letters followed by digits":'.$1 if $digits =~/([a-z]+)(d+)/; # $1 $2
  • 16. Matching repetitions • The quantifier metacharacters ?, * , + , and {} allow us to determine the number of repeats of a portion of a regex • Quantifiers are put immediately after the character, character class, or grouping • a? = match 'a' 1 or 0 times • a* = match 'a' 0 or more times, i.e., any number of times • a+ = match 'a' 1 or more times, i.e., at least once • a{n,m} = match at least n times, but not more than m times • a{n,} = match at least n or more times • a{n} = match exactly n times
  • 17. Matching repetitions use strict; use warnings;$=$/; my $digits ="here are some digits3434 and then678 "; print 'found some letters followed by leters or digits":'.$1 .$2 if $digits =~/([a-z]{2,})(w+)/; print 'found three letter followed by digits":'.$1 .$2 if $digits =~/([a-z]{3}(d+))/; print 'found up to four letters followed by digits":'. $1 .$2 if $digits =~/([a-z]{1,4})(d+)/;
  • 18. Matching repetitions • Greeeedy use strict; use warnings;$=$/; my $digits ="here are some digits3434 and then678 "; print 'found as much as possible letters followed by digits":'.$1 .$2 if $digits =~/([a-z]*)(d+)/;
  • 19. Search and replace • Search and replace is performed using s/regex/replacement/modifiers. • The replacement is a Perl double quoted string that replaces in the string whatever is matched with the regex . • The operator =~ is used to associate a string with s///. • If matching against $_ , the $_ =~ can be dropped. • If there is a match, s/// returns the number of substitutions made, otherwise it returns false
  • 20. Search and replace • The matched variables $1 , $2 , etc. are immediately available for use in the replacement expression. • With the global modifier, s///g will search and replace all occurrences of the regex in the string • The evaluation modifier s///e wraps an eval{...} around the replacement string and the evaluated result is substituted for the matched substring. • s/// can use other delimiters, such as s!!! and s{}{}, and even s{}// • If single quotes are used s''', then the regex and replacement are treated as single quoted strings
  • 21. Search and replace • Example #TODO....
  • 22. The split operator • split /regex/, string splits string into a list of substrings and returns that list • The regex determines the character sequence that string is split with respect to #TODO....
  • 23. Regular expressions • Resources • perlrequick - Perl regular expressions quick start • perlre - Perl regular expressions • perlreref - Perl Regular Expressions Reference • Beginning Perl (Chapter 5 – Regular Expressions)