SlideShare uma empresa Scribd logo
1 de 57
Regular Expressions
      Redux
Scope

• medium to advanced
• 30 minutes
• performance / backtracking irrelevant
• no compatibility charts (yet)
TOC

• basic matching, quantifiers
• character classes, types, properties, anchors
• groups, options, replace string
• look-ahead/behind
• subexpressions
RE overview
RE overview

              match “foo”           replace with “bar”
  Perl        /foo/     (on $_)        s/foo/bar/ (on $_)

Javascript            /foo/       “foolish”.replace(/foo/, “bar”)

   Vi                 /foo/                 :s/foo/bar/

TextMate      ⌘-F, Find: foo       ⌘-F Find: foo, Replace: bar
RE overview

              match “foo”           replace with “bar”
  Perl        /foo/     (on $_)        s/foo/bar/ (on $_)

Javascript            /foo/       “foolish”.replace(/foo/, “bar”)

   Vi                 /foo/                 :s/foo/bar/

TextMate      ⌘-F, Find: foo       ⌘-F Find: foo, Replace: bar
RE overview

              match “foo”           replace with “bar”
  Perl        /foo/     (on $_)        s/foo/bar/ (on $_)

Javascript            /foo/       “foolish”.replace(/foo/, “bar”)

   Vi                 /foo/                 :s/foo/bar/

TextMate      ⌘-F, Find: foo       ⌘-F Find: foo, Replace: bar
Quantifiers
Quantifiers
• classic greedy: ?, *, +
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
  •   ? == {0,1}
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
  •   ? == {0,1}

  •   * == {0,}
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
  •   ? == {0,1}

  •   * == {0,}

  •   + == {1,}
Quantifiers
• classic greedy: ?, *, +
• specific:{1,5}, {,5}
  •   ? == {0,1}

  •   * == {0,}

  •   + == {1,}

• non-greedy: ??, *?, +?, {5,7}?
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

              /reveal(.*)plain/
             /reveal(.*?)plain/
                  /t.{2,3}t/
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

              /reveal(.*)plain/
             /reveal(.*?)plain/
                  /t.{2,3}t/
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

              /reveal(.*)plain/
             /reveal(.*?)plain/
                  /t.{2,3}t/
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

              /reveal(.*)plain/
             /reveal(.*?)plain/
                  /t.{2,3}t/
Character Classes /
    Properties
Character Classes /
      Properties
• [0-9a-z]   (classes)
Character Classes /
      Properties
• [0-9a-z]     (classes)
 •   +420[0-9]{9} = simplified czech phone nr.
Character Classes /
      Properties
• [0-9a-z]      (classes)
 •   +420[0-9]{9} = simplified czech phone nr.

 •   don’t: [A-z0-]
Character Classes /
      Properties
• [0-9a-z]       (classes)
  •   +420[0-9]{9} = simplified czech phone nr.

  •   don’t: [A-z0-]

• [a-z&&[^j-n]] == [a-io-z]
Character Classes /
      Properties
• [0-9a-z]       (classes)
  •   +420[0-9]{9} = simplified czech phone nr.

  •   don’t: [A-z0-]

• [a-z&&[^j-n]] == [a-io-z]
• p{Upper} (properties)
Character Classes /
      Properties
• [0-9a-z]       (classes)
  •   +420[0-9]{9} = simplified czech phone nr.

  •   don’t: [A-z0-]

• [a-z&&[^j-n]] == [a-io-z]
• p{Upper} (properties)
  •   works great on Unicode text (Latin,Katakana)
Character Classes /
      Properties
• [0-9a-z]       (classes)
  •   +420[0-9]{9} = simplified czech phone nr.

  •   don’t: [A-z0-]

• [a-z&&[^j-n]] == [a-io-z]
• p{Upper} (properties)
  •   works great on Unicode text (Latin,Katakana)

• [:alnum:], [:^space:] (POSIX bracket)
Character Types
Character Types
• . == anything (apart from newline)
Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
  •   more in unicode
Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
  •   more in unicode

• w == word char == cca [0-9a-zA-Z_]
  •   is complicated in unicode
Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
  •   more in unicode

• w == word char == cca [0-9a-zA-Z_]
  •   is complicated in unicode

• d == digit == [0-9]
  •   h == hexadecimal digit == [0-9a-fA-F]
Character Types
• . == anything (apart from newline)
• s == space == [tnvfr ]
  •   more in unicode

• w == word char == cca [0-9a-zA-Z_]
  •   is complicated in unicode

• d == digit == [0-9]
  •   h == hexadecimal digit == [0-9a-fA-F]

• SWD == [^s][^w][^d]
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

           /b[w&&[^aA]]+b/
              /W{2,}w+b/
Example
This reveals that plain text is in fact the
technical user's way to regard a file or a
sequence of bytes. In this sense, there is no
plain text.

           /b[w&&[^aA]]+b/
              /W{2,}w+b/
Anchors
Anchors

• ^ - begining (line, string)
Anchors

• ^ - begining (line, string)
• $ - end (line, string)
Anchors

• ^ - begining (line, string)
• $ - end (line, string)
• b - word boundary ~ wW (almost)
 •   b.{5}b != Ww{5}W
Anchors

• ^ - begining (line, string)
• $ - end (line, string)
• b - word boundary ~ wW (almost)
 •   b.{5}b != Ww{5}W

• zero width!
Options
Options
• /foo/imsx
 •   i - case insensitive

 •   m - multiline (^,$ represent start of string/file)

 •   s - single line (. matches newlines)

 •   x - extended!

 •   g - global
Options
• /foo/imsx
  •   i - case insensitive

  •   m - multiline (^,$ represent start of string/file)

  •   s - single line (. matches newlines)

  •   x - extended!

  •   g - global

• can be written inline
  •   (?imsx-imsx)

  •   (?imsx-imsx:...)
Options
• /foo/imsx
  •   i - case insensitive

  •   m - multiline (^,$ represent start of string/file)

  •   s - single line (. matches newlines)

  •   x - extended!

  •   g - global                      (?x-i)
                                         #this is cool
• can be written inline                  (
                                            foo #my important value
  •                                         | #don't forget the alternative
      (?imsx-imsx)
                                            bar
  •                                      ) # result equals to (foo|bar)
      (?imsx-imsx:...)
Groups/Replacing
Groups/Replacing
• (...) - matched group
Groups/Replacing
• (...) - matched group
• $1 - $9
  •   alternatively 1 - 9 (not recommended)
Groups/Replacing
• (...) - matched group
• $1 - $9
  •   alternatively 1 - 9 (not recommended)

• nested groups ordered by left bracket
Groups/Replacing
• (...) - matched group
• $1 - $9
  •   alternatively 1 - 9 (not recommended)

• nested groups ordered by left bracket
• (?:...) - non-captured group
  •   useful for (?:foo)+ or (?:foo|bar)
Example
quot;foobarmanquot;.replace(
  /(?:f)((o)+)(bar)|(baz|man)/g,
  '$1, $2, $3, $4, $5')
Example
quot;foobarmanquot;.replace(
  /(?:f)((o)+)(bar)|(baz|man)/g,
  '$1, $2, $3, $4, $5')

    • foobar
      •   1 -- oo

      •   2 -- o

      •   3 -- bar

      •   4 --
Example
quot;foobarmanquot;.replace(
  /(?:f)((o)+)(bar)|(baz|man)/g,
  '$1, $2, $3, $4, $5')

    • foobar                       • man
      •                             •
          1 -- oo                       1 --

      •                             •
          2 -- o                        2 --

      •                             •
          3 -- bar                      3 --

      •                             •
          4 --                          4 -- man
Look-ahead/behind
• defines custom zero-width anchors
Look-ahead/behind
• defines custom zero-width anchors
                   positive negative

          ahead     (?=...)   (?!...)

          behind   (?<=...)   (?<!...)
Example

zdenek@gooddata.com
   /.*?@gooddata/


zdenek@gooddata.com
 /.*?(?=@gooddata)/
Recursive RE

• very important!
 •   quote & bracket matching

 •   technically not part of regular grammar

• two styles
 •   g<name> or g<n> - TextMate

 •   (?R) - Perl
Example
(?x:

 ( # match the initial opening parenthesis

 # Now make a named group 'balanced' which
     # matches a balanced substring.

 (?<balanced>

 
 [^()] # A balanced substring is either something
             # that is not a parenthesis:

 
 | # …or a parenthesised string:

 
 ( # A parenthesised string begins with an opening parenthesis

 
 
 g<balanced>* # …followed by a sequence of balanced substrings

 
 ) # …and ends with a closing parenthesis

 )* # Look for a sequence of balanced substrings

 ) # Finally, the outer closing parenthesis
)
Example
(?x:

 ( # match the initial opening parenthesis

 # Now make a named group 'balanced' which
     # matches a balanced substring.

 (?<balanced>

 
 [^()] # A balanced substring is either something
             # that is not a parenthesis:

 
 | # …or a parenthesised string:

 
 ( # A parenthesised string begins with an opening parenthesis

 
 
 g<balanced>* # …followed by a sequence of balanced substrings

 
 ) # …and ends with a closing parenthesis

 )* # Look for a sequence of balanced substrings

 ) # Finally, the outer closing parenthesis
)

or: (([^()]|(?R))*)

Mais conteúdo relacionado

Destaque

Brno Perl Mongers 28.5.2015 - Perl family by mj41
Brno Perl Mongers 28.5.2015 - Perl family by mj41Brno Perl Mongers 28.5.2015 - Perl family by mj41
Brno Perl Mongers 28.5.2015 - Perl family by mj41Michal Jurosz
 
Budoucnost Web Aplikaci
Budoucnost Web AplikaciBudoucnost Web Aplikaci
Budoucnost Web AplikaciJakub Nesetril
 
Avoiding API Waterfalls
Avoiding API WaterfallsAvoiding API Waterfalls
Avoiding API WaterfallsJakub Nesetril
 
Consuming API description languages - Refract & Minim
Consuming API description languages - Refract & MinimConsuming API description languages - Refract & Minim
Consuming API description languages - Refract & MinimJakub Nesetril
 
NodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time WebNodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time WebJakub Nesetril
 
Introduction to GoodData BI PaaS
Introduction to GoodData BI PaaSIntroduction to GoodData BI PaaS
Introduction to GoodData BI PaaSJakub Nesetril
 
Introduction to node.js
Introduction to node.jsIntroduction to node.js
Introduction to node.jsJakub Nesetril
 
GoodData: One Stop Shop for Analytics
GoodData: One Stop Shop for AnalyticsGoodData: One Stop Shop for Analytics
GoodData: One Stop Shop for AnalyticsJakub Nesetril
 
Real-time Web a NodeJS
Real-time Web a NodeJSReal-time Web a NodeJS
Real-time Web a NodeJSJakub Nesetril
 

Destaque (20)

Brno Perl Mongers 28.5.2015 - Perl family by mj41
Brno Perl Mongers 28.5.2015 - Perl family by mj41Brno Perl Mongers 28.5.2015 - Perl family by mj41
Brno Perl Mongers 28.5.2015 - Perl family by mj41
 
Budoucnost Web Aplikaci
Budoucnost Web AplikaciBudoucnost Web Aplikaci
Budoucnost Web Aplikaci
 
Startup Accelerators
Startup AcceleratorsStartup Accelerators
Startup Accelerators
 
Harmony in API Design
Harmony in API DesignHarmony in API Design
Harmony in API Design
 
Avoiding API Waterfalls
Avoiding API WaterfallsAvoiding API Waterfalls
Avoiding API Waterfalls
 
Consuming API description languages - Refract & Minim
Consuming API description languages - Refract & MinimConsuming API description languages - Refract & Minim
Consuming API description languages - Refract & Minim
 
NodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time WebNodeJS, CoffeeScript & Real-time Web
NodeJS, CoffeeScript & Real-time Web
 
Post-REST Manifesto
Post-REST ManifestoPost-REST Manifesto
Post-REST Manifesto
 
Introduction to GoodData BI PaaS
Introduction to GoodData BI PaaSIntroduction to GoodData BI PaaS
Introduction to GoodData BI PaaS
 
Art of Building APIs
Art of Building APIsArt of Building APIs
Art of Building APIs
 
REST API tools
REST API toolsREST API tools
REST API tools
 
Introduction to node.js
Introduction to node.jsIntroduction to node.js
Introduction to node.js
 
GoodData: One Stop Shop for Analytics
GoodData: One Stop Shop for AnalyticsGoodData: One Stop Shop for Analytics
GoodData: One Stop Shop for Analytics
 
Pushdown autometa
Pushdown autometaPushdown autometa
Pushdown autometa
 
Let's Have a Cup of CoffeeScript
Let's Have a Cup of CoffeeScriptLet's Have a Cup of CoffeeScript
Let's Have a Cup of CoffeeScript
 
Node at Apiary.io
Node at Apiary.ioNode at Apiary.io
Node at Apiary.io
 
API Design Workflows
API Design WorkflowsAPI Design Workflows
API Design Workflows
 
Pda
PdaPda
Pda
 
Apiary
ApiaryApiary
Apiary
 
Real-time Web a NodeJS
Real-time Web a NodeJSReal-time Web a NodeJS
Real-time Web a NodeJS
 

Semelhante a Advanced Regular Expressions Redux

Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to PerlSway Wang
 
Regular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And BeyondRegular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And BeyondMax Shirshin
 
Regexp secrets
Regexp secretsRegexp secrets
Regexp secretsHiro Asari
 
Perl 5.10 on OSDC.tw 2009
Perl 5.10 on OSDC.tw 2009Perl 5.10 on OSDC.tw 2009
Perl 5.10 on OSDC.tw 2009scweng
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressionsBen Brumfield
 
Out with Regex, In with Tokens
Out with Regex, In with TokensOut with Regex, In with Tokens
Out with Regex, In with Tokensscoates
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Ben Brumfield
 
Regular expressions-ada-2018
Regular expressions-ada-2018Regular expressions-ada-2018
Regular expressions-ada-2018Emma Burrows
 
[Erlang LT] Regexp Perl And Port
[Erlang LT] Regexp Perl And Port[Erlang LT] Regexp Perl And Port
[Erlang LT] Regexp Perl And PortKeiichi Daiba
 
Erlang with Regexp Perl And Port
Erlang with Regexp Perl And PortErlang with Regexp Perl And Port
Erlang with Regexp Perl And PortKeiichi Daiba
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Aslak Hellesøy
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Aslak Hellesøy
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Aslak Hellesøy
 
And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...Codemotion
 

Semelhante a Advanced Regular Expressions Redux (20)

Introduction to Perl
Introduction to PerlIntroduction to Perl
Introduction to Perl
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
perl-pocket
perl-pocketperl-pocket
perl-pocket
 
Regular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And BeyondRegular Expressions: JavaScript And Beyond
Regular Expressions: JavaScript And Beyond
 
Perl Presentation
Perl PresentationPerl Presentation
Perl Presentation
 
Lecture2 B
Lecture2 BLecture2 B
Lecture2 B
 
Regexp secrets
Regexp secretsRegexp secrets
Regexp secrets
 
Perl 5.10 on OSDC.tw 2009
Perl 5.10 on OSDC.tw 2009Perl 5.10 on OSDC.tw 2009
Perl 5.10 on OSDC.tw 2009
 
Introduction to regular expressions
Introduction to regular expressionsIntroduction to regular expressions
Introduction to regular expressions
 
Out with Regex, In with Tokens
Out with Regex, In with TokensOut with Regex, In with Tokens
Out with Regex, In with Tokens
 
Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013Introduction to Regular Expressions RootsTech 2013
Introduction to Regular Expressions RootsTech 2013
 
Regular expressions-ada-2018
Regular expressions-ada-2018Regular expressions-ada-2018
Regular expressions-ada-2018
 
[Erlang LT] Regexp Perl And Port
[Erlang LT] Regexp Perl And Port[Erlang LT] Regexp Perl And Port
[Erlang LT] Regexp Perl And Port
 
Erlang with Regexp Perl And Port
Erlang with Regexp Perl And PortErlang with Regexp Perl And Port
Erlang with Regexp Perl And Port
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009
 
Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009Ruby presentasjon på NTNU 22 april 2009
Ruby presentasjon på NTNU 22 april 2009
 
And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...And now you have two problems. Ruby regular expressions for fun and profit by...
And now you have two problems. Ruby regular expressions for fun and profit by...
 

Último

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 

Último (20)

Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 

Advanced Regular Expressions Redux

  • 2. Scope • medium to advanced • 30 minutes • performance / backtracking irrelevant • no compatibility charts (yet)
  • 3. TOC • basic matching, quantifiers • character classes, types, properties, anchors • groups, options, replace string • look-ahead/behind • subexpressions
  • 5. RE overview match “foo” replace with “bar” Perl /foo/ (on $_) s/foo/bar/ (on $_) Javascript /foo/ “foolish”.replace(/foo/, “bar”) Vi /foo/ :s/foo/bar/ TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
  • 6. RE overview match “foo” replace with “bar” Perl /foo/ (on $_) s/foo/bar/ (on $_) Javascript /foo/ “foolish”.replace(/foo/, “bar”) Vi /foo/ :s/foo/bar/ TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
  • 7. RE overview match “foo” replace with “bar” Perl /foo/ (on $_) s/foo/bar/ (on $_) Javascript /foo/ “foolish”.replace(/foo/, “bar”) Vi /foo/ :s/foo/bar/ TextMate ⌘-F, Find: foo ⌘-F Find: foo, Replace: bar
  • 10. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5}
  • 11. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1}
  • 12. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1} • * == {0,}
  • 13. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1} • * == {0,} • + == {1,}
  • 14. Quantifiers • classic greedy: ?, *, + • specific:{1,5}, {,5} • ? == {0,1} • * == {0,} • + == {1,} • non-greedy: ??, *?, +?, {5,7}?
  • 15. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
  • 16. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
  • 17. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
  • 18. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /reveal(.*)plain/ /reveal(.*?)plain/ /t.{2,3}t/
  • 19. Character Classes / Properties
  • 20. Character Classes / Properties • [0-9a-z] (classes)
  • 21. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr.
  • 22. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-]
  • 23. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z]
  • 24. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z] • p{Upper} (properties)
  • 25. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z] • p{Upper} (properties) • works great on Unicode text (Latin,Katakana)
  • 26. Character Classes / Properties • [0-9a-z] (classes) • +420[0-9]{9} = simplified czech phone nr. • don’t: [A-z0-] • [a-z&&[^j-n]] == [a-io-z] • p{Upper} (properties) • works great on Unicode text (Latin,Katakana) • [:alnum:], [:^space:] (POSIX bracket)
  • 28. Character Types • . == anything (apart from newline)
  • 29. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode
  • 30. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode • w == word char == cca [0-9a-zA-Z_] • is complicated in unicode
  • 31. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode • w == word char == cca [0-9a-zA-Z_] • is complicated in unicode • d == digit == [0-9] • h == hexadecimal digit == [0-9a-fA-F]
  • 32. Character Types • . == anything (apart from newline) • s == space == [tnvfr ] • more in unicode • w == word char == cca [0-9a-zA-Z_] • is complicated in unicode • d == digit == [0-9] • h == hexadecimal digit == [0-9a-fA-F] • SWD == [^s][^w][^d]
  • 33. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /b[w&&[^aA]]+b/ /W{2,}w+b/
  • 34. Example This reveals that plain text is in fact the technical user's way to regard a file or a sequence of bytes. In this sense, there is no plain text. /b[w&&[^aA]]+b/ /W{2,}w+b/
  • 36. Anchors • ^ - begining (line, string)
  • 37. Anchors • ^ - begining (line, string) • $ - end (line, string)
  • 38. Anchors • ^ - begining (line, string) • $ - end (line, string) • b - word boundary ~ wW (almost) • b.{5}b != Ww{5}W
  • 39. Anchors • ^ - begining (line, string) • $ - end (line, string) • b - word boundary ~ wW (almost) • b.{5}b != Ww{5}W • zero width!
  • 41. Options • /foo/imsx • i - case insensitive • m - multiline (^,$ represent start of string/file) • s - single line (. matches newlines) • x - extended! • g - global
  • 42. Options • /foo/imsx • i - case insensitive • m - multiline (^,$ represent start of string/file) • s - single line (. matches newlines) • x - extended! • g - global • can be written inline • (?imsx-imsx) • (?imsx-imsx:...)
  • 43. Options • /foo/imsx • i - case insensitive • m - multiline (^,$ represent start of string/file) • s - single line (. matches newlines) • x - extended! • g - global (?x-i) #this is cool • can be written inline ( foo #my important value • | #don't forget the alternative (?imsx-imsx) bar • ) # result equals to (foo|bar) (?imsx-imsx:...)
  • 46. Groups/Replacing • (...) - matched group • $1 - $9 • alternatively 1 - 9 (not recommended)
  • 47. Groups/Replacing • (...) - matched group • $1 - $9 • alternatively 1 - 9 (not recommended) • nested groups ordered by left bracket
  • 48. Groups/Replacing • (...) - matched group • $1 - $9 • alternatively 1 - 9 (not recommended) • nested groups ordered by left bracket • (?:...) - non-captured group • useful for (?:foo)+ or (?:foo|bar)
  • 50. Example quot;foobarmanquot;.replace( /(?:f)((o)+)(bar)|(baz|man)/g, '$1, $2, $3, $4, $5') • foobar • 1 -- oo • 2 -- o • 3 -- bar • 4 --
  • 51. Example quot;foobarmanquot;.replace( /(?:f)((o)+)(bar)|(baz|man)/g, '$1, $2, $3, $4, $5') • foobar • man • • 1 -- oo 1 -- • • 2 -- o 2 -- • • 3 -- bar 3 -- • • 4 -- 4 -- man
  • 53. Look-ahead/behind • defines custom zero-width anchors positive negative ahead (?=...) (?!...) behind (?<=...) (?<!...)
  • 54. Example zdenek@gooddata.com /.*?@gooddata/ zdenek@gooddata.com /.*?(?=@gooddata)/
  • 55. Recursive RE • very important! • quote & bracket matching • technically not part of regular grammar • two styles • g<name> or g<n> - TextMate • (?R) - Perl
  • 56. Example (?x: ( # match the initial opening parenthesis # Now make a named group 'balanced' which # matches a balanced substring. (?<balanced> [^()] # A balanced substring is either something # that is not a parenthesis: | # …or a parenthesised string: ( # A parenthesised string begins with an opening parenthesis g<balanced>* # …followed by a sequence of balanced substrings ) # …and ends with a closing parenthesis )* # Look for a sequence of balanced substrings ) # Finally, the outer closing parenthesis )
  • 57. Example (?x: ( # match the initial opening parenthesis # Now make a named group 'balanced' which # matches a balanced substring. (?<balanced> [^()] # A balanced substring is either something # that is not a parenthesis: | # …or a parenthesised string: ( # A parenthesised string begins with an opening parenthesis g<balanced>* # …followed by a sequence of balanced substrings ) # …and ends with a closing parenthesis )* # Look for a sequence of balanced substrings ) # Finally, the outer closing parenthesis ) or: (([^()]|(?R))*)

Notas do Editor

  1. escaping???
  2. escaping???
  3. escaping???
  4. examples! possessive (?+, *+, ++)
  5. examples! possessive (?+, *+, ++)
  6. examples! possessive (?+, *+, ++)
  7. examples! possessive (?+, *+, ++)
  8. examples! possessive (?+, *+, ++)
  9. examples! possessive (?+, *+, ++)
  10. unicode compat table!
  11. unicode compat table!
  12. unicode compat table!
  13. unicode compat table!
  14. unicode compat table!
  15. unicode compat table!
  16. unicode compat table!
  17. notice the space at the end, capital reverses
  18. notice the space at the end, capital reverses
  19. notice the space at the end, capital reverses
  20. notice the space at the end, capital reverses
  21. notice the space at the end, capital reverses
  22. how about /g??
  23. how about /g??
  24. how about /g??