Regex lecture

Jun Shimizu
Jun ShimizuCEO / DIRECTOR em PT. Buzoo Indonesia
By Niko Adrianus Yuwono
BUZOO PHP TEAM
REGULAR EXPRESSIONS LECTURE
What is Regular Expressions?
 Regular Expressions or Regex (We’ll mostly use
Regex to call it in this presentation) are a
powerful tool for examining and modifying text.
 Regex use general pattern notation to allow you
describe and parse text.
 PHP supports two different types of regular
expressions: POSIX-extended and Perl-
Compatible Regular Expressions (PCRE). But
we’ll focus on PCRE in this lecture.
Delimiters
 When using PCRE functions we need to enclose
the pattern using delimiters.
 Often used delimiters are forward slashes (/),
hash signs (#) and tildes (~ ).
 Example of usage :
 /([^/ | ^-]+).html/
 /</span>(.*?)</span>/
Literal-Characters
 Literal characters are normal characters that
match themselves. Alphanumeric characters and
symbols are example of literal characters
 To difference between Meta-Characters and
Literal-Characters we need to add backslash ()
before the literal character to define that
character is a literal character not a meta
character
Meta-characters
 Meta-characters are the main power of regular
expressions, with meta-characters it’s possible to
encode alternatives and repetitions in the pattern.
 Meta-characters are divided into two type, meta-
characters outside class, and meta-characters
inside class.
Meta-characters Cont’d
 Here is list of meta-character that can work
outside a class :
  , ^ , $ , . , [ , ] , | , ( , ) , ? , * , + , { , }
 And this is the list of meta-character that work
inside a class :
  , ^ , -
Character Classes
 Character classes in Regex started by opening
square bracket ([) and closed by and closing
square bracket (])
 A character class matches a single character in
the subject; the character must be in the set of
characters defined by the class.
 Example :
 [a-z] will match any lowercase letter
 [^A-Z] will match a
ny character that is not a uppercase letter
Subpatterns
 Subpatterns are delimited by parentheses (round
brackets), which can be nested.
 Subpatterns can do two things :
1. It localizes a set of alternatives. For example,
the pattern hen(dy|rio|ri) matches one of the
words “hendy", “henrio", or “henri". Without the
parentheses, it would match “hendy", “rio" or the
“ri”.
2. It sets up the subpattern as a capturing
subpattern (as defined above).
Subpatterns Cont’d
 For example, if the string “kafji tinggi" is matched
against the pattern ((kafji|niko)
(tinggi|tampan)) the captured substrings are
“kafji tinggi", “kafji", and “tinggi", and are
numbered 1, 2, and 3.
 There are often times we don’t need capturing
functions. In that case we can add "?:“ after the
opening parenthesis.
Optional Items
 The question mark makes the preceding token in
the regular expression optional.
 Example : colou?r will match both
colour and color.
 You can also wrap a set of characters in
parenthesis to make them optional.
 Example : Jan(uary)? will match both Jan and
January.
Repetition
 There are two repetition characters, star ( * ) and
plus ( + ).
 Star ( * ) character will try to match the preceding
token zero or more times.
 Plus ( + ) character will try to match the preceding
token one or more times
 Example :
 [sS]+ will match any character one or more
 [sS]* will match any character zero or more
Limiting Repetition
 Sometimes we need to limit some repetition, to
achieve that we can use { } bracket.
 The syntax is {min,max} where min is a must and
you can empty the max but it’ll be counted as
infinity, and if you omit both the coma and max it’ll
repeat the token exactly min times.
 Example :
 ([A-Z]{3}|[0-9]{4}) will matches three letters or four
numbers
Greediness
 Greediness is a condition where the regex given
to option try to match the pattern or not to match
the pattern.
 But the regex will always try to match the pattern.
It can cause some trouble to us and will return an
unexpected result.
 For example the regex Feb 23(rd)? to the
string Today is Feb 23rd, 2003, the match will
always be Feb 23rd and not Feb 23.
Greediness Cont’d
 Example for repetition :
 You want to get HTML tag for crawling a website.
Usually new people will use <.+> to match the
HTML tag. But it will return a different result than
you expected. Let’s try to match that pattern with
this string -> “Saya <b>suka</b> makan”
 The result will be <b>suka</b>
 Why?
Greediness Cont’d
 That’s because of greediness, the pattern <.+>
will try to match dot ( . ) as many as possible.
 Let’s try to do it step by step.
 First the regex will try to search < from this string
“Saya <b>suka</b> makan” so Saya will be
skipped.
 Then after finding < it’ll try to run (.+) that means
to find any character one or more so it’ll read from
b until the end of string. Then it’ll backtracking
until the last > character that have been found so
the result will be <b>suka</b> not <b> and </b>
Laziness
 How to fix greediness problem? You can use
laziness by adding ? Question mark after the
repetition or question mark to make them lazy
 But there is also another alternative to laziness
that is negated character class.
 Example for previous question :
 <[^>]+> will match anything except > character
1 de 16

Recomendados

Textpad and Regular Expressions por
Textpad and Regular ExpressionsTextpad and Regular Expressions
Textpad and Regular ExpressionsOCSI
66.2K visualizações32 slides
The Power of Regular Expression: use in notepad++ por
The Power of Regular Expression: use in notepad++The Power of Regular Expression: use in notepad++
The Power of Regular Expression: use in notepad++Anjesh Tuladhar
83.1K visualizações18 slides
Introduction to Regular Expressions por
Introduction to Regular ExpressionsIntroduction to Regular Expressions
Introduction to Regular ExpressionsJesse Anderson
3K visualizações13 slides
Andrei's Regex Clinic por
Andrei's Regex ClinicAndrei's Regex Clinic
Andrei's Regex ClinicAndrei Zmievski
16K visualizações236 slides
Regular Expressions in PHP, MySQL by programmerblog.net por
Regular Expressions in PHP, MySQL by programmerblog.netRegular Expressions in PHP, MySQL by programmerblog.net
Regular Expressions in PHP, MySQL by programmerblog.netProgrammer Blog
181 visualizações10 slides
Unix por
UnixUnix
Unixlilututu
225 visualizações21 slides

Mais conteúdo relacionado

Mais procurados

Bioinformatica p2-p3-introduction por
Bioinformatica p2-p3-introductionBioinformatica p2-p3-introduction
Bioinformatica p2-p3-introductionProf. Wim Van Criekinge
2.8K visualizações86 slides
Regular Expressions por
Regular ExpressionsRegular Expressions
Regular ExpressionsSatya Narayana
12.3K visualizações67 slides
3.2 javascript regex por
3.2 javascript regex3.2 javascript regex
3.2 javascript regexJalpesh Vasa
417 visualizações21 slides
Regular expressions por
Regular expressionsRegular expressions
Regular expressionsRaghu nath
505 visualizações24 slides
Regular Expressions 2007 por
Regular Expressions 2007Regular Expressions 2007
Regular Expressions 2007Geoffrey Dunn
1.2K visualizações15 slides
Regular Expressions grep and egrep por
Regular Expressions grep and egrepRegular Expressions grep and egrep
Regular Expressions grep and egrepTri Truong
10.8K visualizações36 slides

Mais procurados(20)

Bioinformatica p2-p3-introduction por Prof. Wim Van Criekinge
Bioinformatica p2-p3-introductionBioinformatica p2-p3-introduction
Bioinformatica p2-p3-introduction
Prof. Wim Van Criekinge2.8K visualizações
Regular Expressions por Satya Narayana
Regular ExpressionsRegular Expressions
Regular Expressions
Satya Narayana12.3K visualizações
3.2 javascript regex por Jalpesh Vasa
3.2 javascript regex3.2 javascript regex
3.2 javascript regex
Jalpesh Vasa417 visualizações
Regular expressions por Raghu nath
Regular expressionsRegular expressions
Regular expressions
Raghu nath505 visualizações
Regular Expressions 2007 por Geoffrey Dunn
Regular Expressions 2007Regular Expressions 2007
Regular Expressions 2007
Geoffrey Dunn1.2K visualizações
Regular Expressions grep and egrep por Tri Truong
Regular Expressions grep and egrepRegular Expressions grep and egrep
Regular Expressions grep and egrep
Tri Truong10.8K visualizações
Regular expression por Larry Nung
Regular expressionRegular expression
Regular expression
Larry Nung777 visualizações
Regular Expression por Lambert Lum
Regular ExpressionRegular Expression
Regular Expression
Lambert Lum640 visualizações
NLP_KASHK:Regular Expressions por Hemantha Kulathilake
NLP_KASHK:Regular Expressions NLP_KASHK:Regular Expressions
NLP_KASHK:Regular Expressions
Hemantha Kulathilake432 visualizações
Regular Expressions 101 Introduction to Regular Expressions por Danny Bryant
Regular Expressions 101 Introduction to Regular ExpressionsRegular Expressions 101 Introduction to Regular Expressions
Regular Expressions 101 Introduction to Regular Expressions
Danny Bryant1.9K visualizações
Introduction_to_Regular_Expressions_in_R por Hellen Gakuruh
Introduction_to_Regular_Expressions_in_RIntroduction_to_Regular_Expressions_in_R
Introduction_to_Regular_Expressions_in_R
Hellen Gakuruh84 visualizações
Regular Expression por Mahzad Zahedi
Regular ExpressionRegular Expression
Regular Expression
Mahzad Zahedi2.4K visualizações
Regular expressions por Raj Gupta
Regular expressionsRegular expressions
Regular expressions
Raj Gupta2.2K visualizações
Java: Regular Expression por Masudul Haque
Java: Regular ExpressionJava: Regular Expression
Java: Regular Expression
Masudul Haque1.8K visualizações
16 Java Regex por wayn
16 Java Regex16 Java Regex
16 Java Regex
wayn4.7K visualizações
Regex posix por sana mateen
Regex posixRegex posix
Regex posix
sana mateen172 visualizações
Regex Presentation por arnolambert
Regex PresentationRegex Presentation
Regex Presentation
arnolambert4.7K visualizações
Don't Fear the Regex - CapitalCamp/GovDays 2014 por Sandy Smith
Don't Fear the Regex - CapitalCamp/GovDays 2014Don't Fear the Regex - CapitalCamp/GovDays 2014
Don't Fear the Regex - CapitalCamp/GovDays 2014
Sandy Smith1.1K visualizações
Regular expression por Rajon
Regular expressionRegular expression
Regular expression
Rajon 296 visualizações
Regular Expressions 101 por Raj Rajandran
Regular Expressions 101Regular Expressions 101
Regular Expressions 101
Raj Rajandran8.9K visualizações

Destaque

Engagement for a Modern Sales Team por
Engagement for a Modern Sales TeamEngagement for a Modern Sales Team
Engagement for a Modern Sales TeamDalia Asterbadi
506 visualizações16 slides
realSociable - Creating a need and changing sales flow por
realSociable - Creating a need and changing sales flowrealSociable - Creating a need and changing sales flow
realSociable - Creating a need and changing sales flowDalia Asterbadi
563 visualizações14 slides
Sales Methodologies - A quick guide to boosting success - realSociable por
Sales Methodologies - A quick guide to boosting success - realSociableSales Methodologies - A quick guide to boosting success - realSociable
Sales Methodologies - A quick guide to boosting success - realSociableDalia Asterbadi
1.2K visualizações18 slides
핑그래프(Fingra.ph) 모바일 광고 적용 사례 por
핑그래프(Fingra.ph) 모바일 광고 적용 사례핑그래프(Fingra.ph) 모바일 광고 적용 사례
핑그래프(Fingra.ph) 모바일 광고 적용 사례Fingra.ph
1.1K visualizações15 slides
Verma sons por
Verma sonsVerma sons
Verma sonsVerma Sons
352 visualizações12 slides
[PHP] Zend_Db (Zend Framework) por
[PHP] Zend_Db (Zend Framework)[PHP] Zend_Db (Zend Framework)
[PHP] Zend_Db (Zend Framework)Jun Shimizu
1.6K visualizações23 slides

Destaque(9)

Engagement for a Modern Sales Team por Dalia Asterbadi
Engagement for a Modern Sales TeamEngagement for a Modern Sales Team
Engagement for a Modern Sales Team
Dalia Asterbadi506 visualizações
realSociable - Creating a need and changing sales flow por Dalia Asterbadi
realSociable - Creating a need and changing sales flowrealSociable - Creating a need and changing sales flow
realSociable - Creating a need and changing sales flow
Dalia Asterbadi563 visualizações
Sales Methodologies - A quick guide to boosting success - realSociable por Dalia Asterbadi
Sales Methodologies - A quick guide to boosting success - realSociableSales Methodologies - A quick guide to boosting success - realSociable
Sales Methodologies - A quick guide to boosting success - realSociable
Dalia Asterbadi1.2K visualizações
핑그래프(Fingra.ph) 모바일 광고 적용 사례 por Fingra.ph
핑그래프(Fingra.ph) 모바일 광고 적용 사례핑그래프(Fingra.ph) 모바일 광고 적용 사례
핑그래프(Fingra.ph) 모바일 광고 적용 사례
Fingra.ph1.1K visualizações
Verma sons por Verma Sons
Verma sonsVerma sons
Verma sons
Verma Sons352 visualizações
[PHP] Zend_Db (Zend Framework) por Jun Shimizu
[PHP] Zend_Db (Zend Framework)[PHP] Zend_Db (Zend Framework)
[PHP] Zend_Db (Zend Framework)
Jun Shimizu1.6K visualizações
Slicing Up the Mobile Services Revenue Pie por Sam Gellar
Slicing Up the Mobile Services Revenue PieSlicing Up the Mobile Services Revenue Pie
Slicing Up the Mobile Services Revenue Pie
Sam Gellar5.3K visualizações
Design Pattern with Burger por Jun Shimizu
Design Pattern with BurgerDesign Pattern with Burger
Design Pattern with Burger
Jun Shimizu1.4K visualizações

Similar a Regex lecture

Maxbox starter20 por
Maxbox starter20Maxbox starter20
Maxbox starter20Max Kleiner
799 visualizações12 slides
Regex startup por
Regex startupRegex startup
Regex startupPayPal
718 visualizações21 slides
Adv. python regular expression by Rj por
Adv. python regular expression by RjAdv. python regular expression by Rj
Adv. python regular expression by RjShree M.L.Kakadiya MCA mahila college, Amreli
933 visualizações49 slides
Php String And Regular Expressions por
Php String  And Regular ExpressionsPhp String  And Regular Expressions
Php String And Regular Expressionsmussawir20
6.1K visualizações40 slides
Python - Regular Expressions por
Python - Regular ExpressionsPython - Regular Expressions
Python - Regular ExpressionsMukesh Tekwani
1.2K visualizações6 slides
Don't Fear the Regex WordCamp DC 2017 por
Don't Fear the Regex WordCamp DC 2017Don't Fear the Regex WordCamp DC 2017
Don't Fear the Regex WordCamp DC 2017Sandy Smith
342 visualizações36 slides

Similar a Regex lecture(20)

Maxbox starter20 por Max Kleiner
Maxbox starter20Maxbox starter20
Maxbox starter20
Max Kleiner799 visualizações
Regex startup por PayPal
Regex startupRegex startup
Regex startup
PayPal718 visualizações
Php String And Regular Expressions por mussawir20
Php String  And Regular ExpressionsPhp String  And Regular Expressions
Php String And Regular Expressions
mussawir206.1K visualizações
Python - Regular Expressions por Mukesh Tekwani
Python - Regular ExpressionsPython - Regular Expressions
Python - Regular Expressions
Mukesh Tekwani1.2K visualizações
Don't Fear the Regex WordCamp DC 2017 por Sandy Smith
Don't Fear the Regex WordCamp DC 2017Don't Fear the Regex WordCamp DC 2017
Don't Fear the Regex WordCamp DC 2017
Sandy Smith342 visualizações
Don't Fear the Regex LSP15 por Sandy Smith
Don't Fear the Regex LSP15Don't Fear the Regex LSP15
Don't Fear the Regex LSP15
Sandy Smith921 visualizações
Don't Fear the Regex - Northeast PHP 2015 por Sandy Smith
Don't Fear the Regex - Northeast PHP 2015Don't Fear the Regex - Northeast PHP 2015
Don't Fear the Regex - Northeast PHP 2015
Sandy Smith1.3K visualizações
Regular Expressions in Google Analytics por Shivani Singh
Regular Expressions in Google AnalyticsRegular Expressions in Google Analytics
Regular Expressions in Google Analytics
Shivani Singh199 visualizações
Regular expressions in oracle por Logan Palanisamy
Regular expressions in oracleRegular expressions in oracle
Regular expressions in oracle
Logan Palanisamy3.6K visualizações
Regular expressions por keeyre
Regular expressionsRegular expressions
Regular expressions
keeyre261 visualizações
Regular_Expressions.pptx por DurgaNayak4
Regular_Expressions.pptxRegular_Expressions.pptx
Regular_Expressions.pptx
DurgaNayak497 visualizações
Regular Expressions in Stata por John Ong'ala Lunalo
Regular Expressions in StataRegular Expressions in Stata
Regular Expressions in Stata
John Ong'ala Lunalo536 visualizações
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf por Bryan Alejos
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdfFUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
FUNDAMENTALS OF REGULAR EXPRESSION (RegEX).pdf
Bryan Alejos33 visualizações
Bioinformatica 06-10-2011-p2 introduction por Prof. Wim Van Criekinge
Bioinformatica 06-10-2011-p2 introductionBioinformatica 06-10-2011-p2 introduction
Bioinformatica 06-10-2011-p2 introduction
Prof. Wim Van Criekinge576 visualizações
Looking for Patterns por Keith Wright
Looking for PatternsLooking for Patterns
Looking for Patterns
Keith Wright1.2K visualizações
Les08 por Sudharsan S
Les08Les08
Les08
Sudharsan S862 visualizações
Python regular expressions por Krishna Nanda
Python regular expressionsPython regular expressions
Python regular expressions
Krishna Nanda159 visualizações
Java căn bản - Chapter9 por Vince Vo
Java căn bản - Chapter9Java căn bản - Chapter9
Java căn bản - Chapter9
Vince Vo554 visualizações
Basta mastering regex power por Max Kleiner
Basta mastering regex powerBasta mastering regex power
Basta mastering regex power
Max Kleiner558 visualizações

Último

Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu... por
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...NUS-ISS
32 visualizações54 slides
Five Things You SHOULD Know About Postman por
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About PostmanPostman
25 visualizações43 slides
Report 2030 Digital Decade por
Report 2030 Digital DecadeReport 2030 Digital Decade
Report 2030 Digital DecadeMassimo Talia
13 visualizações41 slides
[2023] Putting the R! in R&D.pdf por
[2023] Putting the R! in R&D.pdf[2023] Putting the R! in R&D.pdf
[2023] Putting the R! in R&D.pdfEleanor McHugh
38 visualizações127 slides
DALI Basics Course 2023 por
DALI Basics Course  2023DALI Basics Course  2023
DALI Basics Course 2023Ivory Egg
14 visualizações12 slides
Black and White Modern Science Presentation.pptx por
Black and White Modern Science Presentation.pptxBlack and White Modern Science Presentation.pptx
Black and White Modern Science Presentation.pptxmaryamkhalid2916
14 visualizações21 slides

Último(20)

Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu... por NUS-ISS
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
NUS-ISS32 visualizações
Five Things You SHOULD Know About Postman por Postman
Five Things You SHOULD Know About PostmanFive Things You SHOULD Know About Postman
Five Things You SHOULD Know About Postman
Postman25 visualizações
Report 2030 Digital Decade por Massimo Talia
Report 2030 Digital DecadeReport 2030 Digital Decade
Report 2030 Digital Decade
Massimo Talia13 visualizações
[2023] Putting the R! in R&D.pdf por Eleanor McHugh
[2023] Putting the R! in R&D.pdf[2023] Putting the R! in R&D.pdf
[2023] Putting the R! in R&D.pdf
Eleanor McHugh38 visualizações
DALI Basics Course 2023 por Ivory Egg
DALI Basics Course  2023DALI Basics Course  2023
DALI Basics Course 2023
Ivory Egg14 visualizações
Black and White Modern Science Presentation.pptx por maryamkhalid2916
Black and White Modern Science Presentation.pptxBlack and White Modern Science Presentation.pptx
Black and White Modern Science Presentation.pptx
maryamkhalid291614 visualizações
Perth MeetUp November 2023 por Michael Price
Perth MeetUp November 2023 Perth MeetUp November 2023
Perth MeetUp November 2023
Michael Price12 visualizações
handbook for web 3 adoption.pdf por Liveplex
handbook for web 3 adoption.pdfhandbook for web 3 adoption.pdf
handbook for web 3 adoption.pdf
Liveplex19 visualizações
Spesifikasi Lengkap ASUS Vivobook Go 14 por Dot Semarang
Spesifikasi Lengkap ASUS Vivobook Go 14Spesifikasi Lengkap ASUS Vivobook Go 14
Spesifikasi Lengkap ASUS Vivobook Go 14
Dot Semarang35 visualizações
Special_edition_innovator_2023.pdf por WillDavies22
Special_edition_innovator_2023.pdfSpecial_edition_innovator_2023.pdf
Special_edition_innovator_2023.pdf
WillDavies2214 visualizações
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen... por NUS-ISS
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
NUS-ISS23 visualizações
Tunable Laser (1).pptx por Hajira Mahmood
Tunable Laser (1).pptxTunable Laser (1).pptx
Tunable Laser (1).pptx
Hajira Mahmood21 visualizações
Web Dev - 1 PPT.pdf por gdsczhcet
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet52 visualizações
Attacking IoT Devices from a Web Perspective - Linux Day por Simone Onofri
Attacking IoT Devices from a Web Perspective - Linux Day Attacking IoT Devices from a Web Perspective - Linux Day
Attacking IoT Devices from a Web Perspective - Linux Day
Simone Onofri15 visualizações
Java Platform Approach 1.0 - Picnic Meetup por Rick Ossendrijver
Java Platform Approach 1.0 - Picnic MeetupJava Platform Approach 1.0 - Picnic Meetup
Java Platform Approach 1.0 - Picnic Meetup
Rick Ossendrijver25 visualizações
Understanding GenAI/LLM and What is Google Offering - Felix Goh por NUS-ISS
Understanding GenAI/LLM and What is Google Offering - Felix GohUnderstanding GenAI/LLM and What is Google Offering - Felix Goh
Understanding GenAI/LLM and What is Google Offering - Felix Goh
NUS-ISS39 visualizações
Future of Learning - Yap Aye Wee.pdf por NUS-ISS
Future of Learning - Yap Aye Wee.pdfFuture of Learning - Yap Aye Wee.pdf
Future of Learning - Yap Aye Wee.pdf
NUS-ISS38 visualizações
Future of Learning - Khoong Chan Meng por NUS-ISS
Future of Learning - Khoong Chan MengFuture of Learning - Khoong Chan Meng
Future of Learning - Khoong Chan Meng
NUS-ISS31 visualizações

Regex lecture

  • 1. By Niko Adrianus Yuwono BUZOO PHP TEAM REGULAR EXPRESSIONS LECTURE
  • 2. What is Regular Expressions?  Regular Expressions or Regex (We’ll mostly use Regex to call it in this presentation) are a powerful tool for examining and modifying text.  Regex use general pattern notation to allow you describe and parse text.  PHP supports two different types of regular expressions: POSIX-extended and Perl- Compatible Regular Expressions (PCRE). But we’ll focus on PCRE in this lecture.
  • 3. Delimiters  When using PCRE functions we need to enclose the pattern using delimiters.  Often used delimiters are forward slashes (/), hash signs (#) and tildes (~ ).  Example of usage :  /([^/ | ^-]+).html/  /</span>(.*?)</span>/
  • 4. Literal-Characters  Literal characters are normal characters that match themselves. Alphanumeric characters and symbols are example of literal characters  To difference between Meta-Characters and Literal-Characters we need to add backslash () before the literal character to define that character is a literal character not a meta character
  • 5. Meta-characters  Meta-characters are the main power of regular expressions, with meta-characters it’s possible to encode alternatives and repetitions in the pattern.  Meta-characters are divided into two type, meta- characters outside class, and meta-characters inside class.
  • 6. Meta-characters Cont’d  Here is list of meta-character that can work outside a class :  , ^ , $ , . , [ , ] , | , ( , ) , ? , * , + , { , }  And this is the list of meta-character that work inside a class :  , ^ , -
  • 7. Character Classes  Character classes in Regex started by opening square bracket ([) and closed by and closing square bracket (])  A character class matches a single character in the subject; the character must be in the set of characters defined by the class.  Example :  [a-z] will match any lowercase letter  [^A-Z] will match a ny character that is not a uppercase letter
  • 8. Subpatterns  Subpatterns are delimited by parentheses (round brackets), which can be nested.  Subpatterns can do two things : 1. It localizes a set of alternatives. For example, the pattern hen(dy|rio|ri) matches one of the words “hendy", “henrio", or “henri". Without the parentheses, it would match “hendy", “rio" or the “ri”. 2. It sets up the subpattern as a capturing subpattern (as defined above).
  • 9. Subpatterns Cont’d  For example, if the string “kafji tinggi" is matched against the pattern ((kafji|niko) (tinggi|tampan)) the captured substrings are “kafji tinggi", “kafji", and “tinggi", and are numbered 1, 2, and 3.  There are often times we don’t need capturing functions. In that case we can add "?:“ after the opening parenthesis.
  • 10. Optional Items  The question mark makes the preceding token in the regular expression optional.  Example : colou?r will match both colour and color.  You can also wrap a set of characters in parenthesis to make them optional.  Example : Jan(uary)? will match both Jan and January.
  • 11. Repetition  There are two repetition characters, star ( * ) and plus ( + ).  Star ( * ) character will try to match the preceding token zero or more times.  Plus ( + ) character will try to match the preceding token one or more times  Example :  [sS]+ will match any character one or more  [sS]* will match any character zero or more
  • 12. Limiting Repetition  Sometimes we need to limit some repetition, to achieve that we can use { } bracket.  The syntax is {min,max} where min is a must and you can empty the max but it’ll be counted as infinity, and if you omit both the coma and max it’ll repeat the token exactly min times.  Example :  ([A-Z]{3}|[0-9]{4}) will matches three letters or four numbers
  • 13. Greediness  Greediness is a condition where the regex given to option try to match the pattern or not to match the pattern.  But the regex will always try to match the pattern. It can cause some trouble to us and will return an unexpected result.  For example the regex Feb 23(rd)? to the string Today is Feb 23rd, 2003, the match will always be Feb 23rd and not Feb 23.
  • 14. Greediness Cont’d  Example for repetition :  You want to get HTML tag for crawling a website. Usually new people will use <.+> to match the HTML tag. But it will return a different result than you expected. Let’s try to match that pattern with this string -> “Saya <b>suka</b> makan”  The result will be <b>suka</b>  Why?
  • 15. Greediness Cont’d  That’s because of greediness, the pattern <.+> will try to match dot ( . ) as many as possible.  Let’s try to do it step by step.  First the regex will try to search < from this string “Saya <b>suka</b> makan” so Saya will be skipped.  Then after finding < it’ll try to run (.+) that means to find any character one or more so it’ll read from b until the end of string. Then it’ll backtracking until the last > character that have been found so the result will be <b>suka</b> not <b> and </b>
  • 16. Laziness  How to fix greediness problem? You can use laziness by adding ? Question mark after the repetition or question mark to make them lazy  But there is also another alternative to laziness that is negated character class.  Example for previous question :  <[^>]+> will match anything except > character