SlideShare a Scribd company logo
1 of 36
Download to read offline
1
Outline
 Who am I? Why I did this?
 Introduction to PEG
 Introduction to programming language
 Write a parser in PEG
 No demo QQ
2
About Me
 葉闆, Yodalee <lc85301@gmail.com>
 Study EE in college, Microwave in graduate school,
now rookie engineer in Synopsys.
3
Github: yodalee Blogger: http://yodalee.blogspot.tw
Why Did I Do This
 “Understanding Computation: From
Simple Machines to Impossible
Programs”
 In the book, it implements a
programming language parser, regular
expression parser with Ruby Treetop,
which is a PEG parser.
 I re-write all the code in Rust, so I did a
little research on PEG.
https://github.com/yodalee/computationbook
-rust
4
Introduction to PEG
5
Parsing Expression Grammar, PEG
 Bryan Ford, <Parsing Expression Grammars: A Recognition-
Based Syntactic Foundation>, 2004
 A replacement to Chomsky language, by removing the
ambiguity in grammar.
 The ambiguity is useful in modeling natural language, but not
in precise and unambiguous programming language.
6
Language <- Subject Verb Noun
Subject <- He | Lisa …
Verb <- is | has | sees…
Noun <- student | a toy …
PEG Basic Rule
 PEG in definition are very similar to CFG, composed
of rules.
 Rule will either:
 Match success: consume input.
 Match fail: not consume input.
 As predicate: only return success or fail, not consume input.
7
PEG Basic Rule
 Replace choice ‘|’ with
prioritized choice ‘/’.
 Consider following:
 CFG: A = “a” | “ab”
PEG: A = “a” / “ab”
 PEG: A = a* a
8
Operator
“” String Literal
[] Character Set
. Any Character
(e1 e2 ..) Grouping
e? e+ e* Optional Repetition
&e And predicate
!e Not predicate
e1 e2 Sequence
e1 / e2 Prioritized Choice
Some Example
 NUMBER <- [1-9] [0-9]*
 COMMENT <- “//” (!”n” .)* n
 EXPRESSION <- TERM ([+-] TERM)*
TERM <- FACTOR ([*/] FACTOR)*
 STAT_IF <-
“if” COND “then” STATEMENT “else” STATEMENT /
“if” COND “then” STATEMENT
9
PEG is not CFG
 PEG is equivalent to Top Down Programming Language
(TDPL)
 Language anbncn is not context-free, however PEG can parse
it with And-predicate.
 In CFG, A <- aAa | a match: odd number “a”
In PEG, A <- aAa / a match: 2n-1 “a”
 It is an open problem that any CFG can be parsed by PEG
10
A <- aAb / ε
B <- bBc / ε
S <- &(A !b) a* B
Using PEG
 There are many library that supports PEG:
 Rust: rust-peg, pest, nom-peg …
 C++: PEGTL, Boost …
 Ruby: kpeg, raabro, Treetop …
 Python: pyPEG, parsimonious …
 Haskell: Peggy …
 …
 So why Rust?
11
Introduction to
Programming Language
12
Simple Language
 3 types of statements: assign, if else, while.
 Support integer arithmetic.
 Support pair, list, function with one argument.
Simple, but actually we can do some complex things, like
recursion, map.
13
factorfun = function factor(x) {
if (x > 1) { x * factor ( x-1 ) } else { 1 }
}
result = factorfun(10); // 3628800
function last(l) {
if (isnothing(snd(l))) {
fst(l)
} else {
last(snd(l))
}
}
Abstract Syntax Tree
 Use Rust enum to store a payload inside.
 “Programming” like this:
14
pub enum Node {
Number(i64),
Boolean(bool),
Add(Box<Node>, Box<Node>),
Subtract(Box<Node>, Box<Node>),
LT(Box<Node>, Box<Node>)
…
}
let n = Node::add(Node::number(3), Node::number(4))
Add
3 4
LT
8
Abstract Syntax Tree
 All the statement are Node:
15
pub enum Node {
Variable ( String ),
Assign ( String, Box<Node>),
If ( Box<Node>, Box<Node>, Box<Node> ),
While ( Box<Node>, Box<Node> ),
…
}
Pair, List and Nothing
 Node::pair(Node::number(3), Node::number(4))
 List [3,4,5] = pair(3, pair(4, pair(5, nothing)))
 Nothing special
16
Pair
3 4
Pair
3 Pair
4 Pair
Nothing5
Environment and Machine
 Environment stores a Hashmap<String, Box<Node>>, with
<add> and <get> interface.
 A machine accepts an AST and an environment to evaluate
AST inside the machine.
17
pub struct Environment {
pub vars: HashMap<String, Box<Node>>
}
pub struct Machine {
pub environment: Environment,
expression: Box<Node>
}
Evaluate the AST
 Add evaluate function to all AST node using trait.
 The result will be a new Node.
18
fn evaluate(&self, env: &mut Environment) -> Box<Node>;
match *self {
Node::Add(ref l, ref r) => {
Node::number(l.evaluate(env).value() +
r.evaluate(env).value()) }
…
}
Evaluate the AST
 How to evaluate While Node ( condition, body )?
 Evaluate condition => evaluate body and self if true.
19
x = 3;
while (x < 9) { x = x * 2; }
Evaluate x = 3
Evaluate while (x < 9) x = x * 2
Evaluate x = x * 2
Evaluate while (x < 9) x = x * 2
Evaluate x = x * 2
Evaluate while (x < 9) x = x * 2
Function
 Function is also a type of Node. Upon evaluation, function is
wrapped into Closure with environment at that time.
 Call is evaluated the function with closure’s environment.
20
Node::Func(String, String, Box<Node>)
Node::Closure(Environment, Box<Node>)
fn evaluate(&self, env: &mut Environment) -> Box<Node> {
Node::Fun(ref name, ref arg, ref body) => {
Node::closure(env.clone(), Box::new(self.clone()))
}
}
Call a Function
fn evaluate(&self, env: &mut Environment) -> Box<Node> {
Node::Call(ref closure, ref arg) => {
match *closure {
Node::Closure(ref env, ref fun) => {
if let Node::Fun(funname, argname, body) = *fun.clone() {
let mut newenv = env.clone();
newenv.add(&funname, closure.evaluate(env));
newenv.add(&argname, arg.evaluate(env));
body.evaluate(&mut newenv);
} } } } }
21
Free Variable
 Evaluate the free variables in a function to prevent copy whole
environment
 Node::Variable
 Node::Assign
 Node::Function
22
function addx(x) { function addy(y) { x + y }}
-> no free variables
function addy(x) { x + y }
-> free variable y
Call a Function
if let Node::Fun(funname, argname, body) = *fun.clone() {
let mut newenv = new Environment {};
for var in free_vars(fun) {
newenv.add(var, env.get(var));
}
newenv.add(&funname, closure.evaluate(env));
newenv.add(&argname, arg.evaluate(env));
body.evaluate(&mut newenv);
}
23
What is a Language?
 We make some concepts abstract, like a virtual machine.
Design a language is to design the abstraction.
 Function “evaluate” implement the concept, of course we can
implement it as anything. Like return 42 on every evaluation.
24
Concept Simple, virtual
machine
Real Machine
Number 3 Node::number(3) 0b11 in memory
+ Node::add(l, r) add r1 r2
Choice Node::if branch command
What is a Language?
 Abstraction will bring some precision issue, like floating point.
We have no way to express concept of <infinite>.
 We can create a language on geometry as below, which
representation for line is best?
 Consider every pros and cons the abstraction will bring.
25
Concept In Programming Language
Point (x: u32, y: u32)
Line
(Point, Point)
(Point, Slope)
(Point, Point, type{vertical, horizontal, angled})
Intersection Calculate intersection
Implement a Parser with
PEG
26
The Pest Package
 Rust Pest
 https://github.com/pest-parser/pest
 My simple language parser grammar at:
 https://github.com/yodalee/simplelang
 Parsing Flow
27
Grammar Parser
Source
Code
Pest Pair
Structure
Simple AST
The Pest Package
28
use pest::Parser;
#[derive(Parser)]
#[grammar = "simple.pest"]
struct SimpleParser;
let pairs = SimpleParser::parse(
Rule::simple, “<source code>")
 A pair represents the parse result
from a rule.
 Pair.as_rule() => the rule
 Pair.as_span() => get match span
 Pair.as_str() => matched text
 Pair.into_inner()=> Sub-rules
Grammar <-> Build AST
Number = { [1-9] ~ [0-9]* }
Variable = { [A-Za-z] ~ [A-Za-z0-9]* }
Call = { Variable ~ “(“ ~ Expr ~ “)” }
Factor = { “(“ ~ Expr ~ “)” | Call | Variable | Number }
29
fn build_factor(pair: Pair<Rule>) -> Box<Node> {
match pair.as_rule() {
Rule::number => Node::number(pair.as_str().parse::<i64>().unwrap()),
Rule::variable => Node::variable(pair.as_str()),
Rule::expr => ...,
Rule::call => ...,
}
}
Climb the Expression
 Expression can be written as single Rule:
Expr = { Factor ~ (op_binary ~ Factor)* }
 Pest provides a template, just defines:
 Function build factor => create Factor Node
 Function infix rules => create Operator Node
 Operator precedence =>
vector of operator precedence and left/right association
30
Challenges
 Error message with syntax error.
 How to deal with optional? Like C for loop
 A more systematic way to deal with large language, like C.
31
compound_statement <- block_list
block_list <- block_list block | ε
block <- declaration_list | statement_list
declaration_list <- declaration_list declaration | ε
statement_list <- statment_list statement | ε
// Wrong PEG
compound_statement <- block*
block <- declaration* ~ statement*
// Correct PEG
compound_statement <- block*
block <- (declaration | statement)+
Conclusion
32
Conclusion
 PEG is a new, much powerful grammar than CFG. Fast and
convenient to create a small language parser.
 The most important concept in programming language?
Abstraction
 Is there best abstraction? NO. It is engineering.
33
Reference
 <Parsing Expression Grammars: A Recognition-Based
Syntactic Foundation>, Bryan Ford
 <Understanding Computation: From Simple Machines to
Impossible Programs>
 <Programming Language Part B> on Coursera, University of
Washington
34
Thank You for Listening
35
IB502 1430 – 1510
Build Yourself a Nixie Tube Clock
36

More Related Content

What's hot

inversion counting
inversion countinginversion counting
inversion counting
tmaehara
 

What's hot (20)

x^2+ny^2の形で表せる素数の法則と類体論
x^2+ny^2の形で表せる素数の法則と類体論x^2+ny^2の形で表せる素数の法則と類体論
x^2+ny^2の形で表せる素数の法則と類体論
 
Binary exploitation - AIS3
Binary exploitation - AIS3Binary exploitation - AIS3
Binary exploitation - AIS3
 
中3女子が狂える本当に気持ちのいい constexpr
中3女子が狂える本当に気持ちのいい constexpr中3女子が狂える本当に気持ちのいい constexpr
中3女子が狂える本当に気持ちのいい constexpr
 
inversion counting
inversion countinginversion counting
inversion counting
 
Rabin Karp ppt
Rabin Karp pptRabin Karp ppt
Rabin Karp ppt
 
Binary Reading in C#
Binary Reading in C#Binary Reading in C#
Binary Reading in C#
 
9 python data structure-2
9 python data structure-29 python data structure-2
9 python data structure-2
 
Application of Stack - Yadraj Meena
Application of Stack - Yadraj MeenaApplication of Stack - Yadraj Meena
Application of Stack - Yadraj Meena
 
たのしい高階関数
たのしい高階関数たのしい高階関数
たのしい高階関数
 
Async programming and python
Async programming and pythonAsync programming and python
Async programming and python
 
動的計画法入門(An introduction to Dynamic Programming)
動的計画法入門(An introduction to Dynamic Programming)動的計画法入門(An introduction to Dynamic Programming)
動的計画法入門(An introduction to Dynamic Programming)
 
Counting Sort and Radix Sort Algorithms
Counting Sort and Radix Sort AlgorithmsCounting Sort and Radix Sort Algorithms
Counting Sort and Radix Sort Algorithms
 
グラフネットワーク〜フロー&カット〜
グラフネットワーク〜フロー&カット〜グラフネットワーク〜フロー&カット〜
グラフネットワーク〜フロー&カット〜
 
String matching with finite state automata
String matching with finite state automataString matching with finite state automata
String matching with finite state automata
 
F strings
F stringsF strings
F strings
 
Satisfiability
SatisfiabilitySatisfiability
Satisfiability
 
Introduction to go lang
Introduction to go langIntroduction to go lang
Introduction to go lang
 
6-Python-Recursion PPT.pptx
6-Python-Recursion PPT.pptx6-Python-Recursion PPT.pptx
6-Python-Recursion PPT.pptx
 
Linear and Binary search
Linear and Binary searchLinear and Binary search
Linear and Binary search
 
Lecture09 recursion
Lecture09 recursionLecture09 recursion
Lecture09 recursion
 

Similar to Use PEG to Write a Programming Language Parser

20130530-PEGjs
20130530-PEGjs20130530-PEGjs
20130530-PEGjs
zuqqhi 2
 
Столпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай МозговойСтолпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай Мозговой
Sigma Software
 

Similar to Use PEG to Write a Programming Language Parser (20)

Functional Programming In Java
Functional Programming In JavaFunctional Programming In Java
Functional Programming In Java
 
20130530-PEGjs
20130530-PEGjs20130530-PEGjs
20130530-PEGjs
 
Столпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай МозговойСтолпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай Мозговой
 
Golang basics for Java developers - Part 1
Golang basics for Java developers - Part 1Golang basics for Java developers - Part 1
Golang basics for Java developers - Part 1
 
The GO Language : From Beginners to Gophers
The GO Language : From Beginners to GophersThe GO Language : From Beginners to Gophers
The GO Language : From Beginners to Gophers
 
C Tutorials
C TutorialsC Tutorials
C Tutorials
 
Python basic
Python basicPython basic
Python basic
 
What we can learn from Rebol?
What we can learn from Rebol?What we can learn from Rebol?
What we can learn from Rebol?
 
SymfonyCon 2017 php7 performances
SymfonyCon 2017 php7 performancesSymfonyCon 2017 php7 performances
SymfonyCon 2017 php7 performances
 
Functional programming ii
Functional programming iiFunctional programming ii
Functional programming ii
 
Ch2
Ch2Ch2
Ch2
 
Дмитрий Верескун «Синтаксический сахар C#»
Дмитрий Верескун «Синтаксический сахар C#»Дмитрий Верескун «Синтаксический сахар C#»
Дмитрий Верескун «Синтаксический сахар C#»
 
Geeks Anonymes - Le langage Go
Geeks Anonymes - Le langage GoGeeks Anonymes - Le langage Go
Geeks Anonymes - Le langage Go
 
What can be done with Java, but should better be done with Erlang (@pavlobaron)
What can be done with Java, but should better be done with Erlang (@pavlobaron)What can be done with Java, but should better be done with Erlang (@pavlobaron)
What can be done with Java, but should better be done with Erlang (@pavlobaron)
 
Python - Getting to the Essence - Points.com - Dave Park
Python - Getting to the Essence - Points.com - Dave ParkPython - Getting to the Essence - Points.com - Dave Park
Python - Getting to the Essence - Points.com - Dave Park
 
Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008Groovy Introduction - JAX Germany - 2008
Groovy Introduction - JAX Germany - 2008
 
ppt7
ppt7ppt7
ppt7
 
ppt2
ppt2ppt2
ppt2
 
name name2 n
name name2 nname name2 n
name name2 n
 
name name2 n2
name name2 n2name name2 n2
name name2 n2
 

More from Yodalee

More from Yodalee (7)

COSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdfCOSCUP2023 RSA256 Verilator.pdf
COSCUP2023 RSA256 Verilator.pdf
 
Gameboy emulator in rust and web assembly
Gameboy emulator in rust and web assemblyGameboy emulator in rust and web assembly
Gameboy emulator in rust and web assembly
 
Make A Shoot ‘Em Up Game with Amethyst Framework
Make A Shoot ‘Em Up Game with Amethyst FrameworkMake A Shoot ‘Em Up Game with Amethyst Framework
Make A Shoot ‘Em Up Game with Amethyst Framework
 
Build Yourself a Nixie Tube Clock
Build Yourself a Nixie Tube ClockBuild Yourself a Nixie Tube Clock
Build Yourself a Nixie Tube Clock
 
Introduction to nand2 tetris
Introduction to nand2 tetrisIntroduction to nand2 tetris
Introduction to nand2 tetris
 
Office word skills
Office word skillsOffice word skills
Office word skills
 
Git: basic to advanced
Git: basic to advancedGit: basic to advanced
Git: basic to advanced
 

Recently uploaded

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 

Recently uploaded (20)

%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 

Use PEG to Write a Programming Language Parser

  • 1. 1
  • 2. Outline  Who am I? Why I did this?  Introduction to PEG  Introduction to programming language  Write a parser in PEG  No demo QQ 2
  • 3. About Me  葉闆, Yodalee <lc85301@gmail.com>  Study EE in college, Microwave in graduate school, now rookie engineer in Synopsys. 3 Github: yodalee Blogger: http://yodalee.blogspot.tw
  • 4. Why Did I Do This  “Understanding Computation: From Simple Machines to Impossible Programs”  In the book, it implements a programming language parser, regular expression parser with Ruby Treetop, which is a PEG parser.  I re-write all the code in Rust, so I did a little research on PEG. https://github.com/yodalee/computationbook -rust 4
  • 6. Parsing Expression Grammar, PEG  Bryan Ford, <Parsing Expression Grammars: A Recognition- Based Syntactic Foundation>, 2004  A replacement to Chomsky language, by removing the ambiguity in grammar.  The ambiguity is useful in modeling natural language, but not in precise and unambiguous programming language. 6 Language <- Subject Verb Noun Subject <- He | Lisa … Verb <- is | has | sees… Noun <- student | a toy …
  • 7. PEG Basic Rule  PEG in definition are very similar to CFG, composed of rules.  Rule will either:  Match success: consume input.  Match fail: not consume input.  As predicate: only return success or fail, not consume input. 7
  • 8. PEG Basic Rule  Replace choice ‘|’ with prioritized choice ‘/’.  Consider following:  CFG: A = “a” | “ab” PEG: A = “a” / “ab”  PEG: A = a* a 8 Operator “” String Literal [] Character Set . Any Character (e1 e2 ..) Grouping e? e+ e* Optional Repetition &e And predicate !e Not predicate e1 e2 Sequence e1 / e2 Prioritized Choice
  • 9. Some Example  NUMBER <- [1-9] [0-9]*  COMMENT <- “//” (!”n” .)* n  EXPRESSION <- TERM ([+-] TERM)* TERM <- FACTOR ([*/] FACTOR)*  STAT_IF <- “if” COND “then” STATEMENT “else” STATEMENT / “if” COND “then” STATEMENT 9
  • 10. PEG is not CFG  PEG is equivalent to Top Down Programming Language (TDPL)  Language anbncn is not context-free, however PEG can parse it with And-predicate.  In CFG, A <- aAa | a match: odd number “a” In PEG, A <- aAa / a match: 2n-1 “a”  It is an open problem that any CFG can be parsed by PEG 10 A <- aAb / ε B <- bBc / ε S <- &(A !b) a* B
  • 11. Using PEG  There are many library that supports PEG:  Rust: rust-peg, pest, nom-peg …  C++: PEGTL, Boost …  Ruby: kpeg, raabro, Treetop …  Python: pyPEG, parsimonious …  Haskell: Peggy …  …  So why Rust? 11
  • 13. Simple Language  3 types of statements: assign, if else, while.  Support integer arithmetic.  Support pair, list, function with one argument. Simple, but actually we can do some complex things, like recursion, map. 13 factorfun = function factor(x) { if (x > 1) { x * factor ( x-1 ) } else { 1 } } result = factorfun(10); // 3628800 function last(l) { if (isnothing(snd(l))) { fst(l) } else { last(snd(l)) } }
  • 14. Abstract Syntax Tree  Use Rust enum to store a payload inside.  “Programming” like this: 14 pub enum Node { Number(i64), Boolean(bool), Add(Box<Node>, Box<Node>), Subtract(Box<Node>, Box<Node>), LT(Box<Node>, Box<Node>) … } let n = Node::add(Node::number(3), Node::number(4)) Add 3 4 LT 8
  • 15. Abstract Syntax Tree  All the statement are Node: 15 pub enum Node { Variable ( String ), Assign ( String, Box<Node>), If ( Box<Node>, Box<Node>, Box<Node> ), While ( Box<Node>, Box<Node> ), … }
  • 16. Pair, List and Nothing  Node::pair(Node::number(3), Node::number(4))  List [3,4,5] = pair(3, pair(4, pair(5, nothing)))  Nothing special 16 Pair 3 4 Pair 3 Pair 4 Pair Nothing5
  • 17. Environment and Machine  Environment stores a Hashmap<String, Box<Node>>, with <add> and <get> interface.  A machine accepts an AST and an environment to evaluate AST inside the machine. 17 pub struct Environment { pub vars: HashMap<String, Box<Node>> } pub struct Machine { pub environment: Environment, expression: Box<Node> }
  • 18. Evaluate the AST  Add evaluate function to all AST node using trait.  The result will be a new Node. 18 fn evaluate(&self, env: &mut Environment) -> Box<Node>; match *self { Node::Add(ref l, ref r) => { Node::number(l.evaluate(env).value() + r.evaluate(env).value()) } … }
  • 19. Evaluate the AST  How to evaluate While Node ( condition, body )?  Evaluate condition => evaluate body and self if true. 19 x = 3; while (x < 9) { x = x * 2; } Evaluate x = 3 Evaluate while (x < 9) x = x * 2 Evaluate x = x * 2 Evaluate while (x < 9) x = x * 2 Evaluate x = x * 2 Evaluate while (x < 9) x = x * 2
  • 20. Function  Function is also a type of Node. Upon evaluation, function is wrapped into Closure with environment at that time.  Call is evaluated the function with closure’s environment. 20 Node::Func(String, String, Box<Node>) Node::Closure(Environment, Box<Node>) fn evaluate(&self, env: &mut Environment) -> Box<Node> { Node::Fun(ref name, ref arg, ref body) => { Node::closure(env.clone(), Box::new(self.clone())) } }
  • 21. Call a Function fn evaluate(&self, env: &mut Environment) -> Box<Node> { Node::Call(ref closure, ref arg) => { match *closure { Node::Closure(ref env, ref fun) => { if let Node::Fun(funname, argname, body) = *fun.clone() { let mut newenv = env.clone(); newenv.add(&funname, closure.evaluate(env)); newenv.add(&argname, arg.evaluate(env)); body.evaluate(&mut newenv); } } } } } 21
  • 22. Free Variable  Evaluate the free variables in a function to prevent copy whole environment  Node::Variable  Node::Assign  Node::Function 22 function addx(x) { function addy(y) { x + y }} -> no free variables function addy(x) { x + y } -> free variable y
  • 23. Call a Function if let Node::Fun(funname, argname, body) = *fun.clone() { let mut newenv = new Environment {}; for var in free_vars(fun) { newenv.add(var, env.get(var)); } newenv.add(&funname, closure.evaluate(env)); newenv.add(&argname, arg.evaluate(env)); body.evaluate(&mut newenv); } 23
  • 24. What is a Language?  We make some concepts abstract, like a virtual machine. Design a language is to design the abstraction.  Function “evaluate” implement the concept, of course we can implement it as anything. Like return 42 on every evaluation. 24 Concept Simple, virtual machine Real Machine Number 3 Node::number(3) 0b11 in memory + Node::add(l, r) add r1 r2 Choice Node::if branch command
  • 25. What is a Language?  Abstraction will bring some precision issue, like floating point. We have no way to express concept of <infinite>.  We can create a language on geometry as below, which representation for line is best?  Consider every pros and cons the abstraction will bring. 25 Concept In Programming Language Point (x: u32, y: u32) Line (Point, Point) (Point, Slope) (Point, Point, type{vertical, horizontal, angled}) Intersection Calculate intersection
  • 26. Implement a Parser with PEG 26
  • 27. The Pest Package  Rust Pest  https://github.com/pest-parser/pest  My simple language parser grammar at:  https://github.com/yodalee/simplelang  Parsing Flow 27 Grammar Parser Source Code Pest Pair Structure Simple AST
  • 28. The Pest Package 28 use pest::Parser; #[derive(Parser)] #[grammar = "simple.pest"] struct SimpleParser; let pairs = SimpleParser::parse( Rule::simple, “<source code>")  A pair represents the parse result from a rule.  Pair.as_rule() => the rule  Pair.as_span() => get match span  Pair.as_str() => matched text  Pair.into_inner()=> Sub-rules
  • 29. Grammar <-> Build AST Number = { [1-9] ~ [0-9]* } Variable = { [A-Za-z] ~ [A-Za-z0-9]* } Call = { Variable ~ “(“ ~ Expr ~ “)” } Factor = { “(“ ~ Expr ~ “)” | Call | Variable | Number } 29 fn build_factor(pair: Pair<Rule>) -> Box<Node> { match pair.as_rule() { Rule::number => Node::number(pair.as_str().parse::<i64>().unwrap()), Rule::variable => Node::variable(pair.as_str()), Rule::expr => ..., Rule::call => ..., } }
  • 30. Climb the Expression  Expression can be written as single Rule: Expr = { Factor ~ (op_binary ~ Factor)* }  Pest provides a template, just defines:  Function build factor => create Factor Node  Function infix rules => create Operator Node  Operator precedence => vector of operator precedence and left/right association 30
  • 31. Challenges  Error message with syntax error.  How to deal with optional? Like C for loop  A more systematic way to deal with large language, like C. 31 compound_statement <- block_list block_list <- block_list block | ε block <- declaration_list | statement_list declaration_list <- declaration_list declaration | ε statement_list <- statment_list statement | ε // Wrong PEG compound_statement <- block* block <- declaration* ~ statement* // Correct PEG compound_statement <- block* block <- (declaration | statement)+
  • 33. Conclusion  PEG is a new, much powerful grammar than CFG. Fast and convenient to create a small language parser.  The most important concept in programming language? Abstraction  Is there best abstraction? NO. It is engineering. 33
  • 34. Reference  <Parsing Expression Grammars: A Recognition-Based Syntactic Foundation>, Bryan Ford  <Understanding Computation: From Simple Machines to Impossible Programs>  <Programming Language Part B> on Coursera, University of Washington 34
  • 35. Thank You for Listening 35
  • 36. IB502 1430 – 1510 Build Yourself a Nixie Tube Clock 36