What attracts researchers starting from the 60s till nowadays? What is studied in university by engineers in computer science and then successfully forgotten? What is at the heart of the compilers used daily by any software developer? Parsers! From a practical point of view using a small pill of theory, this session will bring lights on questions like: if there is so many parser-generators based on formal theory, then why javac, GCC and Clang are all hand-written? And how we, insiders of the world of parsing, do this at SonarSource for languages like Java, C/C++, C#, JavaScript, Python, COBOL?
2. @dbolkensteyn @_godin_#parsing 2/56
The Art of Parsing
// TODO: don't forget to add huge disclaimer that all opinions hereinbelow
are our own and not our employer (they wish they had them)
Evgeny Mandrikov
@_godin_
Dinesh Bolkensteyn
@dbolkensteyn
4. @dbolkensteyn @_godin_#parsing 4/56
What is the plan?
Why
• javac and GCC are hand-written
• do we use parser-generators ?
Together we will implement parser for
• arithmetic expressions
• common constructions from Java
• C++ ;)
14. @dbolkensteyn @_godin_#parsing 14/56
Arithmetic expressions
expr ➙ NUM – expr
| NUM
expr ➙ expr – expr
| NUM
expr ➙ expr – NUM
| NUM
(4 – 3)– 2 =-1
4 –(3 – 2)= 3
4
3 2
expr
expr
expr
4 3
2
expr
15. @dbolkensteyn @_godin_#parsing 15/56
Show me the code
int expr() {
int res = expr();
if (token == '–')
return res – num();
return num();
}
int expr() {
int res = expr();
if (token == '–')
return res – num();
return num();
}
expr ➙ expr – NUM
| NUM
16. @dbolkensteyn @_godin_#parsing 16/56
Show me the code right code
??
int expr() {
int res = expr();
if (token == '–')
return res – num();
return num();
}
int expr() {
int res = expr();
if (token == '–')
return res – num();
return num();
}
expr ➙ expr – NUM
| NUM
17. @dbolkensteyn @_godin_#parsing 17/56
Show me the code right code
int expr() {
int res = expr();
if (token == '–')
return res – num();
return num();
}
int expr() {
int res = expr();
if (token == '–')
return res – num();
return num();
}
expr ➙ expr – NUM
| NUM
int expr() {
int res = num();
while (token == '–')
res = res – num();
return res;
}
int expr() {
int res = num();
while (token == '–')
res = res – num();
return res;
}
22. @dbolkensteyn @_godin_#parsing 22/56
Show me the code
int subs() {
res = mult() ;
while (token == '–')
res = res – mult();
return res;
}
int mult() {
int res = num();
while (token == '*')
res = res * num();
return res;
}
int subs() {
res = mult() ;
while (token == '–')
res = res – mult();
return res;
}
int mult() {
int res = num();
while (token == '*')
res = res * num();
return res;
}
subs ➙ subs – mult
| mult
mult ➙ mult * NUM
| NUM
39. @dbolkensteyn @_godin_#parsing 39/56
if (false)
if (true) System.out.println("foo");
else System.out.println("bar");
if (false)
if (true) System.out.println("foo");
else System.out.println("bar");
Quiz
40. @dbolkensteyn @_godin_#parsing 40/56
if (false)
if (true) System.out.println("foo");
else System.out.println("bar");
if (false)
if (true) System.out.println("foo");
else System.out.println("bar");
«Dangling else»
if-stmt ➙ IF (cond) stmt ELSE stmt
/ IF (cond) stmt
42. @dbolkensteyn @_godin_#parsing 42/56
C++ all the pains of the world
int *B;
typedef int A;
(A)*B; // cast to type 'A' ('int' alias)
// of dereference of expression 'B'
int A, B;
(A)*B; // multiplication of 'A' and 'B'
// with redundant parenthesis around 'A'
int *B;
typedef int A;
(A)*B; // cast to type 'A' ('int' alias)
// of dereference of expression 'B'
int A, B;
(A)*B; // multiplication of 'A' and 'B'
// with redundant parenthesis around 'A'
Java is good, because it
was influenced by bad experience of C++ (A)*B(A)*B
48. @dbolkensteyn @_godin_#parsing 48/56
Back to the future «dangling else»
if (…)
if (…) then-stmt
else else-stmt
if (…)
if (…) then-stmt
else else-stmt
outer-if
inner-if inner-if
then-stmt else-stmt
inner-if · else-stmt