Potential of AI (Generative AI) in Business: Learnings and Insights
GNU GCC - what just a compiler...?
1. 2012
Saket Kr. Pathak
Software developer
3D Graphics
GNU GCC - what just a compiler...?
A quick lookup of overview in reference of GCC that is GNU Compiler Collection.
2. GNU GCC - what just a compiler...?
Among all of us, we had learnt or studied about compilers and languages supported by
these compilers from last few days (*whatever days might be in multiple of 365 ... :) )
and most of us had specific paper entitled as "Compiler Design" ... or any other name
having similar sense or content of syllabus. That's really great mate, because I never got
that type of fortunate chance to study "Compiler Designing" and all. Whatever ... it's all
my interest and fortunate time that I found some thing valuable as well as sensible to
learn and study.
How many of times any one asked you in which compiler you work ... this question
belong to all studying professional and stud fellows? Even I had asked a few times, and
most of the time, I replied the answer to the question.
And my answer was, GNU GCC or VC++ as per environment matters. Today I realized
this is quite foolish answer as ... If some one asked you about the flavor of coffee (in
reference of dating friend) ... and you replied ... The coffee was good ... what a sense of
humor ... haaa haa. :)
So I realized it and studied my answer in reference of... GNU GCC compiler.
It's basically GCC, stands for GNU Compiler Collection. Originally named the GNU
C Compiler, because it only handled the C programming language, GCC 1.0 was
released in 1987, and the compiler was extended to compile C++ in December of that
year. Later on it is embed with compilers concern to languages like Objective-C,
Objective-C++, Fortran, Java, Ada, and Go etc.
Now since it's a collection of compilers so It can't be the exact answer as the question is
about the type and version of your compiler. So In reference of C++ we have a specific
name for the compiler embed within this GC Collection and that is G++, similarly in
reference of C we have a specific name as GNU C Compiler (i.e GCC). A lot of other
languages with there compiler are supported by GCC and are listed as bellow:
Seq. Language Compiler
1. C gcc
2. C++ g++
3. Objective-C gobjc
4. Fortran gFortran
5. Java gcj
6. Ada GNAT
7. Go gccgo
8. Pascal gpc
9. Mercury Mercury
10. PL/I PL/1
11. VHDL GHDL
Saket Kr. Pathak Page 2
3. GNU GCC - what just a compiler...?
Basically GNU Project has some component modules, that I found to discuss here as
some add-on, because. I thought how this big list of Compilers is going to handle within
a single GNU tool chain for all compiler logic, programming libraries and their syntax.
Then I found quite intellectual overview of GCC architecture that is basically categorized
into 3 hierarchical modules and each compiler includes the following three components:
a Front End, a Middle End, and a Back End. GCC compiles one file at a time. A source
file goes through all three components one after another. These three components are
discussed in bit details as follows:
GCC basic components
Front-End The purpose of the front end is to read the source file, parse it, and convert it into
the standard Abstract Syntax Tree (AST) representation. There is one front end
for each programming language. Because of the differences in languages, the
format of the generated ASTs is slightly different for each language. The next
step after AST generation is the unification step in which the AST tree is
converted into a unified form called Generic.
Middle-End The middle end part of the compiler takes control. First, the tree is converted into
another representation called GIMPLE. In this form, each expression contains no
more than three operands, all control flow constructs are represented as
combinations of conditional statements and goto operators, arguments of a
function call can only be variables. GIMPLE is a convenient representation for
optimizing the source code. After GIMPLE, the source code is converted into the
Static Single Assignment (SSA) representation i.e. each variable is assigned to
only once, but can be used at the right hand side of an expression any time. GCC
performs more than 20 different optimizations on SSA trees. The tree is converted
back to the GIMPLE form which is then used to generate a Register-Transfer
Language (RTL) form of a tree. RTL is a hardware-based representation that
corresponds to abstract target architecture with an infinite number of registers. An
RTL optimization pass optimizes the tree in the RTL form.
Back-End Finally, a GCC back-end generates the assembly code for the target architecture
using the RTL representation. Examples of back-ends are x86 back end, mips
back end, etc.
Saket Kr. Pathak Page 3
4. GNU GCC - what just a compiler...?
Hence from the above short-descriptions we have the overview of all the three
components.
Front-End:
Frontends vary internally, having to produce trees that can be handled by the backend.
Currently, the parsers are all hand-coded recursive descent parsers, though there is
no reason why a parser generator could not be used for new front-ends in the future
hence, version 2 of the C compiler used a bison based grammar. Here a recursive
descent parser is a top-down parser built from a set of mutually-recursive procedures
(or a non-recursive equivalent) where each such procedure usually implements one of
the production rules of the grammar, whereas GNU bison, commonly known as Bison,
is a parser generator that is part of the GNU Project. Bison reads a specification of a
context-free language, warning about any parsing ambiguities, and generates a parser
(either in C, C++, or Java) which reads sequences of tokens and decides whether the
sequence conforms to the syntax specified by the grammar.
Then as it converts the source file to abstract syntax tree which has somewhat different
meaning for different language front-ends, and front-ends could provide their own tree
codes. This was simplified with the introduction of GENERIC and GIMPLE, two new
forms of language-independent trees that were introduced with the advent of GCC 4.0.
GENERIC is more complex, based on the GCC 3.x Java front-end's intermediate
representation. GIMPLE is a simplified GENERIC, in which various constructs are
lowered to multiple GIMPLE instructions. The C, C++ and Java front ends produce
GENERIC directly in the front end. Other front ends instead have different intermediate
representations after parsing and convert these to GENERIC.
Middle-end:
As it takes control, GENERIC that is an intermediate representation language used as a
"middle-end" while compiling source code into executable binaries. A subset, called
GIMPLE, is targeted by all the front-ends of GCC. So it’s responcible for all the code
analysis and optimization, working independently of both the compiled language and
Saket Kr. Pathak Page 4
5. GNU GCC - what just a compiler...?
the target architecture, starting from the GENERIC representation and expanding it to
Register Transfer Language. The GENERIC representation contains only the subset of
the imperative programming constructs optimized by the middle-end. In transforming
the source code to GIMPLE, complex expressions are split into a three address code
using temporary variables. This representation was inspired by the SIMPLE
representation proposed in the McCAT compiler by Laurie J. Hendren for simplifying
the analysis and optimization of imperative programs.
As it performs optimization that occurs during any phase of compilation, however the
bulk of optimizations are performed after the syntax and semantic analysis of the front-
end and before the code generation of the back-end. The exact set of GCC optimizations
varies from release to release as it develops, but includes the standard algorithms, such
as loop optimization, jump threading, common sub-expression elimination,
instruction scheduling, and so forth. The RTL optimizations are of less importance
with the addition of global SSA-based optimizations on GIMPLE trees. Some of these
optimizations performed at this level include dead code elimination, partial
redundancy elimination, global value numbering, sparse conditional
constant propagation, and scalar replacement of aggregates. Array dependence
based optimizations such as automatic vectorization and automatic
parallelization are also performed.
Back-end:
The behavior of GCC's back end is partly specified by preprocessor macros and functions
specific to a target architecture, for instance to define the endianness, word size, and
calling conventions. The front part of the back end uses these to help decide RTL
generation, so although GCC's RTL is nominally processor-independent, the initial
sequence of abstract instructions is already adapted to the target. At any moment, the
actual RTL instructions forming the program representation have to comply with the
machine description of the target architecture. At the end of compilation, valid RTL is
further reduced to a strict form in which each instruction refers to real machine
registers and real instructions from the target's instruction set. Forming strict RTL is a
very complicated task, done mostly by the register allocation first but completed only by
a separate "reloading" phase which must account for the vagaries of all of GCC's targets.
The final phase is somewhat anticlimactic, because the patterns to match were generally
chosen during reloading, and so the assembly code is simply built by running
substitutions of registers and addresses into the strings specifying the instructions.
Compatible IDEs
Integrated development environments written for GNU/Linux and some for other
operating systems support GCC. These include:
Anjuta
Code::Blocks
CodeLite
Dev-C++
Eclipse
geany
Saket Kr. Pathak Page 5
6. GNU GCC - what just a compiler...?
KDevelop
Net Beans
Qt Creator
Xcode
Hmmm ... So please never tell any one like fool ... "I use to work on GNU GCC" ... be
specific with GNU C Compiler/G++ for C/C++ receptively. A few links I would like to
mention here, If any of you people like to read about GNU project in bit detail can
definitely enjoy your time with these all ... But I know ... you are Quite busy ... :) ...
whatever ...
References:
http://en.wikipedia.org/wiki/GNU_Project
http://en.wikipedia.org/wiki/GNU_Compiler_Collection#cite_note-8
http://gcc.gnu.org/frontends.html
http://en.wikipedia.org/wiki/GNU_Compiler_Collection
http://www.enotes.com/topic/GNU_Compiler_Collection
http://www.enotes.com/topic/GNU_Compiler_Collection#Back-end
Saket Kr. Pathak Page 6