2. Languages & Grammar
Before discussing languages & grammar let us deal
with some related issues.
Alphabet: is defined as a nonempty finite set
of symbols or letters such as {a1, a2, …, ak}.
Particularly the alphabet for theory of
computation is mostly the binary alphabet {0, 1}.
Set of alphabet represented by
3. Languages & Grammar
A string (word or sentence): is defied as a
finite sequence of symbols over the set of
alphabet (∑), e.g. aaabbbabbb is string over
∑ = {a, b}. A string might be represent by w.
A string may have no symbol at all, in this
case it is called the empty string / null string
and denoted by or
4. Note:
Some times string (word/sentence) can be
represented as function. e.g.
A string x = x0 x1 x2 x3 … xn, which can be
viewed as function as:
x : [n] | x(k) = xk.
where n is the length of the string x, which
denoted by |x|, e.g. |aabbaab| = 7
5. Operations on strings
For an alphabet Σ, given any two strings u: [m] → Σ
and v: [n] → Σ, the concatenation u・v (also written
as uv) of u and v is the string uv: [m + n] → Σ,
defined such that:
nmimifmiv
miifiu
iuv
1)(
,1)(
)(
In particular uuu
6. Formal Languages
Formal vs. Natural Languages
Natural Languages: used by human to communicate.
e.g. Arabic, English, …
syntax is very complicated not completely specified
7. Formal Languages
Formal vs. Natural Languages
Formal Language: specified by a well-defined sets
of grammatical rules
e.g. programming languages
syntax completely defined by given grammar.
8. Why study Formal Languages?
Programming languages are formal.
very useful for pattern matching.
Important for research in natural language
processing (NLP) with computers.
Closely tied to study of abstract machines.
9. Formal Languages
Let be any finite set of symbols (possibly
none), called an alphabet.
A word (string) over is any fine
sequence of letters from .
A formal language is simply a set of strings
10. Formal Languages
*
defined as the set of all possible words
over , including the empty word (denoted
by ).
Example
Let , then},{ ba
},,,,,,,,,,{*
baaabaaaabbbaabaaba
11. A language over is defined as strings over
i.e. a subset of
*
Examples: ∑ = {a, b}
aabaaa ,,,
8,
*
xbax
oddisxbax
*
,
)()(,
*
xNxNbax ba
12. Example
,,,,,,,,,*
aabaaabbbaabaaba
Let ., ba Then
aabaaaL ,,
The language
is a finite language on .
Note: Language can be infinite, and the most
interesting languages are infinite.
13. New language can be constructed using any of the
set operations.
As an example the union of tow languages over ,
is also a language over .
Having {as, so} and {if, soon, possible} as tow
languages over the concatenation of the tow
languages give another language over as follows.
{as, so} {if, soon, possible} =
{asif, assoon, aspossible, soif, sosoon, sopossible}.
14. Regular Expression & Language
Definition: A regular expression over , that
corresponding to a language, can be recursively
defined as follows:
• ε is a Regular Expression indicates the language containing an
empty string. (L (ε) = {ε})
• φ is a Regular Expression denoting an empty language. (L (φ) = { })
• a is a Regular Expression where L = {a}.
15. Regular Expression & Language
• If x is a Regular Expression denoting the
language L(x) and y is a Regular Expression denoting the
language L(y), then
o x + y is a Regular Expression corresponding to the
language L(x) ∪ L(y) where L(x + y) = L(x) ∪ L(y).
o x . x is a Regular Expression corresponding to the
language L(x) . L(y) where L(x. y) = L(x) . L(y).
o R* is a Regular Expression corresponding to the language
L(R*)where L(R*) = (L(R))*.
• if we apply any of the above rules several times, they are
Regular Expressions.
16. A language over is a regular language if there is
some regular expression over corresponding to it.
17. Examples of regular expression over
**
011
1*
**
110
MeaningExpression
All words over contains
exactly a single 0.
All words over containing
at least one 1.
All words over that
containing a substring 110.
}1,0{
(00 + 01 + 10 + 11)*
All string of 0’s and 1’s of even length
can be obtained by concatenating any
combination of the strings 00, 01, 10 and
11 including ε
18. MeaningExpression
*
)( All words over ∑ with an even length
*
)( All words over ∑ of length a multiple of
three
101100
All words over ∑ starts and ends with the
same symbol
011
All strings containing over ∑ that begin
with 1 and ending with 01
(0|(1(01*0)*1))* All strings containing over ∑ contains
binary numbers that are multiples of 3
19. Formal Grammar
Formal definition
,,,, PSTVG
where,
A generative grammar, which firstly proposed
by Noam Chomsky in 1950s, considered as a
grammar G that defined as a quadruple.
20. is a finite set of objects called variables.V
is a finite set of objects called terminal symbols.T
is a special symbol called start variable.VS
is a finite set of productions.P
V TIt is assumed that and are nonempty and disjoint
22. Example
,,,, PSTVG
Consider a grammar G that defined as
where,
V = {S, B} T = {a, b, c}
P consists of the following production rule
S aBSc S abc
Ba aB Bb bb
23. L(G) may be derived to be consisted of the
following strings:
L(G) = {abc, aabbcc, aaabbbccc, …}
Therefore L(G) can be represented as
L(G) = {an bn cn | n>0}
L(G) = {w ϵ {a, b, c}* | na(w) = nb(w) = nc(w)}
24. Representation of Language using Graphs
Transition Graph over is a finite directed graph in
which every arrow (edge) is labeled by some word
(possibly the empty word ( )).
There is a least one vertex , labeled by a ( ) sign
(such vertices are called initial vertices), and a
(possibly empty set) of vertices, labeled by a ( )
sign (called the final vertices). A vertex can be both
initial and final.
T
*
w
''
''
25.
ba, *
a b
bab,,
Example
Consider the following transition graphs over
ba,
26. b
a
All words over beginning with a followed by a sb'
ba,
aa
ba,
ba,
bb
All words over containing two consecutive ( )
or two consecutive ( )
sa'
sb'
27. bbaa,
baab,
bbaa ,
baab,
All words over containing an even number of ( )
and even number of ( )
sa'
sb'
b
aa
ba,
ba,
All words over either starting with ( ) or
containing ( ) .
b
aa