Handwritten Text Recognition for manuscripts and early printed texts
Final formal languages
1. Formal Languages
●
●
Language: Medium for communication.
Rules of a language should specify
- words which belong to the language.
- words which do not.
2. Formal Languages
Formal Language
●
All the rules for the language are explicitly stated.
●
No liberties are tolerated.
●
No “deeper understanding” involved.
3. Formal Languages
Alphabet
●
Set of legitimate symbols which can form words in a
language.
●
Represented by “∑”.
●
Structures made by alphabet are called “strings”.
●
Strings of characters permissible in the language are called
“words”.
5. Formal Languages
Null Word
●
Also called “Null string” or “Empty string”.
●
Represented by symbol “λ” .
●
A string or word with no letters i.e. empty.
●
Could belong to a language depending on the rules.
7. Formal Languages
Null String and Null Set
●
Null string is a word with no letters.
●
Null set is a language with no words.
8. Formal Languages
λ is a word in the language ɸ . True or False?
●
False.
●
ɸ cannot have any words, not even the word λ.
9. Formal Languages
Union Operation
●
●
●
Set operation which provides a union of words of both the
sets.
Represented by “+”.
Example:
● S1 = {b, aba, abba, abbba} and
● S2 = {a, bab, baab, baaab} then
● S1 + S2 = { a, b, aba, bab, abba, baab, abbba, baaab }
10. Formal Languages
If there is a defined language L. Then what about the language
L + {λ} ? Will the new set or language would be the
same as L or will it be different from L?
●
●
●
Dependent on the definition of L.
If the definition of L already contained λ as a valid word in
the language L, then there would be no change in the
language L.
If L did not contain λ in its definition, then the new resultant
language would have a new extra word λ .
11. Formal Languages
If there is a defined language L. What do you think about the
new language that would be obtained after the following
operation L + ɸ ?
●
●
The language would be same as L.
The new language would always be same as L, this is
because the set ɸ does not any new words to the definition
of L as ɸ is an empty set.
12. Formal Languages
Definition of a Language
●
Exhaustively list down all the words in a language.
●
State all grammatical rules in the language.
13. Formal Languages
According to English language grammatical rules
●
●
●
“I ate three mangoes.” -> Valid sentence
“I ate three saturdays.” -> Grammatically correct but absurd
meaning.
Theory of Formal Languages accepts the second senence
as valid.
●
No reference to any “deeper meaning”.
●
Interested only in syntax. Not semantics or diction.
14. Formal Languages
Language “MY_PET”
●
●
●
∑ = { a, c, d, g, o, t}.
Rules:
● “If the earth and the moon ever collide then MY_PET =
{cat}.”
● “But If the earth and the moon, never collide then
MY_PET = {dog}.”
What do you think about this language definition. Is it
valid?”
15. Formal Languages
Language “MY_PET”
●
●
●
The definition of “MY_PET” is not valid.
At this point in the history of the universe, it is impossible to
be certain whether the word “dog” or the word “cat” is or is
not in the language.
For proper specification, a set of rules must enable us to
decide, in a “finite amount of time”, whether a given string
of alphabet letters is or is not a word in the language.
16. Formal Languages
Concatenation
●
●
●
●
Represented by the dot (“.”) operator between two sets.
When we concatenate two strings, they are written side by
side to form a new longer string.
For Eg. If there is a string “aba” and another string “bb”, then
the new longer concatenated string would be “ababb”.
While concatenating, two strings the order plays an
important role in the theory of formal languages.
17. Formal Languages
Concatenation
●
If S1 = “ababa” and S2 = “bbaa” then let us formulate S1.S2
and S2.S1?
●
S1.S2 = abababbaa while S2.S1 = bbaaababa.
●
Both the obtained strings are treated as different.
●
In theory of automata or formal languages, the order does
matter, i.e. ab ≠ ba, as the order of the alphabets is different
in both the strings.
18. Formal Languages
Example of Language
We shall define a new language L 1 where ∑ = {x} and the rule
for formulating words in L1 is that any non-empty string of
alphabet characters is a word.
●
If we start exhaustively listing L1 in lexicographical order then
it would be represented as
L1 = { x, xx, xxx, xxxx, xxxxx ..........................}
or it could have been represented as
L1 = { xn for n = 1, 2, 3 ......... }
19. Formal Languages
Example of Language
●
●
Let us define another language L 2 where ∑ = {x} and the rule
for formulating words in L2 is that any non-empty string of
alphabet characters whose total length is odd is a word.
If we start exhaustively listing L2 in lexicographical order then
it would be represented as
L2 = { x, xxx, xxxxx, xxxxxxx ..........................}
or it could have been represented as
L2 = { x2n + 1 for n = 1, 2, 3 ......... }
20. Formal Languages
Example of Language
●
●
●
We define another language L3 where ∑ = {0, 1, 2, 3, 4, 5,
6, 7, 8, 9} and L3 is defined with the rule that any finite
string of alphabet letters that does not start with the letter '0'.
What words would be present in L3?
L3 = { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13...............}
L3 represents the language set of all natural numbers. If we
had allowed all words starting with '0' too, then we could
have got invalid digits like 01, 001, 02 etc.
21. Formal Languages
Example of Language
●
●
How would you change the definition of L 3 to incorporate a
zero, and form a set of all “whole” numbers?
If we change the rule for L3 as any finite string of alphabet
letters that, if starts with the alphabet '0', has no more letters
than the first, then the language L 3 would incorporate all
whole numbers.
22. Formal Languages
Length Function
●
The length function takes a string as an argument and
returns the number of letters in the string.
●
If a = 'xxxx' then length (a) = 4.
●
If c = '428' then length (c) = 3.
●
What would be the answer of length ( λ )??
● The answer would be zero as the null word contains no
letters.
23. Formal Languages
Reverse Function
●
If 'a' is a word in some language L, then reverse (a) is the
same string of letters spelled backward, called the reverse of
a, even if this backward string is not a word in L.
●
Reverse (xxx) = xxx.
●
Reverse (145) = 541.
●
Reverse (140) = 041.
24. Formal Languages
Palindrome Language
●
●
If ∑ = {a, b} then the language palindrome is defined as
follows:
Palindrome = { λ, all strings x such that Reverse (x) = x }
Palindrome = { λ, a, b, aa, bb, aaa, aba, bab, bbb, aaaa,
abba................... }
25. Formal Languages
Kleene Closure
●
●
●
Also called Kleene star.
Given an alphabet ∑, we wish to define a language in which
any string of letters from ∑ is a word, even the null string is
called the Kleene closure of the language.
Represented as ∑*.
26. Formal Languages
Kleene Closure
●
●
●
Example 1: Let ∑ = {x} then what would be ∑*?
● ∑* = { λ, x, xx, xxx, xxxx, ..............}.
● All the letters of the alphabet can be repeated any number
of times in any possible order.
Example 2: Let ∑ = {0, 1} then what would be ∑*?
● ∑* = { λ, 0, 1, 00, 01, 10, 11, 000, 001 ..............}.
Example 3: Let ∑ = {a, b, c} then what would be ∑*?
● ∑* = { λ, a, b, c, aa, ab, ac, ba, bb, bc, ca, cb, cc,
aaa................... }
27. Formal Languages
Kleene Closure for set S
●
●
Given by S*.
S* signifies the set of all finite strings formed by
concatenating words from S, where any word may be used
as often as we like, and where the null string is also
included.
28. Formal Languages
Kleene Closure for set S
●
Example 4: If S = {aa, b}, then what is S*?
●
●
●
S* = { λ and any word composed of factors of 'aa' and 'b'}.
S* = { λ, b, aa, bb, aab, baa, bbb, aaaa, aabb, baab, bbaa,
bbbb, aaaab, aabbb, baaaa .........}
Does the string 'aabaaab' belongs to S*?
●
No, the string 'aabaaab' does not belong to S*. This is
because we cannot factorize this string in any such way that it
will only compose of the factors of 'aa' and 'b'.
29. Formal Languages
Kleene Closure
●
Example 5: If S = {a, ab}, then what is S*?
● S* = { λ and any word composed of factors of 'a' and 'ab'}
●
●
●
S* = { λ, a, aa, ab, aaa, aab, aba, aaaa, aaab, aaba, abaa,
abab, aaaaa, aaaab, aaaba, aabaa, aabab, abaaa, abaab,
ababa .........}
Now analyse whether the word 'abaab' belongs to S*. What do
you think?
To prove that a certain word is in the closure language S*, we
must show how it can be written as a concatenation of words
from the base set S. Now the word abaab => (ab) (a) (ab).
30. Formal Languages
Kleene Closure
●
●
●
There could be more than one possible ways to factorize a
given word.
If there is only one way to factor, then we say that the
factoring is unique.
If there are more than one ways, we say that the factoring is
not unique.
31. Formal Languages
Q1: If ∑= ɸ (the empty set) then what is ∑*?
●
●
●
The answer to this question is λ, i.e. ∑* = { λ }.
If the alphabet has no letters, then its closure is the language with
the null string as its only word.
This is because λ is always a word in Kleene closure, as stated by
the definition of Kleene closure.
32. Formal Languages
Q2: If S= { λ } then what is S*?
●
●
●
The answer to this question is again λ, but for a different reason.
As the Kleene closure signifies the set of all finite strings formed
by concatenating words from S, we formulate set
S = { λ, λ.λ, λ.λ.λ...........} and so on.
But concatenation of λ with another λ will give an outcome as λ.
As λ. λ = λ.
33. Formal Languages
●
●
●
●
Example 6: S = {a, b, ab} and T = {a, b, bb}. What would be S*
and T*?
Both S* and T* are languages of all the strings of a's and b's.
Since any string of a's and b's can be factored into syllables of
either (a) or (b), both of which are in S and T.
Thus, we conclude that the Kleene closure of two sets can end up
being the same language even if the two sets that we started with
were not same.
34. Formal Languages
Positive Closure
●
●
●
Positive Closure for a particular set is represented by S +.
If S is a set of strings not including λ, then S + is the language
S* without the word λ.
S+ signifies the set of all finite strings formed by
concatenating words from S, where any word may be used
as often as we like.
●
However, S+ does not contain the word λ .
●
This is only possible if the set S does not originally contain λ.
36. Formal Languages
Theorem
For any set S of strings we have S* = S**.
Proof:
● In order to prove two sets A and B equivalent we need to prove two
things.
i)A ↄ B and
ii) B ↄ A.
Now, similarly we will prove the theorem in two parts.
37. Formal Languages
Theorem
For any set S of strings we have S* = S**.
Part 1: S* ↄ S**
● Every word in S** is made up of factors from S*.
●
Every factor from S* is made of factors from S.
●
Therefore, every word in S** is made up of factors from S.
●
Therefore, every word in S** is also a word in S*.
●
Thus we can write S* ↄ S**.
38. Formal Languages
Theorem
For any set S of strings we have S* = S**.
Part 2: S** ↄ S*
●
●
Now, in general, it is true that for any set A, we know that A* ↄ
A, since in A* we can choose as a word any factor from A.
So if we consider A to be our set S*, we have S** ↄ S*.
From Part 1 and Part 2, it implies that S* = S**.