DevanagiriOCR on CELL BROADBAND ENGINE

Implementation of Devanagari
OCR on Cell Broadband Engine

Team Details
Name
Amulya K (Team Leader)
Astha Goyal
Kodamasimham Pridhvi
Mehak Dhiman

Roll Number
MT2012017
MT2012028
MT2012066
MT2012080

Email Id
amulya.k@iiitb.org
astha.goyal@iiitb.org
pridhvi.kodamasimham@iiitb.org
mehak.dhiman@iiitb.org

Technical Report IIITB-OS-2012-10b
April 2013

Contact No
9880345463
9740126090
8123160887
9535942619

Abstract
Devanagari OCR is an Optical Character Recognition system
which takes scanned images of Devanagari text as input and converts
them to corresponding editable text files. The major modules in
the system are segmentation of the image, feature extraction of
the segmented image, recognition using these features and then
mapping the segmented image to its corresponding unicode format, to
obtain an editable document. In this project, an open source system
developed under GNU public license for English OCR, called GOCR
system has been used. It has been modified GOCR system [1] such
that it works for Devanagari script. The code now converts Devanagari documents in the form of images, to editable Unicode text format.
Processing the images and converting it to the corresponding Unicode format takes a lot of processing time if implemented on a single
processor. This is where Cell Broadband Engine (CBE) comes into
picture. It is a multicore processor having nine microprocessors on
which data can be processed parallely which in turn reduces the overall processing time as compared to a single processor system where the
images are processed serially. This has been utilized in this project
to process the images faster. In a document containing several images, each image is assigned to a different processor, which pass the
image through OCR system parallely. Thus the existing OCR system
becomes more efficient by reducing the total processing time.

1

Acknowledgements
We would like to express our thanks to Prof. Shrisha Rao for the support
and guidance given to us by him in completing this project .
We would like to extend our thanks to Mr. Jorg Schulenburg. He is the
author of the open source English OCR code named ‘GOCR’, which we
have used and modiﬁed to suit Devanagari.
We would also like to thank the authors of the various papers we have
refered.

2

List of Figures
1

CBE Architecture

. . . . . . . . . . . . . . . . . . . . . . . .

11

2

Architecture for implementation of OCR on CBE . . . . . . .

12

3

Plot of performance of Devanagari OCR on single core vs
multi core . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15

Plot of no. of images vs time taken in serial mode against
parallel mode . . . . . . . . . . . . . . . . . . . . . . . . . . .

17

4

3

List of Tables
1
2

Comaprison of performance of Devanagari OCR on single core
against multiple cores of the CBE . . . . . . . . . . . . . . . .

15

Comparison of no. of images vs time taken in serial mode
against parallel mode . . . . . . . . . . . . . . . . . . . . . . .

16

4

Contents
List of Figures

3

List of Tables

4

1 Introduction

6

2 Problem Statement

6

3 Gap Analysis

7

4 Literature Survey

8

5 Architecture

8

5.1

Optical Character Recognition System (OCR) . . . . . . . . .

8

5.2

Cell Broadband Engine (CBE) . . . . . . . . . . . . . . . . .

10

5.3

Overall system Architecture: Implementation of OCR on CBE 12

6 Implementation details

13

7 Results

14

7.1

Recognition rate . . . . . . . . . . . . . . . . . . . . . . . . .

14

7.2

Performance of the OCR system on CBE . . . . . . . . . . .

14

8 Conclusion and future work

17

9 References

18

5

1

Introduction

Devanagari is an age old script used for writing many languages like
Sanskrit, Hindi, Marathi, Punjabi, Konkani, Thula etc. This has generated
a lot of interest in developing new methods of processing documents in
Devanagari. Optical Character Recognition (OCR) is an indispensible part
of Document Image Analysis.
OCR is basically the process of converting printed or hand-written text
into machine-encoded text by recognizing the characters in the document
image and converting them to a Unicode format. The output obtained
is a parsable document which a machine can interpret and perform some
operations or execute certain commands based on the text interpreted.
The general applications of this can include helping the blind “read” by
converting the text in Unicode format to speech, developing intelligent cars
which are able to interpret the signs on the road and navigate on thier
own without the intervention of a human being, intelligent robotics, among
many others.
The Cell Broadband Engine Architecture (CBEA) and the Cell Broadband
Engine are the result of a joint venture by Sony, Toshiba, and IBM (STI).
The Cell Broadband Engine was initially intended for application in game
consoles and media-rich consumer-electronics devices. Implementation
of an OCR on a CBE comes with the advantage of fast processing and
recognition, owing to the immense parallel processing ability the CBE
brings with it.

2

Problem Statement

The project is aimed at processing the image of a document to produce
machine-encoded text (in Unicode format) as the output. This will be
implemented on the Cell processor, for printed text and possibly extended
for hand-written text. The challenges which make the problem interesting
are, the clarity of the document image provided as the input, the lighting
conditions under which the image has been taken, possible degrading of
the paper quality due to age and skew in the scanned documents. Also, a
signiﬁcant diﬀerence between English and Devanagari for instance, lies in
6

the fact that alphabet in English language does not have anything above or
below the line, which is not the case in Devanagari. This project caters to
all these challenges.

3

Gap Analysis

The following points are noteworthy:

• The high performance provided by the CBE can be exploited for large
data applications. OCR is one such application, which has a large
dataset of characters and sometimes even words, which need to be
pre-processed and stored in the database, after feature extraction. The
parallelization provided by CBE will make the time taken taken for
this process negligible. OCR has not been implemented on CBE before, and the fact that there is no literature about OCR on CBE is a
testimony to the same.
• Devanagari is a language used widely across a large geographical area.
Yet, it is yet to receive the amount of importance that really needs to
be attached with it. Whatever research that has happened on Devanagari OCR has happened sparsely and it can be observed that there has
been no co-ordination among the individual research groups. Hence it
is difficult to clearly pin point the exact state-of-the-art in this field.
Also, the number of IEEE papers published in this field in the years
2011 and 2012 put together, is only three. This clearly shows that this
area needs to be addressed urgently.
• A lot of research on OCR in Devanagari has focused on the Feature
Extraction techniques and very few papers have been published on
Segmentation techniques. Literature survey has shown that most of
the OCR implementations suffer due to errors in the segmentation
stage. This usually happens because of improper segmentation due to
characters from one line getting joined with the characters on the line
below, and so on. This area needs to be addressed in this project.

7

4

Literature Survey

Literature in OCR can be classified into two - that in segmentation and
that in recognition. The most widely used method for segmentation is that
of horizontal and vertical projection profiles[3] of an image. Kompalli et.
al. [4] have proposed a recognition driven framework along with stochastic
models, where a graph representation is used to segment characters. This
paper also talks about the usage of linguistics and language models. As
for recognition, Neural Networks (NN) [5] [6] have been most widely used.
Singh et. al. [5] have proposed an OCR system based on artificial NN, where
the features include mean distance, histogram of projection based on spatial
position of pixels and histogram of projection based on pixel value. [7] talks
about topographic feature extraction, where the topographic features are,
closed region, convexity of strokes and straight line strokes. These features
are represented as a shape based graph which acts as an invariant feature
set for discriminating very similar type characters efficiently. [8] proposes a
bilingual recognizer based on PCA, followed by support vector classification
for a Hindi-Telugu OCR. A good survey paper can be found in [9].

5

Architecture

Figure 2 shows the architecture of for the implementation of the OCR
system on CBE. This section will be in three parts:

• The first part will describe the exact steps involved and algorithms
used in the OCR
• The second part will talk about the CBE architecture
• The third part will show how the OCR system described above
will be implemented on CBE

5.1

Optical Character Recognition System (OCR)

An OCR system involves the following four main stages:

8

1. Pre-processing: Pre-processing involves threshold value detection,
the process of converting the input image into a binary image. It also
includes filtering of the image in which we have modules to increase
contrast, clean the dust and remove coffee mug stains. Removal of
serifs ( horizontal lines glued on the lower end of the vertical lines)
and making a barely connected character as connected are the other
pre processing steps.
Algorithm:
• Binarization of an image is done by taking a threshold value and
converting all intensity values above the threshold to 1 and all
intensity values below the threshold to 0. This produces a ‘black
and white’ image, which is nothing but the binarized image
• Blind pixels i.e dust is removed by using fill algorithm
• Serifs are removed by classifying horizontal and vertical pixels by
comparing the distance between the next vertical and next horizontal white pixels and then measuring mean thickness of vertical
and horizontal clusters and erasing unusually thin horizontal pixels at the bottom line
• A 2x2 filter sets pixels to 0 near barely connected pixels hence
making them connected
2. Segmentation: Segmentation is the process of splitting up the
pre-processed image, first into individual lines, then into each word in
a particular line, and finally into individual characters in each word.
This step is required as these characters need to be recognized in one
of the following steps in order to print it in Unicode format.
Algorithm: Horizontal Projection Profile (HPP) and Vertical
Projection Profile (VPP) are the two most widely used algorithms
for segmentation of a document image. After an image is binarized,
each row of the image is summed up to get its HPP. Once the HPP
is obtained, there will be a dip in the HPP at places where the image
has white spaces between two lines. This will be taken as the reference
to segment lines. Each line is then taken and a VPP is taken on that
line to get inter-word spaces (and hence a dip in the VPP), and hence
word segmentation will be done. For individual characters of a word,
a connected component analysis will be done to separate individual
9

characters, where every connected entity (each character in this case)
will be labelled as a separate object. Every part of the word below the
line will be output as a separate entity and will be sent to a different
classifier.
3. Feature Extraction: This is the process of extracting specific
characteristics of an image (in this case, a character), which are
referred to as it’s “features”. These features will be used to separate
one character from the other during the recognition stage.
Algorithm: Features which are extracted for recognition are shape
based image invariants and other essential features of a character.
Other essential features include list of longest lines and bows. It also
includes the no. of points and lines connected between these points to
make a character. Also the relative position of these points are taken
into account.
4. Recognition: Training samples are processed initially and a database
of their features is created and stored. Recognition is the process
where each character will be labelled. This is done by comparing
the same features of each ‘test’ image, against the features of every
‘trained’ image in the database. The class of an image with the
maximum match will be reported as the match class.
Algorithm: Euclidean distance will be used for matching. Support
vector machine (SVM) will act as classifier.

5.2

Cell Broadband Engine (CBE)

The IBM Cell Broadband Engine has a multi-core architecture designed for
high performance computing and distributed multi-core computing. The
architecture features nine microprocessors on single chip. The multi core
chip capable of massive floating-point processing is optimized for computing
intensive workloads, rich broadband media applications, signal and image
processing.

10

Figure 1: CBE Architecture

Figure 1 shows the architecture of the Cell Broadband Engine. The Cell
processor can be split into four components:
• External input and output structures
• Power Processing Element (PPE)
• Synergistic Processing Elements (SPE)
• Element Interconnect Bus (EIB)
The Power Processing Element (PPE) which is the main processor,
controls and coordinates all remaining 8 co-processing Synergetic Processing Elements (SPEs). SPEs are designed to accelerate media and
streaming workloads. Area and power eﬃciency are important enablers
for multi-core designs that take advantage of parallelism in applications.
It is on these SPEs that the OCR will run and will be controlled by the PPE.

11

5.3

Overall system Architecture: Implementation of OCR
on CBE

Figure 2: Architecture for implementation of OCR on CBE

Figure 2 shows the architecture for the implementation of the OCR system
on CBE.

1. Training and testing are the most important steps in an OCR system.
Training is a one time process, where character samples of the Devanagari script are taken and their features (obtained through a combination of PCA and image-invariant methods) are extracted. This step
can be performed on the CBE or the on the local Ubuntu machine
itself, as it is a one time process. The features so obtained are stored
in a feature database.
2. During testing, an input image (document image) is given as the input
to the CBE. The PPE instructs one of the SPEs to carry out the
segmentation process. Since there are eight SPEs, mutiple images can
12

be segmented during the time usually taken to segment one image.
Segmentation is done by the CBE and that is the intermediate output.
3. The PPE gets the segmented words and instructs the SPE to perform
feature extraction. The extracted features will be the output of this
step. The PPE stores this in its memory temporarily. This is required
as multiple images will be processed at a time.
4. The PPE now fetches the features of an individual image and also
the features stored in memory. Comparison of the test features with
the trained features will happen across multiple SPEs. The match
values obtained by all the SPEs is compared by the PPE and the class
matched with the least value (in case of Euclidean distance) is reported
as the match class.
5. The output given by the CBE will be the matched class. This will be
used to write the corresponding character in Unicode format using the
Devanagari fonts present in the system.

6

Implementation details

Performance comparison has been done with respect to two aspects:
• For diﬀerent number of lines of input of a single document, the
time taken to produce the OCR output using a single core has been
compared against the time taken to produce the OCR output using
multiple cores of the CBE. The number of lines has been varied from
5 to 70 in increments of 5, and the time taken to run it using a single
core has been compared against the time taken to run it on multiple
SPUs.
• For diﬀerent number of input documents, the time taken to produce the OCR output using a single core has been compared against
the time taken to produce the OCR output using multiple cores of
the CBE. The number of input documents given simultaneously to a
system has been varied from 1 to 6 (so that each image is processed
on one core of the SPU - there are 6 SPUs in the Sony PS3 on which
the system has been tested), and the time taken for processing those
many documents using a single core has been compared with the time
taken to process all of them simultaneously using multiple SPUs.
13

In order to obtain a fair comparison of the time obtained, comparison has to be done using systems with the same RAM capacity and
processor speeds. Hence, when reporting time taken to run on a single core, only the PPU of the Cell processor has been used to run the
OCR system. This represents serialization, as there is only 1 PPU and
given a number of inputs, they have to be processed one by one. When
reporting the time taken to run on multiple cores, the PPU along with
6 SPUs of the Sony PS3 system has been used. Here, the PPU takes
in the input and creates multiple threads for the SPUs. At a time, 6
input documents can be processed simultaneously on the 6 SPUs, and
the time taken for such parallel execution has been reported.

7
7.1

Results
Recognition rate

The recognition rate obtained for the GOCR system modiﬁed for
Devanagari is 87%. The recognition rate has been calculated as the ratio
of properly recognised words to the total number of words, and has been
calculated over 50 documents.

7.2

Performance of the OCR system on CBE

Dependency of time taken, on the no. of lines of input
Table 1 shows the time taken to produce the OCR output using a single
core as against the time taken to produce the OCR output using multiple
cores of the CBE, for diﬀerent no. of lines of input.

14

Table 1: Comaprison of performance of Devanagari OCR on single core
against multiple cores of the CBE
No. Of Lines Single Processor (in sec) Multi Processor(in sec)
5
81
74
10
182
173
15
277
251
20
365
333
25
478
435
30
560
510
35
694
630
40
1325
1200
45
1480
1340
50
1657
1500
60
1950
1640
70
2334
2130

Figure 3: Plot of performance of Devanagari OCR on single core vs multi
core

15

Figure 3 shows the plot of time taken on single core and multiple cores
for diﬀerent no. of lines of input. It can be seen that as the number of
lines increase, the performance of the multi core CBE becomes better than
that obtained from a single core. That is, the CBE takes much lesser time
compared to a single core processor, when the size of the input increases.

Dependency of time taken, on the no. of input documents run serially vs parallely
Table 2 shows the time taken to produce the OCR output on a single core
as against the time taken on multiple cores of the CBE, for diﬀerent no. of
input documents.
Table 2: Comparison of no. of images vs time taken in serial mode against
parallel mode
No. Of Images Serial (in sec) Parallel (in sec)
1
96
95
2
191
194
3
287
266
4
382
368
5
478
435
6
575
523

16

Figure 4: Plot of no. of images vs time taken in serial mode against parallel
mode

Figure 4 shows the plot of time taken on single core and multiple cores for
diﬀerent no. of input documents. As the number of documents that are run
parallely increases, the performance of the CBE system is better than that
obtained on a single processor, in terms of the time taken.

8

Conclusion and future work

Devanagari OCR has been successfully implemented on a CBE. Since the
OCR is implemented in multicore architecture, the eﬃciency is better than
the single core architecture.
It was observed that for large input sizes CBE gave better performance as
compared to the smaller ones.
In the current implementation, the OCR is able to recognize the scanned
images of the printed text. More algorithms can also be incorporated into the
17

OCR and the database can be trained accordingly to work for handwritten
documents and for different languages. The algorithms can be optimized
more to give better recognition rate.

9

References
[1] gocr.http://jocr.sourceforge.net - Schulenburg
[2] Michael Kistler, Michael Perrone, Fabrizio Petrini, “CELL Multiprocessor Communication Network: Built for Speed”, IEEE Micro, MayJune 2006.
[3] Manoj Kumar Shukla, Dr. Haider Banka, “An Efficient Segmentation
Scheme for the Recognition of Printed Devanagari Script”, IJCST,
Vol. 2, Issue 4, Oct.- Dec. 2011.
[4] Suryaprakash Kompalli, Srirangaraj Setlur, Venu Govindaraju, “Devanagari OCR using a recognition driven segmentation framework and
stochastic language models”, IJDAR (2009), Vol. 12, pp. 123-138, DOI
10.1007/s10032-009-0086-8.
[5] Raghuraj Singh1, C. S. Yadav, Prabhat Verma, Vibhash Yadav, “Optical Character Recognition (OCR) for Printed Devnagari Script Using
Artificial Neural Network”, International Journal of Computer Science
Communication, Vol. 1, No. 1, January-June 2010, pp. 91-95.
[6] Anilkumar
N.
Holambe,
Sushilkumar
N.
Holambe,
Dr.Ravinder.C.Thool, “Comparative Study of Devanagari Handwritten and printed Character
Numerals Recognition using
Nearest-Neighbor Classifiers”, IEEE 2010, 978-1-4244-5540-9.
[7] Soumen Bag, Gaurav Harit, “Topographic Feature Extraction For
Bengali and Hindi Character Images”, Signal Image Processing : An
International Journal (SIPIJ), Vol.2, No.2, June 2011
[8] C. V. Jawahar, M. N. S. S. K. Pavan Kumar, S. S. Ravi Kiran, “A
Bilingual OCR for Hindi-Telugu Documents and its Applications”,
International Institute of Information Technology, Hyderabad, 2006.
[9] R. Jayadevan, Satish R. Kolhe, Pradeep M. Patil, and Umapada Pal,
“Offline Recognition of Devanagari Script: A Survey”, IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and
Reviews, Vol. 41, No. 6, November 2011.
18

[10] U. Pal, T. Wakabayashi, and F. Kimura, “Comparative study of Devanagari handwritten character recognition using different features
and classifiers”, in Proc. 10th Conf. Document Anal. Recognit., 2009,
pp. 1111-1115.
[11] S. Arora, D. Bhattacharjee, M. Nasipuri, D. K. Basu, M. Kundu,
and L. Malik, “Study of different features on handwritten Devnagari
character”, in Proc. 2nd Emerging Trends Eng. Technol., 2009, pp.
929-933.

19

DevanagiriOCR on CELL BROADBAND ENGINE

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a DevanagiriOCR on CELL BROADBAND ENGINE

Semelhante a DevanagiriOCR on CELL BROADBAND ENGINE (20)

Mais de Pridhvi Kodamasimham

Mais de Pridhvi Kodamasimham (7)

Último

Último (20)

DevanagiriOCR on CELL BROADBAND ENGINE