SlideShare uma empresa Scribd logo
1 de 16
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
607
SCRIPT IDENTIFICATION FROM PRINTED DOCUMENT IMAGES
USING STATISTICAL FEATURES
M. M. Kodabagi1
, S. R. Karjol2
1
Department of Computer Science and Engineering,
Basaveshwar Engineering College, Bagalkot-587102, Karnataka, India
2
Department of Computer Science and Engineering,
Basaveshwar Engineering College, Bagalkot-587102, Karnataka, India
ABSTRACT
Automatic identification of a script in a document image facilitates many important
applications such as automatic archiving of multilingual documents; searching online archives of
document images and for the selection of script specific OCR in a multilingual environment. In
this work a technique for script identification from document images is proposed. The method
uses vertical and horizontal run components/objects of words of a single line of text to
distinguish 3 Indian scripts: Kannada, Hindi and English. Initially, the method segments words
from the selected line of text from a document image. Then statistics of horizontal and vertical
run objects are determined. Further, linear discriminant function is used to identify script of the
document image as Kannada, Hindi or English script. The method has been tested for 300
document images and the method found to be robust and efficient. The proposed system achieves
93% identification accuracy for Hindi script, 90% identification accuracy for English script and
86% identification accuracy for Kannada script.
1. INTRODUCTION
In recent years, the escalating use of physical documents has made progress towards the
creation of electronic documents to facilitate easy communication and storage of documents.
However, the usage of physical documents is still prevalent in most of the communications. The
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING
& TECHNOLOGY (IJCET)
ISSN 0976 – 6367(Print)
ISSN 0976 – 6375(Online)
Volume 4, Issue 2, March – April (2013), pp. 607-622
© IAEME: www.iaeme.com/ijcet.asp
Journal Impact Factor (2013): 6.1302 (Calculated by GISI)
www.jifactor.com
IJCET
© I A E M E
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
608
amount of creation and storage of electronic documents is increasing rapidly with the advances
in computer technology. Such data include multi-lingual documents. For example, museums
store images of old fragile documents in typically large databases. These documents have
scientific or historical or artistic value and can be written in different scripts. Document analysis
systems that help process these stored images is of interest for both efficient archival and to
provide access to various researchers. Script identification is a key step that arises in document
image analysis especially when the environment is multiscript and multi-lingual. An automatic
script identification scheme is useful to (i) sort document images, (ii) select appropriate script-
specific OCRs and (iii) search online archives of document images for those containing a
particular script.
India is a multi-script multi-lingual country and hence most of the document including
official ones, may contain text information printed in more than one script/language forms. For
such multi script documents, it is necessary to pre-determine the language type of the document,
before employing a particular OCR on them. With this context, it is proposed to work on the
prioritized requirements of a particular region- Karnataka, a state in India.
In a multi-lingual country like India (India has 18 regional languages derived from 12
different scripts; a script could be a common medium for different languages), documents like
bus reservation forms, passport application forms, examination question papers, bank-challenge,
language translation books and money-order forms may contain text words in more than one
language forms. For such an environment, multi lingual OCR system is needed to read the
multilingual documents. To make a multi-lingual OCR system successful, it is necessary to
separate portions of different language regions of the document before feeding to individual
OCR systems. In this direction, multi lingual document segmentation has strong direct
application potential, especially in a multilingual country like India. In the context of Indian
languages, some amount of research work has been reported. Further there is a growing demand
for automatically processing the documents in every state in India including Karnataka. Under
the three language formulae, adopted by most of the Indian states, the document in a state may
be printed in its respective official regional language, the national language Hindi and also in
English. Accordingly, a document produced in Karnataka, a state in India, may be printed in its
official regional language Kannada, national language Hindi and also in English. For such an
environment, multilingual OCR system is needed to read the multilingual documents.
According to the three language policy adopted by most of the Indian states, the
documents produced in Karnataka are composed of texts in Kannada- the regional language,
Hindi – the National language and English. Such trilingual documents are found in majority of
the private and Government sectors, railways, airlines, banks, post-offices of Karnataka state.
For automatic processing of such tri-lingual documents through the respective OCRs, a pre-
processor is necessary which could identify the language type of the texts words. So, it is
proposed to develop a model to identify the script of documents containing Kannada, Hindi and
English text.
Some essential factors need to be considered before choosing or designing a script
identification scheme for any multi-lingual application. These factors are: (a) complexity in pre-
processing, (b) complexity in feature extraction and classification, (c) computational speed of
entire scheme, (d) sensitivity of the scheme to the variation in text in document (font style, font
size and document skew), (e) performance of the scheme, and (f) range of applications in which
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
609
the scheme could be used. Performance of the scheme includes accuracy reported and selection
of testing data. Currently, individual approaches are designed such that they can effectively deal
with some of the factors listed above (not all).
Some of the key challenges identified in script identification works [1-10] from the
factors listed above are presence of document degradation, skew, varying font size and font type.
There are four types of most common document degradation, namely, poor image resolution,
noise including salt and pepper noise, and Gaussian noise and physical document degradation.
All these document degradation must be compensated before script identification. An image that
is slanting too far in one direction or one that is misaligned is known as skew. Compensating for
the dominant skew angle in an entire page image may not be sufficient adjustment to allow
accurate script identification. In case of varying font size and font type, the relative offsets are
distributed it is difficult to accurately estimate results with limited font size. It is difficult to
classify documents those printed in unfamiliar font types. The difficulty of most of images in
script identification appeared to stem from their unfamiliar font types.
From the reported works [1-10] on script identification, the documents produced in
Karnataka usually are composed of texts in Kannada, Hindi and English. Though a great amount
of work has been carried out on identification of the three languages Kannada, Hindi and
English, very few works pertain to script identification processing the document image at
word/line level. By analysing the study of work carried out on word level identification of
Kannada, English and Hindi, a generalisation of existing work with more accurate results for
script identification from document images have been carried out. Also, the processing of
word/line level reduces the number of computations.
Language identification is one of the vision application problems. Generally human
system identifies the language in a document using some visible characteristic features such as
texture, horizontal lines, vertical lines, which are visually perceivable and appeal to visual
sensation. This human visual perception capability has been the motivator for the development of
the proposed system. With this context, an attempt has been made to simulate the human visual
system, to identify the type of the script based on visual clues, without reading the contents of
the document. . Hence, this motivated for developing a technique for script identification of
Kannada, Hindi and English from printed document images used in Karnataka to report better
recognition.
In this work a technique for script identification of Kannada, English and Hindi from
document images is proposed. In this work a technique for script identification from document
images is proposed. The method uses vertical and horizontal run components/objects of words of
a single line of text to identify the script of the document image. Further, the method
distinguishes 3 Indian scripts: Kannada, Hindi and English. Initially, the method segments words
from the selected line of text from a document image. Then statistics of horizontal and vertical
run objects are determined. Further, linear discriminant function is used to identify script of the
document image as Kannada, Hindi or English script. The method has been tested for 300
document images and the method found to be robust and efficient. The proposed system achieves
93% identification accuracy for Hindi script, 90% identification accuracy for English script and
86% identification accuracy for Kannada script. The literature survey related to current work is
summarized in the following section.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
610
The rest of the paper is organized as follows; the detailed survey related to script
identification from printed document images is described in Section 2. The proposed method is
presented in Section 3. The experimental results and discussions are given in Section 4. Section
5concludes the work and lists future directions of the work.
2. RELATED WORKS
A substantial amount of work has gone into the research related to script identification
from printed document images. Some of the related works are summarized in the following.
A robust method for determination of the script and language content of Document
Images proposed in [1]. The algorithm determines connected components and locates upward
concavities and then classifies the script into two broad classes Han-based (Chinese, Japanese
and Korean) and Latin-based (English, French, German and Russian) languages. The extraction
of Rotation Invariant Texture Features and Their Use in Automatic Script Identification has been
carried out in [2]. The method computes features from text blocks using multi-channel gabor
filters, constructs a representative feature vector and Euclidian distance classifier is used for
script identification of 6 languages (Chinese, English, Greek, Russian, Persian, and Malayalam).
Script and Language Identification from Document Images using Multiple channel Gabor filters
and gray level cooccurrence matrices (GLCMs) to extract texture features and K-NN classifier is
used to classify seven languages; Chinese, English, Greek, Korean, Malayalam, Persian and
Russian has been proposed in [3].
The Cluster-Based Templates is used for Automatic Script Identification from Document
Images in [4]. Evaluation of Texture features for Script Identification is carried out in [5]. A
method for Automatic Identification of English, Chinese, Arabic, Devnagari and Bangla Script
Line is discussed in [6]. A method for Script and Language Identification in Noisy and Degraded
Document Images is presented in [7]. Script Identification Based on Morphological
Reconstruction in Document Images is described in [8]. A simple technique based on the
characteristic features of top-profile and bottom-profile of individual text lines for Identification
for Kannada, Hindi and English text lines from a printed document is proposed in [9]. Script
Identification at both paragraph and word level using Appearance based models have been
presented in [10].
A Survey of Script Identification technique for Multi-Script Document Images is carried out
in [11]. Two-stage Approach for Word-wise Script Identification of English (Roman), Devnagari and
Bengali (Bangla) scripts is proposed in [12]. Zone-based Structural feature extraction to recognize
four south Indian scripts namely Kannada, Telugu, Tamil and Malayalam along with English and
Hindi is employed in [13]. A technique presented in [14] use Voting Technique for Script
Identification from a Tri Lingual Document. The technique presented in [15] extracts features
consistent with human perception from the responses of a multi-channel log-Gabor filter bank,
designed at an optimal scale and multiple orientations for Script Identification from Indian
Documents.
A simple and efficient technique for script identification for Kannada, Hindi and English text
lines from a printed document using horizontal projection profile is presented in [16]. A method for
Word level Script Identification for scanned document images in which during both training and
testing , a Gabor filter is applied and 16 channels of features are extracted is evaluated in [17]. Multi-
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
611
script identification technique for Indian languages using different text lines of Indian scripts from a
document are identified in [18].
A method found in [19] uses texture-based approach to identify the script type using Wavelet
Packet Based Features for documents printed in seven scripts: Kannada, Tamil, Telugu, Malayalam,
Urdu, Hindi and English.
A technique proposed in [20] for language identification in document images to discriminate
five major Indian languages: Hindi, Marathi, Sanskrit, Assamese and Bengali belong to Devnagari
and Bangla scripts. But, in the current work horizontal and vertical run objects determined from the
text line of document image are used to determine the script of document. The detailed description of
the methodology is given in the following section.
3. PROPOSED METHODOLOGY FOR SCRIPT IDENTIFICATION
The proposed methodology uses horizontal and vertical run objects to determine the script of
the document image containing Kannada, Hindi or English text. The methodology comprises four
phases; Image Acquisition, Preprocessing, Segmentation, Feature Extraction and Linear Discriminant
Analysis. The block diagram of proposed model is given in Figure 3a. The detailed description of
each processing step is presented in the following subsections.
3.1 Image acquisition
The process begins with acquiring document images of the three scripts Kannada, Hindi and
English. The document images are scanned images which are downloaded from the internet. The
document images considered as input are skew free and noise free. About 300 sample images i.e.,
100 samples of each script are collected as requirement.
Input document image
Identified script as Kannada/English/Hindi
Fig. 3a. Block Diagram of Proposed Model
PREPROCESSING
(Binarization and Bounding Box )
SEGMENTATION
(Line and Words segmentation)
FEATURE EXTRACTION
(Horizontal run objects and Vertical run objects)
LINEAR DISCRIMINANT ANALYSIS
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
612
3.2 Preprocessing
In the preprocessing phase, the text document images taken as input are binarized and
bounding box is generated. Binarization is the step in which the image is converted into binary
image where each pixel is represented by either 0 or 1. Binary image is a black and white type of
image. Bounding box is generated by applying horizontal and vertical run objects. The purpose
of this phase is to make the image easier for the feature extraction and classification.
3.3. Segmentation
In this phase the segmentation of single line from the document image is carried out. The
bounding box is generated around the segmented line. From the selected line, the words are
segmented and bounding boxes are generated to the segmented words. The segmentation process
of line and words is described below.
• Segmentation of line
The horizontal projection features are determined to segment a line from the document
image. Bounding box is generated to the segmented line. The line segmentation of Hindi script is
as shown in below Figure 3b, the English script is Figure 3c and the Kannada script is Figure 3d.
3b
3c
3d
Fig. 3b, c, d. Sample Images of segmented lines of Hindi, English and Kannada script
• Segmentation of words
The vertical projection features are determined to extract words from the selected lines.
Using the boundary between two consecutive vertical projections, the words are segmented.
Then the bounding box’s are generated to the segmented words. The segmented words of above
Figures 3b, 3c and 3d are given in the below Figures 3e, 3f and 3g respectively.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
613
3e 3f 3g
Fig 3e, f, g. Sample Images of Segmented words of Hindi, English and Kannada lines.
3.4. Feature extraction
In this phase, the Horizontal run object and vertical run objects of each segmented text
words are determined.
Horizontal run object
In the binary image of each text word, a set of consecutive pixels in a row whose length
is greater than the threshold value (HT) results in a horizontal run objects.
Vertical run object
In the binary image of each text word, a set of consecutive pixels in a column whose
length is greater than the threshold value (VT) results in a vertical run objects. The number of
horizontal and vertical run objects are determined and stored into a feature vector Fv as given in
equation (1).
(1)
Where, Fv is Feature Vector
is the number of horizontal run objects
is the number of vertical run objects
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
614
3.5. Linear Discriminant Analysis
The Discriminant analysis phase of the proposed model uses the and features to
classify the segmented words of the document image as Hindi, Kannada or English script.
Condition 1:
If one of the horizontal run objects ( ) in a word is greater than half of the number of
columns(n2) in a word then, the script of the word is identified as Hindi. (HT is n2/2)
> (n2/2) = Word is Hindi script (2)
Condition 2:
If the value of feature is greater the value of feature , then the script of the word
is identified as Kannada. (HT considered is 3 and VT considered is 5)
> = Word is Kannada script (3)
Condition 3:
Else if the value of feature is greater the value of feature then, the script of the
word is identified as English. (HT considered is 3 and VT considered is 5)
> = Word is English script (4)
After identifying the script of each segmented words then the classification of script of
the document image is done on the bases of above conditions.
Condition 4:
If from the selected line in the document image, the number of words identified as Hindi
script i.e. equation (2) is greater than the total number of words in the selected line then, the
script of the document image is identified as Hindi script.
Condition 5:
If the document image is not Hindi script, then if from the selected line the text words
identified as Kannada script i.e. equation(3) are greater than or equal to the words identified as
English script i.e. equation(4) from the selected line, then the script of the document image is
identified as Kannada script.
Condition 6:
Else, if the document image is not Kannada script, then it means the text words from the
selected line identified as English script i.e. equation (4) are greater than the words identified as
Kannada script i.e. equation (3). And hence, the script of the document image is identified as
English script.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
615
4. EXPERIMENTAL RESULTS AND DISCUSSION
For the purpose of experimentation we have created our own database of document
images. The document images are scanned images which are downloaded from the internet. The
document images considered as input are skew free and noise free. About 300 sample images
i.e., 100 samples of each script are collected as requirement. The proposed methodology has
been tested for about 300 document images containing Kannada, Hindi and English script.
Horizontal and vertical run objects are used for feature extraction. Further, linear discriminant
Analysis is carried out to identify the script of the document image as Hindi, Kannada or English
script. The documents having different font sizes have been considered. Exhaustive
experimentations were done to analyze the performance of the system for different image
patterns.
4.1. An Experimental Analysis for a Sample Hindi Document Image.
Fig. 4a. Sample Input Document Image
Figure 4.a shows sample input document image. The bounding box and Binarization of
input document image is done. The segmentation of line from the document image is carried out.
The segmented line from the document image is shown in the Figure 4.b
Fig. 4b. Segmented Line from Input Image
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
616
After segmentation of line the words are segmented from the selected line. The
segmented words from the line in Figure 4.b are given in Figure 4.c
Fig. 4c. Segmented words
Feature extraction and Linear Discriminant Analysis is carried out. And finally the
document image is identified as Hindi script. The Figure 4.d shows the result displayed.
Fig. 4d. Dialog box
4.2. An Experimental Analysis for a Sample English Document Image.
Example 2: English sample
Fig. 4e. Sample English Input document image
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
617
Figure 4.e shows the original English document image. After applying bounding box and
binarization of the image, segmentation of line from the document image is carried out. The
segmented line from the document image is shown in the Figure 4.f.
Fig. 4f. Segmented line
After segmentation of line the words are segmented from the selected line. The
segmented words are given in Figure 4.g
Fig. 4g. Segmented words
Feature extraction and Linear Discriminant Analysis is carried out. And finally the
document image is classified as English script. The Figure 4.h shows the result displayed.
Fig. 4h. Dialog box
4.3 System Performance Analysis
The overall system performance of the script identification from printed document
images is as shown in the below Table 1
Table 1: Overall System Performance
Tested scripts Number of
document images
Classification rate
Word wise
Classification rate
Line wise
Hindi script 100 (987/1053) 94% 93%
Kannada script 100 (496/636) 78% 86%
English script 100 (781/936) 83% 90%
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
618
4.4 An Experimental Analysis dealing with various issues
The proposed methodology has been evaluated dealing with various issues such as
variation in font size and style, color, noise, varying spacing between words. The results of
experimentation are given below;
Example 1: Sample image with containing noisy document image.
Fig. 4i. Input document image
Fig. 4j. Segmented line
Fig 4k. Extracted words
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
619
Fig. 4l. Dialog box
Example 2: Sample image with smaller font size
Fig. 4m. Input document image
Fig. 4n. Segmented line
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
620
Fig. 4o. Segmented words
Fig. 4p. Dialog box
5. CONCLUSION
In this method, Line and Word-Wise identification models to identify Kannada, Hindi
and English text words from Indian multilingual machine printed documents have been
presented. The proposed model is developed based on the visual discriminating features, which
serve as useful visual clues for script identification. Horizontal and Vertical run objects are used
for feature extraction. The methods help to accurately identify and separate different language
portions of Kannada, English and Hindi. The experimental results show that the method is
effective and good enough to identify and separate the three language portions of the document,
which further helps to feed individual language regions to specific OCR system. Further, linear
discriminant function is used to identify script of the document image as Kannada, Hindi or
English script. The method has been tested for 300 document images and the method found to be
robust and efficient. The proposed system achieves 93% identification accuracy for Hindi script,
90% identification accuracy for English script and 86% identification accuracy for Kannada
script approach. The proposed system can also be extended to identify other Indian languages
and foreign languages.
REFERENCES
[1] A. L. Spitz, 1997, “Determination of script and language content of document
images”, IEEE Trans. On Pattern Analysis and Machine Intelligence, Vol. 19, No.3,
pp. 235–245, 1997.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
621
[2] T. N. Tan, 1998, “Rotation Invariant Texture Features and their use in Automatic Script
Identification”, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 7,
pp. 751-756, 1998.
[3] G. S. Peake and T. N. Tan, 1997, “Script and Language Identification from
Document Images”, Proc.Workshop Document Image Analysis, vol. 1, pp. 10-17,
1997.
[4] J. Hochberg, P. Kelly, T. Thomas, L. Kerns, 1997 “Automatic Script Identification
from Document Images using Cluster–based Templates”, IEEE Transaction on
Pattern Analysis and Machine Intelligence, pp. 176-181, 1997.
[5] Andrew Busch, Wageeh W. Boles and Sridha Sridharan, 2005, “Texture for Script
Identification”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.7,
NO. 11, pp. 1720-1732 November 2005.
[6] U. Pal, B. B. Choudhuri, 2001 “Automatic Identification of English, Chinese, Arabic,
Devanagari and Bangla Script Line”, Proc. 6th International Conference on Document
Analysis and Recognition, pp. 790-794, (2001).
[7] Lu Shijian and Chew Lim Tan, 2008,” Script and Language Identification in Noisy
and Degraded Document Images”, IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 30, no. 1, January 2008.
[8] B.V.Dhandra, H.Mallikarjun, Ravindra Hegadi, V.S.Malemath, 2006, “Word- wise
Script Identification from Bilingual Documents Based on Morphological
Reconstruction” Digital Information Management, 2006 1st International Conference,
pp. 389 – 394, December 2006.
[9] M. C. Padma, Dr P. A Vijaya, 2008“Language identification of Kannada, Hindi and
English Text Words Through Visual Discrimination Features”, International Journal
of Computational Intelligence Systems, Vol.1, No. 2 (May, 2008), 116– 126.
[10] T. N. Vikram and D. S. Guru, 2006, “Appearance based models in document script
identification”, International School of Information Management and Department of
Studies in Computer Science, University of Mysore, Manasagangotri, Mysore, India.
[11] S. Abirami, Dr. D. Manjula, 2009,”A Survey of Script Identification Techniques for
Multi-Script Document Images”, International Journal of Recent Trends in
Engineering, Vol. No.2, May 2009.
[12] Sukalpa Chanda, Srikanta Pal, Katrin Franke, Umapada Pal, 2009, “Two-stage
Approach for Word-wise Script Identification”, IEEE 10th International Conference on
Document Analysis and Recognition (ICDAR), pp.926-930,2009.
[13] Rajesh Gopakumar, N V Subbareddy, Krishnamoorthi Makkithaya, U Dinesh
Acharya,2010, “Zone-based Structural feature extraction for Script Identification from
Indian Documents”, 5th International Conference on Industrial and Information
Systems, pp. 420-425, Jul 29 - Aug 01, 2010.
[14] M. C Padma and P. A Vijaya, 2010, “Script Identification of Text Words from a Tri
Lingual Document using Voting Technique” International Journal of Image
Processing, Volume (4): Issue (1). pp. 35-52. 2010.
[15] Gopal Datt Joshi, Saurabh Garg and Jayanthi Sivaswamy, 2006, “Script Identification
from Indian Documents”, In, proceedings of seventh IAPR workshop on
Document Analysis System, New Zealand, pp-255-267, 2006.
International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-
6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME
622
[16] Prakash K. Aithal, Rajesh G., Dinesh U. Acharya, Krishnamoorthi M. Subbareddy N.
V. ,2010,“Text Line Script Identification for a Tri-lingual Document” IEEE 2010
Second International conference on Computing, Communication and Networking
Technologies. pp. 1-3. 2010.
[17] Huanfeng Ma and David Doerman, 2004, “Word level Script Identification for
scanned document images”, In SPIE Conference Document Recognition and
Retrieval (San Jose,CA), in press-2004.
[18] U.Pal, S.Sinha, B.B.Choudhuri, 2003, “Multi-Script Line Identification from Indian
Documents”, Proc. 7th International Conference on Document Analysis and
Recognition (ICDAR 2003) vol. 2, pp. 880-884, 2003.
[19] M. C Padma and P. A Vijaya ,2010, “Global Approach for script identification using
Wavelet Packet Based Features” International Journal of Signal Processing, Image
processing and Pattern Recognition Vol. 3, No. 3 September, 2010.
[20] Mallikarjun Hangarge and B.V.Dhandra, 2008, “Shape and Morphological
Transformation based Features for Language Identification in Indian Document
Images” First International Conference on Emerging Trends in Engineering and
Technology (IEEE Comput. Soc. Press), pp. 1175-1180, July 2008.
[21] M. M. Kodabagi, S. A. Angadi and Chetana. R. Shivanagi, “Character Recognition of
Kannada Text In Scene Images Using Neural Network”, International Journal Of
Graphics And Multimedia (IJGM), Volume 4, Issue 1, 2013, pp. 9 - 19, ISSN Print:
0976 – 6448, ISSN Online : 0976 –6456.
[22] Gunjan Singh, Avinash Pokhriyal and Sushma Lehri, “Fuzzy Rule Based Classification
and Recognition of Handwritten Hindi Curve Script”, International journal of Computer
Engineering & Technology (IJCET), Volume 4, Issue 1, 2013, pp. 337 - 357, ISSN Print:
0976 – 6367, ISSN Online: 0976 – 6375.

Mais conteúdo relacionado

Destaque

High step up boost converter based micro inverter with mppt and current control
High step up boost converter based micro inverter with mppt and current controlHigh step up boost converter based micro inverter with mppt and current control
High step up boost converter based micro inverter with mppt and current controlIAEME Publication
 
The influence of air gaps at 0.4 duty cycle on magnetic core type ‘e’ to in...
The influence of air gaps at 0.4  duty cycle on magnetic  core type ‘e’ to in...The influence of air gaps at 0.4  duty cycle on magnetic  core type ‘e’ to in...
The influence of air gaps at 0.4 duty cycle on magnetic core type ‘e’ to in...IAEME Publication
 
Cosine modulated filter bank transmultiplexer using kaiser window
Cosine modulated filter bank transmultiplexer using kaiser windowCosine modulated filter bank transmultiplexer using kaiser window
Cosine modulated filter bank transmultiplexer using kaiser windowIAEME Publication
 
The advancement & effect of six sigma approach in a modern
The advancement & effect of six sigma approach in a modernThe advancement & effect of six sigma approach in a modern
The advancement & effect of six sigma approach in a modernIAEME Publication
 
Detection and analysis of power quality disturbances under faulty conditions ...
Detection and analysis of power quality disturbances under faulty conditions ...Detection and analysis of power quality disturbances under faulty conditions ...
Detection and analysis of power quality disturbances under faulty conditions ...IAEME Publication
 
Asset management efficiency of selected cement companies in tamil
Asset management efficiency of selected cement companies in tamilAsset management efficiency of selected cement companies in tamil
Asset management efficiency of selected cement companies in tamilIAEME Publication
 

Destaque (7)

High step up boost converter based micro inverter with mppt and current control
High step up boost converter based micro inverter with mppt and current controlHigh step up boost converter based micro inverter with mppt and current control
High step up boost converter based micro inverter with mppt and current control
 
The influence of air gaps at 0.4 duty cycle on magnetic core type ‘e’ to in...
The influence of air gaps at 0.4  duty cycle on magnetic  core type ‘e’ to in...The influence of air gaps at 0.4  duty cycle on magnetic  core type ‘e’ to in...
The influence of air gaps at 0.4 duty cycle on magnetic core type ‘e’ to in...
 
Cosine modulated filter bank transmultiplexer using kaiser window
Cosine modulated filter bank transmultiplexer using kaiser windowCosine modulated filter bank transmultiplexer using kaiser window
Cosine modulated filter bank transmultiplexer using kaiser window
 
The advancement & effect of six sigma approach in a modern
The advancement & effect of six sigma approach in a modernThe advancement & effect of six sigma approach in a modern
The advancement & effect of six sigma approach in a modern
 
10120130405007
1012013040500710120130405007
10120130405007
 
Detection and analysis of power quality disturbances under faulty conditions ...
Detection and analysis of power quality disturbances under faulty conditions ...Detection and analysis of power quality disturbances under faulty conditions ...
Detection and analysis of power quality disturbances under faulty conditions ...
 
Asset management efficiency of selected cement companies in tamil
Asset management efficiency of selected cement companies in tamilAsset management efficiency of selected cement companies in tamil
Asset management efficiency of selected cement companies in tamil
 

Semelhante a Script identification from printed document images using statistical

Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...CSCJournals
 
Dimension Reduction for Script Classification - Printed Indian Documents
Dimension Reduction for Script Classification - Printed Indian DocumentsDimension Reduction for Script Classification - Printed Indian Documents
Dimension Reduction for Script Classification - Printed Indian Documentsijait
 
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTSDIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTSijait
 
Script identification using dct coefficients 2
Script identification using dct coefficients 2Script identification using dct coefficients 2
Script identification using dct coefficients 2IAEME Publication
 
AN APPORACH FOR SCRIPT IDENTIFICATION IN PRINTED TRILINGUAL DOCUMENTS USING T...
AN APPORACH FOR SCRIPT IDENTIFICATION IN PRINTED TRILINGUAL DOCUMENTS USING T...AN APPORACH FOR SCRIPT IDENTIFICATION IN PRINTED TRILINGUAL DOCUMENTS USING T...
AN APPORACH FOR SCRIPT IDENTIFICATION IN PRINTED TRILINGUAL DOCUMENTS USING T...ijaia
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
A survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document imagesA survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document imagesiosrjce
 
Wavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script IdentificationWavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script IdentificationCSCJournals
 
SignReco: Sign Language Translator
SignReco: Sign Language TranslatorSignReco: Sign Language Translator
SignReco: Sign Language TranslatorIRJET Journal
 
Angular Symmetric Axis Constellation Model for Off-line Odia Handwritten Char...
Angular Symmetric Axis Constellation Model for Off-line Odia Handwritten Char...Angular Symmetric Axis Constellation Model for Off-line Odia Handwritten Char...
Angular Symmetric Axis Constellation Model for Off-line Odia Handwritten Char...IJAAS Team
 
Fuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiFuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiIAEME Publication
 
08 8879 10060-1-sm (ijict sj) edit iqbal
08 8879 10060-1-sm (ijict sj) edit iqbal08 8879 10060-1-sm (ijict sj) edit iqbal
08 8879 10060-1-sm (ijict sj) edit iqbalIAESIJEECS
 
Online handwritten script recognition (synopsis)
Online handwritten script recognition (synopsis)Online handwritten script recognition (synopsis)
Online handwritten script recognition (synopsis)Mumbai Academisc
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESA NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESijnlc
 
A Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text SummarizationA Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text SummarizationIJERD Editor
 
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...iosrjce
 

Semelhante a Script identification from printed document images using statistical (20)

Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
Script Identification of Text Words from a Tri-Lingual Document Using Voting ...
 
Dimension Reduction for Script Classification - Printed Indian Documents
Dimension Reduction for Script Classification - Printed Indian DocumentsDimension Reduction for Script Classification - Printed Indian Documents
Dimension Reduction for Script Classification - Printed Indian Documents
 
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTSDIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
DIMENSION REDUCTION FOR SCRIPT CLASSIFICATION- PRINTED INDIAN DOCUMENTS
 
Script identification using dct coefficients 2
Script identification using dct coefficients 2Script identification using dct coefficients 2
Script identification using dct coefficients 2
 
AN APPORACH FOR SCRIPT IDENTIFICATION IN PRINTED TRILINGUAL DOCUMENTS USING T...
AN APPORACH FOR SCRIPT IDENTIFICATION IN PRINTED TRILINGUAL DOCUMENTS USING T...AN APPORACH FOR SCRIPT IDENTIFICATION IN PRINTED TRILINGUAL DOCUMENTS USING T...
AN APPORACH FOR SCRIPT IDENTIFICATION IN PRINTED TRILINGUAL DOCUMENTS USING T...
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
A survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document imagesA survey on Script and Language identification for Handwritten document images
A survey on Script and Language identification for Handwritten document images
 
P01725105109
P01725105109P01725105109
P01725105109
 
50120130405026
5012013040502650120130405026
50120130405026
 
Wavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script IdentificationWavelet Packet Based Features for Automatic Script Identification
Wavelet Packet Based Features for Automatic Script Identification
 
SignReco: Sign Language Translator
SignReco: Sign Language TranslatorSignReco: Sign Language Translator
SignReco: Sign Language Translator
 
Angular Symmetric Axis Constellation Model for Off-line Odia Handwritten Char...
Angular Symmetric Axis Constellation Model for Off-line Odia Handwritten Char...Angular Symmetric Axis Constellation Model for Off-line Odia Handwritten Char...
Angular Symmetric Axis Constellation Model for Off-line Odia Handwritten Char...
 
Fuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindiFuzzy rule based classification and recognition of handwritten hindi
Fuzzy rule based classification and recognition of handwritten hindi
 
Ijetcas14 371
Ijetcas14 371Ijetcas14 371
Ijetcas14 371
 
08 8879 10060-1-sm (ijict sj) edit iqbal
08 8879 10060-1-sm (ijict sj) edit iqbal08 8879 10060-1-sm (ijict sj) edit iqbal
08 8879 10060-1-sm (ijict sj) edit iqbal
 
Online handwritten script recognition (synopsis)
Online handwritten script recognition (synopsis)Online handwritten script recognition (synopsis)
Online handwritten script recognition (synopsis)
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESA NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
 
A Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text SummarizationA Survey of Various Methods for Text Summarization
A Survey of Various Methods for Text Summarization
 
Bj35343348
Bj35343348Bj35343348
Bj35343348
 
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
 

Mais de IAEME Publication

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME Publication
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...IAEME Publication
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSIAEME Publication
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSIAEME Publication
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSIAEME Publication
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSIAEME Publication
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOIAEME Publication
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IAEME Publication
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYIAEME Publication
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...IAEME Publication
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEIAEME Publication
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...IAEME Publication
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...IAEME Publication
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...IAEME Publication
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...IAEME Publication
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...IAEME Publication
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...IAEME Publication
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...IAEME Publication
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...IAEME Publication
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTIAEME Publication
 

Mais de IAEME Publication (20)

IAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdfIAEME_Publication_Call_for_Paper_September_2022.pdf
IAEME_Publication_Call_for_Paper_September_2022.pdf
 
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
MODELING AND ANALYSIS OF SURFACE ROUGHNESS AND WHITE LATER THICKNESS IN WIRE-...
 
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURSA STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
A STUDY ON THE REASONS FOR TRANSGENDER TO BECOME ENTREPRENEURS
 
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURSBROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
BROAD UNEXPOSED SKILLS OF TRANSGENDER ENTREPRENEURS
 
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONSDETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
DETERMINANTS AFFECTING THE USER'S INTENTION TO USE MOBILE BANKING APPLICATIONS
 
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONSANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
ANALYSE THE USER PREDILECTION ON GPAY AND PHONEPE FOR DIGITAL TRANSACTIONS
 
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINOVOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
VOICE BASED ATM FOR VISUALLY IMPAIRED USING ARDUINO
 
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
IMPACT OF EMOTIONAL INTELLIGENCE ON HUMAN RESOURCE MANAGEMENT PRACTICES AMONG...
 
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMYVISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
VISUALISING AGING PARENTS & THEIR CLOSE CARERS LIFE JOURNEY IN AGING ECONOMY
 
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
A STUDY ON THE IMPACT OF ORGANIZATIONAL CULTURE ON THE EFFECTIVENESS OF PERFO...
 
GANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICEGANDHI ON NON-VIOLENT POLICE
GANDHI ON NON-VIOLENT POLICE
 
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
A STUDY ON TALENT MANAGEMENT AND ITS IMPACT ON EMPLOYEE RETENTION IN SELECTED...
 
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
ATTRITION IN THE IT INDUSTRY DURING COVID-19 PANDEMIC: LINKING EMOTIONAL INTE...
 
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
INFLUENCE OF TALENT MANAGEMENT PRACTICES ON ORGANIZATIONAL PERFORMANCE A STUD...
 
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
A STUDY OF VARIOUS TYPES OF LOANS OF SELECTED PUBLIC AND PRIVATE SECTOR BANKS...
 
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
EXPERIMENTAL STUDY OF MECHANICAL AND TRIBOLOGICAL RELATION OF NYLON/BaSO4 POL...
 
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
ROLE OF SOCIAL ENTREPRENEURSHIP IN RURAL DEVELOPMENT OF INDIA - PROBLEMS AND ...
 
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
OPTIMAL RECONFIGURATION OF POWER DISTRIBUTION RADIAL NETWORK USING HYBRID MET...
 
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
APPLICATION OF FRUGAL APPROACH FOR PRODUCTIVITY IMPROVEMENT - A CASE STUDY OF...
 
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENTA MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
A MULTIPLE – CHANNEL QUEUING MODELS ON FUZZY ENVIRONMENT
 

Último

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Script identification from printed document images using statistical

  • 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME 607 SCRIPT IDENTIFICATION FROM PRINTED DOCUMENT IMAGES USING STATISTICAL FEATURES M. M. Kodabagi1 , S. R. Karjol2 1 Department of Computer Science and Engineering, Basaveshwar Engineering College, Bagalkot-587102, Karnataka, India 2 Department of Computer Science and Engineering, Basaveshwar Engineering College, Bagalkot-587102, Karnataka, India ABSTRACT Automatic identification of a script in a document image facilitates many important applications such as automatic archiving of multilingual documents; searching online archives of document images and for the selection of script specific OCR in a multilingual environment. In this work a technique for script identification from document images is proposed. The method uses vertical and horizontal run components/objects of words of a single line of text to distinguish 3 Indian scripts: Kannada, Hindi and English. Initially, the method segments words from the selected line of text from a document image. Then statistics of horizontal and vertical run objects are determined. Further, linear discriminant function is used to identify script of the document image as Kannada, Hindi or English script. The method has been tested for 300 document images and the method found to be robust and efficient. The proposed system achieves 93% identification accuracy for Hindi script, 90% identification accuracy for English script and 86% identification accuracy for Kannada script. 1. INTRODUCTION In recent years, the escalating use of physical documents has made progress towards the creation of electronic documents to facilitate easy communication and storage of documents. However, the usage of physical documents is still prevalent in most of the communications. The INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) ISSN 0976 – 6367(Print) ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), pp. 607-622 © IAEME: www.iaeme.com/ijcet.asp Journal Impact Factor (2013): 6.1302 (Calculated by GISI) www.jifactor.com IJCET © I A E M E
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME 608 amount of creation and storage of electronic documents is increasing rapidly with the advances in computer technology. Such data include multi-lingual documents. For example, museums store images of old fragile documents in typically large databases. These documents have scientific or historical or artistic value and can be written in different scripts. Document analysis systems that help process these stored images is of interest for both efficient archival and to provide access to various researchers. Script identification is a key step that arises in document image analysis especially when the environment is multiscript and multi-lingual. An automatic script identification scheme is useful to (i) sort document images, (ii) select appropriate script- specific OCRs and (iii) search online archives of document images for those containing a particular script. India is a multi-script multi-lingual country and hence most of the document including official ones, may contain text information printed in more than one script/language forms. For such multi script documents, it is necessary to pre-determine the language type of the document, before employing a particular OCR on them. With this context, it is proposed to work on the prioritized requirements of a particular region- Karnataka, a state in India. In a multi-lingual country like India (India has 18 regional languages derived from 12 different scripts; a script could be a common medium for different languages), documents like bus reservation forms, passport application forms, examination question papers, bank-challenge, language translation books and money-order forms may contain text words in more than one language forms. For such an environment, multi lingual OCR system is needed to read the multilingual documents. To make a multi-lingual OCR system successful, it is necessary to separate portions of different language regions of the document before feeding to individual OCR systems. In this direction, multi lingual document segmentation has strong direct application potential, especially in a multilingual country like India. In the context of Indian languages, some amount of research work has been reported. Further there is a growing demand for automatically processing the documents in every state in India including Karnataka. Under the three language formulae, adopted by most of the Indian states, the document in a state may be printed in its respective official regional language, the national language Hindi and also in English. Accordingly, a document produced in Karnataka, a state in India, may be printed in its official regional language Kannada, national language Hindi and also in English. For such an environment, multilingual OCR system is needed to read the multilingual documents. According to the three language policy adopted by most of the Indian states, the documents produced in Karnataka are composed of texts in Kannada- the regional language, Hindi – the National language and English. Such trilingual documents are found in majority of the private and Government sectors, railways, airlines, banks, post-offices of Karnataka state. For automatic processing of such tri-lingual documents through the respective OCRs, a pre- processor is necessary which could identify the language type of the texts words. So, it is proposed to develop a model to identify the script of documents containing Kannada, Hindi and English text. Some essential factors need to be considered before choosing or designing a script identification scheme for any multi-lingual application. These factors are: (a) complexity in pre- processing, (b) complexity in feature extraction and classification, (c) computational speed of entire scheme, (d) sensitivity of the scheme to the variation in text in document (font style, font size and document skew), (e) performance of the scheme, and (f) range of applications in which
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME 609 the scheme could be used. Performance of the scheme includes accuracy reported and selection of testing data. Currently, individual approaches are designed such that they can effectively deal with some of the factors listed above (not all). Some of the key challenges identified in script identification works [1-10] from the factors listed above are presence of document degradation, skew, varying font size and font type. There are four types of most common document degradation, namely, poor image resolution, noise including salt and pepper noise, and Gaussian noise and physical document degradation. All these document degradation must be compensated before script identification. An image that is slanting too far in one direction or one that is misaligned is known as skew. Compensating for the dominant skew angle in an entire page image may not be sufficient adjustment to allow accurate script identification. In case of varying font size and font type, the relative offsets are distributed it is difficult to accurately estimate results with limited font size. It is difficult to classify documents those printed in unfamiliar font types. The difficulty of most of images in script identification appeared to stem from their unfamiliar font types. From the reported works [1-10] on script identification, the documents produced in Karnataka usually are composed of texts in Kannada, Hindi and English. Though a great amount of work has been carried out on identification of the three languages Kannada, Hindi and English, very few works pertain to script identification processing the document image at word/line level. By analysing the study of work carried out on word level identification of Kannada, English and Hindi, a generalisation of existing work with more accurate results for script identification from document images have been carried out. Also, the processing of word/line level reduces the number of computations. Language identification is one of the vision application problems. Generally human system identifies the language in a document using some visible characteristic features such as texture, horizontal lines, vertical lines, which are visually perceivable and appeal to visual sensation. This human visual perception capability has been the motivator for the development of the proposed system. With this context, an attempt has been made to simulate the human visual system, to identify the type of the script based on visual clues, without reading the contents of the document. . Hence, this motivated for developing a technique for script identification of Kannada, Hindi and English from printed document images used in Karnataka to report better recognition. In this work a technique for script identification of Kannada, English and Hindi from document images is proposed. In this work a technique for script identification from document images is proposed. The method uses vertical and horizontal run components/objects of words of a single line of text to identify the script of the document image. Further, the method distinguishes 3 Indian scripts: Kannada, Hindi and English. Initially, the method segments words from the selected line of text from a document image. Then statistics of horizontal and vertical run objects are determined. Further, linear discriminant function is used to identify script of the document image as Kannada, Hindi or English script. The method has been tested for 300 document images and the method found to be robust and efficient. The proposed system achieves 93% identification accuracy for Hindi script, 90% identification accuracy for English script and 86% identification accuracy for Kannada script. The literature survey related to current work is summarized in the following section.
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME 610 The rest of the paper is organized as follows; the detailed survey related to script identification from printed document images is described in Section 2. The proposed method is presented in Section 3. The experimental results and discussions are given in Section 4. Section 5concludes the work and lists future directions of the work. 2. RELATED WORKS A substantial amount of work has gone into the research related to script identification from printed document images. Some of the related works are summarized in the following. A robust method for determination of the script and language content of Document Images proposed in [1]. The algorithm determines connected components and locates upward concavities and then classifies the script into two broad classes Han-based (Chinese, Japanese and Korean) and Latin-based (English, French, German and Russian) languages. The extraction of Rotation Invariant Texture Features and Their Use in Automatic Script Identification has been carried out in [2]. The method computes features from text blocks using multi-channel gabor filters, constructs a representative feature vector and Euclidian distance classifier is used for script identification of 6 languages (Chinese, English, Greek, Russian, Persian, and Malayalam). Script and Language Identification from Document Images using Multiple channel Gabor filters and gray level cooccurrence matrices (GLCMs) to extract texture features and K-NN classifier is used to classify seven languages; Chinese, English, Greek, Korean, Malayalam, Persian and Russian has been proposed in [3]. The Cluster-Based Templates is used for Automatic Script Identification from Document Images in [4]. Evaluation of Texture features for Script Identification is carried out in [5]. A method for Automatic Identification of English, Chinese, Arabic, Devnagari and Bangla Script Line is discussed in [6]. A method for Script and Language Identification in Noisy and Degraded Document Images is presented in [7]. Script Identification Based on Morphological Reconstruction in Document Images is described in [8]. A simple technique based on the characteristic features of top-profile and bottom-profile of individual text lines for Identification for Kannada, Hindi and English text lines from a printed document is proposed in [9]. Script Identification at both paragraph and word level using Appearance based models have been presented in [10]. A Survey of Script Identification technique for Multi-Script Document Images is carried out in [11]. Two-stage Approach for Word-wise Script Identification of English (Roman), Devnagari and Bengali (Bangla) scripts is proposed in [12]. Zone-based Structural feature extraction to recognize four south Indian scripts namely Kannada, Telugu, Tamil and Malayalam along with English and Hindi is employed in [13]. A technique presented in [14] use Voting Technique for Script Identification from a Tri Lingual Document. The technique presented in [15] extracts features consistent with human perception from the responses of a multi-channel log-Gabor filter bank, designed at an optimal scale and multiple orientations for Script Identification from Indian Documents. A simple and efficient technique for script identification for Kannada, Hindi and English text lines from a printed document using horizontal projection profile is presented in [16]. A method for Word level Script Identification for scanned document images in which during both training and testing , a Gabor filter is applied and 16 channels of features are extracted is evaluated in [17]. Multi-
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME 611 script identification technique for Indian languages using different text lines of Indian scripts from a document are identified in [18]. A method found in [19] uses texture-based approach to identify the script type using Wavelet Packet Based Features for documents printed in seven scripts: Kannada, Tamil, Telugu, Malayalam, Urdu, Hindi and English. A technique proposed in [20] for language identification in document images to discriminate five major Indian languages: Hindi, Marathi, Sanskrit, Assamese and Bengali belong to Devnagari and Bangla scripts. But, in the current work horizontal and vertical run objects determined from the text line of document image are used to determine the script of document. The detailed description of the methodology is given in the following section. 3. PROPOSED METHODOLOGY FOR SCRIPT IDENTIFICATION The proposed methodology uses horizontal and vertical run objects to determine the script of the document image containing Kannada, Hindi or English text. The methodology comprises four phases; Image Acquisition, Preprocessing, Segmentation, Feature Extraction and Linear Discriminant Analysis. The block diagram of proposed model is given in Figure 3a. The detailed description of each processing step is presented in the following subsections. 3.1 Image acquisition The process begins with acquiring document images of the three scripts Kannada, Hindi and English. The document images are scanned images which are downloaded from the internet. The document images considered as input are skew free and noise free. About 300 sample images i.e., 100 samples of each script are collected as requirement. Input document image Identified script as Kannada/English/Hindi Fig. 3a. Block Diagram of Proposed Model PREPROCESSING (Binarization and Bounding Box ) SEGMENTATION (Line and Words segmentation) FEATURE EXTRACTION (Horizontal run objects and Vertical run objects) LINEAR DISCRIMINANT ANALYSIS
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME 612 3.2 Preprocessing In the preprocessing phase, the text document images taken as input are binarized and bounding box is generated. Binarization is the step in which the image is converted into binary image where each pixel is represented by either 0 or 1. Binary image is a black and white type of image. Bounding box is generated by applying horizontal and vertical run objects. The purpose of this phase is to make the image easier for the feature extraction and classification. 3.3. Segmentation In this phase the segmentation of single line from the document image is carried out. The bounding box is generated around the segmented line. From the selected line, the words are segmented and bounding boxes are generated to the segmented words. The segmentation process of line and words is described below. • Segmentation of line The horizontal projection features are determined to segment a line from the document image. Bounding box is generated to the segmented line. The line segmentation of Hindi script is as shown in below Figure 3b, the English script is Figure 3c and the Kannada script is Figure 3d. 3b 3c 3d Fig. 3b, c, d. Sample Images of segmented lines of Hindi, English and Kannada script • Segmentation of words The vertical projection features are determined to extract words from the selected lines. Using the boundary between two consecutive vertical projections, the words are segmented. Then the bounding box’s are generated to the segmented words. The segmented words of above Figures 3b, 3c and 3d are given in the below Figures 3e, 3f and 3g respectively.
  • 7. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME 613 3e 3f 3g Fig 3e, f, g. Sample Images of Segmented words of Hindi, English and Kannada lines. 3.4. Feature extraction In this phase, the Horizontal run object and vertical run objects of each segmented text words are determined. Horizontal run object In the binary image of each text word, a set of consecutive pixels in a row whose length is greater than the threshold value (HT) results in a horizontal run objects. Vertical run object In the binary image of each text word, a set of consecutive pixels in a column whose length is greater than the threshold value (VT) results in a vertical run objects. The number of horizontal and vertical run objects are determined and stored into a feature vector Fv as given in equation (1). (1) Where, Fv is Feature Vector is the number of horizontal run objects is the number of vertical run objects
  • 8. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME 614 3.5. Linear Discriminant Analysis The Discriminant analysis phase of the proposed model uses the and features to classify the segmented words of the document image as Hindi, Kannada or English script. Condition 1: If one of the horizontal run objects ( ) in a word is greater than half of the number of columns(n2) in a word then, the script of the word is identified as Hindi. (HT is n2/2) > (n2/2) = Word is Hindi script (2) Condition 2: If the value of feature is greater the value of feature , then the script of the word is identified as Kannada. (HT considered is 3 and VT considered is 5) > = Word is Kannada script (3) Condition 3: Else if the value of feature is greater the value of feature then, the script of the word is identified as English. (HT considered is 3 and VT considered is 5) > = Word is English script (4) After identifying the script of each segmented words then the classification of script of the document image is done on the bases of above conditions. Condition 4: If from the selected line in the document image, the number of words identified as Hindi script i.e. equation (2) is greater than the total number of words in the selected line then, the script of the document image is identified as Hindi script. Condition 5: If the document image is not Hindi script, then if from the selected line the text words identified as Kannada script i.e. equation(3) are greater than or equal to the words identified as English script i.e. equation(4) from the selected line, then the script of the document image is identified as Kannada script. Condition 6: Else, if the document image is not Kannada script, then it means the text words from the selected line identified as English script i.e. equation (4) are greater than the words identified as Kannada script i.e. equation (3). And hence, the script of the document image is identified as English script.
  • 9. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME 615 4. EXPERIMENTAL RESULTS AND DISCUSSION For the purpose of experimentation we have created our own database of document images. The document images are scanned images which are downloaded from the internet. The document images considered as input are skew free and noise free. About 300 sample images i.e., 100 samples of each script are collected as requirement. The proposed methodology has been tested for about 300 document images containing Kannada, Hindi and English script. Horizontal and vertical run objects are used for feature extraction. Further, linear discriminant Analysis is carried out to identify the script of the document image as Hindi, Kannada or English script. The documents having different font sizes have been considered. Exhaustive experimentations were done to analyze the performance of the system for different image patterns. 4.1. An Experimental Analysis for a Sample Hindi Document Image. Fig. 4a. Sample Input Document Image Figure 4.a shows sample input document image. The bounding box and Binarization of input document image is done. The segmentation of line from the document image is carried out. The segmented line from the document image is shown in the Figure 4.b Fig. 4b. Segmented Line from Input Image
  • 10. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME 616 After segmentation of line the words are segmented from the selected line. The segmented words from the line in Figure 4.b are given in Figure 4.c Fig. 4c. Segmented words Feature extraction and Linear Discriminant Analysis is carried out. And finally the document image is identified as Hindi script. The Figure 4.d shows the result displayed. Fig. 4d. Dialog box 4.2. An Experimental Analysis for a Sample English Document Image. Example 2: English sample Fig. 4e. Sample English Input document image
  • 11. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME 617 Figure 4.e shows the original English document image. After applying bounding box and binarization of the image, segmentation of line from the document image is carried out. The segmented line from the document image is shown in the Figure 4.f. Fig. 4f. Segmented line After segmentation of line the words are segmented from the selected line. The segmented words are given in Figure 4.g Fig. 4g. Segmented words Feature extraction and Linear Discriminant Analysis is carried out. And finally the document image is classified as English script. The Figure 4.h shows the result displayed. Fig. 4h. Dialog box 4.3 System Performance Analysis The overall system performance of the script identification from printed document images is as shown in the below Table 1 Table 1: Overall System Performance Tested scripts Number of document images Classification rate Word wise Classification rate Line wise Hindi script 100 (987/1053) 94% 93% Kannada script 100 (496/636) 78% 86% English script 100 (781/936) 83% 90%
  • 12. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME 618 4.4 An Experimental Analysis dealing with various issues The proposed methodology has been evaluated dealing with various issues such as variation in font size and style, color, noise, varying spacing between words. The results of experimentation are given below; Example 1: Sample image with containing noisy document image. Fig. 4i. Input document image Fig. 4j. Segmented line Fig 4k. Extracted words
  • 13. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME 619 Fig. 4l. Dialog box Example 2: Sample image with smaller font size Fig. 4m. Input document image Fig. 4n. Segmented line
  • 14. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME 620 Fig. 4o. Segmented words Fig. 4p. Dialog box 5. CONCLUSION In this method, Line and Word-Wise identification models to identify Kannada, Hindi and English text words from Indian multilingual machine printed documents have been presented. The proposed model is developed based on the visual discriminating features, which serve as useful visual clues for script identification. Horizontal and Vertical run objects are used for feature extraction. The methods help to accurately identify and separate different language portions of Kannada, English and Hindi. The experimental results show that the method is effective and good enough to identify and separate the three language portions of the document, which further helps to feed individual language regions to specific OCR system. Further, linear discriminant function is used to identify script of the document image as Kannada, Hindi or English script. The method has been tested for 300 document images and the method found to be robust and efficient. The proposed system achieves 93% identification accuracy for Hindi script, 90% identification accuracy for English script and 86% identification accuracy for Kannada script approach. The proposed system can also be extended to identify other Indian languages and foreign languages. REFERENCES [1] A. L. Spitz, 1997, “Determination of script and language content of document images”, IEEE Trans. On Pattern Analysis and Machine Intelligence, Vol. 19, No.3, pp. 235–245, 1997.
  • 15. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME 621 [2] T. N. Tan, 1998, “Rotation Invariant Texture Features and their use in Automatic Script Identification”, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 20, no. 7, pp. 751-756, 1998. [3] G. S. Peake and T. N. Tan, 1997, “Script and Language Identification from Document Images”, Proc.Workshop Document Image Analysis, vol. 1, pp. 10-17, 1997. [4] J. Hochberg, P. Kelly, T. Thomas, L. Kerns, 1997 “Automatic Script Identification from Document Images using Cluster–based Templates”, IEEE Transaction on Pattern Analysis and Machine Intelligence, pp. 176-181, 1997. [5] Andrew Busch, Wageeh W. Boles and Sridha Sridharan, 2005, “Texture for Script Identification”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.7, NO. 11, pp. 1720-1732 November 2005. [6] U. Pal, B. B. Choudhuri, 2001 “Automatic Identification of English, Chinese, Arabic, Devanagari and Bangla Script Line”, Proc. 6th International Conference on Document Analysis and Recognition, pp. 790-794, (2001). [7] Lu Shijian and Chew Lim Tan, 2008,” Script and Language Identification in Noisy and Degraded Document Images”, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 30, no. 1, January 2008. [8] B.V.Dhandra, H.Mallikarjun, Ravindra Hegadi, V.S.Malemath, 2006, “Word- wise Script Identification from Bilingual Documents Based on Morphological Reconstruction” Digital Information Management, 2006 1st International Conference, pp. 389 – 394, December 2006. [9] M. C. Padma, Dr P. A Vijaya, 2008“Language identification of Kannada, Hindi and English Text Words Through Visual Discrimination Features”, International Journal of Computational Intelligence Systems, Vol.1, No. 2 (May, 2008), 116– 126. [10] T. N. Vikram and D. S. Guru, 2006, “Appearance based models in document script identification”, International School of Information Management and Department of Studies in Computer Science, University of Mysore, Manasagangotri, Mysore, India. [11] S. Abirami, Dr. D. Manjula, 2009,”A Survey of Script Identification Techniques for Multi-Script Document Images”, International Journal of Recent Trends in Engineering, Vol. No.2, May 2009. [12] Sukalpa Chanda, Srikanta Pal, Katrin Franke, Umapada Pal, 2009, “Two-stage Approach for Word-wise Script Identification”, IEEE 10th International Conference on Document Analysis and Recognition (ICDAR), pp.926-930,2009. [13] Rajesh Gopakumar, N V Subbareddy, Krishnamoorthi Makkithaya, U Dinesh Acharya,2010, “Zone-based Structural feature extraction for Script Identification from Indian Documents”, 5th International Conference on Industrial and Information Systems, pp. 420-425, Jul 29 - Aug 01, 2010. [14] M. C Padma and P. A Vijaya, 2010, “Script Identification of Text Words from a Tri Lingual Document using Voting Technique” International Journal of Image Processing, Volume (4): Issue (1). pp. 35-52. 2010. [15] Gopal Datt Joshi, Saurabh Garg and Jayanthi Sivaswamy, 2006, “Script Identification from Indian Documents”, In, proceedings of seventh IAPR workshop on Document Analysis System, New Zealand, pp-255-267, 2006.
  • 16. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- 6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 2, March – April (2013), © IAEME 622 [16] Prakash K. Aithal, Rajesh G., Dinesh U. Acharya, Krishnamoorthi M. Subbareddy N. V. ,2010,“Text Line Script Identification for a Tri-lingual Document” IEEE 2010 Second International conference on Computing, Communication and Networking Technologies. pp. 1-3. 2010. [17] Huanfeng Ma and David Doerman, 2004, “Word level Script Identification for scanned document images”, In SPIE Conference Document Recognition and Retrieval (San Jose,CA), in press-2004. [18] U.Pal, S.Sinha, B.B.Choudhuri, 2003, “Multi-Script Line Identification from Indian Documents”, Proc. 7th International Conference on Document Analysis and Recognition (ICDAR 2003) vol. 2, pp. 880-884, 2003. [19] M. C Padma and P. A Vijaya ,2010, “Global Approach for script identification using Wavelet Packet Based Features” International Journal of Signal Processing, Image processing and Pattern Recognition Vol. 3, No. 3 September, 2010. [20] Mallikarjun Hangarge and B.V.Dhandra, 2008, “Shape and Morphological Transformation based Features for Language Identification in Indian Document Images” First International Conference on Emerging Trends in Engineering and Technology (IEEE Comput. Soc. Press), pp. 1175-1180, July 2008. [21] M. M. Kodabagi, S. A. Angadi and Chetana. R. Shivanagi, “Character Recognition of Kannada Text In Scene Images Using Neural Network”, International Journal Of Graphics And Multimedia (IJGM), Volume 4, Issue 1, 2013, pp. 9 - 19, ISSN Print: 0976 – 6448, ISSN Online : 0976 –6456. [22] Gunjan Singh, Avinash Pokhriyal and Sushma Lehri, “Fuzzy Rule Based Classification and Recognition of Handwritten Hindi Curve Script”, International journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 1, 2013, pp. 337 - 357, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.