SlideShare uma empresa Scribd logo
1 de 5
Baixar para ler offline
ACEEE Int. J. on Information Technology, Vol. 02, No. 01, March 2012



     Improvement of Random Forest Classifier through
        Localization of Persian Handwritten OCR
                                                     M. Zahedi, S. Eslami
                                      Shahrood University of Technology, Shahrood, Iran
                                         {Zahedi, Saeideh_eslami} @shahroodut.ac.ir

Abstract — The random forest (RF) classifier is an ensemble        or location of the dots while the basic shape of these letters
classifier derived from decision tree idea. However the parallel   is totally the same. The next table presents printed Persian
operations of several classifiers along with use of randomness     characters grouped based on their basic shape [2]. Besides;
in sample and feature selection has made the random forest a       most of characters may appear both in isolated and cursive
very strong classifier with accuracy rates comparable to most
                                                                   form; and there are several cursive forms for some of
of currently used classifiers. Although, the use of random
forest on handwritten digits has been considered before, in                                      TABLE I.
this paper RF is applied in recognizing Persian handwritten               THE DOT-LESS FORM OF CHARACTERS IN EACH COLUMN IS THE SAME

characters. Trying to improve the recognition rate, we suggest
converting the structure of decision trees from a binary tree
to a multi branch tree. The improvement gained this way
proves the applicability of the idea.

Index Terms— Random forest classifier, Persian/ Arabic
alphabet, handwritten characters, loci features, splitting
function

                       I. INTRODUCTION
    The problem of character recognition OCR has widely
been the object of interest in the machine learning and AI
issues. Handwritten character recognition entails its own          characters, depending on their location in a word [3]. These
challenges as careless and rough handwritings often misguide       cursive forms make a total number of more than 56 characters
the classifier. Cursive alphabets like Arabic and Persian add      along with their single forms.
to the complexity of the task hence the number of relevant                                      TABLE II.
researches is rather few [1]. In this paper, random forest         SOME CHARACTERS HAVE SEVERAL CURSIVE FORMS DEPENDING ON THEIR POSITION IN
classifier is used to recognize 33 Persian handwritten single                                    THE WORD.

characters. The random forest (RF) is a robust and fast
classifier which is also able to handle large number of feature
variables. Also it is suggested to improve the efficiency of
the RF through a localization process which makes the
structure of decision trees like a multi-class tree. The
implementation results verify the theory. The characteristics
of Persian/Arabic alphabet will be discussed in detail in the
next part. We go over a brief review on the common features
and methods used for Farsi/Arabic handwritten OCR in
section III. Section IV summarizes the loci features which are
used in recognition. A synopsis of random forest structure
and its classification strategy comes in section V. Part VI and
VII discusses in turn some main evolution algorithms applied
on random forest algorithm and our localization theory. Finally    The problem is worse when it comes to handwritten docu-
the section VIII gives the experimental results and diagrams.      ments. The form of writing dots in 2 or 3-dotted characters
                                                                   vary for different writers. The next figure illustrates some
II. CHALLENGES IN PERSIAN/ARABIC CHARACTER RECOGNITION             common forms of writing dots.
    Most of the difficulty in Persian/ Arabic OCR goes back
to the large number of different letters in these languages.
The presence of dotted characters causes another challenge.
16 out of 32 Persian characters contain dots. Some of these
characters can be grouped based on their number of dots. It
means the only difference within such a group is the number               Fig. 1. Different forms of writing 3-dot components

© 2012 ACEEE                                                 13
DOI: 01.IJIT.02.01. 31
ACEEE Int. J. on Information Technology, Vol. 02, No. 01, March 2012


As a result of careless writing, dots can be misplaced or can
have an unreasonable distance from the character they belong
to. This can lead to a notable decrease in the recognition
accuracy while noise and image degradation make the problem
much complicated.

          II. CLASSIFICATION METHODS AND FEATURES
    Characters can be classified using structural, statistical
and global features. Structural features refer to the skeleton              Fig. 2. Projection in 4-main directions to compute loci features
or basic topology of the character such as loops, branch-
points, zigzags, height, width, number of crossing points and                              IV. RANDOM FOREST CLASSIFIER
endpoints. They are also very helpful in extracting dot                       Boosting is an ensemble classification method when each
information to classify letters [4], [5]. Statistical features are        individual classifier try to contribute to the final result by
numerical information like pixel densities, histogram of chain            improving the result received from its previous counterpart.
code directions, moments, characteristic loci and Fourier                 Bagging also means building an ensemble of base classifiers,
descriptors [4], [5], [6]. Statistical features are effective and         each one trained on a bootstrap replicate of the training set.
fast to compute but may be affected by noise [5]. Global                  Decisions are then made on the basis of voting [12]. Based
transformations methods mostly include: horizontal and                    on the bagging theory [13], Breiman introduced the concept
vertical projections, coding, Hough transform, Gabor                      of random forests in 2001 [14]. A decision forest is an ensemble
transform [5]. Hidden Markov model is known to learn and                  of several decision trees which act in a parallel manner. The
find characteristics that are difficult to describe [4]. So;              decision forest is different from boosting in way that each
structural features can be used very well with HMM to                     tree tries to classify the data totally independent of the other
recognize a letter. Some researchers suggest using a                      trees [13]. Breiman suggested that randomness can be applied
combination of classifiers like HMM with SVM, SOM, RBF                    on the selection of training data samples and set of features
or another neural network [7], [8], [9]. Classification results           used to classify the data [14]. The reason is not all features
of such systems are promising but the high complexity and                 contribute to the correct recognition and some may deteriorate
required training time are considered as their drawbacks.                 the results. The accuracy of RF depends on:
                                                                                 - The correlation between each pair of trees in the
                    III. FEATURE EXTRACTION                                          forest; Increasing the correlation or dependency of
    The loci features have been proven to be very prosperous                         the trees in prediction, increases the error rate as
in OCR [10]. The notable increase in the recognition accuracy                       well.
has emerged many researchers to use them in handwritten                           - The strength of each single tree in the forest;
OCR as well [10], [11]. The loci features for each sample image                     Increasing the strength of the individual trees
build an 81-length feature vector. Taking only background                           increases the forest‘s overall recognition accuracy
pixels into account, the number of times a direct line starting                    [15].
from a center pixel and continuing to reach the image                     With enough number of classifiers (decision trees) and
boundary, passes through a foreground area in each north,                 different sets of features, we can hope that only the most
south, east and west directions, is calculated and converted              persistent features exist in the data for classification and the
to a ternary code according to the following formula:                     irrelevant features do not affect its efficiency so much [16].
                                                                          The random selection also helps the classifier to handle
value east _counter *27                                                 missing values. If the number of trees is controlled, the random
                                                                          forest classifier can escape overfitting too.
west _counter *9 north _counter *3                  (1)
                                                                          A. Random forest training
south _counter
                                                                              All the data is fed to all trees at once. For each tree, a set
where “east_counter” is the obtained counter during                       of ‘n’ samples are randomly extracted with replacement from
eastward projection and similarly for other counters. Note                the available training set. Again at each node, a random set
that there is no priority between counters and they can be                of features consisting of a fixed number of feature variables
replaced but a fixed format should be used for all cases. The             (often equal to the square root of the total number of features)
feature vector is not affected by the size of image. Using                is selected from the feature vector [17]. Several random
ternary code, there will not be more than 81 different values             functions and linear combinations are produced out of
per image. The 81-length histogram of the counters will build             selected features and are tested to classify the data. Training
up the feature vector of the image. This is fixed for all images          means finding the best splitting formula or the best linear
of all sizes.                                                             combination of feature variables. Such a formula minimizes a
                                                                          cost function or rather maximizes an attribute evaluation
                                                                          function and splits the data in the best way at that node [14].
                                                                          Random forests use the Gini criterion derived from the CART
© 2012 ACEEE                                                         14
DOI: 01.IJIT.02.01.31
ACEEE Int. J. on Information Technology, Vol. 02, No. 01, March 2012


decision trees [18], [19].                                                     a single random feature to split the data at each node [13] but
                                                                               he later suggested using a linear combination of several
         Gini      p (1  p ) .
                  i  0,1
                             i             i                        (1)        features instead of one [14], [17]. The latter results in less
                                                                               correlation between trees and better management of missing
 The other important criterion is the entropy function which                   feature values. The accuracy of recognition can be increased
is calculated as follows:                                                      by eliminating or lowering the effect of weak classifiers, i.e.
                                                                               the trees with high rate of misclassification. Robnik‘s
          Entropy               
                            i  0,1,2,..., n
                                                p i log( p i ) .   (2)        weighted voting system selects a subset of decision trees as
                                                                               the classifiers using the accuracy records gained during
where, pi stands for proportions of data samples from class i.                 training phase [12]. The system finds the training sample(s)
At each node of each tree, a set of random functions and                       with most similarity to the unknown sample. The trees which
linear combinations are produced [16]. The best function is                    have mislabeled such cases are then put away to avoid further
selected to classify the future samples at that node. The nodes                error. Finding similar samples follows the algorithm in [17]. It
often break until there is only one sample in each node. The                   is claimed that weighted voting is often better and never
probability of belonging to a class for each data sample is                    worse than the original voting method. This method acts
calculated. The same process runs through other trees. The                     similar to the AdaBoost algorithm in which the training set of
final class is determined by the voting mechanism; i.e. the                    each classifier is weighted according to the correctness of
class chosen by most of trees for a data sample is declared as                 the prediction [12]. The main reason of the random forest‘s
its final class.                                                               strength and efficiency is the induction of randomness both
B. Random forest parameters                                                    in sample and feature selection. It is the Gini criterion or the
    The determinant factors in RF classifier‘s recognition rate                entropy function that decides on the selection of splitting
are the number and depth of the decision trees, number of                      threshold but in [18], [20] it is reported that a totally random
splitting features and the minimum number of remained                          cut-point also yields comparably accurate results. Although
samples to need a node split into another level . The most                     the individual tree classifiers are very weak, they are
critical parameter which needs to be adjusted is the number                    uncorrelated [20]. Boinee et al. [18] tries to build a better
of randomly selected variables used to form a splitting                        classifier by combining the AdaBoost and Bagging
criterion at each node. The splitting formula divides the                      techniques with the random forests. The paper actually
available data into different categories. However the                          proposes to use an ensemble of random forests instead of
recognition accuracy is not very sensitive to chosen value                     the original random forest algorithm based on the theory that
as long as its difference with the real optimum value is eligible.             a combined classifier often acts better than individual
So, based on simple assumptions, the default value is often                    classifiers.
set to the square root of number of features which gives
rather near optimal results and can also be used as a start                                           VI. LOCALIZAION
point in searching for the real optimum value [16], [17].                          There is no limit in the number of categories when using
                                                                               RF however the random forest actually builds up a binary
                   V. EVOLUTION ALGORITHMS                                     structure. This means the inputs break into two branches
    Since the introduction of RF, there has been no standard                   when reaching a node. They go down the tree until there is
study on adjustment of critical parameters such as number of                   no sample remains unclassified. In each non-leaf node, a
trees, number of used features and depth of a decision tree.                   unique splitting function is selected among other randomly
The set of training parameters are often adjusted according                    produced ones when the function splits the data better than
to some default formulas which produce acceptable results                      the others. The problem of Persian character recognition has
but they have been gained by try and error. As it is mentioned                 been discussed before in the paper and it is noted that some
before, the number of features used in splitting is the most                   characters appear basically in the same group ignoring their
crucial variable since as the number of used features                          dots. Complication occurs when trying to distinguish such
increases, the possibility of using similar sets of features                   characters in their own group. Here we suggest applying a
also increases. The consequent correlation between different                   separate splitting function for each group which recognizes
trees has proved to reduce the recognition rate. A near optimal                these characters inside the group. Thus each node converts
value is the square root of the number of features used as the                 to a multi-branch classifier rather than the previous binary
default number of splitting features but without any                           one. In our case the decision tree becomes a 10-branch
verification. The number of trees commonly varies between                      structure with single characters being recognized all with
100 and 150. Greater values do not improve the recognition                     one common splitting function .The improvement gained in
much and there is the possibility of overfitting for small                     recognition is depicted in next section.
datasets. In contrast to parameter evaluation, some researches
did suggest some innovative methods to increase the overall                                      VII. EXPERIMENTAL RESULTS
efficiency of the random forest by moderate modifications                         The training database consists of 2838 binary image
on the original algorithm [15]. Breiman first proposed to use                  samples of 33 Persian single characters written by 86 different
© 2012 ACEEE                                                              15
DOI: 01.IJIT.02.01.31
ACEEE Int. J. on Information Technology, Vol. 02, No. 01, March 2012


writers. A complete set of 1650 samples was used to train the              2009.
classifier and the rest 1180 were used as the test set to verify           [2] H. Izakian, S. A. Monadjemi, B. Tork Ladani, and K. Zamanifar,
it‘s efficiency. The inputs to classifier were loci feature vectors        “Multi-font Farsi/Arabic isolated character recognition using chain
                                                                           codes”, World Academy of Science, Engineering and Technology,
of length 81 extracted from binary images. As there is no
                                                                           vol. 43, 2008.
verified formula to find optimal parameter values for the forest,          [3] M. Zahedi, S. Eslami,” Farsi/Arabic Optical Font Recognition
numerous trials were done with different adjustment of RF                  Using SIFT Features”, World Conference on Information Technology
parameters (number of trees and splitters). The diagrams                   (WCIT), Procedia Computer Science, vol. 3, pp. 1055-1059, 2011.
depict the effect of changing the training variables on                    [4] L.M. Lorigo, V. Govindaraju, “Off-line Arabic Handwriting
recognition rate and also on the elapsed training time. In the             Recognition: A Survey” , IEEE Transactions on Pattern Analysis
initial phase the random forest is tested using its original               and Machine Intelligence, vol. 28, no. 5, pp. 712-724, 2006.
structure. The highest accuracy was reached using 120 trees                [5] A. M. AL-Shatnawi, S. AL-Salaimeh, F. H. AL-Zawaideh and
and 7 random splitting features with 87% of the samples                    K. Omar, “ Offline Arabic Text Recognition – An Overview”, World
                                                                           of Computer Science and Information Technology Journal (WCSIT),
recognized correctly. The total time needed for training and
                                                                           Vol. 1, No. 5, 184-192, 2011.
testing a 120-tree forest with 2838 feature vectors of size 81,            [6] R. J. Ramteke, “ Invariant Moments Based Feature Extraction
is just about 5s. More detailed results are shown in fig. 3. Part          for Handwritten Devanagari Vowels Recognition ”, 2010
of error goes back to those dotted groups mentioned earlier                International Journal of Computer Applications, vol. 1, No. 18.
in the paper. Trying to fix the error, these characters can be             [7] M. Dehghan, K. Faez, M. Ahmadi and M. Shridhar, “
grouped and considered as one class. Another set of features               Handwritten Farsi (Arabic) word recognition:a holistic approach
then can be utilized to classify the inter-group characters.               using discrete HMM”, Pattern Recognition vol. 34, pp. 1057-1065,
Results of such classification are compared to the previous                2001.
ones in fig. 4. The next two figures demonstrate the localization          [8] N. Ben Amor, “Multifont Arabic Characters Recognition Using
                                                                           HoughTransform and HMM/ANN Classification”, Journal of
effect on the recognition accuracy. There is an apparent
                                                                           Multimedia vol. 1, no. 2, 2006.
improvement on the recognition both in the original
                                                                           [9] R. Al-Jawfi, “Handwriting Arabic Character Recognition LeNet
classification and in the joint-class character recognition                Using Neural Network”, The International Arab Journal of
problem. The implementation of the localization idea raises                Information Technology, vol. 6, no. 3, 2009.
the accuracy rate to 94%. The average accuracy for Persian                 [10] I.D. Trier and A.K. Jain, “ Feature extraction Methods for
handwritten OCR reported by different researchers on different             Character recognition- A Survey”, Pattern Recognition, vol. 29, no.
datasets is currently about 95% [21], [22], [23]. Nevertheless             4, pp. 641-662, 1996.
the main focus of this paper is not improvement of Persian                 [11] R. Ebrahimpour, M.R. Moradian, A. Esmkhani, F. M. Jafarlou,
handwritten OCR but proposing a method for increasing the                  “ Recognition of Persian handwritten digits using Characterization
                                                                           Loci and Mixture of Experts”, InternationalJournal of Digital
random forest‘s recognition rate especially when the similarity
                                                                           Content Technology and Its Applications, vol. 3, no. 3, pp. 42-46,
between different classes becomes very troublesome.
                                                                           2009.
                                                                           [12] M. Robnik,”Improving Random Forests”, Machine Learning,
             VIII. CONCLUSION AND FUTURE WORK                              ECML, Springer, Berlin, 2004.
                                                                           [13] Breiman,L., “Bagging predictors”, Mach. Learn, vol. 24, pp.
    In this paper, the use of random forest in the field of
                                                                           123–140, 1996.
Persian handwritten character recognition has been
                                                                           [14] Breiman,L., “Random forests”, Mach. Learn., vol. 45, pp. 5–
discussed. Most of Persian/Arabic characters can be grouped                32, 2001.
based on their basic shape. The main challenge in Persian                  [15] S. Bernard, L. Heutte, S. Adam,” Using Random Forests for
OCR goes back to distinguishing such characters. A similar                 Handwritten Digit Recognition”, Proceedings of the 9th IAPR/
problem occurs when samples belonging to different classes                 IEEE International Conference on Document Analysis and
have several common features. In such cases a single entropy               Recognition ICDAR’07, Curitiba : Brazil, 2007
function may not recognize the difference between the                      [16] D. Amaratunga, J. Cabrera and Y. S. Lee,” Enriched random
samples. Thus we suggested using several entropy functions                 forests”, Bioinformatics, Vol. 24, No. 18, pp. 2010-2014, 2008.
                                                                           [17] Breiman,L. and Cutler,A., “Random forests manual (version
each dedicated to one of these groups rather than one global
                                                                           4.0). Technical Reportof the University of California, Berkeley,
entropy criterion. The improvement in recognition accuracy
                                                                           Department of Statistics, 2003.
shows possibility of further improvement in recognition by                 [18] P. Boinee, A. De Angelis and G. L. Foresti, “Meta Random
random forests. The coding also has great impact on the                    Forests”, International Journal of Information and Mathematical
results. Professional coding and better structures can increase            Sciences, 2006.
both speed and accuracy. We are going to test the contribution             [19] R B.H. Menza, B.M. Kelm, R. Masuch, U. Himmelreich, P.
on other datasets and features to evaluate its applicability in            Bachert, W. Petrich and F.A. Hamprecht, “A Cpmparison of
different areas.                                                           Random Forest And Its Gini Importance With Standard
                                                                           Chemometric Methods For The Feature Selection And
                                                                           Classification of Spectral Data”, BMC, 2009.
                          REFERENCES
                                                                           [20] A. Cutler and G. Zhao, “PERT- Perfect Random Tree
[1] S. Mozaffari and H. Soltanizadeh,” ICDAR 2009 Handwritten              Ensembles”, Computing Science and Statistics, vol. 33, 2001.
Farsi/Arabic Character Recognition Competition”, 10th                      [21] A. Ebrahimi and E. Kabir, “ A pictorial Dictionary for Printed
International Conference on Document Analysis and Recognition,             Farsi Subwords” , Pattern Recognition Letters vol. 29, No. 5, 656–

© 2012 ACEEE                                                          16
DOI: 01.IJIT.02.01.31
ACEEE Int. J. on Information Technology, Vol. 02, No. 01, March 2012


663, 2008
[22] Borji A, Hamidi M, Mahmoudi F (2008) Robust Handwritten
Character Recognition With Features Inspired By Visual Ventral
Stream. Neural Process Letters vol. 28, No. 2, 97–111.
[23] M. Dehghan and K. Faez, “Farsi handwritten character
recognition with moment invariants”, Procceedings of 13th
International Conference on Digital Signal Processing-DSP-97, vol.
2, 1997.




                                                                              Fig. 5. Comparative diagram: the results before and after
                                                                                 ocalization; localization error versus original state;




  Fig. 3. RF classification results, with different number of decision
                           trees in the forest.




                                                                              Fig. 7. Comparative diagram: the results before and after
                                                                               localization; Localization error versus joint class state.




 Fig. 4. Comparative diagram: the results before and after joining
                         similar classes




© 2012 ACEEE                                                             17
DOI: 01.IJIT.02.01.31

Mais conteúdo relacionado

Mais procurados

An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...ijnlc
 
An Abstraction-Based Genetic Programming System
An Abstraction-Based Genetic Programming SystemAn Abstraction-Based Genetic Programming System
An Abstraction-Based Genetic Programming Systemfbinard
 
Devnagari document segmentation using histogram approach
Devnagari document segmentation using histogram approachDevnagari document segmentation using histogram approach
Devnagari document segmentation using histogram approachVikas Dongre
 
Recognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural NetworkRecognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural NetworkIJERA Editor
 
Dbms 10: Conversion of ER model to Relational Model
Dbms 10: Conversion of ER model to Relational ModelDbms 10: Conversion of ER model to Relational Model
Dbms 10: Conversion of ER model to Relational ModelAmiya9439793168
 
Bca3020– data base management system(dbms)
Bca3020– data base management system(dbms)Bca3020– data base management system(dbms)
Bca3020– data base management system(dbms)smumbahelp
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlpeSAT Journals
 
Handwritten character recognition using method filters
Handwritten character recognition using method filtersHandwritten character recognition using method filters
Handwritten character recognition using method filterseSAT Journals
 

Mais procurados (14)

An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...
 
An Abstraction-Based Genetic Programming System
An Abstraction-Based Genetic Programming SystemAn Abstraction-Based Genetic Programming System
An Abstraction-Based Genetic Programming System
 
Fuzzy
FuzzyFuzzy
Fuzzy
 
Devnagari document segmentation using histogram approach
Devnagari document segmentation using histogram approachDevnagari document segmentation using histogram approach
Devnagari document segmentation using histogram approach
 
Relational Model
Relational ModelRelational Model
Relational Model
 
Recognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural NetworkRecognition of Words in Tamil Script Using Neural Network
Recognition of Words in Tamil Script Using Neural Network
 
Ijetcas14 399
Ijetcas14 399Ijetcas14 399
Ijetcas14 399
 
E123440
E123440E123440
E123440
 
Dbms 10: Conversion of ER model to Relational Model
Dbms 10: Conversion of ER model to Relational ModelDbms 10: Conversion of ER model to Relational Model
Dbms 10: Conversion of ER model to Relational Model
 
09.01 normalization
09.01 normalization09.01 normalization
09.01 normalization
 
Bca3020– data base management system(dbms)
Bca3020– data base management system(dbms)Bca3020– data base management system(dbms)
Bca3020– data base management system(dbms)
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
 
Usage of regular expressions in nlp
Usage of regular expressions in nlpUsage of regular expressions in nlp
Usage of regular expressions in nlp
 
Handwritten character recognition using method filters
Handwritten character recognition using method filtersHandwritten character recognition using method filters
Handwritten character recognition using method filters
 

Destaque

Specification-based Verification of Incomplete Programs
Specification-based Verification of Incomplete ProgramsSpecification-based Verification of Incomplete Programs
Specification-based Verification of Incomplete ProgramsIDES Editor
 
Development of Dual Frequency Alternator Technology Based Power Source For Mi...
Development of Dual Frequency Alternator Technology Based Power Source For Mi...Development of Dual Frequency Alternator Technology Based Power Source For Mi...
Development of Dual Frequency Alternator Technology Based Power Source For Mi...IDES Editor
 
A Generic Describing Method of Memory Latency Hiding in a High-level Synthesi...
A Generic Describing Method of Memory Latency Hiding in a High-level Synthesi...A Generic Describing Method of Memory Latency Hiding in a High-level Synthesi...
A Generic Describing Method of Memory Latency Hiding in a High-level Synthesi...IDES Editor
 
A New Soft-Switched Resonant DC-DC Converter
A New Soft-Switched Resonant DC-DC ConverterA New Soft-Switched Resonant DC-DC Converter
A New Soft-Switched Resonant DC-DC ConverterIDES Editor
 
Identification of Reactive Power Reserve in Transmission Network
Identification of Reactive Power Reserve in Transmission NetworkIdentification of Reactive Power Reserve in Transmission Network
Identification of Reactive Power Reserve in Transmission NetworkIDES Editor
 
KidzFrame: Supporting Awareness in the Daycare
KidzFrame: Supporting Awareness in the DaycareKidzFrame: Supporting Awareness in the Daycare
KidzFrame: Supporting Awareness in the DaycareIDES Editor
 
Application of Solar Powered High Voltage Discharge Plasma for NOX Removal in...
Application of Solar Powered High Voltage Discharge Plasma for NOX Removal in...Application of Solar Powered High Voltage Discharge Plasma for NOX Removal in...
Application of Solar Powered High Voltage Discharge Plasma for NOX Removal in...IDES Editor
 
Resource Identification Using Mobile Queries
Resource Identification Using Mobile QueriesResource Identification Using Mobile Queries
Resource Identification Using Mobile QueriesIDES Editor
 
ANN Based PID Controlled Brushless DC drive System
ANN Based PID Controlled Brushless DC drive SystemANN Based PID Controlled Brushless DC drive System
ANN Based PID Controlled Brushless DC drive SystemIDES Editor
 

Destaque (9)

Specification-based Verification of Incomplete Programs
Specification-based Verification of Incomplete ProgramsSpecification-based Verification of Incomplete Programs
Specification-based Verification of Incomplete Programs
 
Development of Dual Frequency Alternator Technology Based Power Source For Mi...
Development of Dual Frequency Alternator Technology Based Power Source For Mi...Development of Dual Frequency Alternator Technology Based Power Source For Mi...
Development of Dual Frequency Alternator Technology Based Power Source For Mi...
 
A Generic Describing Method of Memory Latency Hiding in a High-level Synthesi...
A Generic Describing Method of Memory Latency Hiding in a High-level Synthesi...A Generic Describing Method of Memory Latency Hiding in a High-level Synthesi...
A Generic Describing Method of Memory Latency Hiding in a High-level Synthesi...
 
A New Soft-Switched Resonant DC-DC Converter
A New Soft-Switched Resonant DC-DC ConverterA New Soft-Switched Resonant DC-DC Converter
A New Soft-Switched Resonant DC-DC Converter
 
Identification of Reactive Power Reserve in Transmission Network
Identification of Reactive Power Reserve in Transmission NetworkIdentification of Reactive Power Reserve in Transmission Network
Identification of Reactive Power Reserve in Transmission Network
 
KidzFrame: Supporting Awareness in the Daycare
KidzFrame: Supporting Awareness in the DaycareKidzFrame: Supporting Awareness in the Daycare
KidzFrame: Supporting Awareness in the Daycare
 
Application of Solar Powered High Voltage Discharge Plasma for NOX Removal in...
Application of Solar Powered High Voltage Discharge Plasma for NOX Removal in...Application of Solar Powered High Voltage Discharge Plasma for NOX Removal in...
Application of Solar Powered High Voltage Discharge Plasma for NOX Removal in...
 
Resource Identification Using Mobile Queries
Resource Identification Using Mobile QueriesResource Identification Using Mobile Queries
Resource Identification Using Mobile Queries
 
ANN Based PID Controlled Brushless DC drive System
ANN Based PID Controlled Brushless DC drive SystemANN Based PID Controlled Brushless DC drive System
ANN Based PID Controlled Brushless DC drive System
 

Semelhante a Improvement of Random Forest Classifier through Localization of Persian Handwritten OCR

Multitier holistic Approach for urdu Nastaliq Recognition
Multitier holistic Approach for urdu Nastaliq RecognitionMultitier holistic Approach for urdu Nastaliq Recognition
Multitier holistic Approach for urdu Nastaliq RecognitionDr. Syed Hassan Amin
 
Optimal Clustering Technique for Handwritten Nandinagari Character Recognition
Optimal Clustering Technique for Handwritten Nandinagari Character RecognitionOptimal Clustering Technique for Handwritten Nandinagari Character Recognition
Optimal Clustering Technique for Handwritten Nandinagari Character RecognitionEditor IJCATR
 
Classifier fusion method to recognize
Classifier fusion method to recognizeClassifier fusion method to recognize
Classifier fusion method to recognizeIJCI JOURNAL
 
Indexing for Large DNA Database sequences
Indexing for Large DNA Database sequencesIndexing for Large DNA Database sequences
Indexing for Large DNA Database sequencesCSCJournals
 
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITIONA NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITIONsipij
 
A NOVEL FEATURE SET FOR RECOGNITION OF SIMILAR SHAPED HANDWRITTEN HINDI CHARA...
A NOVEL FEATURE SET FOR RECOGNITION OF SIMILAR SHAPED HANDWRITTEN HINDI CHARA...A NOVEL FEATURE SET FOR RECOGNITION OF SIMILAR SHAPED HANDWRITTEN HINDI CHARA...
A NOVEL FEATURE SET FOR RECOGNITION OF SIMILAR SHAPED HANDWRITTEN HINDI CHARA...cscpconf
 
High level speaker specific features modeling in automatic speaker recognitio...
High level speaker specific features modeling in automatic speaker recognitio...High level speaker specific features modeling in automatic speaker recognitio...
High level speaker specific features modeling in automatic speaker recognitio...IJECEIAES
 
SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text IJERA Editor
 
Review of research on devnagari character recognition
Review of research on devnagari character recognitionReview of research on devnagari character recognition
Review of research on devnagari character recognitionVikas Dongre
 
An effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognitionAn effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognitionijaia
 
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...acijjournal
 
IRJET- A Survey on MSER Based Scene Text Detection
IRJET-  	  A Survey on MSER Based Scene Text DetectionIRJET-  	  A Survey on MSER Based Scene Text Detection
IRJET- A Survey on MSER Based Scene Text DetectionIRJET Journal
 
13 random forest
13 random forest13 random forest
13 random forestVishal Dutt
 
Dimensionality Reduction and Feature Selection Methods for Script Identificat...
Dimensionality Reduction and Feature Selection Methods for Script Identificat...Dimensionality Reduction and Feature Selection Methods for Script Identificat...
Dimensionality Reduction and Feature Selection Methods for Script Identificat...ITIIIndustries
 
A New Method for Identification of Partially Similar Indian Scripts
A New Method for Identification of Partially Similar Indian ScriptsA New Method for Identification of Partially Similar Indian Scripts
A New Method for Identification of Partially Similar Indian ScriptsCSCJournals
 
The effect of training set size in authorship attribution: application on sho...
The effect of training set size in authorship attribution: application on sho...The effect of training set size in authorship attribution: application on sho...
The effect of training set size in authorship attribution: application on sho...IJECEIAES
 
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...iosrjce
 

Semelhante a Improvement of Random Forest Classifier through Localization of Persian Handwritten OCR (20)

P-6
P-6P-6
P-6
 
Multitier holistic Approach for urdu Nastaliq Recognition
Multitier holistic Approach for urdu Nastaliq RecognitionMultitier holistic Approach for urdu Nastaliq Recognition
Multitier holistic Approach for urdu Nastaliq Recognition
 
Optimal Clustering Technique for Handwritten Nandinagari Character Recognition
Optimal Clustering Technique for Handwritten Nandinagari Character RecognitionOptimal Clustering Technique for Handwritten Nandinagari Character Recognition
Optimal Clustering Technique for Handwritten Nandinagari Character Recognition
 
Classifier fusion method to recognize
Classifier fusion method to recognizeClassifier fusion method to recognize
Classifier fusion method to recognize
 
Indexing for Large DNA Database sequences
Indexing for Large DNA Database sequencesIndexing for Large DNA Database sequences
Indexing for Large DNA Database sequences
 
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITIONA NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
 
A NOVEL FEATURE SET FOR RECOGNITION OF SIMILAR SHAPED HANDWRITTEN HINDI CHARA...
A NOVEL FEATURE SET FOR RECOGNITION OF SIMILAR SHAPED HANDWRITTEN HINDI CHARA...A NOVEL FEATURE SET FOR RECOGNITION OF SIMILAR SHAPED HANDWRITTEN HINDI CHARA...
A NOVEL FEATURE SET FOR RECOGNITION OF SIMILAR SHAPED HANDWRITTEN HINDI CHARA...
 
High level speaker specific features modeling in automatic speaker recognitio...
High level speaker specific features modeling in automatic speaker recognitio...High level speaker specific features modeling in automatic speaker recognitio...
High level speaker specific features modeling in automatic speaker recognitio...
 
SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text SVM Based Identification of Psychological Personality Using Handwritten Text
SVM Based Identification of Psychological Personality Using Handwritten Text
 
Review of research on devnagari character recognition
Review of research on devnagari character recognitionReview of research on devnagari character recognition
Review of research on devnagari character recognition
 
An effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognitionAn effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognition
 
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
 
20120140505011
2012014050501120120140505011
20120140505011
 
mlss
mlssmlss
mlss
 
IRJET- A Survey on MSER Based Scene Text Detection
IRJET-  	  A Survey on MSER Based Scene Text DetectionIRJET-  	  A Survey on MSER Based Scene Text Detection
IRJET- A Survey on MSER Based Scene Text Detection
 
13 random forest
13 random forest13 random forest
13 random forest
 
Dimensionality Reduction and Feature Selection Methods for Script Identificat...
Dimensionality Reduction and Feature Selection Methods for Script Identificat...Dimensionality Reduction and Feature Selection Methods for Script Identificat...
Dimensionality Reduction and Feature Selection Methods for Script Identificat...
 
A New Method for Identification of Partially Similar Indian Scripts
A New Method for Identification of Partially Similar Indian ScriptsA New Method for Identification of Partially Similar Indian Scripts
A New Method for Identification of Partially Similar Indian Scripts
 
The effect of training set size in authorship attribution: application on sho...
The effect of training set size in authorship attribution: application on sho...The effect of training set size in authorship attribution: application on sho...
The effect of training set size in authorship attribution: application on sho...
 
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
 

Mais de IDES Editor

Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A ReviewIDES Editor
 
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...IDES Editor
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...IDES Editor
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...IDES Editor
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCIDES Editor
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...IDES Editor
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingIDES Editor
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...IDES Editor
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsIDES Editor
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...IDES Editor
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...IDES Editor
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkIDES Editor
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetIDES Editor
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyIDES Editor
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’sIDES Editor
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...IDES Editor
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance AnalysisIDES Editor
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesIDES Editor
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...IDES Editor
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...IDES Editor
 

Mais de IDES Editor (20)

Power System State Estimation - A Review
Power System State Estimation - A ReviewPower System State Estimation - A Review
Power System State Estimation - A Review
 
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...Artificial Intelligence Technique based Reactive Power Planning Incorporating...
Artificial Intelligence Technique based Reactive Power Planning Incorporating...
 
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
Design and Performance Analysis of Genetic based PID-PSS with SVC in a Multi-...
 
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
Optimal Placement of DG for Loss Reduction and Voltage Sag Mitigation in Radi...
 
Line Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFCLine Losses in the 14-Bus Power System Network using UPFC
Line Losses in the 14-Bus Power System Network using UPFC
 
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
Study of Structural Behaviour of Gravity Dam with Various Features of Gallery...
 
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric ModelingAssessing Uncertainty of Pushover Analysis to Geometric Modeling
Assessing Uncertainty of Pushover Analysis to Geometric Modeling
 
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
Secure Multi-Party Negotiation: An Analysis for Electronic Payments in Mobile...
 
Selfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive ThresholdsSelfish Node Isolation & Incentivation using Progressive Thresholds
Selfish Node Isolation & Incentivation using Progressive Thresholds
 
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
Various OSI Layer Attacks and Countermeasure to Enhance the Performance of WS...
 
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
Responsive Parameter based an AntiWorm Approach to Prevent Wormhole Attack in...
 
Cloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability FrameworkCloud Security and Data Integrity with Client Accountability Framework
Cloud Security and Data Integrity with Client Accountability Framework
 
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP BotnetGenetic Algorithm based Layered Detection and Defense of HTTP Botnet
Genetic Algorithm based Layered Detection and Defense of HTTP Botnet
 
Enhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through SteganographyEnhancing Data Storage Security in Cloud Computing Through Steganography
Enhancing Data Storage Security in Cloud Computing Through Steganography
 
Low Energy Routing for WSN’s
Low Energy Routing for WSN’sLow Energy Routing for WSN’s
Low Energy Routing for WSN’s
 
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
Permutation of Pixels within the Shares of Visual Cryptography using KBRP for...
 
Rotman Lens Performance Analysis
Rotman Lens Performance AnalysisRotman Lens Performance Analysis
Rotman Lens Performance Analysis
 
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral ImagesBand Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
Band Clustering for the Lossless Compression of AVIRIS Hyperspectral Images
 
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
Microelectronic Circuit Analogous to Hydrogen Bonding Network in Active Site ...
 
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
Texture Unit based Monocular Real-world Scene Classification using SOM and KN...
 

Último

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingThe Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingSelcen Ozturkcan
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Último (20)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central BankingThe Evolution of Money: Digital Transformation and CBDCs in Central Banking
The Evolution of Money: Digital Transformation and CBDCs in Central Banking
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Improvement of Random Forest Classifier through Localization of Persian Handwritten OCR

  • 1. ACEEE Int. J. on Information Technology, Vol. 02, No. 01, March 2012 Improvement of Random Forest Classifier through Localization of Persian Handwritten OCR M. Zahedi, S. Eslami Shahrood University of Technology, Shahrood, Iran {Zahedi, Saeideh_eslami} @shahroodut.ac.ir Abstract — The random forest (RF) classifier is an ensemble or location of the dots while the basic shape of these letters classifier derived from decision tree idea. However the parallel is totally the same. The next table presents printed Persian operations of several classifiers along with use of randomness characters grouped based on their basic shape [2]. Besides; in sample and feature selection has made the random forest a most of characters may appear both in isolated and cursive very strong classifier with accuracy rates comparable to most form; and there are several cursive forms for some of of currently used classifiers. Although, the use of random forest on handwritten digits has been considered before, in TABLE I. this paper RF is applied in recognizing Persian handwritten THE DOT-LESS FORM OF CHARACTERS IN EACH COLUMN IS THE SAME characters. Trying to improve the recognition rate, we suggest converting the structure of decision trees from a binary tree to a multi branch tree. The improvement gained this way proves the applicability of the idea. Index Terms— Random forest classifier, Persian/ Arabic alphabet, handwritten characters, loci features, splitting function I. INTRODUCTION The problem of character recognition OCR has widely been the object of interest in the machine learning and AI issues. Handwritten character recognition entails its own characters, depending on their location in a word [3]. These challenges as careless and rough handwritings often misguide cursive forms make a total number of more than 56 characters the classifier. Cursive alphabets like Arabic and Persian add along with their single forms. to the complexity of the task hence the number of relevant TABLE II. researches is rather few [1]. In this paper, random forest SOME CHARACTERS HAVE SEVERAL CURSIVE FORMS DEPENDING ON THEIR POSITION IN classifier is used to recognize 33 Persian handwritten single THE WORD. characters. The random forest (RF) is a robust and fast classifier which is also able to handle large number of feature variables. Also it is suggested to improve the efficiency of the RF through a localization process which makes the structure of decision trees like a multi-class tree. The implementation results verify the theory. The characteristics of Persian/Arabic alphabet will be discussed in detail in the next part. We go over a brief review on the common features and methods used for Farsi/Arabic handwritten OCR in section III. Section IV summarizes the loci features which are used in recognition. A synopsis of random forest structure and its classification strategy comes in section V. Part VI and VII discusses in turn some main evolution algorithms applied on random forest algorithm and our localization theory. Finally The problem is worse when it comes to handwritten docu- the section VIII gives the experimental results and diagrams. ments. The form of writing dots in 2 or 3-dotted characters vary for different writers. The next figure illustrates some II. CHALLENGES IN PERSIAN/ARABIC CHARACTER RECOGNITION common forms of writing dots. Most of the difficulty in Persian/ Arabic OCR goes back to the large number of different letters in these languages. The presence of dotted characters causes another challenge. 16 out of 32 Persian characters contain dots. Some of these characters can be grouped based on their number of dots. It means the only difference within such a group is the number Fig. 1. Different forms of writing 3-dot components © 2012 ACEEE 13 DOI: 01.IJIT.02.01. 31
  • 2. ACEEE Int. J. on Information Technology, Vol. 02, No. 01, March 2012 As a result of careless writing, dots can be misplaced or can have an unreasonable distance from the character they belong to. This can lead to a notable decrease in the recognition accuracy while noise and image degradation make the problem much complicated. II. CLASSIFICATION METHODS AND FEATURES Characters can be classified using structural, statistical and global features. Structural features refer to the skeleton Fig. 2. Projection in 4-main directions to compute loci features or basic topology of the character such as loops, branch- points, zigzags, height, width, number of crossing points and IV. RANDOM FOREST CLASSIFIER endpoints. They are also very helpful in extracting dot Boosting is an ensemble classification method when each information to classify letters [4], [5]. Statistical features are individual classifier try to contribute to the final result by numerical information like pixel densities, histogram of chain improving the result received from its previous counterpart. code directions, moments, characteristic loci and Fourier Bagging also means building an ensemble of base classifiers, descriptors [4], [5], [6]. Statistical features are effective and each one trained on a bootstrap replicate of the training set. fast to compute but may be affected by noise [5]. Global Decisions are then made on the basis of voting [12]. Based transformations methods mostly include: horizontal and on the bagging theory [13], Breiman introduced the concept vertical projections, coding, Hough transform, Gabor of random forests in 2001 [14]. A decision forest is an ensemble transform [5]. Hidden Markov model is known to learn and of several decision trees which act in a parallel manner. The find characteristics that are difficult to describe [4]. So; decision forest is different from boosting in way that each structural features can be used very well with HMM to tree tries to classify the data totally independent of the other recognize a letter. Some researchers suggest using a trees [13]. Breiman suggested that randomness can be applied combination of classifiers like HMM with SVM, SOM, RBF on the selection of training data samples and set of features or another neural network [7], [8], [9]. Classification results used to classify the data [14]. The reason is not all features of such systems are promising but the high complexity and contribute to the correct recognition and some may deteriorate required training time are considered as their drawbacks. the results. The accuracy of RF depends on: - The correlation between each pair of trees in the III. FEATURE EXTRACTION forest; Increasing the correlation or dependency of The loci features have been proven to be very prosperous the trees in prediction, increases the error rate as in OCR [10]. The notable increase in the recognition accuracy well. has emerged many researchers to use them in handwritten - The strength of each single tree in the forest; OCR as well [10], [11]. The loci features for each sample image Increasing the strength of the individual trees build an 81-length feature vector. Taking only background increases the forest‘s overall recognition accuracy pixels into account, the number of times a direct line starting [15]. from a center pixel and continuing to reach the image With enough number of classifiers (decision trees) and boundary, passes through a foreground area in each north, different sets of features, we can hope that only the most south, east and west directions, is calculated and converted persistent features exist in the data for classification and the to a ternary code according to the following formula: irrelevant features do not affect its efficiency so much [16]. The random selection also helps the classifier to handle value east _counter *27 missing values. If the number of trees is controlled, the random forest classifier can escape overfitting too. west _counter *9 north _counter *3 (1) A. Random forest training south _counter All the data is fed to all trees at once. For each tree, a set where “east_counter” is the obtained counter during of ‘n’ samples are randomly extracted with replacement from eastward projection and similarly for other counters. Note the available training set. Again at each node, a random set that there is no priority between counters and they can be of features consisting of a fixed number of feature variables replaced but a fixed format should be used for all cases. The (often equal to the square root of the total number of features) feature vector is not affected by the size of image. Using is selected from the feature vector [17]. Several random ternary code, there will not be more than 81 different values functions and linear combinations are produced out of per image. The 81-length histogram of the counters will build selected features and are tested to classify the data. Training up the feature vector of the image. This is fixed for all images means finding the best splitting formula or the best linear of all sizes. combination of feature variables. Such a formula minimizes a cost function or rather maximizes an attribute evaluation function and splits the data in the best way at that node [14]. Random forests use the Gini criterion derived from the CART © 2012 ACEEE 14 DOI: 01.IJIT.02.01.31
  • 3. ACEEE Int. J. on Information Technology, Vol. 02, No. 01, March 2012 decision trees [18], [19]. a single random feature to split the data at each node [13] but he later suggested using a linear combination of several Gini   p (1  p ) . i  0,1 i i (1) features instead of one [14], [17]. The latter results in less correlation between trees and better management of missing The other important criterion is the entropy function which feature values. The accuracy of recognition can be increased is calculated as follows: by eliminating or lowering the effect of weak classifiers, i.e. the trees with high rate of misclassification. Robnik‘s Entropy   i  0,1,2,..., n  p i log( p i ) . (2) weighted voting system selects a subset of decision trees as the classifiers using the accuracy records gained during where, pi stands for proportions of data samples from class i. training phase [12]. The system finds the training sample(s) At each node of each tree, a set of random functions and with most similarity to the unknown sample. The trees which linear combinations are produced [16]. The best function is have mislabeled such cases are then put away to avoid further selected to classify the future samples at that node. The nodes error. Finding similar samples follows the algorithm in [17]. It often break until there is only one sample in each node. The is claimed that weighted voting is often better and never probability of belonging to a class for each data sample is worse than the original voting method. This method acts calculated. The same process runs through other trees. The similar to the AdaBoost algorithm in which the training set of final class is determined by the voting mechanism; i.e. the each classifier is weighted according to the correctness of class chosen by most of trees for a data sample is declared as the prediction [12]. The main reason of the random forest‘s its final class. strength and efficiency is the induction of randomness both B. Random forest parameters in sample and feature selection. It is the Gini criterion or the The determinant factors in RF classifier‘s recognition rate entropy function that decides on the selection of splitting are the number and depth of the decision trees, number of threshold but in [18], [20] it is reported that a totally random splitting features and the minimum number of remained cut-point also yields comparably accurate results. Although samples to need a node split into another level . The most the individual tree classifiers are very weak, they are critical parameter which needs to be adjusted is the number uncorrelated [20]. Boinee et al. [18] tries to build a better of randomly selected variables used to form a splitting classifier by combining the AdaBoost and Bagging criterion at each node. The splitting formula divides the techniques with the random forests. The paper actually available data into different categories. However the proposes to use an ensemble of random forests instead of recognition accuracy is not very sensitive to chosen value the original random forest algorithm based on the theory that as long as its difference with the real optimum value is eligible. a combined classifier often acts better than individual So, based on simple assumptions, the default value is often classifiers. set to the square root of number of features which gives rather near optimal results and can also be used as a start VI. LOCALIZAION point in searching for the real optimum value [16], [17]. There is no limit in the number of categories when using RF however the random forest actually builds up a binary V. EVOLUTION ALGORITHMS structure. This means the inputs break into two branches Since the introduction of RF, there has been no standard when reaching a node. They go down the tree until there is study on adjustment of critical parameters such as number of no sample remains unclassified. In each non-leaf node, a trees, number of used features and depth of a decision tree. unique splitting function is selected among other randomly The set of training parameters are often adjusted according produced ones when the function splits the data better than to some default formulas which produce acceptable results the others. The problem of Persian character recognition has but they have been gained by try and error. As it is mentioned been discussed before in the paper and it is noted that some before, the number of features used in splitting is the most characters appear basically in the same group ignoring their crucial variable since as the number of used features dots. Complication occurs when trying to distinguish such increases, the possibility of using similar sets of features characters in their own group. Here we suggest applying a also increases. The consequent correlation between different separate splitting function for each group which recognizes trees has proved to reduce the recognition rate. A near optimal these characters inside the group. Thus each node converts value is the square root of the number of features used as the to a multi-branch classifier rather than the previous binary default number of splitting features but without any one. In our case the decision tree becomes a 10-branch verification. The number of trees commonly varies between structure with single characters being recognized all with 100 and 150. Greater values do not improve the recognition one common splitting function .The improvement gained in much and there is the possibility of overfitting for small recognition is depicted in next section. datasets. In contrast to parameter evaluation, some researches did suggest some innovative methods to increase the overall VII. EXPERIMENTAL RESULTS efficiency of the random forest by moderate modifications The training database consists of 2838 binary image on the original algorithm [15]. Breiman first proposed to use samples of 33 Persian single characters written by 86 different © 2012 ACEEE 15 DOI: 01.IJIT.02.01.31
  • 4. ACEEE Int. J. on Information Technology, Vol. 02, No. 01, March 2012 writers. A complete set of 1650 samples was used to train the 2009. classifier and the rest 1180 were used as the test set to verify [2] H. Izakian, S. A. Monadjemi, B. Tork Ladani, and K. Zamanifar, it‘s efficiency. The inputs to classifier were loci feature vectors “Multi-font Farsi/Arabic isolated character recognition using chain codes”, World Academy of Science, Engineering and Technology, of length 81 extracted from binary images. As there is no vol. 43, 2008. verified formula to find optimal parameter values for the forest, [3] M. Zahedi, S. Eslami,” Farsi/Arabic Optical Font Recognition numerous trials were done with different adjustment of RF Using SIFT Features”, World Conference on Information Technology parameters (number of trees and splitters). The diagrams (WCIT), Procedia Computer Science, vol. 3, pp. 1055-1059, 2011. depict the effect of changing the training variables on [4] L.M. Lorigo, V. Govindaraju, “Off-line Arabic Handwriting recognition rate and also on the elapsed training time. In the Recognition: A Survey” , IEEE Transactions on Pattern Analysis initial phase the random forest is tested using its original and Machine Intelligence, vol. 28, no. 5, pp. 712-724, 2006. structure. The highest accuracy was reached using 120 trees [5] A. M. AL-Shatnawi, S. AL-Salaimeh, F. H. AL-Zawaideh and and 7 random splitting features with 87% of the samples K. Omar, “ Offline Arabic Text Recognition – An Overview”, World of Computer Science and Information Technology Journal (WCSIT), recognized correctly. The total time needed for training and Vol. 1, No. 5, 184-192, 2011. testing a 120-tree forest with 2838 feature vectors of size 81, [6] R. J. Ramteke, “ Invariant Moments Based Feature Extraction is just about 5s. More detailed results are shown in fig. 3. Part for Handwritten Devanagari Vowels Recognition ”, 2010 of error goes back to those dotted groups mentioned earlier International Journal of Computer Applications, vol. 1, No. 18. in the paper. Trying to fix the error, these characters can be [7] M. Dehghan, K. Faez, M. Ahmadi and M. Shridhar, “ grouped and considered as one class. Another set of features Handwritten Farsi (Arabic) word recognition:a holistic approach then can be utilized to classify the inter-group characters. using discrete HMM”, Pattern Recognition vol. 34, pp. 1057-1065, Results of such classification are compared to the previous 2001. ones in fig. 4. The next two figures demonstrate the localization [8] N. Ben Amor, “Multifont Arabic Characters Recognition Using HoughTransform and HMM/ANN Classification”, Journal of effect on the recognition accuracy. There is an apparent Multimedia vol. 1, no. 2, 2006. improvement on the recognition both in the original [9] R. Al-Jawfi, “Handwriting Arabic Character Recognition LeNet classification and in the joint-class character recognition Using Neural Network”, The International Arab Journal of problem. The implementation of the localization idea raises Information Technology, vol. 6, no. 3, 2009. the accuracy rate to 94%. The average accuracy for Persian [10] I.D. Trier and A.K. Jain, “ Feature extraction Methods for handwritten OCR reported by different researchers on different Character recognition- A Survey”, Pattern Recognition, vol. 29, no. datasets is currently about 95% [21], [22], [23]. Nevertheless 4, pp. 641-662, 1996. the main focus of this paper is not improvement of Persian [11] R. Ebrahimpour, M.R. Moradian, A. Esmkhani, F. M. Jafarlou, handwritten OCR but proposing a method for increasing the “ Recognition of Persian handwritten digits using Characterization Loci and Mixture of Experts”, InternationalJournal of Digital random forest‘s recognition rate especially when the similarity Content Technology and Its Applications, vol. 3, no. 3, pp. 42-46, between different classes becomes very troublesome. 2009. [12] M. Robnik,”Improving Random Forests”, Machine Learning, VIII. CONCLUSION AND FUTURE WORK ECML, Springer, Berlin, 2004. [13] Breiman,L., “Bagging predictors”, Mach. Learn, vol. 24, pp. In this paper, the use of random forest in the field of 123–140, 1996. Persian handwritten character recognition has been [14] Breiman,L., “Random forests”, Mach. Learn., vol. 45, pp. 5– discussed. Most of Persian/Arabic characters can be grouped 32, 2001. based on their basic shape. The main challenge in Persian [15] S. Bernard, L. Heutte, S. Adam,” Using Random Forests for OCR goes back to distinguishing such characters. A similar Handwritten Digit Recognition”, Proceedings of the 9th IAPR/ problem occurs when samples belonging to different classes IEEE International Conference on Document Analysis and have several common features. In such cases a single entropy Recognition ICDAR’07, Curitiba : Brazil, 2007 function may not recognize the difference between the [16] D. Amaratunga, J. Cabrera and Y. S. Lee,” Enriched random samples. Thus we suggested using several entropy functions forests”, Bioinformatics, Vol. 24, No. 18, pp. 2010-2014, 2008. [17] Breiman,L. and Cutler,A., “Random forests manual (version each dedicated to one of these groups rather than one global 4.0). Technical Reportof the University of California, Berkeley, entropy criterion. The improvement in recognition accuracy Department of Statistics, 2003. shows possibility of further improvement in recognition by [18] P. Boinee, A. De Angelis and G. L. Foresti, “Meta Random random forests. The coding also has great impact on the Forests”, International Journal of Information and Mathematical results. Professional coding and better structures can increase Sciences, 2006. both speed and accuracy. We are going to test the contribution [19] R B.H. Menza, B.M. Kelm, R. Masuch, U. Himmelreich, P. on other datasets and features to evaluate its applicability in Bachert, W. Petrich and F.A. Hamprecht, “A Cpmparison of different areas. Random Forest And Its Gini Importance With Standard Chemometric Methods For The Feature Selection And Classification of Spectral Data”, BMC, 2009. REFERENCES [20] A. Cutler and G. Zhao, “PERT- Perfect Random Tree [1] S. Mozaffari and H. Soltanizadeh,” ICDAR 2009 Handwritten Ensembles”, Computing Science and Statistics, vol. 33, 2001. Farsi/Arabic Character Recognition Competition”, 10th [21] A. Ebrahimi and E. Kabir, “ A pictorial Dictionary for Printed International Conference on Document Analysis and Recognition, Farsi Subwords” , Pattern Recognition Letters vol. 29, No. 5, 656– © 2012 ACEEE 16 DOI: 01.IJIT.02.01.31
  • 5. ACEEE Int. J. on Information Technology, Vol. 02, No. 01, March 2012 663, 2008 [22] Borji A, Hamidi M, Mahmoudi F (2008) Robust Handwritten Character Recognition With Features Inspired By Visual Ventral Stream. Neural Process Letters vol. 28, No. 2, 97–111. [23] M. Dehghan and K. Faez, “Farsi handwritten character recognition with moment invariants”, Procceedings of 13th International Conference on Digital Signal Processing-DSP-97, vol. 2, 1997. Fig. 5. Comparative diagram: the results before and after ocalization; localization error versus original state; Fig. 3. RF classification results, with different number of decision trees in the forest. Fig. 7. Comparative diagram: the results before and after localization; Localization error versus joint class state. Fig. 4. Comparative diagram: the results before and after joining similar classes © 2012 ACEEE 17 DOI: 01.IJIT.02.01.31