SlideShare uma empresa Scribd logo
1 de 6
Baixar para ler offline
Kavitha Balakrishnan, Kavitha Sunil, Sreedhanya A.V, K.P Soman / International Journal of
     Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                       Vol. 2, Issue4, July-August 2012, pp.1529-1534
 Effect Of Pre-Processing On Historical Sanskrit Text Documents
      Kavitha Balakrishnan, Kavitha Sunil, Sreedhanya A.V, K.P Soman
                      Centre for Excellence in Computational Engineering and Networking



ABSTRACT
         In this paper, the effect of pre-processing
on binarization is explored. Here, pre-processing                           PAPER DOCUMENT
and binarization operations are performed on a
historical Sanskrit text document. After scanning,
pre-processing operations are applied on the image
                                                                                 SCANNING
to remove noise. Pre-processing techniques play an
important role in binarization. Newly developed
pre-processing techniques are Non Local means
and Total Variation methods. Total Variation
methods are motivated by the developments in                                 NOISE REMOVAL
compressive sensing methods like l1 optimization.
Binarization is used as a pre-processor before
OCR, because most OCR packages work only on                                   BINARIZATION
black and white images.

Keywords - Pre-processing filters, Total variation         Fig. 1 Overview of basic steps in preprocessing
methods,     NL      means,     Binarization,     Otsu
Binarization                                                        Before going into the OCR process, one must
                                                           scan the text document. It is important to do a good
1. INTRODUCTION                                            quality scanning since if the quality is poor then it will
          There are a number of factors that affect the    be hard for the OCR software to read the text and to
accuracy of a text document recognized through OCR.        make correct interpretation. The scanned image is
Scanned documents often contain noise that arises due      then stored in jpeg/bmp format. In order to improve
to printer, scanner, print quality, age of the document.   the quality of the image we then have to go for noise
In the case of historical documents, the quality is        removal.
usually very low, and the images suffer high
degradation. The degradation on the historical             Noise Removal
document images is mainly due to fading of ink,                     Scanning of text documents itself can
scanning, paper aging and bleed-through.                   introduce some amount of noise. In order to make it
                                                           suitable for further processing, a scanned document
         The importance of the preprocessing stage of      image is to be freed from any existing noise. This can
a character recognition system lies in its ability to      be achieved by image enhancement.               Image
remedy some of the problems that may occur due to          enhancement mainly involves noise removal which is
factors presented above. Thus, the use of                  done by filtering. Filtering is a neighbourhood
preprocessing techniques may enhance the document          operation, in which the value of any given pixel in the
image and prepare it for the next stage in the character   output image is determined by applying some
recognition system.                                        algorithm to the values of the pixels in the
                                                           neighbourhood of the corresponding input pixel.
         The paper is organized as follows. Section 2      Various methods are applied to reduce noise. The
shows the steps involved in preprocessing. Section 3       most important reason to reduce noise is to obtain
deals with noise removal techniques. It comprises of       easy way of recognition of documents.
conventional and new filtering methods. Otsu
Binarization algorithm is discussed in Section 4.          Binarization
Section 5 deals with the experimental results and in                 Image binarization converts a gray-scale
Section 6 we conclude the discussion.                      document image into a black and white (0s&1s)
                                                           format. The choice of binarization algorithm depends
2. STEPS INVOLVED IN PREPROCESSING                         on the quality of the image. Hence the binarization
        The steps involved in preprocessing is given       algorithms that work best on one image may not be
below Scanning and image digitization                      best for another.



                                                                                                   1529 | P a g e
Kavitha Balakrishnan, Kavitha Sunil, Sreedhanya A.V, K.P Soman / International Journal of
      Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                       Vol. 2, Issue4, July-August 2012, pp.1529-1534
3. NOISE REMOVAL METHODS                       window. It is widely used because it will remove the
3.1. Conventional methods                                     noise while preserving the edges present in the image.
3.1.1. Mean filter
           Mean filter is one of the simplest linear          3.1.3. Wiener filter
filters. It is a sliding window spatial filter, which                   Wiener filter is an adaptive linear filter,
replaces the center pixel of the window by the average        which takes local variance of the image into account.
of all the values in the window. The size of the              When the variance in an image is large, the Wiener
window determines the image contrast. As we                   filter results in light local smoothing, while when the
increase the size of the window, the image becomes            variance is small, it gives an improved local
more and more blurred.                                        smoothing.
                                                              Figure 2 shows the effect of filters on text documents.
3.1.2. Median filter                                          Mean filters smoothen the edges of the character and
          It is a non linear filter. It is a sliding window   background. The median filter works better than the
spatial filter, which replaces the center pixel of the        mean filter and preserves useful details in the image.
window by the median of all the values in the                 Wiener filter produces a fair amount of edge blurring.




Fig. 2. Samples showing the effect of preprocessing filter (a) The original text image, (b) Mean filter (c)
Median filter (d) Wiener filter
3.2 Non Local Means                                     contains only low frequencies. But some images
         The conventional filtering methods remove contain fine details and structures which have high
fine details present in the image along with noise. frequencies. Filtering removes these high frequency
Most denoising algorithms make two assumptions details in addition to the high frequency noise, and
about the noisy image. The first assumption is that the these methods do nothing to remove low frequency
noise contained in the image is white noise. The noise present in the image. These assumptions can
second assumption is the true image is smooth i.e. it




                                                                                                    1530 | P a g e
Kavitha Balakrishnan, Kavitha Sunil, Sreedhanya A.V, K.P Soman / International Journal of
     Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                       Vol. 2, Issue4, July-August 2012, pp.1529-1534
cause blurring and loss of detail in the resulting       N r , s denotes the neighborhood centered over pixel
denoised images.
         The Non-local means assumes that the image      ( x, y) and (r , s) respectively. Let the window size
contains an extensive amount of redundancy and           be MxM, where M is the odd. Similarity between the
exploits these redundancies to remove the noise          two neighborhoods can be found using L2-
present in the image. Adjacent pixels tend to have
                                                         norm: N x , y  N r , s
similar neighborhoods, but non-adjacent pixels can                                  2
also have similar neighborhoods’ as shown in the
figure below. The self-similarity assumption is          This norm then defines a weight to be used in our
exploited to denoise an image. Pixels with similar       weighted average
neighborhoods can be used to determine the denoised                                             exp(( N x, y  N r , s ))2
value of a pixel.                                               w( N x , y , N x , y ) 
                                                                                                            h2
                                                         where h is a parameter that needs to be fine-tuned.
                                                         Similar neighborhoods give w  1 . If the two
                                                         neighborhoods are very different, w  0 .

                                                         Then each pixel in our new image        g ( x, y) is a
                                                         weighted average of the pixels in f ( x, y ) , weighted
                                                         by how similar the neighborhoods are
Fig. 3. Example for self-similarity is shown. Non

                                                                            f  r, s  .w  N , N 
          adjacent pixels p, q and r have similar
          neighborhood in the image.                                                                                x, y     r ,s
The non-local means replaces a pixel by the weighted          g ( x, y )       r       s
average of other neighborhoods in the image. This
method is the best possible denoising method for
                                                                             w  N , N   r     s
                                                                                                            x, y      r ,s

natural images. First we make a list of all similar      When historical document image with noise is
neighborhoods in the image. Neighborhood of each         subjected to NL means algorithm, we get result as
pixel is then linearized to form a row in a matrix and   shown in figure. Smaller neighborhoods remove noise
L2 norm is computed between each row. Let N x , y and    better.




Fig. 4. Effect of NL means algorithm is shown (a) original image, (b) image denoised using NL means
with window size=3.

3.3 Total Variation                                      statistical properties as the original image, along with
3.3.1 Tikhnov model                                      sharp edges and low noise.
         The Total Variation approach is used when       Finding the denoised text image can be
characters in the text document are highly degraded.     mathematically described as an optimization problem:
TV Denoising reduces total variation of the image. It                       u*  maxu { p(u / f )}
filters out noise while preserving the edges. The
resulting filtered image should have the same



                                                                                                                   1531 | P a g e
Kavitha Balakrishnan, Kavitha Sunil, Sreedhanya A.V, K.P Soman / International Journal of
     Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                       Vol. 2, Issue4, July-August 2012, pp.1529-1534
where p(u / f ) is the posterior probability of a                                                                                              1                     1
                                                                                                                              min E (u )       u( x, y) d   2   f  u  d 
                                                                                                                                                             2                     2

hypothesis u (describing our best solution) might be                                                                            u              2
true given by the observation f.
The conditional probability p(u / f ) can be written                                                                                    This is Tikhonov model. The first term is the
                                                                                                                              regularization term which is derived from prior and
as
                                                                                                                              the second term as the data fidelity term. The data
                                         p( f / u ) p(u )
                  p(u / f )                                                                                                  fidelity measures how far the current solution u is
                                             p( f )                                                                           from the observed image f. The parameter λ is a non-
          where p(u) is the prior probability of u and                                                                        negative coefficient that governs the balance between
                                                                                                                              the data fidelity and the regularization term. A large
p( f u )   is the conditional describing how well the                                                                         value for λ will produce an image with few details,
observed data f can be explained by the solution u.                                                                           removing small features, while a small value will
The probability of the low noise image with respect to                                                                        yield an image same as u. In Tikhnov model, since we
the image f (that is defining our inverse problem):                                                                           use image prior as a set of smooth images, we obtain
                                                                                                                              blurred images as output.
                                                                                                   2                      2
                                                                   ( f ( x , y )  u ( x , y ))           u ( x , y )
                                                                                                       
               p ( f , u ) p (u )                       1
p (u / f )                                                                 2                               2
                                                                                   2                                2

                                                              e                                                               3.3.2 ROF Model
                    p( f )              ( x , y )D   4                                                                              The ROF model is similar to the Tikhonov
                                                                                                                              model but the regularization term has been changed to
                                                                                                                              the TV norm instead of the quadratic norm. In its
          Maximizing this probability is equivalent to                                                                        original formulation, the ROF model is defined as the
minimizing the negative term in the exponent on the                                                                           constrained optimization problem
set of pixels D, that is the overall integral of
         ( f ( x, y)  u ( x, y))2 u ( x, y )
                                  
                                                                                         2
                                                                                                                                    min
                                                                                                                                     u
                                                                                                                                          
                                                                                                                                           
                                                                                                                                                      
                                                                                                                                               u d  s.t  (u  f )2 d    2
                                                                                                                                                            
                   2 2              2 2
                                                                                                                                       where f is the observed image function which
This leads us to the following variational formulation                                                                        is assumed to be corrupted by Gaussian noise of
                                                                                                                              variance  and u is the unknown denoised image.
                                                                                                                                         2
of our image restoration problem




          Fig. 5. Result of denoising using Tikhnov model (a) The original image, (b) TV using λ=1.




                                                                                                                                                                       1532 | P a g e
Kavitha Balakrishnan, Kavitha Sunil, Sreedhanya A.V, K.P Soman / International Journal of
     Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                       Vol. 2, Issue4, July-August 2012, pp.1529-1534




Figure 6. Result of denoising using ROF model (a) The original image, (b) TV using λ=3, (c) TV using
λ=5.

         The original non-convex ROF model can be         the threshold value, then it is classified as black.
turned into a convex problem by replacing the             Binarization techniques can be either global or local.
equality constraint                                       In global binarization we choose a single threshold for
                                                          4.1 Otsu Binarization Algorithm

    (u  f )2 d    2 by the inequality constraint               Otsu is a global thresholding binarization
                                                         algorithm .It assumes that the image contains two

    (u  f )2 d    2                                  classes of pixels; one foreground (black) and one
                                                         background (white). Then it calculates the optimum
which in turn can further be transformed to the           threshold separating those two classes so that their
unconstrained (or Lagrangian) model                       intra-class variance is minimal. This is equivalent to
                                                          maximizing the between-class scatter. This establishes
                 1                                      an optimum threshold K.
min   u d       (u  f ) d .
                               2
                                                                                 1, if I gray ( x, y)  K
 u
               2                                                             
                                                                    I ( x, y )  
                                                                                 0, if I gray ( x, y )  K
                                                                                 
where      is a Lagrange multiplier.
                                                          5. RESULTS
4. BINARIZATION                                                    The text image is first filtered by each of the
          Image binarization converts a gray-scale        noise removal algorithms described in Section 3. Then
document image into a black and white (0s&1s)             each filter output along with the unfiltered original
format. In binarization, we choose a threshold value to   was then binarized by using Otsu binarization
classify the pixels as black and white. If the pixel      algorithm. Two methods are used to do the evaluation
value is greater than the threshold value, then it is     measures – MSE and PSNR.
classified as white and if the pixel value is less than
the whole document image, but these techniques are
not suitable for degraded images. In local binarization
technique, the local threshold can be calculated by
using different information of the document images,
such as the mean and standard variation of pixel
values within a local window.



                                                          (b) median, (c) wiener, (d) Tikhnov, (e) TV ROF
Figure 7. Result of binarization using Otsu               using λ=3, (f) TV ROF using λ=5
algorithm on denoised image using (a) mean filter,

                                                                                                    1533 | P a g e
Kavitha Balakrishnan, Kavitha Sunil, Sreedhanya A.V, K.P Soman / International Journal of
     Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com
                       Vol. 2, Issue4, July-August 2012, pp.1529-1534
         Mean Square Error (MSE) for two m  n
images I and K is given by,                             M I is the maximum possible pixel value of the
              1 m1 n1
        MSE     [ I (i, j)  K (i, j)]2
              mn i 0 j 0
                                                        image.
                                                                 These measures were calculated for every
                                                        image-filter combination. The score obtained using
where one of the images is considered a noisy           preprocessing algorithms are shown in Tables 1. The
approximation of the other .                            image binarized using Otsu algorithms are measured
         Peak Signal-To-Noise Ratio (PSNR) is           and the scores got are shown in Table 2.
generally used to analyze quality of image in dB
(decibels). PSNR calculation of two images, one                  In most cases the best pre-processing filter
original and an altered image, describes how far two    was NL means. Next best results are shown by Total
images are equal.                                       Variation filter with λ=3. For the conventional filters
                             M2                       mean, median and Wiener filter, Wiener gives best
             PSNR  10log10  I                        results.
                             MSE 
                          MI 
                20log10       
                          MSE 

Table 1. Performance measure of various preprocessing algorithm
             Mean       Median       Wiener       Tikhonov      ROF λ=5              ROF λ=3        NLmeans
MSE          702.1072   713.0556     50.9573      338.8237      22.8120              13.4386        2.7225
PSNR         19.6668    19.5996      31.0587      22.8311       34.5492              36.8473        43.76

Table 2. Performance measure after Otsu binarization
                        Median       Wiener        Tikhonov           ROF λ=5        ROF λ=3       NLmeans
MSE          4476.7     3557.5       325.4874      1923.7             169.1009       111.8863      84.2275
PSNR         11.6212    12.6194      23.0055       15.2895            25.8493        27.6430       28.9978
                                                                 International Conference on Computer
6. CONCLUSION                                                    Vision and Pattern Recognition, 2005.
This paper presents a system that enhances the            [5]    W. Evans. Image denoising with the non-
readability of an old Sanskrit document, through                 local                                 means
effective preprocessing methods for binarization. The            algorithm.Available:http://www.cs.wisc.edu .
pre-processing techniques have been successfully          [6]    Chambolle, A. and Lions, “Image recovery
tested on a degraded document. Experiments show                  via total variation minimization and related
best results for the NL means algorithm, thereby                 problems”, Numer. Math., 76:167-188(1997)
showing good binarization performance. So the             [7]    Chan, T., Golub, and Mulet, P. (1999), “A
proposed methods can be used for preprocessing of                nonlinear primal-dual method for total
historical machine-printed documents.                            variation-based image restoration”, SIAM J.
                                                                 Sci. Comp., 20(6):1964-1977.
7. REFERENCES                                             [8]    Rudin, L., Osher, S., and Fatemi, “Nonlinear
  [1] Laurence Likforman-Sulem, Jérôme Darbon                    total variation based noise removal
      Elisa H. Barney Smith, “Enhancement of                     algorithms”. Physica D, 60:259-268. 1992.
      Historical Printed Document Images by
      Combining Total Variation Regularization
      and Non-Local Means Filtering”, Image and
      Vision Computing (2011). Vol. 29, Nr. 5, p.
      351-363.
  [2] Elisa H. Barney Smith, Laurence Likforman-
      Sulem, Jérôme Darbon, “Effect of Pre-
      Processing         on          Binarization”,
      Conference: Document Recognition and
      Retrieval - DRR , pp. 1-10, 2010

  [3]    A. Buades, B. Coll, and J Morel. “On Image
         denoising     Methods”, Technical Report
         2004-15, CMLA, 2004.
  [4]    A. Buades, B. Coll, and J Morel., “A non-
         local algorithm for image denoising”, IEEE

                                                                                              1534 | P a g e

Mais conteúdo relacionado

Mais procurados

Labview with dwt for denoising the blurred biometric images
Labview with dwt for denoising the blurred biometric imagesLabview with dwt for denoising the blurred biometric images
Labview with dwt for denoising the blurred biometric imagesijcsa
 
A SURVEY : On Image Denoising and its Various Techniques
A SURVEY :  On Image Denoising and its Various TechniquesA SURVEY :  On Image Denoising and its Various Techniques
A SURVEY : On Image Denoising and its Various TechniquesIRJET Journal
 
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT sipij
 
image denoising technique using disctere wavelet transform
image denoising technique using disctere wavelet transformimage denoising technique using disctere wavelet transform
image denoising technique using disctere wavelet transformalishapb
 
23 an investigation on image 233 241
23 an investigation on image 233 24123 an investigation on image 233 241
23 an investigation on image 233 241Alexander Decker
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
Performance analysis of image filtering algorithms for mri images
Performance analysis of image filtering algorithms for mri imagesPerformance analysis of image filtering algorithms for mri images
Performance analysis of image filtering algorithms for mri imageseSAT Publishing House
 
Lectures 1 3 final (4)
Lectures 1 3 final (4)Lectures 1 3 final (4)
Lectures 1 3 final (4)seemakashyap15
 
A Decision tree and Conditional Median Filter Based Denoising for impulse noi...
A Decision tree and Conditional Median Filter Based Denoising for impulse noi...A Decision tree and Conditional Median Filter Based Denoising for impulse noi...
A Decision tree and Conditional Median Filter Based Denoising for impulse noi...IJERA Editor
 
Design Approach of Colour Image Denoising Using Adaptive Wavelet
Design Approach of Colour Image Denoising Using Adaptive WaveletDesign Approach of Colour Image Denoising Using Adaptive Wavelet
Design Approach of Colour Image Denoising Using Adaptive WaveletIJERD Editor
 
De-speckling of underwater ultrasound images
De-speckling of underwater ultrasound images De-speckling of underwater ultrasound images
De-speckling of underwater ultrasound images ajujohnkk
 
A Novel and Robust Wavelet based Super Resolution Reconstruction of Low Resol...
A Novel and Robust Wavelet based Super Resolution Reconstruction of Low Resol...A Novel and Robust Wavelet based Super Resolution Reconstruction of Low Resol...
A Novel and Robust Wavelet based Super Resolution Reconstruction of Low Resol...CSCJournals
 
www.ijera.com 68 | P a g e Leaf Disease Detection Using Arm7 and Image Proces...
www.ijera.com 68 | P a g e Leaf Disease Detection Using Arm7 and Image Proces...www.ijera.com 68 | P a g e Leaf Disease Detection Using Arm7 and Image Proces...
www.ijera.com 68 | P a g e Leaf Disease Detection Using Arm7 and Image Proces...IJERA Editor
 

Mais procurados (20)

Labview with dwt for denoising the blurred biometric images
Labview with dwt for denoising the blurred biometric imagesLabview with dwt for denoising the blurred biometric images
Labview with dwt for denoising the blurred biometric images
 
IJSRDV3I40293
IJSRDV3I40293IJSRDV3I40293
IJSRDV3I40293
 
Bh044365368
Bh044365368Bh044365368
Bh044365368
 
A SURVEY : On Image Denoising and its Various Techniques
A SURVEY :  On Image Denoising and its Various TechniquesA SURVEY :  On Image Denoising and its Various Techniques
A SURVEY : On Image Denoising and its Various Techniques
 
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
A NOVEL ALGORITHM FOR IMAGE DENOISING USING DT-CWT
 
L011117884
L011117884L011117884
L011117884
 
Ch2
Ch2Ch2
Ch2
 
image denoising technique using disctere wavelet transform
image denoising technique using disctere wavelet transformimage denoising technique using disctere wavelet transform
image denoising technique using disctere wavelet transform
 
Cp31608611
Cp31608611Cp31608611
Cp31608611
 
23 an investigation on image 233 241
23 an investigation on image 233 24123 an investigation on image 233 241
23 an investigation on image 233 241
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Performance analysis of image filtering algorithms for mri images
Performance analysis of image filtering algorithms for mri imagesPerformance analysis of image filtering algorithms for mri images
Performance analysis of image filtering algorithms for mri images
 
Lectures 1 3 final (4)
Lectures 1 3 final (4)Lectures 1 3 final (4)
Lectures 1 3 final (4)
 
Lecture 2
Lecture 2Lecture 2
Lecture 2
 
A Decision tree and Conditional Median Filter Based Denoising for impulse noi...
A Decision tree and Conditional Median Filter Based Denoising for impulse noi...A Decision tree and Conditional Median Filter Based Denoising for impulse noi...
A Decision tree and Conditional Median Filter Based Denoising for impulse noi...
 
Aw044306308
Aw044306308Aw044306308
Aw044306308
 
Design Approach of Colour Image Denoising Using Adaptive Wavelet
Design Approach of Colour Image Denoising Using Adaptive WaveletDesign Approach of Colour Image Denoising Using Adaptive Wavelet
Design Approach of Colour Image Denoising Using Adaptive Wavelet
 
De-speckling of underwater ultrasound images
De-speckling of underwater ultrasound images De-speckling of underwater ultrasound images
De-speckling of underwater ultrasound images
 
A Novel and Robust Wavelet based Super Resolution Reconstruction of Low Resol...
A Novel and Robust Wavelet based Super Resolution Reconstruction of Low Resol...A Novel and Robust Wavelet based Super Resolution Reconstruction of Low Resol...
A Novel and Robust Wavelet based Super Resolution Reconstruction of Low Resol...
 
www.ijera.com 68 | P a g e Leaf Disease Detection Using Arm7 and Image Proces...
www.ijera.com 68 | P a g e Leaf Disease Detection Using Arm7 and Image Proces...www.ijera.com 68 | P a g e Leaf Disease Detection Using Arm7 and Image Proces...
www.ijera.com 68 | P a g e Leaf Disease Detection Using Arm7 and Image Proces...
 

Destaque (20)

Jj2416341637
Jj2416341637Jj2416341637
Jj2416341637
 
Iy2415661571
Iy2415661571Iy2415661571
Iy2415661571
 
Ip2415101517
Ip2415101517Ip2415101517
Ip2415101517
 
Hc2512901294
Hc2512901294Hc2512901294
Hc2512901294
 
Gw2512481257
Gw2512481257Gw2512481257
Gw2512481257
 
Ij2414761479
Ij2414761479Ij2414761479
Ij2414761479
 
Jd2416001606
Jd2416001606Jd2416001606
Jd2416001606
 
Jf2416121616
Jf2416121616Jf2416121616
Jf2416121616
 
Mu2522002204
Mu2522002204Mu2522002204
Mu2522002204
 
T25102106
T25102106T25102106
T25102106
 
U25107111
U25107111U25107111
U25107111
 
Mn2521592162
Mn2521592162Mn2521592162
Mn2521592162
 
Mh2521282131
Mh2521282131Mh2521282131
Mh2521282131
 
Ml2521492153
Ml2521492153Ml2521492153
Ml2521492153
 
O25074078
O25074078O25074078
O25074078
 
Mi2521322136
Mi2521322136Mi2521322136
Mi2521322136
 
Mk2521432148
Mk2521432148Mk2521432148
Mk2521432148
 
W25116118
W25116118W25116118
W25116118
 
Diccionario informático
Diccionario informáticoDiccionario informático
Diccionario informático
 
Ip2515381543
Ip2515381543Ip2515381543
Ip2515381543
 

Semelhante a Is2415291534

A binarization technique for extraction of devanagari text from camera based ...
A binarization technique for extraction of devanagari text from camera based ...A binarization technique for extraction of devanagari text from camera based ...
A binarization technique for extraction of devanagari text from camera based ...sipij
 
Survey Paper on Image Denoising Using Spatial Statistic son Pixel
Survey Paper on Image Denoising Using Spatial Statistic son PixelSurvey Paper on Image Denoising Using Spatial Statistic son Pixel
Survey Paper on Image Denoising Using Spatial Statistic son PixelIJERA Editor
 
IRJET- A Review on Various Restoration Techniques in Digital Image Processing
IRJET- A Review on Various Restoration Techniques in Digital Image ProcessingIRJET- A Review on Various Restoration Techniques in Digital Image Processing
IRJET- A Review on Various Restoration Techniques in Digital Image ProcessingIRJET Journal
 
Filter for Removal of Impulse Noise By Using Fuzzy Logic
Filter for Removal of Impulse Noise By Using Fuzzy LogicFilter for Removal of Impulse Noise By Using Fuzzy Logic
Filter for Removal of Impulse Noise By Using Fuzzy LogicCSCJournals
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Performance analysis of image filtering algorithms for mri images
Performance analysis of image filtering algorithms for mri imagesPerformance analysis of image filtering algorithms for mri images
Performance analysis of image filtering algorithms for mri imageseSAT Publishing House
 
Adapter Wavelet Thresholding for Image Denoising Using Various Shrinkage Unde...
Adapter Wavelet Thresholding for Image Denoising Using Various Shrinkage Unde...Adapter Wavelet Thresholding for Image Denoising Using Various Shrinkage Unde...
Adapter Wavelet Thresholding for Image Denoising Using Various Shrinkage Unde...muhammed jassim k
 
V2 i2087
V2 i2087V2 i2087
V2 i2087Rucku
 
noise remove in image processing by fuzzy logic
noise remove in image processing by fuzzy logicnoise remove in image processing by fuzzy logic
noise remove in image processing by fuzzy logicRucku
 
Paper on image processing
Paper on image processingPaper on image processing
Paper on image processingSaloni Bhatia
 
Sliced Ridgelet Transform for Image Denoising
Sliced Ridgelet Transform for Image DenoisingSliced Ridgelet Transform for Image Denoising
Sliced Ridgelet Transform for Image DenoisingIOSR Journals
 
Comparison of Denoising Filters on Greyscale TEM Image for Different Noise
Comparison of Denoising Filters on Greyscale TEM Image for  Different NoiseComparison of Denoising Filters on Greyscale TEM Image for  Different Noise
Comparison of Denoising Filters on Greyscale TEM Image for Different NoiseIOSR Journals
 
Translation Invariance (TI) based Novel Approach for better De-noising of Dig...
Translation Invariance (TI) based Novel Approach for better De-noising of Dig...Translation Invariance (TI) based Novel Approach for better De-noising of Dig...
Translation Invariance (TI) based Novel Approach for better De-noising of Dig...IRJET Journal
 
Iaetsd a review on enhancement of degraded
Iaetsd a review on enhancement of degradedIaetsd a review on enhancement of degraded
Iaetsd a review on enhancement of degradedIaetsd Iaetsd
 
Denoising and Edge Detection Using Sobelmethod
Denoising and Edge Detection Using SobelmethodDenoising and Edge Detection Using Sobelmethod
Denoising and Edge Detection Using SobelmethodIJMER
 

Semelhante a Is2415291534 (20)

A binarization technique for extraction of devanagari text from camera based ...
A binarization technique for extraction of devanagari text from camera based ...A binarization technique for extraction of devanagari text from camera based ...
A binarization technique for extraction of devanagari text from camera based ...
 
Survey Paper on Image Denoising Using Spatial Statistic son Pixel
Survey Paper on Image Denoising Using Spatial Statistic son PixelSurvey Paper on Image Denoising Using Spatial Statistic son Pixel
Survey Paper on Image Denoising Using Spatial Statistic son Pixel
 
IRJET- A Review on Various Restoration Techniques in Digital Image Processing
IRJET- A Review on Various Restoration Techniques in Digital Image ProcessingIRJET- A Review on Various Restoration Techniques in Digital Image Processing
IRJET- A Review on Various Restoration Techniques in Digital Image Processing
 
Filter for Removal of Impulse Noise By Using Fuzzy Logic
Filter for Removal of Impulse Noise By Using Fuzzy LogicFilter for Removal of Impulse Noise By Using Fuzzy Logic
Filter for Removal of Impulse Noise By Using Fuzzy Logic
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Jc3515691575
Jc3515691575Jc3515691575
Jc3515691575
 
Performance analysis of image filtering algorithms for mri images
Performance analysis of image filtering algorithms for mri imagesPerformance analysis of image filtering algorithms for mri images
Performance analysis of image filtering algorithms for mri images
 
Adapter Wavelet Thresholding for Image Denoising Using Various Shrinkage Unde...
Adapter Wavelet Thresholding for Image Denoising Using Various Shrinkage Unde...Adapter Wavelet Thresholding for Image Denoising Using Various Shrinkage Unde...
Adapter Wavelet Thresholding for Image Denoising Using Various Shrinkage Unde...
 
Sub1586
Sub1586Sub1586
Sub1586
 
V2 i2087
V2 i2087V2 i2087
V2 i2087
 
noise remove in image processing by fuzzy logic
noise remove in image processing by fuzzy logicnoise remove in image processing by fuzzy logic
noise remove in image processing by fuzzy logic
 
Paper on image processing
Paper on image processingPaper on image processing
Paper on image processing
 
Sliced Ridgelet Transform for Image Denoising
Sliced Ridgelet Transform for Image DenoisingSliced Ridgelet Transform for Image Denoising
Sliced Ridgelet Transform for Image Denoising
 
Comparison of Denoising Filters on Greyscale TEM Image for Different Noise
Comparison of Denoising Filters on Greyscale TEM Image for  Different NoiseComparison of Denoising Filters on Greyscale TEM Image for  Different Noise
Comparison of Denoising Filters on Greyscale TEM Image for Different Noise
 
Translation Invariance (TI) based Novel Approach for better De-noising of Dig...
Translation Invariance (TI) based Novel Approach for better De-noising of Dig...Translation Invariance (TI) based Novel Approach for better De-noising of Dig...
Translation Invariance (TI) based Novel Approach for better De-noising of Dig...
 
W4101139143
W4101139143W4101139143
W4101139143
 
Iaetsd a review on enhancement of degraded
Iaetsd a review on enhancement of degradedIaetsd a review on enhancement of degraded
Iaetsd a review on enhancement of degraded
 
Denoising and Edge Detection Using Sobelmethod
Denoising and Edge Detection Using SobelmethodDenoising and Edge Detection Using Sobelmethod
Denoising and Edge Detection Using Sobelmethod
 
J010245458
J010245458J010245458
J010245458
 
P180203105108
P180203105108P180203105108
P180203105108
 

Is2415291534

  • 1. Kavitha Balakrishnan, Kavitha Sunil, Sreedhanya A.V, K.P Soman / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.1529-1534 Effect Of Pre-Processing On Historical Sanskrit Text Documents Kavitha Balakrishnan, Kavitha Sunil, Sreedhanya A.V, K.P Soman Centre for Excellence in Computational Engineering and Networking ABSTRACT In this paper, the effect of pre-processing on binarization is explored. Here, pre-processing PAPER DOCUMENT and binarization operations are performed on a historical Sanskrit text document. After scanning, pre-processing operations are applied on the image SCANNING to remove noise. Pre-processing techniques play an important role in binarization. Newly developed pre-processing techniques are Non Local means and Total Variation methods. Total Variation methods are motivated by the developments in NOISE REMOVAL compressive sensing methods like l1 optimization. Binarization is used as a pre-processor before OCR, because most OCR packages work only on BINARIZATION black and white images. Keywords - Pre-processing filters, Total variation Fig. 1 Overview of basic steps in preprocessing methods, NL means, Binarization, Otsu Binarization Before going into the OCR process, one must scan the text document. It is important to do a good 1. INTRODUCTION quality scanning since if the quality is poor then it will There are a number of factors that affect the be hard for the OCR software to read the text and to accuracy of a text document recognized through OCR. make correct interpretation. The scanned image is Scanned documents often contain noise that arises due then stored in jpeg/bmp format. In order to improve to printer, scanner, print quality, age of the document. the quality of the image we then have to go for noise In the case of historical documents, the quality is removal. usually very low, and the images suffer high degradation. The degradation on the historical Noise Removal document images is mainly due to fading of ink, Scanning of text documents itself can scanning, paper aging and bleed-through. introduce some amount of noise. In order to make it suitable for further processing, a scanned document The importance of the preprocessing stage of image is to be freed from any existing noise. This can a character recognition system lies in its ability to be achieved by image enhancement. Image remedy some of the problems that may occur due to enhancement mainly involves noise removal which is factors presented above. Thus, the use of done by filtering. Filtering is a neighbourhood preprocessing techniques may enhance the document operation, in which the value of any given pixel in the image and prepare it for the next stage in the character output image is determined by applying some recognition system. algorithm to the values of the pixels in the neighbourhood of the corresponding input pixel. The paper is organized as follows. Section 2 Various methods are applied to reduce noise. The shows the steps involved in preprocessing. Section 3 most important reason to reduce noise is to obtain deals with noise removal techniques. It comprises of easy way of recognition of documents. conventional and new filtering methods. Otsu Binarization algorithm is discussed in Section 4. Binarization Section 5 deals with the experimental results and in Image binarization converts a gray-scale Section 6 we conclude the discussion. document image into a black and white (0s&1s) format. The choice of binarization algorithm depends 2. STEPS INVOLVED IN PREPROCESSING on the quality of the image. Hence the binarization The steps involved in preprocessing is given algorithms that work best on one image may not be below Scanning and image digitization best for another. 1529 | P a g e
  • 2. Kavitha Balakrishnan, Kavitha Sunil, Sreedhanya A.V, K.P Soman / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.1529-1534 3. NOISE REMOVAL METHODS window. It is widely used because it will remove the 3.1. Conventional methods noise while preserving the edges present in the image. 3.1.1. Mean filter Mean filter is one of the simplest linear 3.1.3. Wiener filter filters. It is a sliding window spatial filter, which Wiener filter is an adaptive linear filter, replaces the center pixel of the window by the average which takes local variance of the image into account. of all the values in the window. The size of the When the variance in an image is large, the Wiener window determines the image contrast. As we filter results in light local smoothing, while when the increase the size of the window, the image becomes variance is small, it gives an improved local more and more blurred. smoothing. Figure 2 shows the effect of filters on text documents. 3.1.2. Median filter Mean filters smoothen the edges of the character and It is a non linear filter. It is a sliding window background. The median filter works better than the spatial filter, which replaces the center pixel of the mean filter and preserves useful details in the image. window by the median of all the values in the Wiener filter produces a fair amount of edge blurring. Fig. 2. Samples showing the effect of preprocessing filter (a) The original text image, (b) Mean filter (c) Median filter (d) Wiener filter 3.2 Non Local Means contains only low frequencies. But some images The conventional filtering methods remove contain fine details and structures which have high fine details present in the image along with noise. frequencies. Filtering removes these high frequency Most denoising algorithms make two assumptions details in addition to the high frequency noise, and about the noisy image. The first assumption is that the these methods do nothing to remove low frequency noise contained in the image is white noise. The noise present in the image. These assumptions can second assumption is the true image is smooth i.e. it 1530 | P a g e
  • 3. Kavitha Balakrishnan, Kavitha Sunil, Sreedhanya A.V, K.P Soman / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.1529-1534 cause blurring and loss of detail in the resulting N r , s denotes the neighborhood centered over pixel denoised images. The Non-local means assumes that the image ( x, y) and (r , s) respectively. Let the window size contains an extensive amount of redundancy and be MxM, where M is the odd. Similarity between the exploits these redundancies to remove the noise two neighborhoods can be found using L2- present in the image. Adjacent pixels tend to have norm: N x , y  N r , s similar neighborhoods, but non-adjacent pixels can 2 also have similar neighborhoods’ as shown in the figure below. The self-similarity assumption is This norm then defines a weight to be used in our exploited to denoise an image. Pixels with similar weighted average neighborhoods can be used to determine the denoised exp(( N x, y  N r , s ))2 value of a pixel. w( N x , y , N x , y )  h2 where h is a parameter that needs to be fine-tuned. Similar neighborhoods give w  1 . If the two neighborhoods are very different, w  0 . Then each pixel in our new image g ( x, y) is a weighted average of the pixels in f ( x, y ) , weighted by how similar the neighborhoods are Fig. 3. Example for self-similarity is shown. Non  f  r, s  .w  N , N  adjacent pixels p, q and r have similar neighborhood in the image. x, y r ,s The non-local means replaces a pixel by the weighted g ( x, y )  r s average of other neighborhoods in the image. This method is the best possible denoising method for  w  N , N  r s x, y r ,s natural images. First we make a list of all similar When historical document image with noise is neighborhoods in the image. Neighborhood of each subjected to NL means algorithm, we get result as pixel is then linearized to form a row in a matrix and shown in figure. Smaller neighborhoods remove noise L2 norm is computed between each row. Let N x , y and better. Fig. 4. Effect of NL means algorithm is shown (a) original image, (b) image denoised using NL means with window size=3. 3.3 Total Variation statistical properties as the original image, along with 3.3.1 Tikhnov model sharp edges and low noise. The Total Variation approach is used when Finding the denoised text image can be characters in the text document are highly degraded. mathematically described as an optimization problem: TV Denoising reduces total variation of the image. It u*  maxu { p(u / f )} filters out noise while preserving the edges. The resulting filtered image should have the same 1531 | P a g e
  • 4. Kavitha Balakrishnan, Kavitha Sunil, Sreedhanya A.V, K.P Soman / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.1529-1534 where p(u / f ) is the posterior probability of a 1 1 min E (u )   u( x, y) d   2   f  u  d  2 2 hypothesis u (describing our best solution) might be u 2 true given by the observation f. The conditional probability p(u / f ) can be written This is Tikhonov model. The first term is the regularization term which is derived from prior and as the second term as the data fidelity term. The data p( f / u ) p(u ) p(u / f )  fidelity measures how far the current solution u is p( f ) from the observed image f. The parameter λ is a non- where p(u) is the prior probability of u and negative coefficient that governs the balance between the data fidelity and the regularization term. A large p( f u ) is the conditional describing how well the value for λ will produce an image with few details, observed data f can be explained by the solution u. removing small features, while a small value will The probability of the low noise image with respect to yield an image same as u. In Tikhnov model, since we the image f (that is defining our inverse problem): use image prior as a set of smooth images, we obtain blurred images as output. 2 2  ( f ( x , y )  u ( x , y )) u ( x , y )  p ( f , u ) p (u ) 1 p (u / f )    2 2 2 2 e 3.3.2 ROF Model p( f ) ( x , y )D 4  The ROF model is similar to the Tikhonov model but the regularization term has been changed to the TV norm instead of the quadratic norm. In its Maximizing this probability is equivalent to original formulation, the ROF model is defined as the minimizing the negative term in the exponent on the constrained optimization problem set of pixels D, that is the overall integral of ( f ( x, y)  u ( x, y))2 u ( x, y )  2 min u    u d  s.t  (u  f )2 d    2  2 2 2 2 where f is the observed image function which This leads us to the following variational formulation is assumed to be corrupted by Gaussian noise of variance  and u is the unknown denoised image. 2 of our image restoration problem Fig. 5. Result of denoising using Tikhnov model (a) The original image, (b) TV using λ=1. 1532 | P a g e
  • 5. Kavitha Balakrishnan, Kavitha Sunil, Sreedhanya A.V, K.P Soman / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.1529-1534 Figure 6. Result of denoising using ROF model (a) The original image, (b) TV using λ=3, (c) TV using λ=5. The original non-convex ROF model can be the threshold value, then it is classified as black. turned into a convex problem by replacing the Binarization techniques can be either global or local. equality constraint In global binarization we choose a single threshold for 4.1 Otsu Binarization Algorithm  (u  f )2 d    2 by the inequality constraint Otsu is a global thresholding binarization  algorithm .It assumes that the image contains two  (u  f )2 d    2 classes of pixels; one foreground (black) and one  background (white). Then it calculates the optimum which in turn can further be transformed to the threshold separating those two classes so that their unconstrained (or Lagrangian) model intra-class variance is minimal. This is equivalent to maximizing the between-class scatter. This establishes  1  an optimum threshold K. min   u d    (u  f ) d . 2 1, if I gray ( x, y)  K u   2   I ( x, y )   0, if I gray ( x, y )  K  where  is a Lagrange multiplier. 5. RESULTS 4. BINARIZATION The text image is first filtered by each of the Image binarization converts a gray-scale noise removal algorithms described in Section 3. Then document image into a black and white (0s&1s) each filter output along with the unfiltered original format. In binarization, we choose a threshold value to was then binarized by using Otsu binarization classify the pixels as black and white. If the pixel algorithm. Two methods are used to do the evaluation value is greater than the threshold value, then it is measures – MSE and PSNR. classified as white and if the pixel value is less than the whole document image, but these techniques are not suitable for degraded images. In local binarization technique, the local threshold can be calculated by using different information of the document images, such as the mean and standard variation of pixel values within a local window. (b) median, (c) wiener, (d) Tikhnov, (e) TV ROF Figure 7. Result of binarization using Otsu using λ=3, (f) TV ROF using λ=5 algorithm on denoised image using (a) mean filter, 1533 | P a g e
  • 6. Kavitha Balakrishnan, Kavitha Sunil, Sreedhanya A.V, K.P Soman / International Journal of Engineering Research and Applications (IJERA) ISSN: 2248-9622 www.ijera.com Vol. 2, Issue4, July-August 2012, pp.1529-1534 Mean Square Error (MSE) for two m  n images I and K is given by, M I is the maximum possible pixel value of the 1 m1 n1 MSE  [ I (i, j)  K (i, j)]2 mn i 0 j 0 image. These measures were calculated for every image-filter combination. The score obtained using where one of the images is considered a noisy preprocessing algorithms are shown in Tables 1. The approximation of the other . image binarized using Otsu algorithms are measured Peak Signal-To-Noise Ratio (PSNR) is and the scores got are shown in Table 2. generally used to analyze quality of image in dB (decibels). PSNR calculation of two images, one In most cases the best pre-processing filter original and an altered image, describes how far two was NL means. Next best results are shown by Total images are equal. Variation filter with λ=3. For the conventional filters  M2  mean, median and Wiener filter, Wiener gives best PSNR  10log10  I  results.  MSE   MI   20log10    MSE  Table 1. Performance measure of various preprocessing algorithm Mean Median Wiener Tikhonov ROF λ=5 ROF λ=3 NLmeans MSE 702.1072 713.0556 50.9573 338.8237 22.8120 13.4386 2.7225 PSNR 19.6668 19.5996 31.0587 22.8311 34.5492 36.8473 43.76 Table 2. Performance measure after Otsu binarization Median Wiener Tikhonov ROF λ=5 ROF λ=3 NLmeans MSE 4476.7 3557.5 325.4874 1923.7 169.1009 111.8863 84.2275 PSNR 11.6212 12.6194 23.0055 15.2895 25.8493 27.6430 28.9978 International Conference on Computer 6. CONCLUSION Vision and Pattern Recognition, 2005. This paper presents a system that enhances the [5] W. Evans. Image denoising with the non- readability of an old Sanskrit document, through local means effective preprocessing methods for binarization. The algorithm.Available:http://www.cs.wisc.edu . pre-processing techniques have been successfully [6] Chambolle, A. and Lions, “Image recovery tested on a degraded document. Experiments show via total variation minimization and related best results for the NL means algorithm, thereby problems”, Numer. Math., 76:167-188(1997) showing good binarization performance. So the [7] Chan, T., Golub, and Mulet, P. (1999), “A proposed methods can be used for preprocessing of nonlinear primal-dual method for total historical machine-printed documents. variation-based image restoration”, SIAM J. Sci. Comp., 20(6):1964-1977. 7. REFERENCES [8] Rudin, L., Osher, S., and Fatemi, “Nonlinear [1] Laurence Likforman-Sulem, Jérôme Darbon total variation based noise removal Elisa H. Barney Smith, “Enhancement of algorithms”. Physica D, 60:259-268. 1992. Historical Printed Document Images by Combining Total Variation Regularization and Non-Local Means Filtering”, Image and Vision Computing (2011). Vol. 29, Nr. 5, p. 351-363. [2] Elisa H. Barney Smith, Laurence Likforman- Sulem, Jérôme Darbon, “Effect of Pre- Processing on Binarization”, Conference: Document Recognition and Retrieval - DRR , pp. 1-10, 2010 [3] A. Buades, B. Coll, and J Morel. “On Image denoising Methods”, Technical Report 2004-15, CMLA, 2004. [4] A. Buades, B. Coll, and J Morel., “A non- local algorithm for image denoising”, IEEE 1534 | P a g e