O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Devanagari Character Recognition

The presentation will describe an algorithm through which one can recognize Devanagari Characters. Devanagari is the script in which Hindi is represented. This algorithm
could automatically segment character from the image of Devenagari text and then recognize them.
For extracting the individual characters from the image of Devanagari text, algorithm segmented the image several
times using the vertical and horizontal projection.

The algorithm starts with first segmenting the lines separately from the document by taking horizontal projection and then the line
into words by taking vertical projection of the line. Another step which is particular to the separation of
Devanagari characters was required and was done by first removing the header line by finding horizontal projection
of each word. The characters can then be extracted by vertical projection of the word without the header line.

Algorithm uses a Kohonen Neural Netowrk for the recognition task. After the separation of the characters from the
image, the image matrix was then downsampled to bring it down to a fixed size so as to make the recognition
size independent. The matrix can then be fed as input neurons to the Kohonen Neural Network and the winning neuron is
found which identifies the recognized the character. This information in Kohonen Neural Network was stored
earlier during the training phase of the neural network. For this, we first assigned random weights from input neurons
to output neurons and then for each training set, the winning neuron was calculated by finding the maximum
output produced by the neurons. The wights for this winning neuron were then adjusted so that it responds to this
pattern more strongly the next time.

  • Entre para ver os comentários

Devanagari Character Recognition

  2. 2. DEVANAGARI CHARACTER RECOGNITION <ul><li>- Dr. Anupam Agrawal </li></ul><ul><li> Pulkit Goyal (RIT 2007029) </li></ul><ul><li> Sapan Diwakar (RIT 2007043) </li></ul><ul><li>Project supervisor </li></ul><ul><li>Project Members </li></ul>
  3. 3. DEVANAGARI CHARACTER RECOGNITION <ul><li>Introduction </li></ul><ul><ul><li>The aim is to create a software that can recognize characters from image of Devanagari text. </li></ul></ul><ul><ul><li>It is a growing area of research in the field of Pattern Recognition. </li></ul></ul>
  4. 4. DEVANAGARI CHARACTER RECOGNITION <ul><li>The Problem </li></ul><ul><li>Most of the websites in Devanagari use images to represent text. There is a need to index such images based on the text in them so that they can be easily searched. </li></ul><ul><li>The aim of project is to develop software which can recognize Devanagari characters from scanned image of printed documents. </li></ul>Image Recognised characters
  5. 5. DEVANAGARI CHARACTER RECOGNITION <ul><li>The Solution </li></ul>
  6. 6. DEVANAGARI CHARACTER RECOGNITION <ul><li>Recognition methods can be classified broadly into :- </li></ul><ul><ul><li>Template Matching </li></ul></ul><ul><ul><li>Statistical Approach </li></ul></ul><ul><ul><li>Structural Approach </li></ul></ul><ul><ul><li>Artificial Neural Network </li></ul></ul><ul><li>Recognition Methods </li></ul>
  7. 7. DEVANAGARI CHARACTER RECOGNITION <ul><ul><li>Kohonen Neural Network :- </li></ul></ul><ul><ul><li>(Type of Artificial Neural Network) </li></ul></ul><ul><ul><li>Require less formal statistical training </li></ul></ul><ul><ul><li>Relatively Simple to construct </li></ul></ul><ul><ul><li>They can be trained very rapidly </li></ul></ul><ul><ul><li>These are very robust. </li></ul></ul><ul><li>Our choice </li></ul>
  8. 8. DEVANAGARI CHARACTER RECOGNITION <ul><ul><li>We consider that the input image is not tilted. </li></ul></ul><ul><ul><li>The Background color should be white and the text in black (or other shades of grey). </li></ul></ul><ul><ul><li>We have considered the pen width to be 1 pixel. </li></ul></ul><ul><ul><li>Matra: No matras will be recognized. </li></ul></ul><ul><ul><li>We have not considered characters which are not joint. E.g., ग , श etc. </li></ul></ul><ul><ul><li>Fused Characters: No fused characters in the image. </li></ul></ul><ul><li>Constraints </li></ul>
  9. 9. DEVANAGARI CHARACTER RECOGNITION <ul><li>Proceedings of the project… </li></ul><ul><ul><li>Segmentation: The breakup of image into lines, lines into words and words into characters. This is accomplished by taking horizontal and vertical projection[5]. </li></ul></ul><ul><ul><li>Down Sampling: Bringing the character to a fixed size. Also referred to as windowing[2]. We have taken it as 15 x 20. </li></ul></ul><ul><ul><li>Train the Network: Adjust weights so that the recognition can be carried out successfully. </li></ul></ul><ul><ul><li>Recognition: Recognizing the character presented. </li></ul></ul>
  10. 10. DEVANAGARI CHARACTER RECOGNITION <ul><li>Line Segmentation </li></ul><ul><ul><li>A line is separated from the previous and following lines by white space. The line segmentation is based on horizontal histograms of the document. Those rows, for which HP[j] is zero; j = 1, 2, …, H serve as delimiters between successive text lines. </li></ul></ul>
  11. 11. DEVANAGARI CHARACTER RECOGNITION <ul><li>Word Segmentation </li></ul><ul><ul><li>The segmentation of the text line into words is based on the vertical projection of the text line. A vertical histogram of the text line is made and white space are used as word delimiter. </li></ul></ul>
  12. 12. DEVANAGARI CHARACTER RECOGNITION <ul><li>Locating the Header Line </li></ul><ul><li>After extracting the subimages corresponding to words for a text line, we locate the position of the header line of each word. Coordinates of the top-left corner are (0,0) and bottom-right corner are (W, H) where H is the height and W is the width of the word image box. We compute the horizontal projection of the word image box. The row containing maximum number of black pixels is considered to be the header line. Let this position be denoted by hLinePos. </li></ul>
  13. 13. DEVANAGARI CHARACTER RECOGNITION <ul><li>Character Segmentation </li></ul><ul><li>Separate character/symbol boxes of the image below the header line: To do this, we make vertical projection of the image starting from the hLinePos to the bottom row of the word image box. The columns that have no black pixels are treated as boundaries for extracting image boxes corresponding to characters. </li></ul>
  14. 14. DEVANAGARI CHARACTER RECOGNITION <ul><li>Segmented Output </li></ul>
  15. 15. DEVANAGARI CHARACTER RECOGNITION <ul><li>Down Sampling </li></ul><ul><li>To make the recognition of characters size independent, down sampling is required. Steps in Down sampling: </li></ul><ul><li>Define the down sampling ratio. </li></ul><ul><ul><li>RatioX = (Width of character image)/(Width of down sampled image) </li></ul></ul><ul><ul><li>RatioY = (Height of character image)/(Height of down sampled image) </li></ul></ul><ul><li>Check for presence of black pixel in a box of dimension RatioX X RatioY in the character image. </li></ul><ul><li>If a black pixel is present, then the corresponding pixel in down sampled image is black. Otherwise white. </li></ul>
  16. 16. DEVANAGARI CHARACTER RECOGNITION <ul><li>Recognition using Kohonen Neural Network </li></ul><ul><li>The recognition process using a Kohonen Neural Network involves following steps[4]: </li></ul><ul><li>Defining the structure of Kohonen Network </li></ul><ul><ul><li>We have defined the matrix of down sampled character as the input to the network. The output neurons are the characters to be recognized. </li></ul></ul><ul><li>. Calculating each neuron’s output </li></ul><ul><ul><li>To calculate the output, the “dot product” of the input neurons and their connection weights is calculated. </li></ul></ul>
  17. 17. DEVANAGARI CHARACTER RECOGNITION <ul><li>Recognition using Kohonen Neural Network </li></ul><ul><li>Choosing a Winner </li></ul><ul><ul><li>For choosing a winner we calculate the final output value of each neuron and then to choose the winning neuron we choose the output that has the largest output value. </li></ul></ul><ul><ul><li>This represents the neuron that is most suitable for the given input. </li></ul></ul><ul><ul><li>The successful recognition depends on the training of the network. </li></ul></ul>
  18. 18. DEVANAGARI CHARACTER RECOGNITION <ul><li>Training Kohonen Neural Network </li></ul><ul><li>The successful recognition of characters depends on the weights between the input neurons and the output neurons. These weights are decided using the training process. </li></ul><ul><li>Overall, the process for training a Kohonen Neural network involves stepping through several epochs until error of a Kohonen neural network is below an acceptable level. </li></ul><ul><li>The training process for Kohonen network is competitive. </li></ul><ul><li>For each training set, a neuron will “win”. </li></ul><ul><li>This winning neuron will have its weight adjusted so that it will react more strongly to the input next time. </li></ul><ul><li>As different neurons win for different patterns, their ability to recognize that particular pattern will be increased. </li></ul>
  19. 19. DEVANAGARI CHARACTER RECOGNITION <ul><li>Training Kohonen Neural Network(contd.) </li></ul><ul><li>Forcing a winner[4]: </li></ul><ul><li>To handle output neurons that are failing to even learn, we must go through the entire training set to find the training set pattern that causes the least activation. </li></ul><ul><li>This training set is considered to be least well represented by the current winning neurons. </li></ul><ul><li>The next step is to choose an output neuron that will be modified to better classify the training set identified in the previous step. This is done by going through every output neuron that did not ever win and seeing which one has the highest activation for the training pattern identified in previous step.. </li></ul><ul><li>Finally, we will modify the weight of this neuron so that it will better classify this pattern next time. </li></ul>
  20. 20. DEVANAGARI CHARACTER RECOGNITION <ul><li>GUI of the project </li></ul>
  21. 21. DEVANAGARI CHARACTER RECOGNITION <ul><li>Adding characters using the drawing area </li></ul>
  22. 22. DEVANAGARI CHARACTER RECOGNITION <ul><li>Recognition of handwritten character </li></ul>
  23. 23. DEVANAGARI CHARACTER RECOGNITION <ul><li>Training using image </li></ul>
  24. 24. DEVANAGARI CHARACTER RECOGNITION <ul><li>Recognizing image </li></ul>
  25. 25. DEVANAGARI CHARACTER RECOGNITION <ul><li>Applications of Devanagari Character Recognition </li></ul><ul><li>Digitizing Books: It can be used to digitize books written in Devanagari. Digitizing of books can provide a large number of advantages such as searching through the book, cost reduction, and easy storage. </li></ul><ul><li>Indexing of Images in Search Engines: Most of the sites often use images to represent Devanagari text. There is a need of search engines which can search for keywords provided in Devanagari script. For the efficient functioning of these search engines it becomes necessary for them to include some software to recognize Devanagari text from websites. </li></ul><ul><li>Indexing of Videos: This can be extended to recognition of Devanagari text from videos for proper indexing of such videos. </li></ul><ul><li>Recognizing addresses on envelopes in post offices: use of such software could be in recognizing Devanagari addresses on envelopes in post offices, thus automating the overall process. </li></ul><ul><li>Use for those who don’t know Hindi : This software, if added with the capability of transliteration/translation can prove quite useful for many persons who don’t understand Hindi but want to read some book written in Hindi. It would allow the books to be easily translated into other languages. </li></ul>
  26. 26. IMAGE MOSAIC GENERATOR <ul><li>Refrences </li></ul><ul><li> [1] R.M.K. Sinha and Veena Bansal, “On Devanagari Document Processing”, IEEE International Conference on Systems, Man and Cybernetics, Vancouver, Canada </li></ul><ul><li>[email_address] ,   [email_address] </li></ul><ul><li>[2] K.Y. Rajput and Sangeeta Mishra, “Recognition and Editing of Devnagari Handwriting Using Neural Network”, IEEE Colloquium and International Conference, Mumbai, Vol. 1, pp. 66-70 </li></ul><ul><li>[3] Vishwanatha Kaushik and C.V. Jawahar, “Detection of Devanagiri Text in Digital Images using Connected Component Analysis”, National Conference on Document Analysis and Recognition (NCDAR), pp. 41—48. </li></ul><ul><li>[4] Jeff Heaton, Introduction to Neural Networks </li></ul><ul><li>[5] Su Liang, M. Shridhar and M. Ahmad, Segmentation of Touching Characters in Printed Document Recognition, Pattern Recognition, 27, pp. 825-840, 1994. </li></ul>