SlideShare a Scribd company logo
1 of 56
An OCR System for
recognition of Urdu text in
Nastaliq Font
By
S. Hassan Amin
Supervised By
Dr. S. Afaq Hussain
Faculty of Computer Science & Engineering
Ghulam Ishaq Khan Institute of Engineering
Sciences & Technology, Topi-Swabi, 2004
Layout
♦ Introduction
♦ Research Scope
♦ Objectives
♦ Optical Character Recognition
Steps in OCR
♦ Urdu Writing Characteristics
♦ Cursive Script Recognition Schemes
♦ Methodology
Multi-Tier Holistic Approach
Multi-Stage Classification Approach
♦ Results and Discussion
♦ Conclusion
♦ Future Directions
♦ References
Introduction
♦ Urdu is the national language of Pakistan, and is
understood by well over 300 million people
around the world.
♦ There is a need to convert historical database of
Urdu literature into electronic form , so that Urdu
can prosper in the age of computers.
♦ Urdu text recognition endeavors to convert
scanned Urdu documents automatically into
computerized text files.
Research Scope
♦ Paper documents have been the most important
means for exchanging information for ages, but this
is changing , as we are rapidly moving towards
paperless society.
♦ It has been estimated by IBM that about $250
billion are annually spent worldwide (largely in
operator salaries, etc.) in keying-in information
from paper documents, and this is the cost of
manually capturing information from only 5% of
the available documents [1].
♦ Urdu Text Recognition
♦ Urdu Text Transliteration
♦ Machine Translation
Objectives
♦ The main objective of this research is to make an
OCR system for Urdu language that is effective for
Nastaliq Script irrespective of font size and orientation. To
achieve this objective, there are a number of sub goals
which are:-
 To investigate the problem of Urdu OCR in depth, and to
propose new and better ways to solve this problem.
 To investigate the use of appropriate set of features for
Urdu OCR.
 To establish a database of Urdu ligatures for investigating
the problem of Urdu OCR.
 To investigate classification methods that can be useful for
the problem of Urdu OCR.
Optical Character
Recognition(OCR)
♦ Character Recognition or Optical Character
Recognition (OCR) is the process of converting
scanned images of machine printed or handwritten
text (numerals, letters and symbols), into a
computer processable format (such as ASCII and
Unicode) [2].
♦ Offline character recognition is performed after
the writing or printing has been completed.
♦ In Online character recognition, computer
recognizes the character as they are drawn(timing
information).
Steps in OCR
1. Image Acquistion
2. Preprocessing
3. Segmentation
4. Feature Extraction
5. Classification
6. Post Processing
1. Image Acquistion
♦ This conversion process is accomplished by
digitizer which can be either a
scanner(Offline recognition), Camera, tablet
digitizer(Online recognition).
2. Preprocessing
♦ The preprocessing involves noise reduction,
skew detection,slant normalization,
document decomposition etc.
♦ For slant estimation we have methods such
as Projection method , chain code
method[4].
♦ For estimating skew angle of page , we
have methods such as Orientation
dependent histogram[3].
3. Segmentation
♦ Segmentation is the process of dividing an
image into regions , each susceptible to
containing a single object or a group of
objects of the same type. For instance , an
object can be a character on a text page or a
line segment in an engineering drawing.
♦ In OCR , the commonly used segmentation
algorithms are XY tree decomposition , run-
length smearing and Hough transform.
4. Feature Extraction
♦ Selection of appropriate feature extraction
method is probably the single most
important factor in achieving high
recognition performance [5].
♦ A new comer to the field is faced with the
challenge of selecting appropriate features
for his/her application.
Feature Extraction(Contd)
♦ Some useful feature extraction methods in the
field of OCR are :-
1. Geometric Features
2. Structural Features
3. Moment based Features
4. Template Matching
5. Unitary Image Transforms
6. Zoning
7. Contour Profiles
8. Fourier Descriptors
5. Classification
♦ Classification is the process of identifying
each character and assigning to it the
correct character class. Two major
approaches for classification methods are:
1. Decision theoretic method
2. Structural Methods
1. Decision theoretic method
♦ These methods are used when the
description of the character can be
represented numerically in a feature vector.
♦ The principal approaches to decision-
theoretic recognition are minimum distance
classifiers , statistical classifiers and neural
networks.
2. Structural Methods
♦ Within the area of the structural
recognition, syntactic methods are among
the most common approaches.
♦ In Syntactic pattern recognition, measures
of similarity based on the relationship
between structural components are
formulated using grammatical concepts.
5. Post Processing
♦ In Post Processing , we have
1. Grouping
2. Error Detection and Correction
1. Grouping
♦ The result of plain symbol recognition is a set of
individual symbols.
♦ These symbols in themselves usually do not
contain enough information.
♦ We would like to associate the individual symbols
that belong to the same string with each other
making up word and numbers.
♦ The process of performing this association of
symbols into strings is commonly referred to as
grouping.
2. Error Detection and Correction
♦ Along with the grouping of the characters,
another issue to take care is the context in
which each character appears.
♦ Because even the best of the OCR systems
cannot identify each character with 100%
accuracy. These errors may be detected or
even corrected by use of context.
Urdu Writing Characteristics
♦ Urdu is a cursive language , which has
evolved from Arabic , Persian and Turkish
languages.
♦ Urdu language has 36,37,42,51 and 53
characters according to different sources[8].
♦ The UZT 1.01 standard has 42 characters.
Urdu Writing
Characteristics(Contd)
Figure : Urdu Character Set UZT 1.01
Urdu Writing
Characteristics(Contd)
Characteristics Urdu Arabic Latin Hebrew Hindi
H Justification RL RL LR RL LR
V-Justification Center Base No No Top
Cursive Yes Yes No No Yes
Diacritics Yes Yes No No Yes
# Vowels 2 2 5 11 -
# Letters 37 28 26 22 40
Letter Shapes 1-28 1-4 2 1 1
Complementary
Characters
5 3- - - -
Cursive Script Recognition
Schemes
♦ There are two strategies that have been
applied to cursive script recognition. As
mentioned by Amin and Khorsheed [6,7],
they can be categorized as follows:
1. Holistic Strategies in which the
recognition is globally performed on the
whole representation of words and where
there is no attempt to identify characters
individually.
Cursive Script Recognition
Schemes(Contd)
1. Analytical strategies in which words are
not considered as a whole, but as
sequences of small size units and
recognition is not directly performed at
word level but at an intermediate level
dealing with these units, which can be
graphemes, segments, Pseudo-letters etc.
Research Methodology
♦ Two approaches to recognize Urdu ligatures
printed in Nastaliq Script are presented. Both
these approaches are holistic in nature.These
approaches are tested for identification of a set of
most frequent ligatures printed in Noori Nastaliq
Script. The suggested approaches to recognize
Urdu text are :-
1. Multi-tier Holistic Approach
2. Multi-Stage Classification Approach.
Multi-Tier Holistic Approach to
Urdu Nastaliq Recognition
♦ A multi-tier Holistic Approach using feed
forward back propagation neural network
was implemented[12].
(Contd)
Figure :Multi-Tier Holistic Approach to Urdu Nastaliq Recognition
1. Segmentation
♦ Connected Component Labeling is applied to the
image of Urdu text.
♦ This technique assigns to each connected
component of binary image a distinct label.
♦ The labels are usually natural numbers from 1 to
the number of connected components in the input
image.
♦ The algorithm scans the image from left-to-right
and top-to-bottom.
Segmentation(Contd)
♦ On the first line containing black pixels, a unique
label is assigned to each contiguous run of black
pixels.
♦ For each black pixel, the pixels in its eight
neighborhood are examined, if any of these
pixels has been labeled the same label is assigned
to the current pixel, otherwise a new label is
assigned to it. The procedure continues to the
bottom of the image.
Feature Extraction I
♦ In this stage, we extract
some features that will
help us in the recognition
of special ligatures, see
figure. These features are
Solidity, Number of
Holes, Axis Ratio,
Eccentricity, Moments,
Normalized segment
length, curvature, ratio of
bounding box width and
height.
1
2
3
4
5
6
7
8
Special Ligature Identification
♦ A Feed forward BPN network is trained on
the feature vectors obtained in the Feature
Extraction I stage. During testing , this
network is used to identify input ligatures as
one of special ligature . If no valid output is
returned , then the ligature is identified as
base ligature.
Feature Extraction II
♦ In this stage, special ligatures are associated with
the base ligatures. Special ligature are associated
with the base ligature whose Centroid-to-Centroid
distance is minimum.
♦ A number of lines are grown from the center of
each special ligature, when one of these lines
touches a base ligature, then the special ligature is
associated with that base ligature.
♦ In this stage, due to association of special ligatures
with the base ligatures twenty new features are
added to the feature vector of the base ligature.
Classification and Recognition
♦ In this stage, the final feature vector
consisting of 34 features is fed into Feed
Forward Back propagation neural network.
The network architecture consists of 34
inputs, 65 hidden neurons and 45 output
neurons.
Multi-Stage Classification Approach
to Urdu Text Recognition
♦ The motivation behind this approach is the
belief , that classification performance
could be improved by combining multiple
classifiers[9,10,11].
(Contd)
♦ As shown in the figure , the first three stages are
similar to the multi-tier approach.
♦ Intermediate Classification
In the training phase , we train a competitive network
on feature vectors of base ligatures , to divide input
data into desired number of clusters.
In the training phase , a LVQ/BPN network is trained
on the output of the competitive network , to classify
the input pattern to a particular class or cluster.
In the testing phase, the input feature vector is
presented to the to trained LVQ/BPN network , it gives
us the desired class/cluster.
(Contd)
♦ Ligature Identification
A BPN network is trained for all the ligatures
belonging to a particular class/cluster in the
classification and recognition stage of the
system.
Results and Discussion
Frequency Analysis
♦ To establish a database of Urdu images for training and testing, it was
decided that most frequent Urdu ligatures would be identified from the
World Wide Web.
♦ This was a challenge, since most Urdu sites are based on images of Urdu
text, so there was no way of counting Urdu ligatures without first
identifying them.
♦ The BBC Urdu news site http://www.bbc.co.uk/urdu/ was selected for
frequency analysis because it is font-based site of Urdu.
♦ The hex codes of BBC Urdu font were studied.
♦ A study of Urdu font was also done. There are three types of Urdu
characters, given as follows:
1. Characters which do not connect on both sides e.g alif
2. Character which connect on both sides e.g bay, tay
3. Characters which do not connect from the left e.g wow , ray
♦ There are two types of breaks in Urdu text file , one is hard break identified
by 0x0020 and soft break identified by nature of character. On the basis of
these breaks and punctuation marks we decide about separation between
ligatures , and hence keep count of ligatures.
Frequency Analysis(Contd)
S.No. Lig Count S.No. Lig Count
1 ‫ا‬ 2904 11 ‫کا‬ 408
2 ‫ر‬ 1600 12 ‫ہے‬ 377
3 ‫و‬ 1240 13 ‫کر‬ 338
4 ‫کے‬ 745 14 ‫کو‬ 309
5 ‫د‬ 718 15 ‫ہ‬ 295
6 ‫ں‬ 480 16 ‫سے‬ 290
7 ‫کی‬ 469 17 ‫ی‬ 269
8 ‫نے‬ 456 18 ‫ہو‬ 269
9 ‫میں‬ 445 19 ‫س‬ 260
10 ‫ن‬ 439 20 ‫کہ‬ 256
Table : List of 20 most frequent ligatures
1. Segmentation
Feature Vectors
S.No. Name Moment 1 Moment 2 Moment 3 Moment 4 Moment 5 Moment 6 Moment 7
1 1.bmp 0.52283 0.24376 0.00496 0.004624 2.21E-05 0.002274 -5.63E-07
2 10.bmp 0.16563 9.28E-05 0.000277 5.48E-06 2.01E-10 -1.78E-08 -7.16E-11
3 100.bmp 0.16949 0.000171 9.12E-05 3.05E-06 4.94E-11 -2.88E-08 1.26E-11
4 101.bmp 0.64308 0.37256 0.008243 0.005689 3.88E-05 0.003196 3.37E-06
5 102.bmp 0.16488 0.0007 0.000168 6.87E-06 1.89E-10 1.61E-07 -1.37E-10
6 103.bmp 0.40757 0.03951 0.039031 0.031366 0.001002 0.006081 -0.00045
7 104.bmp 0.29624 0.048083 0.000436 8.78E-05 1.66E-08 1.91E-05 4.48E-09
8 105.bmp 0.16481 0.000165 4.32E-05 1.16E-06 6.75E-12 -7.19E-09 4.63E-12
9 106.bmp 0.26849 0.033972 0 0 0 0 0
S.No Name Solidity Minor Axis LengthMajor Axis LengthEccentricityOrientationAxis Ratio
1 1.bmp 0.82051 4.0294 22.8431 0.98432 86.8416 0.1764
2 10.bmp 0.80645 5.7038 6.0321 0.32538 56.7493 0.94558
3 100.bmp 0.75 5.6006 6.0319 0.37135 -16.0531 0.92849
4 101.bmp 0.61702 2.9867 17.0919 0.98461 41.8065 0.17474
5 102.bmp 0.83871 5.4889 6.4133 0.51721 17.1527 0.85586
6 103.bmp 0.44898 16.0802 27.3559 0.809 109.504 0.58781
7 104.bmp 0.66667 5.3315 13.5202 0.91897 1.5099 0.39433
8 105.bmp 0.81081 6.1484 6.6314 0.37467 16.9823 0.92716
9 106.bmp 0.7 6.2487 14.2896 0.89932 -3.3781 0.43729
Figure : Moment based features for some ligatures
Figure : Geometric features for some ligatures
Special Ligature Identification
Figure : Importance of Special ligature in identifying ligatures
Network BPN Configuration 52-26-8
Goal 0.01 Mc 0.4 Lr 0.1
Figure : Network configuration used to identify special ligatures
Special Ligature
Identification(Contd)
Figure : Training to identify special ligatures
Intermediate Classification
Figure : Analysis for identification of clusters
Intermediate
Classification(Contd)
Features Used No. of Clusters No. of Images
Moment 1 Solidity Eccentricity
Axis
Ratio 4 216
Neural Net
Used BPN Configuration 64-32-4
Percentage Distribution of Clusters
Cluster
1 Cluster 2 Cluster 3 Cluster 4
16.67 29.63 27.78 25.93
Figure : Network Configuration
Intermediate
Classification(Contd)
Figure : Training to identify clusters
Feature Extraction II
Ligature Identification
Cluster 4 Configuration 80-40-8 Lr 0.1 mc 0.3
Ligature Identification(Contd)
Cluster
2 Configuration 80-40-8 Lr 0.1 mc 0.3
goal 0.019
Conclusion
♦ Two different approaches for recognition of Cursive Urdu
text written in Nastaliq Script have been presented.
♦ A set of 1000 most frequent ligature has been identified.
♦ Our approach minimizes the errors due to segmentation by
using segmentation free approach.
♦ By using different types of features, we have improved the
number of ligatures that can be identified.
♦ Classification performance has been improved by
implementing multi-stage classification approach; this
approach is especially useful for large number of
ligatures[9,10,11].
Future Directions
♦ A number of possible directions are under consideration
for enhancement of the system for practical use namely,
 Study of effectiveness of features used , and to find new features
that can be effective for Urdu OCR.
 Enhancement of the number of ligatures used for training.
 Addition of Special characters, Numerals and Aerab for
recognition as special ligatures.
 Recognition of intonation marks in the document.
 Addition of multi lingual support in the system.
References
1. http://www.almaden.ibm.com/cs/dare.html
2. Sargur N. Sridhar, Stephen W. Lam, “Character Recognition” .
3. H. Bunke and Wang, “Handbook of character recognition and
document image analysis”, World Scientific.
4. M. Shridhar, F. Kimura,”Segmentation Based Cursive Handwriting
Recognition”, Handbook of Character Recognition.
5. Oivind De Trier, Anil K. Jain and Torfinn, “Feature Extraction
methods for Character Recognition-A Survey”, Pattern
Recognition,Vol 29, No. 4,pp. 641-662, 1996
References(Contd)
1. Adnan Amin, “Arabic Character Recognition”, Handbook of
Character Recognition.
2. Mohammad S. Khorsheed, “Structural Features of Cursive Arabic
Script”
3. Muhammad Afzal, Sarmad Hussain,”Urdu Computing
Standards:Development of Urdu Zabta Takhti-WG2 N2413-2-SC2
N3589-2 (UZT) 1.01”
4. L. Xu, A. Krzyzak, and C. Y. Suen ,” Methods of Combining
Multiple Classifiers and their Applications to Handwriting
Recognition,” IEEE Trans. Systems, Man and Cybernetics, vol. 27 ,
no. 4, pp.418-435,1992.
5. T.K. Ho, J.J. Hull and S. N. Srihari, ” Decision Combination in
Multiple Classifier Systems,” IEEE Trans. Pattern Analysis and
Machine Intelligence, vol. 16, no. 1, pp. 66-75,1994.
References(Contd)
1. K. Kittler, M. Hatef, R P. W. Dutin and K.
Matas, “On Combining Classifiers,” IEEE
Trans. Pattern Analysis and Machnie
Intelligence, vol. 20, no. 3 pp. 226-239, 1998.
2. Syed Afaq Husain, S. Hassan Amin,” Multi-Tier
Holistic Approach to Urdu Nastaliq
Recognition,” IEEE INMIC Dec. 2002, Karachi.
Questions ?
Thank You

More Related Content

What's hot

Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...iosrjce
 
Handwriting Recognition
Handwriting RecognitionHandwriting Recognition
Handwriting RecognitionBindu Karki
 
Optical character recognition IEEE Paper Study
Optical character recognition IEEE Paper StudyOptical character recognition IEEE Paper Study
Optical character recognition IEEE Paper StudyEr. Ashish Pandey
 
Project report of OCR Recognition
Project report of OCR RecognitionProject report of OCR Recognition
Project report of OCR RecognitionBharat Kalia
 
Character recognition project
Character recognition projectCharacter recognition project
Character recognition projectMonsif sakienah
 
Optical Character Recognition (OCR) System
Optical Character Recognition (OCR) SystemOptical Character Recognition (OCR) System
Optical Character Recognition (OCR) Systemiosrjce
 
Handwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHandwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHarshana Madusanka Jayamaha
 
Big Data & Text Mining
Big Data & Text MiningBig Data & Text Mining
Big Data & Text MiningMichel Bruley
 
Computer vision basics
Computer vision basicsComputer vision basics
Computer vision basicsShilpa Sharma
 
Multi media Data mining
Multi media Data miningMulti media Data mining
Multi media Data mininghome
 
Optical Character Recognition (OCR)
Optical Character Recognition (OCR)Optical Character Recognition (OCR)
Optical Character Recognition (OCR)Vidyut Singhania
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Chiranjeevi Adi
 
FAKE CURRENCY DETECTION PDF NEW PPT.pptx
FAKE CURRENCY DETECTION PDF NEW PPT.pptxFAKE CURRENCY DETECTION PDF NEW PPT.pptx
FAKE CURRENCY DETECTION PDF NEW PPT.pptxBasavaPrabhu14
 
Final Report on Optical Character Recognition
Final Report on Optical Character Recognition Final Report on Optical Character Recognition
Final Report on Optical Character Recognition Vidyut Singhania
 
XML Schema
XML SchemaXML Schema
XML Schemayht4ever
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character RecognitionDurjoy Saha
 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition systemVijay Apurva
 

What's hot (20)

Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
 
Handwriting Recognition
Handwriting RecognitionHandwriting Recognition
Handwriting Recognition
 
Optical character recognition IEEE Paper Study
Optical character recognition IEEE Paper StudyOptical character recognition IEEE Paper Study
Optical character recognition IEEE Paper Study
 
Project report of OCR Recognition
Project report of OCR RecognitionProject report of OCR Recognition
Project report of OCR Recognition
 
Character recognition project
Character recognition projectCharacter recognition project
Character recognition project
 
Optical Character Recognition (OCR) System
Optical Character Recognition (OCR) SystemOptical Character Recognition (OCR) System
Optical Character Recognition (OCR) System
 
Handwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHandwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural network
 
Big Data & Text Mining
Big Data & Text MiningBig Data & Text Mining
Big Data & Text Mining
 
Computer vision basics
Computer vision basicsComputer vision basics
Computer vision basics
 
Multi media Data mining
Multi media Data miningMulti media Data mining
Multi media Data mining
 
Optical Character Recognition (OCR)
Optical Character Recognition (OCR)Optical Character Recognition (OCR)
Optical Character Recognition (OCR)
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
 
FAKE CURRENCY DETECTION PDF NEW PPT.pptx
FAKE CURRENCY DETECTION PDF NEW PPT.pptxFAKE CURRENCY DETECTION PDF NEW PPT.pptx
FAKE CURRENCY DETECTION PDF NEW PPT.pptx
 
Ocr abstract
Ocr abstractOcr abstract
Ocr abstract
 
Final Report on Optical Character Recognition
Final Report on Optical Character Recognition Final Report on Optical Character Recognition
Final Report on Optical Character Recognition
 
Handwritten Character Recognition
Handwritten Character RecognitionHandwritten Character Recognition
Handwritten Character Recognition
 
XML Schema
XML SchemaXML Schema
XML Schema
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character Recognition
 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition system
 
Text Detection and Recognition
Text Detection and RecognitionText Detection and Recognition
Text Detection and Recognition
 

Viewers also liked

Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...CSCJournals
 
Search space reduction for holistic ligature recognition in Urdu Nastaliq scr...
Search space reduction for holistic ligature recognition in Urdu Nastaliq scr...Search space reduction for holistic ligature recognition in Urdu Nastaliq scr...
Search space reduction for holistic ligature recognition in Urdu Nastaliq scr...Akram El-Korashy
 
Building a Naive OCR System
Building a Naive OCR SystemBuilding a Naive OCR System
Building a Naive OCR SystemKaur Alasoo
 
Estimating the Impact of OCR Quality on Research Tasks in the Digital Humanities
Estimating the Impact of OCR Quality on Research Tasks in the Digital HumanitiesEstimating the Impact of OCR Quality on Research Tasks in the Digital Humanities
Estimating the Impact of OCR Quality on Research Tasks in the Digital HumanitiesMyriam Traub
 
Quran Majeed with Urdu Tafseer PDF
Quran Majeed with Urdu Tafseer PDFQuran Majeed with Urdu Tafseer PDF
Quran Majeed with Urdu Tafseer PDFFahad M. Siddique
 
Machine learning
Machine learningMachine learning
Machine learningAmit Gupta
 
final year project_leaf recognition
final year project_leaf recognitionfinal year project_leaf recognition
final year project_leaf recognitionNupur Aggarwal
 
Matlab Image Enhancement Techniques
Matlab Image Enhancement TechniquesMatlab Image Enhancement Techniques
Matlab Image Enhancement Techniquesmatlab Content
 
Matlab and Image Processing Workshop-SKERG
Matlab and Image Processing Workshop-SKERG Matlab and Image Processing Workshop-SKERG
Matlab and Image Processing Workshop-SKERG Sulaf Almagooshi
 
ENHANCED SIGNATURE VERIFICATION AND RECOGNITION USING MATLAB
ENHANCED SIGNATURE VERIFICATION AND RECOGNITION USING MATLABENHANCED SIGNATURE VERIFICATION AND RECOGNITION USING MATLAB
ENHANCED SIGNATURE VERIFICATION AND RECOGNITION USING MATLABAM Publications
 
Text to speech conversation in gujarati
Text to speech conversation in gujaratiText to speech conversation in gujarati
Text to speech conversation in gujaratiAshvin Nakum
 
Artificial intelligence in medical image processing
Artificial intelligence in medical image processingArtificial intelligence in medical image processing
Artificial intelligence in medical image processingFarzad Jahedi
 
Azim akhtar decline of urdu &impact on education in up
Azim akhtar decline of urdu &impact on education in upAzim akhtar decline of urdu &impact on education in up
Azim akhtar decline of urdu &impact on education in upsatyendraurinfo
 
Automated attendance system based on facial recognition
Automated attendance system based on facial recognitionAutomated attendance system based on facial recognition
Automated attendance system based on facial recognitionDhanush Kasargod
 
Readymade M Tech Thesis
Readymade M Tech ThesisReadymade M Tech Thesis
Readymade M Tech Thesise2-matrix
 
Anhoring Script For Annual Function
Anhoring Script For Annual FunctionAnhoring Script For Annual Function
Anhoring Script For Annual FunctionAnushkaSahu
 

Viewers also liked (20)

Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
Optical Character Recognition System for Urdu (Naskh Font)Using Pattern Match...
 
Search space reduction for holistic ligature recognition in Urdu Nastaliq scr...
Search space reduction for holistic ligature recognition in Urdu Nastaliq scr...Search space reduction for holistic ligature recognition in Urdu Nastaliq scr...
Search space reduction for holistic ligature recognition in Urdu Nastaliq scr...
 
Building a Naive OCR System
Building a Naive OCR SystemBuilding a Naive OCR System
Building a Naive OCR System
 
Estimating the Impact of OCR Quality on Research Tasks in the Digital Humanities
Estimating the Impact of OCR Quality on Research Tasks in the Digital HumanitiesEstimating the Impact of OCR Quality on Research Tasks in the Digital Humanities
Estimating the Impact of OCR Quality on Research Tasks in the Digital Humanities
 
Quran Majeed with Urdu Tafseer PDF
Quran Majeed with Urdu Tafseer PDFQuran Majeed with Urdu Tafseer PDF
Quran Majeed with Urdu Tafseer PDF
 
Machine learning
Machine learningMachine learning
Machine learning
 
Image processing
Image processingImage processing
Image processing
 
final year project_leaf recognition
final year project_leaf recognitionfinal year project_leaf recognition
final year project_leaf recognition
 
Matlab Image Enhancement Techniques
Matlab Image Enhancement TechniquesMatlab Image Enhancement Techniques
Matlab Image Enhancement Techniques
 
Matlab and Image Processing Workshop-SKERG
Matlab and Image Processing Workshop-SKERG Matlab and Image Processing Workshop-SKERG
Matlab and Image Processing Workshop-SKERG
 
ENHANCED SIGNATURE VERIFICATION AND RECOGNITION USING MATLAB
ENHANCED SIGNATURE VERIFICATION AND RECOGNITION USING MATLABENHANCED SIGNATURE VERIFICATION AND RECOGNITION USING MATLAB
ENHANCED SIGNATURE VERIFICATION AND RECOGNITION USING MATLAB
 
Text to speech conversation in gujarati
Text to speech conversation in gujaratiText to speech conversation in gujarati
Text to speech conversation in gujarati
 
Artificial intelligence in medical image processing
Artificial intelligence in medical image processingArtificial intelligence in medical image processing
Artificial intelligence in medical image processing
 
Azim akhtar decline of urdu &impact on education in up
Azim akhtar decline of urdu &impact on education in upAzim akhtar decline of urdu &impact on education in up
Azim akhtar decline of urdu &impact on education in up
 
Lecture4 - Machine Learning
Lecture4 - Machine LearningLecture4 - Machine Learning
Lecture4 - Machine Learning
 
Automated attendance system based on facial recognition
Automated attendance system based on facial recognitionAutomated attendance system based on facial recognition
Automated attendance system based on facial recognition
 
OCR
OCROCR
OCR
 
Readymade M Tech Thesis
Readymade M Tech ThesisReadymade M Tech Thesis
Readymade M Tech Thesis
 
Getting Things Done
Getting Things DoneGetting Things Done
Getting Things Done
 
Anhoring Script For Annual Function
Anhoring Script For Annual FunctionAnhoring Script For Annual Function
Anhoring Script For Annual Function
 

Similar to An OCR System for recognition of Urdu text in Nastaliq Font

Multitier holistic Approach for urdu Nastaliq Recognition
Multitier holistic Approach for urdu Nastaliq RecognitionMultitier holistic Approach for urdu Nastaliq Recognition
Multitier holistic Approach for urdu Nastaliq RecognitionDr. Syed Hassan Amin
 
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...acijjournal
 
Two Methods for Recognition of Hand Written Farsi Characters
Two Methods for Recognition of Hand Written Farsi CharactersTwo Methods for Recognition of Hand Written Farsi Characters
Two Methods for Recognition of Hand Written Farsi CharactersCSCJournals
 
Offline Character Recognition Using Monte Carlo Method and Neural Network
Offline Character Recognition Using Monte Carlo Method and Neural NetworkOffline Character Recognition Using Monte Carlo Method and Neural Network
Offline Character Recognition Using Monte Carlo Method and Neural Networkijaia
 
A Review on Geometrical Analysis in Character Recognition
A Review on Geometrical Analysis in Character RecognitionA Review on Geometrical Analysis in Character Recognition
A Review on Geometrical Analysis in Character Recognitioniosrjce
 
An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...ijnlc
 
Text Detection and Recognition: A Review
Text Detection and Recognition: A ReviewText Detection and Recognition: A Review
Text Detection and Recognition: A ReviewIRJET Journal
 
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...iosrjce
 
OCR for Urdu translation
OCR for Urdu translation OCR for Urdu translation
OCR for Urdu translation Yasar Hayat
 
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten DocumentsIRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten DocumentsIRJET Journal
 
Design and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontDesign and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontIRJET Journal
 
An effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognitionAn effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognitionijaia
 
A Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition SystemA Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition Systemiosrjce
 
Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Editor IJARCET
 
Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Editor IJARCET
 

Similar to An OCR System for recognition of Urdu text in Nastaliq Font (20)

Isolated Kannada Character Recognition using Chain Code Features
Isolated Kannada Character Recognition using Chain Code FeaturesIsolated Kannada Character Recognition using Chain Code Features
Isolated Kannada Character Recognition using Chain Code Features
 
Multitier holistic Approach for urdu Nastaliq Recognition
Multitier holistic Approach for urdu Nastaliq RecognitionMultitier holistic Approach for urdu Nastaliq Recognition
Multitier holistic Approach for urdu Nastaliq Recognition
 
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
FREEMAN CODE BASED ONLINE HANDWRITTEN CHARACTER RECOGNITION FOR MALAYALAM USI...
 
Two Methods for Recognition of Hand Written Farsi Characters
Two Methods for Recognition of Hand Written Farsi CharactersTwo Methods for Recognition of Hand Written Farsi Characters
Two Methods for Recognition of Hand Written Farsi Characters
 
Offline Character Recognition Using Monte Carlo Method and Neural Network
Offline Character Recognition Using Monte Carlo Method and Neural NetworkOffline Character Recognition Using Monte Carlo Method and Neural Network
Offline Character Recognition Using Monte Carlo Method and Neural Network
 
I017256165
I017256165I017256165
I017256165
 
A Review on Geometrical Analysis in Character Recognition
A Review on Geometrical Analysis in Character RecognitionA Review on Geometrical Analysis in Character Recognition
A Review on Geometrical Analysis in Character Recognition
 
An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...An exhaustive font and size invariant classification scheme for ocr of devana...
An exhaustive font and size invariant classification scheme for ocr of devana...
 
L017248388
L017248388L017248388
L017248388
 
E123440
E123440E123440
E123440
 
Text Detection and Recognition: A Review
Text Detection and Recognition: A ReviewText Detection and Recognition: A Review
Text Detection and Recognition: A Review
 
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
An Efficient Segmentation Technique for Machine Printed Devanagiri Script: Bo...
 
OCR for Urdu translation
OCR for Urdu translation OCR for Urdu translation
OCR for Urdu translation
 
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten DocumentsIRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
IRJET - A Survey on Recognition of Strike-Out Texts in Handwritten Documents
 
Design and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English FontDesign and Description of Feature Extraction Algorithm for Old English Font
Design and Description of Feature Extraction Algorithm for Old English Font
 
An effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognitionAn effective approach to offline arabic handwriting recognition
An effective approach to offline arabic handwriting recognition
 
A Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition SystemA Comprehensive Study On Handwritten Character Recognition System
A Comprehensive Study On Handwritten Character Recognition System
 
A017240107
A017240107A017240107
A017240107
 
Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015
 
Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015Volume 2-issue-6-2009-2015
Volume 2-issue-6-2009-2015
 

More from Dr. Syed Hassan Amin

Greenplum versus redshift and actian vectorwise comparison
Greenplum versus redshift and actian vectorwise comparisonGreenplum versus redshift and actian vectorwise comparison
Greenplum versus redshift and actian vectorwise comparisonDr. Syed Hassan Amin
 
Thin Controllers Fat Models - How to Write Better Code
Thin Controllers Fat Models - How to Write Better CodeThin Controllers Fat Models - How to Write Better Code
Thin Controllers Fat Models - How to Write Better CodeDr. Syed Hassan Amin
 
Improving Code Quality Through Effective Review Process
Improving Code Quality Through Effective  Review ProcessImproving Code Quality Through Effective  Review Process
Improving Code Quality Through Effective Review ProcessDr. Syed Hassan Amin
 
Software Project Management Tips and Tricks
Software Project Management Tips and TricksSoftware Project Management Tips and Tricks
Software Project Management Tips and TricksDr. Syed Hassan Amin
 
Improving Software Quality Using Object Oriented Design Principles
Improving Software Quality Using Object Oriented Design PrinciplesImproving Software Quality Using Object Oriented Design Principles
Improving Software Quality Using Object Oriented Design PrinciplesDr. Syed Hassan Amin
 
Learning Technology Leadership from Steve Jobs
Learning Technology Leadership from Steve JobsLearning Technology Leadership from Steve Jobs
Learning Technology Leadership from Steve JobsDr. Syed Hassan Amin
 
Understanding and Managing Technical Debt
Understanding and Managing Technical DebtUnderstanding and Managing Technical Debt
Understanding and Managing Technical DebtDr. Syed Hassan Amin
 

More from Dr. Syed Hassan Amin (11)

Greenplum versus redshift and actian vectorwise comparison
Greenplum versus redshift and actian vectorwise comparisonGreenplum versus redshift and actian vectorwise comparison
Greenplum versus redshift and actian vectorwise comparison
 
Introduction To Docker
Introduction To  DockerIntroduction To  Docker
Introduction To Docker
 
Laravel Unit Testing
Laravel Unit TestingLaravel Unit Testing
Laravel Unit Testing
 
Understandig PCA and LDA
Understandig PCA and LDAUnderstandig PCA and LDA
Understandig PCA and LDA
 
Agile Scrum Methodology
Agile Scrum MethodologyAgile Scrum Methodology
Agile Scrum Methodology
 
Thin Controllers Fat Models - How to Write Better Code
Thin Controllers Fat Models - How to Write Better CodeThin Controllers Fat Models - How to Write Better Code
Thin Controllers Fat Models - How to Write Better Code
 
Improving Code Quality Through Effective Review Process
Improving Code Quality Through Effective  Review ProcessImproving Code Quality Through Effective  Review Process
Improving Code Quality Through Effective Review Process
 
Software Project Management Tips and Tricks
Software Project Management Tips and TricksSoftware Project Management Tips and Tricks
Software Project Management Tips and Tricks
 
Improving Software Quality Using Object Oriented Design Principles
Improving Software Quality Using Object Oriented Design PrinciplesImproving Software Quality Using Object Oriented Design Principles
Improving Software Quality Using Object Oriented Design Principles
 
Learning Technology Leadership from Steve Jobs
Learning Technology Leadership from Steve JobsLearning Technology Leadership from Steve Jobs
Learning Technology Leadership from Steve Jobs
 
Understanding and Managing Technical Debt
Understanding and Managing Technical DebtUnderstanding and Managing Technical Debt
Understanding and Managing Technical Debt
 

Recently uploaded

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 

Recently uploaded (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

An OCR System for recognition of Urdu text in Nastaliq Font

  • 1. An OCR System for recognition of Urdu text in Nastaliq Font By S. Hassan Amin Supervised By Dr. S. Afaq Hussain Faculty of Computer Science & Engineering Ghulam Ishaq Khan Institute of Engineering Sciences & Technology, Topi-Swabi, 2004
  • 2. Layout ♦ Introduction ♦ Research Scope ♦ Objectives ♦ Optical Character Recognition Steps in OCR ♦ Urdu Writing Characteristics ♦ Cursive Script Recognition Schemes ♦ Methodology Multi-Tier Holistic Approach Multi-Stage Classification Approach ♦ Results and Discussion ♦ Conclusion ♦ Future Directions ♦ References
  • 3. Introduction ♦ Urdu is the national language of Pakistan, and is understood by well over 300 million people around the world. ♦ There is a need to convert historical database of Urdu literature into electronic form , so that Urdu can prosper in the age of computers. ♦ Urdu text recognition endeavors to convert scanned Urdu documents automatically into computerized text files.
  • 4. Research Scope ♦ Paper documents have been the most important means for exchanging information for ages, but this is changing , as we are rapidly moving towards paperless society. ♦ It has been estimated by IBM that about $250 billion are annually spent worldwide (largely in operator salaries, etc.) in keying-in information from paper documents, and this is the cost of manually capturing information from only 5% of the available documents [1]. ♦ Urdu Text Recognition ♦ Urdu Text Transliteration ♦ Machine Translation
  • 5. Objectives ♦ The main objective of this research is to make an OCR system for Urdu language that is effective for Nastaliq Script irrespective of font size and orientation. To achieve this objective, there are a number of sub goals which are:-  To investigate the problem of Urdu OCR in depth, and to propose new and better ways to solve this problem.  To investigate the use of appropriate set of features for Urdu OCR.  To establish a database of Urdu ligatures for investigating the problem of Urdu OCR.  To investigate classification methods that can be useful for the problem of Urdu OCR.
  • 6. Optical Character Recognition(OCR) ♦ Character Recognition or Optical Character Recognition (OCR) is the process of converting scanned images of machine printed or handwritten text (numerals, letters and symbols), into a computer processable format (such as ASCII and Unicode) [2]. ♦ Offline character recognition is performed after the writing or printing has been completed. ♦ In Online character recognition, computer recognizes the character as they are drawn(timing information).
  • 7. Steps in OCR 1. Image Acquistion 2. Preprocessing 3. Segmentation 4. Feature Extraction 5. Classification 6. Post Processing
  • 8. 1. Image Acquistion ♦ This conversion process is accomplished by digitizer which can be either a scanner(Offline recognition), Camera, tablet digitizer(Online recognition).
  • 9. 2. Preprocessing ♦ The preprocessing involves noise reduction, skew detection,slant normalization, document decomposition etc. ♦ For slant estimation we have methods such as Projection method , chain code method[4]. ♦ For estimating skew angle of page , we have methods such as Orientation dependent histogram[3].
  • 10. 3. Segmentation ♦ Segmentation is the process of dividing an image into regions , each susceptible to containing a single object or a group of objects of the same type. For instance , an object can be a character on a text page or a line segment in an engineering drawing. ♦ In OCR , the commonly used segmentation algorithms are XY tree decomposition , run- length smearing and Hough transform.
  • 11. 4. Feature Extraction ♦ Selection of appropriate feature extraction method is probably the single most important factor in achieving high recognition performance [5]. ♦ A new comer to the field is faced with the challenge of selecting appropriate features for his/her application.
  • 12. Feature Extraction(Contd) ♦ Some useful feature extraction methods in the field of OCR are :- 1. Geometric Features 2. Structural Features 3. Moment based Features 4. Template Matching 5. Unitary Image Transforms 6. Zoning 7. Contour Profiles 8. Fourier Descriptors
  • 13. 5. Classification ♦ Classification is the process of identifying each character and assigning to it the correct character class. Two major approaches for classification methods are: 1. Decision theoretic method 2. Structural Methods
  • 14. 1. Decision theoretic method ♦ These methods are used when the description of the character can be represented numerically in a feature vector. ♦ The principal approaches to decision- theoretic recognition are minimum distance classifiers , statistical classifiers and neural networks.
  • 15. 2. Structural Methods ♦ Within the area of the structural recognition, syntactic methods are among the most common approaches. ♦ In Syntactic pattern recognition, measures of similarity based on the relationship between structural components are formulated using grammatical concepts.
  • 16. 5. Post Processing ♦ In Post Processing , we have 1. Grouping 2. Error Detection and Correction
  • 17. 1. Grouping ♦ The result of plain symbol recognition is a set of individual symbols. ♦ These symbols in themselves usually do not contain enough information. ♦ We would like to associate the individual symbols that belong to the same string with each other making up word and numbers. ♦ The process of performing this association of symbols into strings is commonly referred to as grouping.
  • 18. 2. Error Detection and Correction ♦ Along with the grouping of the characters, another issue to take care is the context in which each character appears. ♦ Because even the best of the OCR systems cannot identify each character with 100% accuracy. These errors may be detected or even corrected by use of context.
  • 19. Urdu Writing Characteristics ♦ Urdu is a cursive language , which has evolved from Arabic , Persian and Turkish languages. ♦ Urdu language has 36,37,42,51 and 53 characters according to different sources[8]. ♦ The UZT 1.01 standard has 42 characters.
  • 20. Urdu Writing Characteristics(Contd) Figure : Urdu Character Set UZT 1.01
  • 21. Urdu Writing Characteristics(Contd) Characteristics Urdu Arabic Latin Hebrew Hindi H Justification RL RL LR RL LR V-Justification Center Base No No Top Cursive Yes Yes No No Yes Diacritics Yes Yes No No Yes # Vowels 2 2 5 11 - # Letters 37 28 26 22 40 Letter Shapes 1-28 1-4 2 1 1 Complementary Characters 5 3- - - -
  • 22. Cursive Script Recognition Schemes ♦ There are two strategies that have been applied to cursive script recognition. As mentioned by Amin and Khorsheed [6,7], they can be categorized as follows: 1. Holistic Strategies in which the recognition is globally performed on the whole representation of words and where there is no attempt to identify characters individually.
  • 23. Cursive Script Recognition Schemes(Contd) 1. Analytical strategies in which words are not considered as a whole, but as sequences of small size units and recognition is not directly performed at word level but at an intermediate level dealing with these units, which can be graphemes, segments, Pseudo-letters etc.
  • 24. Research Methodology ♦ Two approaches to recognize Urdu ligatures printed in Nastaliq Script are presented. Both these approaches are holistic in nature.These approaches are tested for identification of a set of most frequent ligatures printed in Noori Nastaliq Script. The suggested approaches to recognize Urdu text are :- 1. Multi-tier Holistic Approach 2. Multi-Stage Classification Approach.
  • 25. Multi-Tier Holistic Approach to Urdu Nastaliq Recognition ♦ A multi-tier Holistic Approach using feed forward back propagation neural network was implemented[12].
  • 26. (Contd) Figure :Multi-Tier Holistic Approach to Urdu Nastaliq Recognition
  • 27. 1. Segmentation ♦ Connected Component Labeling is applied to the image of Urdu text. ♦ This technique assigns to each connected component of binary image a distinct label. ♦ The labels are usually natural numbers from 1 to the number of connected components in the input image. ♦ The algorithm scans the image from left-to-right and top-to-bottom.
  • 28. Segmentation(Contd) ♦ On the first line containing black pixels, a unique label is assigned to each contiguous run of black pixels. ♦ For each black pixel, the pixels in its eight neighborhood are examined, if any of these pixels has been labeled the same label is assigned to the current pixel, otherwise a new label is assigned to it. The procedure continues to the bottom of the image.
  • 29. Feature Extraction I ♦ In this stage, we extract some features that will help us in the recognition of special ligatures, see figure. These features are Solidity, Number of Holes, Axis Ratio, Eccentricity, Moments, Normalized segment length, curvature, ratio of bounding box width and height. 1 2 3 4 5 6 7 8
  • 30. Special Ligature Identification ♦ A Feed forward BPN network is trained on the feature vectors obtained in the Feature Extraction I stage. During testing , this network is used to identify input ligatures as one of special ligature . If no valid output is returned , then the ligature is identified as base ligature.
  • 31. Feature Extraction II ♦ In this stage, special ligatures are associated with the base ligatures. Special ligature are associated with the base ligature whose Centroid-to-Centroid distance is minimum. ♦ A number of lines are grown from the center of each special ligature, when one of these lines touches a base ligature, then the special ligature is associated with that base ligature. ♦ In this stage, due to association of special ligatures with the base ligatures twenty new features are added to the feature vector of the base ligature.
  • 32. Classification and Recognition ♦ In this stage, the final feature vector consisting of 34 features is fed into Feed Forward Back propagation neural network. The network architecture consists of 34 inputs, 65 hidden neurons and 45 output neurons.
  • 33. Multi-Stage Classification Approach to Urdu Text Recognition ♦ The motivation behind this approach is the belief , that classification performance could be improved by combining multiple classifiers[9,10,11].
  • 34.
  • 35. (Contd) ♦ As shown in the figure , the first three stages are similar to the multi-tier approach. ♦ Intermediate Classification In the training phase , we train a competitive network on feature vectors of base ligatures , to divide input data into desired number of clusters. In the training phase , a LVQ/BPN network is trained on the output of the competitive network , to classify the input pattern to a particular class or cluster. In the testing phase, the input feature vector is presented to the to trained LVQ/BPN network , it gives us the desired class/cluster.
  • 36. (Contd) ♦ Ligature Identification A BPN network is trained for all the ligatures belonging to a particular class/cluster in the classification and recognition stage of the system.
  • 38. Frequency Analysis ♦ To establish a database of Urdu images for training and testing, it was decided that most frequent Urdu ligatures would be identified from the World Wide Web. ♦ This was a challenge, since most Urdu sites are based on images of Urdu text, so there was no way of counting Urdu ligatures without first identifying them. ♦ The BBC Urdu news site http://www.bbc.co.uk/urdu/ was selected for frequency analysis because it is font-based site of Urdu. ♦ The hex codes of BBC Urdu font were studied. ♦ A study of Urdu font was also done. There are three types of Urdu characters, given as follows: 1. Characters which do not connect on both sides e.g alif 2. Character which connect on both sides e.g bay, tay 3. Characters which do not connect from the left e.g wow , ray ♦ There are two types of breaks in Urdu text file , one is hard break identified by 0x0020 and soft break identified by nature of character. On the basis of these breaks and punctuation marks we decide about separation between ligatures , and hence keep count of ligatures.
  • 39. Frequency Analysis(Contd) S.No. Lig Count S.No. Lig Count 1 ‫ا‬ 2904 11 ‫کا‬ 408 2 ‫ر‬ 1600 12 ‫ہے‬ 377 3 ‫و‬ 1240 13 ‫کر‬ 338 4 ‫کے‬ 745 14 ‫کو‬ 309 5 ‫د‬ 718 15 ‫ہ‬ 295 6 ‫ں‬ 480 16 ‫سے‬ 290 7 ‫کی‬ 469 17 ‫ی‬ 269 8 ‫نے‬ 456 18 ‫ہو‬ 269 9 ‫میں‬ 445 19 ‫س‬ 260 10 ‫ن‬ 439 20 ‫کہ‬ 256 Table : List of 20 most frequent ligatures
  • 41. Feature Vectors S.No. Name Moment 1 Moment 2 Moment 3 Moment 4 Moment 5 Moment 6 Moment 7 1 1.bmp 0.52283 0.24376 0.00496 0.004624 2.21E-05 0.002274 -5.63E-07 2 10.bmp 0.16563 9.28E-05 0.000277 5.48E-06 2.01E-10 -1.78E-08 -7.16E-11 3 100.bmp 0.16949 0.000171 9.12E-05 3.05E-06 4.94E-11 -2.88E-08 1.26E-11 4 101.bmp 0.64308 0.37256 0.008243 0.005689 3.88E-05 0.003196 3.37E-06 5 102.bmp 0.16488 0.0007 0.000168 6.87E-06 1.89E-10 1.61E-07 -1.37E-10 6 103.bmp 0.40757 0.03951 0.039031 0.031366 0.001002 0.006081 -0.00045 7 104.bmp 0.29624 0.048083 0.000436 8.78E-05 1.66E-08 1.91E-05 4.48E-09 8 105.bmp 0.16481 0.000165 4.32E-05 1.16E-06 6.75E-12 -7.19E-09 4.63E-12 9 106.bmp 0.26849 0.033972 0 0 0 0 0 S.No Name Solidity Minor Axis LengthMajor Axis LengthEccentricityOrientationAxis Ratio 1 1.bmp 0.82051 4.0294 22.8431 0.98432 86.8416 0.1764 2 10.bmp 0.80645 5.7038 6.0321 0.32538 56.7493 0.94558 3 100.bmp 0.75 5.6006 6.0319 0.37135 -16.0531 0.92849 4 101.bmp 0.61702 2.9867 17.0919 0.98461 41.8065 0.17474 5 102.bmp 0.83871 5.4889 6.4133 0.51721 17.1527 0.85586 6 103.bmp 0.44898 16.0802 27.3559 0.809 109.504 0.58781 7 104.bmp 0.66667 5.3315 13.5202 0.91897 1.5099 0.39433 8 105.bmp 0.81081 6.1484 6.6314 0.37467 16.9823 0.92716 9 106.bmp 0.7 6.2487 14.2896 0.89932 -3.3781 0.43729 Figure : Moment based features for some ligatures Figure : Geometric features for some ligatures
  • 42. Special Ligature Identification Figure : Importance of Special ligature in identifying ligatures Network BPN Configuration 52-26-8 Goal 0.01 Mc 0.4 Lr 0.1 Figure : Network configuration used to identify special ligatures
  • 43. Special Ligature Identification(Contd) Figure : Training to identify special ligatures
  • 44. Intermediate Classification Figure : Analysis for identification of clusters
  • 45. Intermediate Classification(Contd) Features Used No. of Clusters No. of Images Moment 1 Solidity Eccentricity Axis Ratio 4 216 Neural Net Used BPN Configuration 64-32-4 Percentage Distribution of Clusters Cluster 1 Cluster 2 Cluster 3 Cluster 4 16.67 29.63 27.78 25.93 Figure : Network Configuration
  • 48. Ligature Identification Cluster 4 Configuration 80-40-8 Lr 0.1 mc 0.3
  • 49. Ligature Identification(Contd) Cluster 2 Configuration 80-40-8 Lr 0.1 mc 0.3 goal 0.019
  • 50. Conclusion ♦ Two different approaches for recognition of Cursive Urdu text written in Nastaliq Script have been presented. ♦ A set of 1000 most frequent ligature has been identified. ♦ Our approach minimizes the errors due to segmentation by using segmentation free approach. ♦ By using different types of features, we have improved the number of ligatures that can be identified. ♦ Classification performance has been improved by implementing multi-stage classification approach; this approach is especially useful for large number of ligatures[9,10,11].
  • 51. Future Directions ♦ A number of possible directions are under consideration for enhancement of the system for practical use namely,  Study of effectiveness of features used , and to find new features that can be effective for Urdu OCR.  Enhancement of the number of ligatures used for training.  Addition of Special characters, Numerals and Aerab for recognition as special ligatures.  Recognition of intonation marks in the document.  Addition of multi lingual support in the system.
  • 52. References 1. http://www.almaden.ibm.com/cs/dare.html 2. Sargur N. Sridhar, Stephen W. Lam, “Character Recognition” . 3. H. Bunke and Wang, “Handbook of character recognition and document image analysis”, World Scientific. 4. M. Shridhar, F. Kimura,”Segmentation Based Cursive Handwriting Recognition”, Handbook of Character Recognition. 5. Oivind De Trier, Anil K. Jain and Torfinn, “Feature Extraction methods for Character Recognition-A Survey”, Pattern Recognition,Vol 29, No. 4,pp. 641-662, 1996
  • 53. References(Contd) 1. Adnan Amin, “Arabic Character Recognition”, Handbook of Character Recognition. 2. Mohammad S. Khorsheed, “Structural Features of Cursive Arabic Script” 3. Muhammad Afzal, Sarmad Hussain,”Urdu Computing Standards:Development of Urdu Zabta Takhti-WG2 N2413-2-SC2 N3589-2 (UZT) 1.01” 4. L. Xu, A. Krzyzak, and C. Y. Suen ,” Methods of Combining Multiple Classifiers and their Applications to Handwriting Recognition,” IEEE Trans. Systems, Man and Cybernetics, vol. 27 , no. 4, pp.418-435,1992. 5. T.K. Ho, J.J. Hull and S. N. Srihari, ” Decision Combination in Multiple Classifier Systems,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 16, no. 1, pp. 66-75,1994.
  • 54. References(Contd) 1. K. Kittler, M. Hatef, R P. W. Dutin and K. Matas, “On Combining Classifiers,” IEEE Trans. Pattern Analysis and Machnie Intelligence, vol. 20, no. 3 pp. 226-239, 1998. 2. Syed Afaq Husain, S. Hassan Amin,” Multi-Tier Holistic Approach to Urdu Nastaliq Recognition,” IEEE INMIC Dec. 2002, Karachi.