SlideShare uma empresa Scribd logo
1 de 11
Multilingual OCR                                                                 Introduction


                                       ABSTRACT
       The aim of the project ‘Multilingual OCR’ is to develop OCR software for
online/offline handwriting recognition. OCR is an Optical character recognition and is the
mechanical or electronic translation of images of handwritten or typewritten text (usually
captured by a scanner) into machine-editable text. OCR is a field of research in pattern
recognition, artificial intelligence and machine vision.
       Handwritten recognition is used most often to describe the ability of a computer to
translate human writing into text. This may take in one of the two ways, either by scanning of
written text or by writing directly on peripheral input devices.




PES’s Modern College of Engineering, Shivajinagar, Pune-5                              Page 1
Multilingual OCR                                                                      Introduction


Aim: To develop an OCR for online/offline handwriting recognition.


Description:
       We are going to implement the software which will recognize the characters from
online or offline document (in image format) and use it as individual user profile.
       Here we are developing OCR which will recognize handwritten English characters.
OCR is an Optical character recognition and is the mechanical or electronic translation of
images of handwritten or typewritten text (usually captured by a scanner) into machine-
editable text. OCR is a field of research in pattern recognition, artificial intelligence and
machine vision.




PES’s Modern College of Engineering, Shivajinagar, Pune-5                                  Page 2
Multilingual OCR                                                                         Introduction


Scope of the project:
       This system can be used by multiple users. We can do this by improving our software for
recognizing the handwriting of more than one user. Also if we can take the stroke information and
give it to our system, then it will be possible to recognize even cursive script also.

The recognized characters are stored in the text file. We can add words to the sound files and invoke
them through the program, so that the recognized words can be read aloud. Thus we can make the
computer read the handwritten document.


Block Diagram:



                                               Stored
                                             Characters
                                                                                Grayscale
                                                                                Conversion
      Touch Pad
                                                                                  Filtering
On Line / Real Time Input
                                                  PC                              Thinning


                                                                                  Feature
 Scanned Document                                                                Extraction


   Off Line Input                                                                 Pattern
                                                                                Recognition

                                                                                 Recognition
                                                                                   Output


                                                                             Software Domain


                                 Fig. Block Diagram for OCR




PES’s Modern College of Engineering, Shivajinagar, Pune-5                                     Page 3
Multilingual OCR                                                                       Introduction




1. Introduction

1.1 Problem Statement:

         To develop an OCR for online/offline handwriting recognition.


1.2 Project Scope:

         This system can be used by multiple users. We can do this by improving our software
for recognizing the handwriting of more than one user. Also if we can take the stroke
information and give it to our system, then it will be possible to recognize even cursive script
also.
         The recognized characters are stored in the text file. We can add words to the sound
files and invoke them through the program, so that the recognized words can be read aloud.
Thus we can make the computer read the handwritten document.


1.3 Project Objectives:

         This software is for recognizing handwritten characters and creating profile for each
particular user. This software supports various languages (except Marathi and Hindi). The
software can be used for security purposes and for creating font of user’s handwriting.


1.4 Assumptions and dependencies:

    1. “Multilingual OCR” requires input image with a black background and white fore color.
         For this purpose, the software has Invert Image option, which will convert the image in
         proper format.
    2.     System is designed only for Windows OS. It may not work for other operating system.
    3.     System will recognize any set of characters provided that they are written in legible manner.
    4. The characters must be properly separated for greater accuracy.
    5. The input given to the system must be in a Bitmap, png, jpeg, jpg file.
    6. There should be constant distance between characters and rows to ensure accuracy.

PES’s Modern College of Engineering, Shivajinagar, Pune-5                                    Page 4
Multilingual OCR                                                                    Introduction




1.5 Applications of OCR:

•   Practical Applications:
       In recent years, OCR (Optical Character Recognition) technology has been applied
throughout the entire spectrum of industries, revolutionizing the document management
process. OCR has enabled scanned documents to become more than just image files, turning
into fully searchable documents with text content that is recognized by computers. With the
help of OCR, people no longer need to manually retype important documents when entering
them into electronic databases. Instead, OCR extracts relevant information and enters it
automatically. The result is accurate, efficient information processing in less time.


•   Banking:
    The uses of OCR vary across different fields. One widely known application is in
banking, where OCR is used to process checks without human involvement. A check can be
inserted into a machine, the writing on it is scanned instantly, and the correct amount of
money is transferred. This technology has nearly been perfected for printed checks, and is
fairly accurate for handwritten checks as well, though it occasionally requires manual
confirmation. Overall, this reduces wait times in many banks.


•   Legal:
    In the legal industry, there has also been a significant movement to digitize paper
documents. In order to save space and eliminate the need to sift through boxes of paper files,
documents are being scanned and entered into computer databases. OCR further simplifies
the process by making documents text-searchable, so that they are easier to locate and work
with once in the database. Legal professionals now have fast, easy access to a huge library of
documents in electronic format, which they can find simply by typing in a few keywords.


•   Healthcare:
    Healthcare has also seen an increase in the use of OCR technology to process
paperwork. Healthcare professionals always have to deal with large volumes of forms for
each patient, including insurance forms as well as general health forms. To keep up with all
of this information, it is useful to input relevant data into an electronic database that can be

PES’s Modern College of Engineering, Shivajinagar, Pune-5                                Page 5
Multilingual OCR                                                                   Introduction


accessed as necessary. Form processing tools, powered by OCR, are able to extract
information from forms and put it into databases, so that every patient's data is promptly
recorded. As a result, healthcare providers can focus on delivering the best possible service to
every patient.


•   OCR in Other Industries:
    OCR is widely used in many other fields, including education, finance, and government
agencies. OCR has made countless texts available online, saving money for students and
allowing knowledge to be shared. Invoice imaging applications are used in many businesses
to keep track of financial records and prevent a backlog of payments from piling up. In
government agencies and independent organizations, OCR simplifies data collection and
analysis, among other processes. As the technology continues to develop, more and more
applications are found for OCR technology, including increased use of handwriting
recognition. Furthermore, other technologies related to OCR, such as barcode recognition, are
used daily in retail and other industries. To learn more about OCR solutions for your office,
you can download a free trial of Maestro Recognition Server, CVISION's OCR toolkit, or
Trapeze, our automated form-processing solution.




PES’s Modern College of Engineering, Shivajinagar, Pune-5                               Page 6
Multilingual OCR                                                                  Introduction




1.6 Literature Survey:

       Now a days, there are software’s for recognizing only the English characters. It
recognizes and stores the characters in ASCII format.

       Optical character recognition, usually abbreviated to OCR, is the mechanical or
electronic translation of images of handwritten, typewritten or printed text (usually captured
by a scanner) into machine-editable text.

       OCR is a field of research in pattern recognition, artificial intelligence and machine
vision. Though academic research in the field continues, the focus on OCR has shifted to
implementation of proven techniques. Optical character recognition (using optical techniques
such as mirrors and lenses) and digital character recognition (using scanners and computer
algorithms) were originally considered separate fields. Because very few applications survive
that use true optical techniques, the OCR term has now been broadened to include digital
image processing as well.

       Early systems required training (the provision of known samples of each character) to
read a specific font. "Intelligent" systems with a high degree of recognition accuracy for most
fonts are now common. Some systems are even capable of reproducing formatted output that
closely approximates the original scanned page including images, columns and other non-
textual components.

         In about 1965, Reader's Digest and RCA collaborated to build an OCR Document
reader designed to digitize the serial numbers on Reader's Digest coupons returned from
advertisements. The fonts used on the documents were printed by an RCA Drum printer
using the OCR-A font. The reader was connected directly to an RCA 301 computer (one of
the first solid state computers). This reader was followed by a specialised document reader
installed at TWA where the reader processed Airline Ticket stock. The readers processed
documents at a rate of 1,500 documents per minute, and checked each document, rejecting
those it was not able to process correctly. The product became part of the RCA product line
as a reader designed to process "Turn around Documents" such as those utility and insurance
bills returned with payments.

       The United States Postal Service has been using OCR machines to sort mail since
1965 based on technology devised primarily by the prolific inventor Jacob Rabinow. The first

PES’s Modern College of Engineering, Shivajinagar, Pune-5                               Page 7
Multilingual OCR                                                                 Introduction


use of OCR in Europe was by the British General Post Office (GPO). In 1965 it began
planning an entire banking system, the National Giro, using OCR technology, a process that
revolutionized bill payment systems in the UK. Canada Post has been using OCR systems
since 1971.

        In 1974 Ray Kurzweil started the company Kurzweil Computer Products, Inc. and led
development of the first omni-font optical character recognition system — a computer
program capable of recognizing text printed in any normal font. He decided that the best
application of this technology would be to create a reading machine for the blind, which
would allow blind people to have a computer read text to them out loud. This device required
the invention of two enabling technologies — the CCD flatbed scanner and the text-to-speech
synthesizer.

        In 1978 Kurzweil Computer Products began selling a commercial version of the
optical character recognition computer program. LexisNexis was one of the first customers,
and bought the program to upload paper legal and news documents onto its nascent online
databases.

        1992-1996 Commissioned by the U.S. Department of Energy (DOE), Information
Science Research Institute (ISRI) conducted the most authoritative of the Annual Test of
OCR Accuracy for 5 consecutive years in the mid-90s. Information Science Research
Institute (ISRI) is a research and development unit of University of Nevada, Las Vegas. ISRI
was established in 1990 with funding from the U.S. Department of Energy. Its mission is to
foster the improvement of automated technologies for understanding machine printed
documents.

       One study based on recognition of 19th and early 20th century newspaper pages
concluded that character-by-character OCR accuracy for commercial OCR software varied
from 71% to 98%; total accuracy can only be achieved by human review. Other areas—
including recognition of hand printing, cursive handwriting, and printed text in other scripts
(especially those East Asian language characters which have many strokes for a single
character)—are still the subject of active research.




PES’s Modern College of Engineering, Shivajinagar, Pune-5                              Page 8
Multilingual OCR                                                               Introduction




3.5 User Characteristics:

   •   User should be provided proper training to operate whole system
   •   User must have the basic knowledge of computers.
   •   User must know the handling of different instruments e.g. scanner, mouse etc.


3.6 Specific Requirement:

3.6.1 User Interfaces

   The user will interact with system

   •   Depending on type of user required output will be generated
   •   By writing directly on the text area provided on the GUI.
   •   By first writing in an image file and then giving as input to the system.
   •   The user will be asked to save the text generated in a .TXT file.


3.6.2 Hardware Requirements

   •    Intel Pentium 2 Processor
   •    CPU minimum 500MHZ
   •    Minimum 64 MB of RAM
   •   Mouse
   •   Keyboard
   •   Scanner
   •   Monitor




PES’s Modern College of Engineering, Shivajinagar, Pune-5                              Page 9
Multilingual OCR                                                               Introduction


3.6.3 Software Requirements

   •   Microsoft Windows 98/NT/XP/2000
   •   MINIMUM JDK 1.4
   •   JAVA 2D API
   •   JAVA Advanced Imaging API
   •   JAVA Image I/O API
   •   JAVA Media Frameworks


3.6.4 Performance Requirements:
   •   Accuracy: The extent to which a program satisfies its specification and fulfils the
       customer mission objective.
   •   Reliability: The extent to which a program can be expected to perform its
       intended function with require precision.
   •   Speed: The time require for a program to perform the given task.
   •   Maintainability: The efforts required to locate and fix an error in the program.
   •   Portability: The efforts required to transform a program from one hardware and/or
       software system environment to another.
   •   Availability: The system is expected to be available around the clock as it will be
       further used to analyze blood slides at the installed site.

3.6.5 Functional Requirements:
    1. For static OCR, software should provide a way to load scanned document for
       recognition purpose.
    2. If scanned image is not having black background and white foreground, facility
       for image inversion should be provided by software.
    3. Software should process the image and extract characters.
    4. User should have facility to save extracted data in format of his interest.
    5. For dynamic OCR, the software should recognize characters drawn by user
       simultaneously.



PES’s Modern College of Engineering, Shivajinagar, Pune-5                            Page 10
Multilingual OCR                                                             Introduction


    6. If software is not giving proper output, there should be a way for training the
        database of software.

3.6.6 Other Requirements:

    •   The input image is to be in the bitmap file format
    •   In case of scanned image, a high quality scanner as well as good paper quality is
        required. The resolution of the scanner should be set to a minimum of 300 dots per inch
        (dpi).
    •   During scanning a maximum tilt of up to 20º can be corrected.
    •   In case of discontinuities in the hand written characters a maximum gap of up to 3 pixel
        wide thickness is tolerable.
    •   A first order median filter is used.




3.7 Position Statement:

        Optical Handwriting recognition is used most often used to describe
the ability of a computer to translate human writing into text. This system can
be used for: -
                Railway Reservation Forms
                Libraries
                Government Agencies
                School/College Admission Forms
                Make other Lengthy Documents available Electronically




PES’s Modern College of Engineering, Shivajinagar, Pune-5                         Page 11

Mais conteúdo relacionado

Mais procurados

optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition system
Vijay Apurva
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
Chiranjeevi Adi
 
Face recognition technology - BEST PPT
Face recognition technology - BEST PPTFace recognition technology - BEST PPT
Face recognition technology - BEST PPT
Siddharth Modi
 

Mais procurados (20)

Character Recognition using Machine Learning
Character Recognition using Machine LearningCharacter Recognition using Machine Learning
Character Recognition using Machine Learning
 
OCR (Optical Character Recognition)
OCR (Optical Character Recognition) OCR (Optical Character Recognition)
OCR (Optical Character Recognition)
 
OCR Presentation (Optical Character Recognition)
OCR Presentation (Optical Character Recognition)OCR Presentation (Optical Character Recognition)
OCR Presentation (Optical Character Recognition)
 
Optical Character Recognition Using Python
Optical Character Recognition Using PythonOptical Character Recognition Using Python
Optical Character Recognition Using Python
 
Optical character recognition (ocr) ppt
Optical character recognition (ocr) pptOptical character recognition (ocr) ppt
Optical character recognition (ocr) ppt
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character Recognition
 
Optical Character Reader - Project Report BTech
Optical Character Reader - Project Report BTechOptical Character Reader - Project Report BTech
Optical Character Reader - Project Report BTech
 
optical character recognition system
optical character recognition systemoptical character recognition system
optical character recognition system
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character Recognition
 
Final Report on Optical Character Recognition
Final Report on Optical Character Recognition Final Report on Optical Character Recognition
Final Report on Optical Character Recognition
 
Presentation on OCR
Presentation on OCRPresentation on OCR
Presentation on OCR
 
Optical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based RetrievalOptical Character Recognition (OCR) based Retrieval
Optical Character Recognition (OCR) based Retrieval
 
ocr
ocrocr
ocr
 
project ppt.pptx
project ppt.pptxproject ppt.pptx
project ppt.pptx
 
Handwriting Recognition
Handwriting RecognitionHandwriting Recognition
Handwriting Recognition
 
Handwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHandwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural network
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
 
LICENSE NUMBER PLATE RECOGNITION SYSTEM USING ANDROID APP
LICENSE NUMBER PLATE RECOGNITION SYSTEM USING ANDROID APPLICENSE NUMBER PLATE RECOGNITION SYSTEM USING ANDROID APP
LICENSE NUMBER PLATE RECOGNITION SYSTEM USING ANDROID APP
 
Computer vision
Computer visionComputer vision
Computer vision
 
Face recognition technology - BEST PPT
Face recognition technology - BEST PPTFace recognition technology - BEST PPT
Face recognition technology - BEST PPT
 

Destaque

Feature Analysis for Affect Recognition Supporting Task Sequencing in Adaptiv...
Feature Analysis for Affect Recognition Supporting Task Sequencing in Adaptiv...Feature Analysis for Affect Recognition Supporting Task Sequencing in Adaptiv...
Feature Analysis for Affect Recognition Supporting Task Sequencing in Adaptiv...
janningr
 
Fourth Dimension Level 1 By Dr.Moiz Hussain
Fourth Dimension Level 1  By Dr.Moiz HussainFourth Dimension Level 1  By Dr.Moiz Hussain
Fourth Dimension Level 1 By Dr.Moiz Hussain
Ehtesham Mirxa
 
An OCR System for recognition of Urdu text in Nastaliq Font
An OCR System for recognition of Urdu text in Nastaliq FontAn OCR System for recognition of Urdu text in Nastaliq Font
An OCR System for recognition of Urdu text in Nastaliq Font
Dr. Syed Hassan Amin
 

Destaque (17)

OCR
OCROCR
OCR
 
Text Detection and Recognition
Text Detection and RecognitionText Detection and Recognition
Text Detection and Recognition
 
Text extraction From Digital image
Text extraction From Digital imageText extraction From Digital image
Text extraction From Digital image
 
Nuance-ACEDS May 21 OCR Webcast
Nuance-ACEDS May 21 OCR Webcast Nuance-ACEDS May 21 OCR Webcast
Nuance-ACEDS May 21 OCR Webcast
 
Machine learning
Machine learningMachine learning
Machine learning
 
Image processing
Image processingImage processing
Image processing
 
Scalability in Model Checking through Relational Databases
Scalability in Model Checking through Relational DatabasesScalability in Model Checking through Relational Databases
Scalability in Model Checking through Relational Databases
 
Usage of Shape From Focus Method For 3D Shape Recovery And Identification of ...
Usage of Shape From Focus Method For 3D Shape Recovery And Identification of ...Usage of Shape From Focus Method For 3D Shape Recovery And Identification of ...
Usage of Shape From Focus Method For 3D Shape Recovery And Identification of ...
 
Stages of image processing
Stages of image processingStages of image processing
Stages of image processing
 
Feature Analysis for Affect Recognition Supporting Task Sequencing in Adaptiv...
Feature Analysis for Affect Recognition Supporting Task Sequencing in Adaptiv...Feature Analysis for Affect Recognition Supporting Task Sequencing in Adaptiv...
Feature Analysis for Affect Recognition Supporting Task Sequencing in Adaptiv...
 
Image Enhancement by Image Fusion for Crime Investigation
Image Enhancement by Image Fusion for Crime InvestigationImage Enhancement by Image Fusion for Crime Investigation
Image Enhancement by Image Fusion for Crime Investigation
 
Fourth Dimension Level 1 By Dr.Moiz Hussain
Fourth Dimension Level 1  By Dr.Moiz HussainFourth Dimension Level 1  By Dr.Moiz Hussain
Fourth Dimension Level 1 By Dr.Moiz Hussain
 
final year project_leaf recognition
final year project_leaf recognitionfinal year project_leaf recognition
final year project_leaf recognition
 
Matlab Image Enhancement Techniques
Matlab Image Enhancement TechniquesMatlab Image Enhancement Techniques
Matlab Image Enhancement Techniques
 
An OCR System for recognition of Urdu text in Nastaliq Font
An OCR System for recognition of Urdu text in Nastaliq FontAn OCR System for recognition of Urdu text in Nastaliq Font
An OCR System for recognition of Urdu text in Nastaliq Font
 
Off-line English Character Recognition: A Comparative Survey
Off-line English Character Recognition: A Comparative SurveyOff-line English Character Recognition: A Comparative Survey
Off-line English Character Recognition: A Comparative Survey
 
Matlab and Image Processing Workshop-SKERG
Matlab and Image Processing Workshop-SKERG Matlab and Image Processing Workshop-SKERG
Matlab and Image Processing Workshop-SKERG
 

Semelhante a Ocr abstract

OPTICAL CHARACTER RECOGNIZATION NEERAJ.pptx
OPTICAL CHARACTER RECOGNIZATION  NEERAJ.pptxOPTICAL CHARACTER RECOGNIZATION  NEERAJ.pptx
OPTICAL CHARACTER RECOGNIZATION NEERAJ.pptx
NeerajBudhlakoti
 
Optical character recognization word
Optical character recognization wordOptical character recognization word
Optical character recognization word
Dhana K
 
Smart Assistant for Blind Humans using Rashberry PI
Smart Assistant for Blind Humans using Rashberry PISmart Assistant for Blind Humans using Rashberry PI
Smart Assistant for Blind Humans using Rashberry PI
ijtsrd
 

Semelhante a Ocr abstract (20)

D017222226
D017222226D017222226
D017222226
 
OPTICAL CHARACTER RECOGNIZATION NEERAJ.pptx
OPTICAL CHARACTER RECOGNIZATION  NEERAJ.pptxOPTICAL CHARACTER RECOGNIZATION  NEERAJ.pptx
OPTICAL CHARACTER RECOGNIZATION NEERAJ.pptx
 
OCR 's Functions
OCR 's FunctionsOCR 's Functions
OCR 's Functions
 
Bj35343348
Bj35343348Bj35343348
Bj35343348
 
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCRA SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
A SMART LANGUAGE TRANSLATION TECHNIQUE USING OCR
 
What is Optical Character Recognition (OCR) Technology?
What is Optical Character Recognition (OCR) Technology?What is Optical Character Recognition (OCR) Technology?
What is Optical Character Recognition (OCR) Technology?
 
How to create a corpus of machine-readable texts: challenges and solutions
How to create a corpus of machine-readable texts: challenges and solutionsHow to create a corpus of machine-readable texts: challenges and solutions
How to create a corpus of machine-readable texts: challenges and solutions
 
Optical character recognization word
Optical character recognization wordOptical character recognization word
Optical character recognization word
 
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...
BLOB DETECTION TECHNIQUE USING IMAGE PROCESSING FOR IDENTIFICATION OF MACHINE...
 
Ocr 1
Ocr 1Ocr 1
Ocr 1
 
O45018291
O45018291O45018291
O45018291
 
En31919926
En31919926En31919926
En31919926
 
Colorful Modern Group Project Creative Presentation.pdf
Colorful Modern Group Project Creative Presentation.pdfColorful Modern Group Project Creative Presentation.pdf
Colorful Modern Group Project Creative Presentation.pdf
 
50120130406005
5012013040600550120130406005
50120130406005
 
300GroupProject_handwritingsoftware.pptx
300GroupProject_handwritingsoftware.pptx300GroupProject_handwritingsoftware.pptx
300GroupProject_handwritingsoftware.pptx
 
Smart Assistant for Blind Humans using Rashberry PI
Smart Assistant for Blind Humans using Rashberry PISmart Assistant for Blind Humans using Rashberry PI
Smart Assistant for Blind Humans using Rashberry PI
 
131 133
131 133131 133
131 133
 
OCR, optical character reader
OCR, optical character readerOCR, optical character reader
OCR, optical character reader
 
PB.docx
PB.docxPB.docx
PB.docx
 
OPTICAL CHARACTER RECOGNITION IN HEALTHCARE
OPTICAL CHARACTER RECOGNITION IN HEALTHCAREOPTICAL CHARACTER RECOGNITION IN HEALTHCARE
OPTICAL CHARACTER RECOGNITION IN HEALTHCARE
 

Último

Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 

Último (20)

Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 

Ocr abstract

  • 1. Multilingual OCR Introduction ABSTRACT The aim of the project ‘Multilingual OCR’ is to develop OCR software for online/offline handwriting recognition. OCR is an Optical character recognition and is the mechanical or electronic translation of images of handwritten or typewritten text (usually captured by a scanner) into machine-editable text. OCR is a field of research in pattern recognition, artificial intelligence and machine vision. Handwritten recognition is used most often to describe the ability of a computer to translate human writing into text. This may take in one of the two ways, either by scanning of written text or by writing directly on peripheral input devices. PES’s Modern College of Engineering, Shivajinagar, Pune-5 Page 1
  • 2. Multilingual OCR Introduction Aim: To develop an OCR for online/offline handwriting recognition. Description: We are going to implement the software which will recognize the characters from online or offline document (in image format) and use it as individual user profile. Here we are developing OCR which will recognize handwritten English characters. OCR is an Optical character recognition and is the mechanical or electronic translation of images of handwritten or typewritten text (usually captured by a scanner) into machine- editable text. OCR is a field of research in pattern recognition, artificial intelligence and machine vision. PES’s Modern College of Engineering, Shivajinagar, Pune-5 Page 2
  • 3. Multilingual OCR Introduction Scope of the project: This system can be used by multiple users. We can do this by improving our software for recognizing the handwriting of more than one user. Also if we can take the stroke information and give it to our system, then it will be possible to recognize even cursive script also. The recognized characters are stored in the text file. We can add words to the sound files and invoke them through the program, so that the recognized words can be read aloud. Thus we can make the computer read the handwritten document. Block Diagram: Stored Characters Grayscale Conversion Touch Pad Filtering On Line / Real Time Input PC Thinning Feature Scanned Document Extraction Off Line Input Pattern Recognition Recognition Output Software Domain Fig. Block Diagram for OCR PES’s Modern College of Engineering, Shivajinagar, Pune-5 Page 3
  • 4. Multilingual OCR Introduction 1. Introduction 1.1 Problem Statement: To develop an OCR for online/offline handwriting recognition. 1.2 Project Scope: This system can be used by multiple users. We can do this by improving our software for recognizing the handwriting of more than one user. Also if we can take the stroke information and give it to our system, then it will be possible to recognize even cursive script also. The recognized characters are stored in the text file. We can add words to the sound files and invoke them through the program, so that the recognized words can be read aloud. Thus we can make the computer read the handwritten document. 1.3 Project Objectives: This software is for recognizing handwritten characters and creating profile for each particular user. This software supports various languages (except Marathi and Hindi). The software can be used for security purposes and for creating font of user’s handwriting. 1.4 Assumptions and dependencies: 1. “Multilingual OCR” requires input image with a black background and white fore color. For this purpose, the software has Invert Image option, which will convert the image in proper format. 2. System is designed only for Windows OS. It may not work for other operating system. 3. System will recognize any set of characters provided that they are written in legible manner. 4. The characters must be properly separated for greater accuracy. 5. The input given to the system must be in a Bitmap, png, jpeg, jpg file. 6. There should be constant distance between characters and rows to ensure accuracy. PES’s Modern College of Engineering, Shivajinagar, Pune-5 Page 4
  • 5. Multilingual OCR Introduction 1.5 Applications of OCR: • Practical Applications: In recent years, OCR (Optical Character Recognition) technology has been applied throughout the entire spectrum of industries, revolutionizing the document management process. OCR has enabled scanned documents to become more than just image files, turning into fully searchable documents with text content that is recognized by computers. With the help of OCR, people no longer need to manually retype important documents when entering them into electronic databases. Instead, OCR extracts relevant information and enters it automatically. The result is accurate, efficient information processing in less time. • Banking: The uses of OCR vary across different fields. One widely known application is in banking, where OCR is used to process checks without human involvement. A check can be inserted into a machine, the writing on it is scanned instantly, and the correct amount of money is transferred. This technology has nearly been perfected for printed checks, and is fairly accurate for handwritten checks as well, though it occasionally requires manual confirmation. Overall, this reduces wait times in many banks. • Legal: In the legal industry, there has also been a significant movement to digitize paper documents. In order to save space and eliminate the need to sift through boxes of paper files, documents are being scanned and entered into computer databases. OCR further simplifies the process by making documents text-searchable, so that they are easier to locate and work with once in the database. Legal professionals now have fast, easy access to a huge library of documents in electronic format, which they can find simply by typing in a few keywords. • Healthcare: Healthcare has also seen an increase in the use of OCR technology to process paperwork. Healthcare professionals always have to deal with large volumes of forms for each patient, including insurance forms as well as general health forms. To keep up with all of this information, it is useful to input relevant data into an electronic database that can be PES’s Modern College of Engineering, Shivajinagar, Pune-5 Page 5
  • 6. Multilingual OCR Introduction accessed as necessary. Form processing tools, powered by OCR, are able to extract information from forms and put it into databases, so that every patient's data is promptly recorded. As a result, healthcare providers can focus on delivering the best possible service to every patient. • OCR in Other Industries: OCR is widely used in many other fields, including education, finance, and government agencies. OCR has made countless texts available online, saving money for students and allowing knowledge to be shared. Invoice imaging applications are used in many businesses to keep track of financial records and prevent a backlog of payments from piling up. In government agencies and independent organizations, OCR simplifies data collection and analysis, among other processes. As the technology continues to develop, more and more applications are found for OCR technology, including increased use of handwriting recognition. Furthermore, other technologies related to OCR, such as barcode recognition, are used daily in retail and other industries. To learn more about OCR solutions for your office, you can download a free trial of Maestro Recognition Server, CVISION's OCR toolkit, or Trapeze, our automated form-processing solution. PES’s Modern College of Engineering, Shivajinagar, Pune-5 Page 6
  • 7. Multilingual OCR Introduction 1.6 Literature Survey: Now a days, there are software’s for recognizing only the English characters. It recognizes and stores the characters in ASCII format. Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of images of handwritten, typewritten or printed text (usually captured by a scanner) into machine-editable text. OCR is a field of research in pattern recognition, artificial intelligence and machine vision. Though academic research in the field continues, the focus on OCR has shifted to implementation of proven techniques. Optical character recognition (using optical techniques such as mirrors and lenses) and digital character recognition (using scanners and computer algorithms) were originally considered separate fields. Because very few applications survive that use true optical techniques, the OCR term has now been broadened to include digital image processing as well. Early systems required training (the provision of known samples of each character) to read a specific font. "Intelligent" systems with a high degree of recognition accuracy for most fonts are now common. Some systems are even capable of reproducing formatted output that closely approximates the original scanned page including images, columns and other non- textual components. In about 1965, Reader's Digest and RCA collaborated to build an OCR Document reader designed to digitize the serial numbers on Reader's Digest coupons returned from advertisements. The fonts used on the documents were printed by an RCA Drum printer using the OCR-A font. The reader was connected directly to an RCA 301 computer (one of the first solid state computers). This reader was followed by a specialised document reader installed at TWA where the reader processed Airline Ticket stock. The readers processed documents at a rate of 1,500 documents per minute, and checked each document, rejecting those it was not able to process correctly. The product became part of the RCA product line as a reader designed to process "Turn around Documents" such as those utility and insurance bills returned with payments. The United States Postal Service has been using OCR machines to sort mail since 1965 based on technology devised primarily by the prolific inventor Jacob Rabinow. The first PES’s Modern College of Engineering, Shivajinagar, Pune-5 Page 7
  • 8. Multilingual OCR Introduction use of OCR in Europe was by the British General Post Office (GPO). In 1965 it began planning an entire banking system, the National Giro, using OCR technology, a process that revolutionized bill payment systems in the UK. Canada Post has been using OCR systems since 1971. In 1974 Ray Kurzweil started the company Kurzweil Computer Products, Inc. and led development of the first omni-font optical character recognition system — a computer program capable of recognizing text printed in any normal font. He decided that the best application of this technology would be to create a reading machine for the blind, which would allow blind people to have a computer read text to them out loud. This device required the invention of two enabling technologies — the CCD flatbed scanner and the text-to-speech synthesizer. In 1978 Kurzweil Computer Products began selling a commercial version of the optical character recognition computer program. LexisNexis was one of the first customers, and bought the program to upload paper legal and news documents onto its nascent online databases. 1992-1996 Commissioned by the U.S. Department of Energy (DOE), Information Science Research Institute (ISRI) conducted the most authoritative of the Annual Test of OCR Accuracy for 5 consecutive years in the mid-90s. Information Science Research Institute (ISRI) is a research and development unit of University of Nevada, Las Vegas. ISRI was established in 1990 with funding from the U.S. Department of Energy. Its mission is to foster the improvement of automated technologies for understanding machine printed documents. One study based on recognition of 19th and early 20th century newspaper pages concluded that character-by-character OCR accuracy for commercial OCR software varied from 71% to 98%; total accuracy can only be achieved by human review. Other areas— including recognition of hand printing, cursive handwriting, and printed text in other scripts (especially those East Asian language characters which have many strokes for a single character)—are still the subject of active research. PES’s Modern College of Engineering, Shivajinagar, Pune-5 Page 8
  • 9. Multilingual OCR Introduction 3.5 User Characteristics: • User should be provided proper training to operate whole system • User must have the basic knowledge of computers. • User must know the handling of different instruments e.g. scanner, mouse etc. 3.6 Specific Requirement: 3.6.1 User Interfaces The user will interact with system • Depending on type of user required output will be generated • By writing directly on the text area provided on the GUI. • By first writing in an image file and then giving as input to the system. • The user will be asked to save the text generated in a .TXT file. 3.6.2 Hardware Requirements • Intel Pentium 2 Processor • CPU minimum 500MHZ • Minimum 64 MB of RAM • Mouse • Keyboard • Scanner • Monitor PES’s Modern College of Engineering, Shivajinagar, Pune-5 Page 9
  • 10. Multilingual OCR Introduction 3.6.3 Software Requirements • Microsoft Windows 98/NT/XP/2000 • MINIMUM JDK 1.4 • JAVA 2D API • JAVA Advanced Imaging API • JAVA Image I/O API • JAVA Media Frameworks 3.6.4 Performance Requirements: • Accuracy: The extent to which a program satisfies its specification and fulfils the customer mission objective. • Reliability: The extent to which a program can be expected to perform its intended function with require precision. • Speed: The time require for a program to perform the given task. • Maintainability: The efforts required to locate and fix an error in the program. • Portability: The efforts required to transform a program from one hardware and/or software system environment to another. • Availability: The system is expected to be available around the clock as it will be further used to analyze blood slides at the installed site. 3.6.5 Functional Requirements: 1. For static OCR, software should provide a way to load scanned document for recognition purpose. 2. If scanned image is not having black background and white foreground, facility for image inversion should be provided by software. 3. Software should process the image and extract characters. 4. User should have facility to save extracted data in format of his interest. 5. For dynamic OCR, the software should recognize characters drawn by user simultaneously. PES’s Modern College of Engineering, Shivajinagar, Pune-5 Page 10
  • 11. Multilingual OCR Introduction 6. If software is not giving proper output, there should be a way for training the database of software. 3.6.6 Other Requirements: • The input image is to be in the bitmap file format • In case of scanned image, a high quality scanner as well as good paper quality is required. The resolution of the scanner should be set to a minimum of 300 dots per inch (dpi). • During scanning a maximum tilt of up to 20º can be corrected. • In case of discontinuities in the hand written characters a maximum gap of up to 3 pixel wide thickness is tolerable. • A first order median filter is used. 3.7 Position Statement: Optical Handwriting recognition is used most often used to describe the ability of a computer to translate human writing into text. This system can be used for: -  Railway Reservation Forms  Libraries  Government Agencies  School/College Admission Forms  Make other Lengthy Documents available Electronically PES’s Modern College of Engineering, Shivajinagar, Pune-5 Page 11