SlideShare a Scribd company logo
1 of 29
Improving OCR Accuracy
Clean UpandEnhance Scanned Images
Cleaner Image = More Accurate OCR
Your acceptable level of OCR
accuracy may depend on your
application
Healthcare and Legal applications have high
OCR accuracy requirements.
Pre-
Scanning
During
Scanning
Optimizing for the highest OCR
accuracy generally is divided into two
phases.
Form Design
• adequate white
space
• limited lines
Font Selection
• monospace like
Courier or san serif
fonts like Helvetica
• at least 10-13
points
Color Selection
• limited use of color
Set pre-processing standards and
procedures
During scanning…
Scan at
at least 300 dpi
and CLEAN.
Most capture applications
include basic cleaning features.
Go beyond the basics with DocuFi’s
Adaptive thresholding assists in cleaning “dirty” documents or documents with
a colored background which interferes with the foreground data.
Adaptive Thresholding
Adaptive thresholding assists in cleaning “dirty” documents or documents with
a colored background which interferes with the foreground data.
Adaptive Thresholding
Most scanner and capture software can apply basic thresholding technology.
Adaptive Thresholding
ImageRamp uses Adaptive Thresholding with advanced algorithms and
Sensitivity settings allowing you to optimize the thresholding for your
documents.
This option smoothes the edging of text. Smoothing text fills small pits in the
edges of a character and removes small bumps on the edges. This improves
legibility and reduce storage needs.
Smooth Text
Dither Form Fills
Black and white printed images may use dithering, often called dot shading, to
simulate shades of gray by varying the patterns of dots. The Dither Form Fills
feature removes areas of dot shading from an image. This function is used to
make a black and white TIFF image appear as black and white and not a
grayscale image.
This searches and resizes the document based on the outermost located raster
data or pixels.
Reset Margins
Using detected text as the basis for alignment, this tool is designed to work with
scanned office documents and eliminate rescans.
Deskew or
Straighten Page
This selection detects and removes lines which may interfere with OCR
interpretation.
Remove Lines
Whether your scanned image is contaminated or a bad original, this option
removes extraneous black specks and fills in white holes on black areas of an
image.
Remove Noise or
Despeckle
Auto Rotate automatically evaluates orientation based on the text and rotates
misoriented pages. Optionally, select a degree of rotation for ImageRamp to
rotate all pages based on the selection.
Auto Rotate and
Rotate Pages
This can be used to eliminate unnecessary blank pages in a document and make
the file size smaller. Blank page detection can also play a role in file splitting.
Many users divide documents in a scanning stack with blank pages and
ImageRamp can be set to split the stack of documents into multiple files when
blanks are detected.
Remove Blank
Pages
Besides cleaning and enhancing
the image, ImageRamp has other
ways to improve OCR accuracy.
OCR with validation during processing is a very powerful way to
eliminate entries not meeting a specific format rule.
For instance if an inventory item should contain three alpha
characters followed by five numbers, all documents with item
numbers that are not identified in the OCR process with that pattern
may be tagged for manual inspection before further processing is
done.
Field Validation Improves Accuracy.
PEN21096
CAP36581
INV98453
PA568793
ImageRamp offers
significant preview and
testing options to fine-
tune settings.
Additionally
ImageRamp offers PDF
or TIFF output which
may differ in OCR
accuracy.
Set Pre-
Processing
Standards
OCR
Accuracy
Scan at
300+ dpi
Capture with
Clean-up
Wrap up: Ways to
Improve OCR
3
Pre-Processing Standards
Encourage accuracy by setting document procedures
and guidelines to:
Good pre-processing can be as important as the scanning technologies.
• Use adequate white space
• Limit lines and gridlines
• Limit the use of color
• Use OCR friendly fonts and sizes
Use an Intelligent Capture
Solution such as ImageRamp
Learn More about Document Imaging and Capture
For more on:
• Clean scans,
• Ways to improve OCR
scanning,
• Cleaning documents for
scanning,
• Enhancing your images for
improved OCR,
• Watching folders,
• Batch Processing,
• Bulk scanning,
• Split files with barcodes,
• Barcode splitting,
• Docufi,
• Imageramp,
• Watch folders,
• Data capture,
• Intelligent Data Capture
Contact Us
DocuFi
30 years’ experience in the Document Imaging market.
Capture Products www.docufi.com
ImageRamp Cleanup and Enhance for OCR
Copyright ©2014
makers of ImageRamp,
Document Management
Capture Solution
Image Credits
• Tim Evanson, “Albert V Bryan Federal District
Courthouse - Alexandria Va - 0014 - 2012-03-10”,
http://bit.ly/1iGIBpF
• takacsi75, “Medicine 02”, http://bit.ly/1dtsIxK
• ToastyKen,”New Mophead”, http://bit.ly/1ijjkkD
• mjtmail (tiggy), “Day 307”, http://bit.ly/1g4G3Bw

More Related Content

What's hot

AutoEncoderで特徴抽出
AutoEncoderで特徴抽出AutoEncoderで特徴抽出
AutoEncoderで特徴抽出Kai Sasaki
 
Optical character recognition IEEE Paper Study
Optical character recognition IEEE Paper StudyOptical character recognition IEEE Paper Study
Optical character recognition IEEE Paper StudyEr. Ashish Pandey
 
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [link-us(링커즈)] : 링키드를 위한 비즈니스 대시보드 제작
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [link-us(링커즈)] : 링키드를 위한 비즈니스 대시보드 제작제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [link-us(링커즈)] : 링키드를 위한 비즈니스 대시보드 제작
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [link-us(링커즈)] : 링키드를 위한 비즈니스 대시보드 제작BOAZ Bigdata
 
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15Shuyo Nakatani
 
Kaggle M5 Forecasting (日本語)
Kaggle M5 Forecasting (日本語)Kaggle M5 Forecasting (日本語)
Kaggle M5 Forecasting (日本語)Masakazu Mori
 
제 17회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [힐링세포들] : MHTI (Mental Health Type Indicator)
제 17회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [힐링세포들] : MHTI (Mental Health Type Indicator)제 17회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [힐링세포들] : MHTI (Mental Health Type Indicator)
제 17회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [힐링세포들] : MHTI (Mental Health Type Indicator)BOAZ Bigdata
 
機械学習の力を引き出すための依存性管理
機械学習の力を引き出すための依存性管理機械学習の力を引き出すための依存性管理
機械学習の力を引き出すための依存性管理Takahiro Kubo
 
10分でわかる主成分分析(PCA)
10分でわかる主成分分析(PCA)10分でわかる主成分分析(PCA)
10分でわかる主成分分析(PCA)Takanori Ogata
 
機械学習 / Deep Learning 大全 (1) 機械学習基礎編
機械学習 / Deep Learning 大全 (1) 機械学習基礎編機械学習 / Deep Learning 大全 (1) 機械学習基礎編
機械学習 / Deep Learning 大全 (1) 機械学習基礎編Daiyu Hatakeyama
 
ICCV 2019 論文紹介 (26 papers)
ICCV 2019 論文紹介 (26 papers)ICCV 2019 論文紹介 (26 papers)
ICCV 2019 論文紹介 (26 papers)Hideki Okada
 
物体検出コンペティションOpen Imagesに挑む
物体検出コンペティションOpen Imagesに挑む物体検出コンペティションOpen Imagesに挑む
物体検出コンペティションOpen Imagesに挑むHiroto Honda
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsTakuya Akiba
 
Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Karan Panjwani
 
全体セミナー20170629
全体セミナー20170629全体セミナー20170629
全体セミナー20170629Jiro Nishitoba
 
Handwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHandwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHarshana Madusanka Jayamaha
 
初めてのグラフカット
初めてのグラフカット初めてのグラフカット
初めてのグラフカットTsubasa Hirakawa
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character RecognitionRahul Mallik
 

What's hot (20)

AutoEncoderで特徴抽出
AutoEncoderで特徴抽出AutoEncoderで特徴抽出
AutoEncoderで特徴抽出
 
Optical character recognition IEEE Paper Study
Optical character recognition IEEE Paper StudyOptical character recognition IEEE Paper Study
Optical character recognition IEEE Paper Study
 
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [link-us(링커즈)] : 링키드를 위한 비즈니스 대시보드 제작
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [link-us(링커즈)] : 링키드를 위한 비즈니스 대시보드 제작제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [link-us(링커즈)] : 링키드를 위한 비즈니스 대시보드 제작
제 18회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [link-us(링커즈)] : 링키드를 위한 비즈니스 대시보드 제작
 
OCR Text Extraction
OCR Text ExtractionOCR Text Extraction
OCR Text Extraction
 
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
画像をテキストで検索したい!(OpenAI CLIP) - VRC-LT #15
 
2_1 Edit Distance.pptx
2_1 Edit Distance.pptx2_1 Edit Distance.pptx
2_1 Edit Distance.pptx
 
Kaggle M5 Forecasting (日本語)
Kaggle M5 Forecasting (日本語)Kaggle M5 Forecasting (日本語)
Kaggle M5 Forecasting (日本語)
 
제 17회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [힐링세포들] : MHTI (Mental Health Type Indicator)
제 17회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [힐링세포들] : MHTI (Mental Health Type Indicator)제 17회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [힐링세포들] : MHTI (Mental Health Type Indicator)
제 17회 보아즈(BOAZ) 빅데이터 컨퍼런스 - [힐링세포들] : MHTI (Mental Health Type Indicator)
 
機械学習の力を引き出すための依存性管理
機械学習の力を引き出すための依存性管理機械学習の力を引き出すための依存性管理
機械学習の力を引き出すための依存性管理
 
10分でわかる主成分分析(PCA)
10分でわかる主成分分析(PCA)10分でわかる主成分分析(PCA)
10分でわかる主成分分析(PCA)
 
機械学習 / Deep Learning 大全 (1) 機械学習基礎編
機械学習 / Deep Learning 大全 (1) 機械学習基礎編機械学習 / Deep Learning 大全 (1) 機械学習基礎編
機械学習 / Deep Learning 大全 (1) 機械学習基礎編
 
ICCV 2019 論文紹介 (26 papers)
ICCV 2019 論文紹介 (26 papers)ICCV 2019 論文紹介 (26 papers)
ICCV 2019 論文紹介 (26 papers)
 
物体検出コンペティションOpen Imagesに挑む
物体検出コンペティションOpen Imagesに挑む物体検出コンペティションOpen Imagesに挑む
物体検出コンペティションOpen Imagesに挑む
 
Learning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for GraphsLearning Convolutional Neural Networks for Graphs
Learning Convolutional Neural Networks for Graphs
 
Optical Character Recognition( OCR )
Optical Character Recognition( OCR )Optical Character Recognition( OCR )
Optical Character Recognition( OCR )
 
全体セミナー20170629
全体セミナー20170629全体セミナー20170629
全体セミナー20170629
 
Handwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural networkHandwritten character recognition using artificial neural network
Handwritten character recognition using artificial neural network
 
初めてのグラフカット
初めてのグラフカット初めてのグラフカット
初めてのグラフカット
 
Optical Character Recognition
Optical Character RecognitionOptical Character Recognition
Optical Character Recognition
 
BERT入門
BERT入門BERT入門
BERT入門
 

Viewers also liked

Introducing Compreno - Natural Language Processing Technology
Introducing Compreno - Natural Language Processing TechnologyIntroducing Compreno - Natural Language Processing Technology
Introducing Compreno - Natural Language Processing TechnologyABBYY
 
Performance of Statistics Based Line Segmentation System for Unconstrained H...
Performance of Statistics Based Line Segmentation  System for Unconstrained H...Performance of Statistics Based Line Segmentation  System for Unconstrained H...
Performance of Statistics Based Line Segmentation System for Unconstrained H...AM Publications
 
Document Recognition Market Landscape
Document Recognition Market LandscapeDocument Recognition Market Landscape
Document Recognition Market LandscapeChris Riley ☁
 
IDenTV Capabilities Overview 2017 (with Demos)
IDenTV Capabilities Overview 2017 (with Demos) IDenTV Capabilities Overview 2017 (with Demos)
IDenTV Capabilities Overview 2017 (with Demos) Amro Shihadah
 
ABBYY Technology Summit keynote
ABBYY Technology Summit keynoteABBYY Technology Summit keynote
ABBYY Technology Summit keynoteSandy Kemsley
 
Neural Networks in the Wild: Handwriting Recognition
Neural Networks in the Wild: Handwriting RecognitionNeural Networks in the Wild: Handwriting Recognition
Neural Networks in the Wild: Handwriting RecognitionJohn Liu
 
Project report of OCR Recognition
Project report of OCR RecognitionProject report of OCR Recognition
Project report of OCR RecognitionBharat Kalia
 
ABBYY USA TAWPI presentation
ABBYY USA TAWPI presentationABBYY USA TAWPI presentation
ABBYY USA TAWPI presentationABBYY
 
Intelligent Text Analytics with ABBYY Compreno
Intelligent Text Analytics with ABBYY ComprenoIntelligent Text Analytics with ABBYY Compreno
Intelligent Text Analytics with ABBYY ComprenoABBYY
 
Transform 2014: Introducing Kofax TotalAgility® Cloud
Transform 2014: Introducing Kofax TotalAgility® CloudTransform 2014: Introducing Kofax TotalAgility® Cloud
Transform 2014: Introducing Kofax TotalAgility® CloudKofax
 
[Webinar Slides] How to Increase Your Profits by Improving Your Data Accuracy
[Webinar Slides] How to Increase Your Profits by Improving Your Data Accuracy[Webinar Slides] How to Increase Your Profits by Improving Your Data Accuracy
[Webinar Slides] How to Increase Your Profits by Improving Your Data AccuracyAIIM International
 

Viewers also liked (15)

OCR
OCROCR
OCR
 
Folder Watching For Automated Document Capture, Batch Scanning
Folder Watching For Automated Document Capture, Batch ScanningFolder Watching For Automated Document Capture, Batch Scanning
Folder Watching For Automated Document Capture, Batch Scanning
 
Automated Document Indexing with ImageRamp
Automated Document Indexing with ImageRampAutomated Document Indexing with ImageRamp
Automated Document Indexing with ImageRamp
 
Introducing Compreno - Natural Language Processing Technology
Introducing Compreno - Natural Language Processing TechnologyIntroducing Compreno - Natural Language Processing Technology
Introducing Compreno - Natural Language Processing Technology
 
Performance of Statistics Based Line Segmentation System for Unconstrained H...
Performance of Statistics Based Line Segmentation  System for Unconstrained H...Performance of Statistics Based Line Segmentation  System for Unconstrained H...
Performance of Statistics Based Line Segmentation System for Unconstrained H...
 
Document Recognition Market Landscape
Document Recognition Market LandscapeDocument Recognition Market Landscape
Document Recognition Market Landscape
 
IDenTV Capabilities Overview 2017 (with Demos)
IDenTV Capabilities Overview 2017 (with Demos) IDenTV Capabilities Overview 2017 (with Demos)
IDenTV Capabilities Overview 2017 (with Demos)
 
ABBYY Technology Summit keynote
ABBYY Technology Summit keynoteABBYY Technology Summit keynote
ABBYY Technology Summit keynote
 
Neural Networks in the Wild: Handwriting Recognition
Neural Networks in the Wild: Handwriting RecognitionNeural Networks in the Wild: Handwriting Recognition
Neural Networks in the Wild: Handwriting Recognition
 
Project report of OCR Recognition
Project report of OCR RecognitionProject report of OCR Recognition
Project report of OCR Recognition
 
ABBYY USA TAWPI presentation
ABBYY USA TAWPI presentationABBYY USA TAWPI presentation
ABBYY USA TAWPI presentation
 
Text Detection and Recognition
Text Detection and RecognitionText Detection and Recognition
Text Detection and Recognition
 
Intelligent Text Analytics with ABBYY Compreno
Intelligent Text Analytics with ABBYY ComprenoIntelligent Text Analytics with ABBYY Compreno
Intelligent Text Analytics with ABBYY Compreno
 
Transform 2014: Introducing Kofax TotalAgility® Cloud
Transform 2014: Introducing Kofax TotalAgility® CloudTransform 2014: Introducing Kofax TotalAgility® Cloud
Transform 2014: Introducing Kofax TotalAgility® Cloud
 
[Webinar Slides] How to Increase Your Profits by Improving Your Data Accuracy
[Webinar Slides] How to Increase Your Profits by Improving Your Data Accuracy[Webinar Slides] How to Increase Your Profits by Improving Your Data Accuracy
[Webinar Slides] How to Increase Your Profits by Improving Your Data Accuracy
 

Similar to Improve OCR Accuracy, Clean Up and Enhance Scanned Images

Opticalcharacter recognition
Opticalcharacter recognition Opticalcharacter recognition
Opticalcharacter recognition Shobhit Saxena
 
Oce TDS 450 Euro Ozalit Makinası - GenisFormat.Com
Oce TDS 450 Euro Ozalit Makinası - GenisFormat.ComOce TDS 450 Euro Ozalit Makinası - GenisFormat.Com
Oce TDS 450 Euro Ozalit Makinası - GenisFormat.Comuzburo
 
GIS - Unit 3-1.pptx for geographical information systems
GIS - Unit 3-1.pptx for geographical information systemsGIS - Unit 3-1.pptx for geographical information systems
GIS - Unit 3-1.pptx for geographical information systemsHarshavarthan24
 
Basic Digitization - Scanning Toolkit
Basic Digitization - Scanning ToolkitBasic Digitization - Scanning Toolkit
Basic Digitization - Scanning ToolkitHeirLoom Project
 
Scan!_Brochure
Scan!_BrochureScan!_Brochure
Scan!_BrochureLaura Long
 

Similar to Improve OCR Accuracy, Clean Up and Enhance Scanned Images (20)

An Introduction to Document Scanning, Understanding Your Requirements
An Introduction to Document Scanning, Understanding Your RequirementsAn Introduction to Document Scanning, Understanding Your Requirements
An Introduction to Document Scanning, Understanding Your Requirements
 
05a
05a05a
05a
 
Opticalcharacter recognition
Opticalcharacter recognition Opticalcharacter recognition
Opticalcharacter recognition
 
Worldexpo2007
Worldexpo2007Worldexpo2007
Worldexpo2007
 
Intelligent Data Capture Just Got Better, What's New in ImageRamp 6
Intelligent Data Capture Just Got Better, What's New in ImageRamp 6Intelligent Data Capture Just Got Better, What's New in ImageRamp 6
Intelligent Data Capture Just Got Better, What's New in ImageRamp 6
 
A12REVIEW.pptx
A12REVIEW.pptxA12REVIEW.pptx
A12REVIEW.pptx
 
Fujitsu ScanSnap Scanner, an overview of document data capture with barcodes,...
Fujitsu ScanSnap Scanner, an overview of document data capture with barcodes,...Fujitsu ScanSnap Scanner, an overview of document data capture with barcodes,...
Fujitsu ScanSnap Scanner, an overview of document data capture with barcodes,...
 
Scanners Mary Van Court
Scanners  Mary  Van CourtScanners  Mary  Van Court
Scanners Mary Van Court
 
Scanners Mary Van Court
Scanners Mary Van CourtScanners Mary Van Court
Scanners Mary Van Court
 
Oce TDS 450 Euro Ozalit Makinası - GenisFormat.Com
Oce TDS 450 Euro Ozalit Makinası - GenisFormat.ComOce TDS 450 Euro Ozalit Makinası - GenisFormat.Com
Oce TDS 450 Euro Ozalit Makinası - GenisFormat.Com
 
Gis unit 3
Gis   unit 3Gis   unit 3
Gis unit 3
 
Tips to Solve Common Problems Reading Barcodes
Tips to Solve Common Problems Reading BarcodesTips to Solve Common Problems Reading Barcodes
Tips to Solve Common Problems Reading Barcodes
 
ChronoScan Document Scanning and Capture for Unparralleled Data Extraction an...
ChronoScan Document Scanning and Capture for Unparralleled Data Extraction an...ChronoScan Document Scanning and Capture for Unparralleled Data Extraction an...
ChronoScan Document Scanning and Capture for Unparralleled Data Extraction an...
 
GIS - Unit 3-1.pptx for geographical information systems
GIS - Unit 3-1.pptx for geographical information systemsGIS - Unit 3-1.pptx for geographical information systems
GIS - Unit 3-1.pptx for geographical information systems
 
Basic Digitization - Scanning Toolkit
Basic Digitization - Scanning ToolkitBasic Digitization - Scanning Toolkit
Basic Digitization - Scanning Toolkit
 
50120130406005
5012013040600550120130406005
50120130406005
 
What is Document Indexing? A tutorial for intelligent data capture.
What is Document Indexing? A tutorial for intelligent data capture.What is Document Indexing? A tutorial for intelligent data capture.
What is Document Indexing? A tutorial for intelligent data capture.
 
Scan!_Brochure
Scan!_BrochureScan!_Brochure
Scan!_Brochure
 
Automated Data Capture and Extraction with ChronoScan for Automated Metadata ...
Automated Data Capture and Extraction with ChronoScan for Automated Metadata ...Automated Data Capture and Extraction with ChronoScan for Automated Metadata ...
Automated Data Capture and Extraction with ChronoScan for Automated Metadata ...
 
Digital Historian Series: Using Digital Tools for Archival Research
Digital Historian Series: Using Digital Tools for Archival ResearchDigital Historian Series: Using Digital Tools for Archival Research
Digital Historian Series: Using Digital Tools for Archival Research
 

More from DocuFi, offering HAI and Infection Prevention Analytics

More from DocuFi, offering HAI and Infection Prevention Analytics (12)

HAIvia Mobile for Infection Prevention Data Capture and Forms Management (for...
HAIvia Mobile for Infection Prevention Data Capture and Forms Management (for...HAIvia Mobile for Infection Prevention Data Capture and Forms Management (for...
HAIvia Mobile for Infection Prevention Data Capture and Forms Management (for...
 
Custom Capture Tool Development
Custom Capture Tool DevelopmentCustom Capture Tool Development
Custom Capture Tool Development
 
Batch Document Processing with ImageRamp Batch
Batch Document Processing with ImageRamp BatchBatch Document Processing with ImageRamp Batch
Batch Document Processing with ImageRamp Batch
 
Mobile Cloud Capture: Customize your Data Capture on Mobile Devices with Proc...
Mobile Cloud Capture: Customize your Data Capture on Mobile Devices with Proc...Mobile Cloud Capture: Customize your Data Capture on Mobile Devices with Proc...
Mobile Cloud Capture: Customize your Data Capture on Mobile Devices with Proc...
 
Intelligent Data Extraction, Turning Content into Data, A Look at Advanced Ca...
Intelligent Data Extraction, Turning Content into Data, A Look at Advanced Ca...Intelligent Data Extraction, Turning Content into Data, A Look at Advanced Ca...
Intelligent Data Extraction, Turning Content into Data, A Look at Advanced Ca...
 
Transformation in the Electric Utility Industry, Redevelopment of Decommissio...
Transformation in the Electric Utility Industry, Redevelopment of Decommissio...Transformation in the Electric Utility Industry, Redevelopment of Decommissio...
Transformation in the Electric Utility Industry, Redevelopment of Decommissio...
 
What is Intelligent Document and Data Capture? A look at the technologies to ...
What is Intelligent Document and Data Capture? A look at the technologies to ...What is Intelligent Document and Data Capture? A look at the technologies to ...
What is Intelligent Document and Data Capture? A look at the technologies to ...
 
Automatic file naming and routing for scanned documents and existing files.
Automatic file naming and routing for scanned documents and existing files.  Automatic file naming and routing for scanned documents and existing files.
Automatic file naming and routing for scanned documents and existing files.
 
PDF vs. TIFF, An Evaluation of Document Scanning File Formats
PDF vs. TIFF, An Evaluation of Document Scanning File FormatsPDF vs. TIFF, An Evaluation of Document Scanning File Formats
PDF vs. TIFF, An Evaluation of Document Scanning File Formats
 
What is Batch Document Processing? A tutorial for document capture.
What is Batch Document Processing?  A tutorial for document capture.What is Batch Document Processing?  A tutorial for document capture.
What is Batch Document Processing? A tutorial for document capture.
 
8 Document Capture Must Haves, a Document Management Tutorial
8 Document Capture Must Haves, a Document Management Tutorial8 Document Capture Must Haves, a Document Management Tutorial
8 Document Capture Must Haves, a Document Management Tutorial
 
What can barcodes do for me? A look at barcodes in Document Management/EMR da...
What can barcodes do for me? A look at barcodes in Document Management/EMR da...What can barcodes do for me? A look at barcodes in Document Management/EMR da...
What can barcodes do for me? A look at barcodes in Document Management/EMR da...
 

Recently uploaded

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 

Recently uploaded (20)

WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 

Improve OCR Accuracy, Clean Up and Enhance Scanned Images

  • 1. Improving OCR Accuracy Clean UpandEnhance Scanned Images
  • 2. Cleaner Image = More Accurate OCR
  • 3. Your acceptable level of OCR accuracy may depend on your application
  • 4. Healthcare and Legal applications have high OCR accuracy requirements.
  • 5. Pre- Scanning During Scanning Optimizing for the highest OCR accuracy generally is divided into two phases.
  • 6. Form Design • adequate white space • limited lines Font Selection • monospace like Courier or san serif fonts like Helvetica • at least 10-13 points Color Selection • limited use of color Set pre-processing standards and procedures
  • 7. During scanning… Scan at at least 300 dpi and CLEAN.
  • 8. Most capture applications include basic cleaning features.
  • 9. Go beyond the basics with DocuFi’s
  • 10. Adaptive thresholding assists in cleaning “dirty” documents or documents with a colored background which interferes with the foreground data. Adaptive Thresholding
  • 11. Adaptive thresholding assists in cleaning “dirty” documents or documents with a colored background which interferes with the foreground data. Adaptive Thresholding Most scanner and capture software can apply basic thresholding technology.
  • 12. Adaptive Thresholding ImageRamp uses Adaptive Thresholding with advanced algorithms and Sensitivity settings allowing you to optimize the thresholding for your documents.
  • 13. This option smoothes the edging of text. Smoothing text fills small pits in the edges of a character and removes small bumps on the edges. This improves legibility and reduce storage needs. Smooth Text
  • 14. Dither Form Fills Black and white printed images may use dithering, often called dot shading, to simulate shades of gray by varying the patterns of dots. The Dither Form Fills feature removes areas of dot shading from an image. This function is used to make a black and white TIFF image appear as black and white and not a grayscale image.
  • 15. This searches and resizes the document based on the outermost located raster data or pixels. Reset Margins
  • 16. Using detected text as the basis for alignment, this tool is designed to work with scanned office documents and eliminate rescans. Deskew or Straighten Page
  • 17. This selection detects and removes lines which may interfere with OCR interpretation. Remove Lines
  • 18. Whether your scanned image is contaminated or a bad original, this option removes extraneous black specks and fills in white holes on black areas of an image. Remove Noise or Despeckle
  • 19. Auto Rotate automatically evaluates orientation based on the text and rotates misoriented pages. Optionally, select a degree of rotation for ImageRamp to rotate all pages based on the selection. Auto Rotate and Rotate Pages
  • 20. This can be used to eliminate unnecessary blank pages in a document and make the file size smaller. Blank page detection can also play a role in file splitting. Many users divide documents in a scanning stack with blank pages and ImageRamp can be set to split the stack of documents into multiple files when blanks are detected. Remove Blank Pages
  • 21. Besides cleaning and enhancing the image, ImageRamp has other ways to improve OCR accuracy.
  • 22. OCR with validation during processing is a very powerful way to eliminate entries not meeting a specific format rule. For instance if an inventory item should contain three alpha characters followed by five numbers, all documents with item numbers that are not identified in the OCR process with that pattern may be tagged for manual inspection before further processing is done. Field Validation Improves Accuracy. PEN21096 CAP36581 INV98453 PA568793
  • 23. ImageRamp offers significant preview and testing options to fine- tune settings. Additionally ImageRamp offers PDF or TIFF output which may differ in OCR accuracy.
  • 24. Set Pre- Processing Standards OCR Accuracy Scan at 300+ dpi Capture with Clean-up Wrap up: Ways to Improve OCR 3
  • 25. Pre-Processing Standards Encourage accuracy by setting document procedures and guidelines to: Good pre-processing can be as important as the scanning technologies. • Use adequate white space • Limit lines and gridlines • Limit the use of color • Use OCR friendly fonts and sizes
  • 26. Use an Intelligent Capture Solution such as ImageRamp
  • 27. Learn More about Document Imaging and Capture
  • 28. For more on: • Clean scans, • Ways to improve OCR scanning, • Cleaning documents for scanning, • Enhancing your images for improved OCR, • Watching folders, • Batch Processing, • Bulk scanning, • Split files with barcodes, • Barcode splitting, • Docufi, • Imageramp, • Watch folders, • Data capture, • Intelligent Data Capture Contact Us DocuFi 30 years’ experience in the Document Imaging market. Capture Products www.docufi.com ImageRamp Cleanup and Enhance for OCR Copyright ©2014 makers of ImageRamp, Document Management Capture Solution
  • 29. Image Credits • Tim Evanson, “Albert V Bryan Federal District Courthouse - Alexandria Va - 0014 - 2012-03-10”, http://bit.ly/1iGIBpF • takacsi75, “Medicine 02”, http://bit.ly/1dtsIxK • ToastyKen,”New Mophead”, http://bit.ly/1ijjkkD • mjtmail (tiggy), “Day 307”, http://bit.ly/1g4G3Bw