Learn about the basic decisions required for business document scanning. Indexing, file formats, document resolution, color space, and more. Learn about estimating volumes and automated capture technology such as barcode recogonition, OCR, batch document processing and more.
5. Lessons:
Lesson 1: Simplex or Duplex
Lesson 2: Resolution
Lesson 3: Color Depth
Lesson 4: File Formats
Lesson 5: Indexing
Lesson 6: Document Prep and Estimating Volumes
Homework: Learn More About Data Capture and Document Management
6. Lesson 1: Simplex or Duplex
Are the documents single or double-sided?
This may seem obvious but…
7. You many not want documents such as
purchase invoices scanned in duplex where
the back of the document only contains terms
and conditions.
On the other hand, if the documents have
high legal importance you may want every
conceivable item of information captured
such as small signatures or notes on the back.
12. Resolution is expressed as the number of dots
per inch (dpi) or less frequently pixels. Pixel
refers to “picture element” per inch (ppi) which
make up the image or really at what the image
was sampled.
What is Resolution?
14. Implications of Resolution
• If we halved the size of the grid horizontally and
vertically (doubled the resolution), the pixels would
appear smoother and produce a better quality image,
the inverse would be true if we doubled the size of the
squares.
• If we kept the squares the same size but reduced the
size of the characters significantly the resolution is
insufficient.
15. Implications of Resolution
• The higher the resolution, the better the image
quality.
• For small characters, increase the resolution to
capture them effectively
So:
16. And, the higher the resolution,
the slower the scan and the
larger the file.
17. And, the higher the resolution,
the slower the scan and the
larger the file.
Which means higher scanning
and file storage costs, Einstein.
18. Typical Scanning Resolutions
• Web graphic – 96 dpi
• Standard archive document – 200 dpi
• Document required for optical character
recognition (OCR) – 300 dpi
• Plans/drawings for vectorization – 400 dpi
• Documents required for historical archiving –
600 dpi
Resolution is generally determined by intended
use.
20. Documents scanned in black and white are
always scanned as grayscale within the
scanner. The scanner then applies a process
known as thresholding to the image to produce
the black and white image.
Thresholding simply determines when a pixel
should be black or white.
Understanding Black and White
21. Grayscale is used when the image contains
color or grayscale data and the tone of the
image needs to be retained, i.e. photographs or
shaded graphics.
Understanding Grayscale
22. Color is obviously used when the image
contains color data. Some users wish to retain
important color information for example, land
boundaries or graphical data, and not
letterhead logos, highlighters, etc.
Understanding Color
24. Bits per
pixel
File Storage Requirements
24 8 1
So the storage requirements for a grayscale image is 8
times larger than a black and white, and color
requirements are 24 times more than black and white.
And, remember Einstein, larger files equals higher costs.
25. Lesson 4: File Formats
TIFF
JPEG
PDF
For an in-depth look visit: PDF v. TIFF
26. • Well established format
• Most often used for black and white documents
• Supports multiple pages
• Interpreted correctly by most applications with a
caution on certain color implementations
• “Group 4” format refers to the compression method
used on black and white images which is a “lossless”
compression where original data is not lost in
compression/decompression.
Understanding TIFF*
TIFF
*Tagged Image File Format
27. • Well established format by Adobe
• Supports color, grayscale, and black and white
• Supports multiple pages
• Generally stored using Group 4 and JPEG
compression although supports other formats too.
• Used when more advanced features are needed
within the file such as embedded Optical Character
Recognition (OCR), hyperlinking, digital signing
and other security features.
Understanding PDF*
PDF
*Portable Document Format
28. Searchable PDF:
Understanding PDF Variations
PDF
Many scanning applications can create searchable
PDF files. Here, the scanner applies OCR technology
to make the file text searchable. Your application
may label this as “make searchable”, “apply OCR”,
“text-under-image” or “searchable PDF.” If selected,
your file will be text searchable or text selectable
within the Acrobat viewer and many other programs
that search PDF files
29. PDF/A:
Understanding PDF Variations
PDF
PDF/A is an ISO-standard for digital preservation or
archiving of electronic documents.
It differs from standard PDF by omitting features not
necessary for long-term archiving, such as font
linking.
Growing in international government and industry
segments, including legal systems, libraries,
newspapers, and regulated industries.
30. Understanding JPEG
JPEG
*Joint Photographic Expert Group
• Well established format
• Most often used for photographs and graphics
• Supports single page only
• A “lossy” compression format, that is, some of the
data is lost during compression. however it provides
good compression ratios for grayscale and color
images.
32. Compression and File Size
*Comparison courtesy of Wikipedia
OMG,
right?
The bottom line: experiment with your
images and file size. A middle quality
scan may meet your needs and save
tremendous file space.
34. What is Indexing?
Document indexing (sometimes referred to as
metadata) enables a users to quickly and
efficiently locate their documents, either
through a folder structure, database or
electronic document management system.
36. Avoid a disaster
Great care should be taken to design an efficient indexing
scheme. If the design is not devised correctly at the outset,
trying to rectify it later can be both difficult and costly.
Sometimes it makes sense to replicate the current manual
method for document location to create a familiar, but faster
system.
37. Don’t worry, there is automation
Technologies such as
• Barcode recognition
• OCR
• Batch processing
• Data Mining, Text Mining
can save time and money by automating indexing and
more.
38. Using Barcodes for Indexing
Intelligent data
capture software
can extract data
from barcodes to
create and send
index information
to a document
management
system.
For an in-depth look at barcodex in data capture
visit: What Can Barcodes Do For Me?
39. With OCR, make your image-based file fully
text searchable or extract data from a zone for
indexing.
40. Using OCR for Indexing
With zonal OCR, document
areas are identified for
automatic OCR capture.
Additionally, drag-and-drop
OCR allows an operator to
highlight document text
which is automatically OCR'd
and dropped into index
fields.
41. TIPS for OCR
• Scan at 300 dpi for greater accuracy
and ensure that small text is captured.
• Limit the use of color on documents.
• Pre-process the image with image
enhancement software (available in
many data capture products, learn
more).
42. Intelligent data capture solutions often use batch processing that
lets you process a whole folder of documents at a time. Some
products can “watch folders,” and process files as they are
scanned into the folder.
What is Batch Processing?
For an in-depth look visit: What is Batch Document Processing?
43. Intelligent data capture solutions often use batch processing that
lets you process a whole folder of documents at a time. Some
products can “watch folders,” and process files as they are
scanned into the folder.
What is Batch Processing?
Processing can include indexing, file routing, file splitting,
and cleaning/enhancing the scans. Learn more.
45. Preparation, quality control and indexing are the
most time consuming elements of any scanning
job and usually the most costly.
46. TIPS for OCR
Typically a good operator can prepare 750-1000
documents per hour, however a number of
factors may drop throughput to 300 or 500.
47. Odd Size Document Type
sales receipts, photos,
plans/drawings,
Bindings
three ring, spiral, glue,
folder
Fasteners
staples, paper clips binder
clips, rubber bands
Attachments
Post-its, tabs
Factors that Influence Document Prep
48. Estimating Volumes and
Storage
Type
Paper
Folders Ring Binder
Lever arch
folder
Transfer
Cases
Bankers
Boxes Archive Boxes
Filing
Cabinets
Simplex
(avg #s)
30 to 100 200 500 500 500 2500 3000/drawer
Duplex
(avg #s)
60 to 200 400 1000 1000 1000 5000 6000/drawer
Learn more about estimating volumes
50. Document Management
Determine if you require a full document
management system or do you just need a
simple search and retrieval system?
Can I use it as a stepping stone while I
evaluate my document management
system?
52. Call us for information on:
How to digitize medical or dental records.
The best way to scan medical or dental records.
Scanning paper records.
Document scanning for medical or dental records.
Going paperless at the medical or dental office.
How to capture medical or dental records efficiently.
Scanning medical or dental records with Fujitsu ScanSnap.
Touchscreen scanning of medical or dental records.
How to improve your medical or dental workflow with document scanning.
Scanning to EMR or scanning to EDR
How to maximize your Fujitsu ScanSnap
Using your ScanSnap for a basic document management system
Using barcodes and the Fujitsu ScanSnap
Scanning with the Fujitsu ScanSnap
Automating workflow with the Fujitsu ScanSnap
Automating document management capture
Scanning into Dentrix
Indexing into Dentrix
Understanding basic Document Scanning
Things your teacher never told you about Document Scanning
An introduction to Document Scanning
Scanning Fundamentals for the average Joe
By DocuFi
Makers of ImageRamp Data Capture Solutions
30 years’ Experience in the Document Imaging
Market
Proven Fujitsu ISV Partner
Find out more at ImageRamp and
www.docufi.com
53. Image Credits
• Pjohnkeane, Requirements, requirements, requirements, http://bit.ly/1fcULDf
• Doug Waldron, “Files (85)”, http://bit.ly/1bfciII
• UBC Learning Commons, “Scanner_icon-1024x671”, http://bit.ly/1eewI4P
• Knile Lucy, you have some sorting to do! http://bit.ly/19bSgjF
• Michael 1952, SJSA Fifth Grade - I Fell in Love With The Teacher, http://bit.ly/1eevu9A
• Ton Haex, ”Einstein show.... “, http://bit.ly/LVqeBi
• Loco Steve, “Sunrise under scrutiny”, http://bit.ly/1eevSVv
• Tax Credits, “ Coins”, http://bit.ly/1mtQj5j
• j_baer, ”Ubuntu Color Wheel”, http://bit.ly/1jARikx
• Marcin Wichary, Alphabetical, http://bit.ly/1aILOku
• David Erickson e-strategyblog.com, “Hindenburg Disaster”, http://bit.ly/1jASeFF
• William Warby wwarby,” Gears”, http://bit.ly/1dwtU1S
• Alan Cleaver,” watching”, http://bit.ly/1h1k9k7
• Zoetnet, “overflowing,” http://bit.ly/KHW9Em
• Seattle Municipal Archives, “Comptroller's Office employees, 1960”, http://bit.ly/1eBvLGE
• Seattle Municipal Archives , “City Light worker with office machine, 1954”,
http://bit.ly/1eBw3NM
• Patrick Hoesly, “Thank you” http://bit.ly/17xKErE
All images are owned or licensed by DocuFi with acknowledgement given to: