Z Parts Home PageClassified AdsPhoto AdsSpecial Z Parts ListsParts Interchange Manuals / FicheZP Tech Tips and InformationZ Car Photo GalleryLocation Map and Address

Scanning Magazine, Book and Newspaper Articles for Web Publication and Historical Archives

By Eric Neyerlin of


Scanning Guideline Basics Scanning and article conversion FAQ.
Scanning body text sections for OCR
Scanning Photos for Web Display
Scanning photos for printer output Index to Sample Screen Shots (not online yet)

INTRODUCTION

 
From time to time we all run across treasured magazine articles, many long out of print, that have great historical significance to a community of fellow enthusiasts, that we would like to share with others online.  Unfortunately, scans of entire 8"x10” magazine pages, with images included, turn out to be too large in file size to publish on the internet or even send via e-mail as attachments.
The good news is that by using OCR (optical character recognition) technology, the huge file size of scanned text can be reduced down to a mere fraction in size by converting the scanned image of the text page into editable text such as a word processor can open and print. The following article offers practical advice to those who wish to convert book, magazine and newspaper text articles into editable text conversions and compact graphics suitable for publication on the web or printer output.
 
When scanning any printed material for OCR processing, with mixed text, photo and/or illustrations, for eventual republication in HTML format, 2 separate scanning stages must be performed for optimum results. The aim of this article is to explain the basics of these 2 separate steps.
 
Step One: All text pages to be OCR'ed must be scanned and saved using different scanner control settings than what will be used for the photos and continuous tone graphics on the same pages.
 
Second, all photos and continuous tone graphics in a paper based publication must be scanned using a different set of scanner settings, plus descreening features (when available), in order for them to be reproduced small enough for a browser to view on the web or to be sent as e-mail attachments over relatively slow modem connections.

Scanning Guideline Basics for all scans:

 
  1. Place top or bottom edge of magazine article aligned as square as possible to the top edge of the flatbed scanner’s glass area.  If page is cocked more than 3% out of square alignment, text may not OCR accurately and graphics may become degraded if later rotated to corrected skewed angles.
 
  1. If available, it is wise to place a black sheet of art paper behind the text and photo page being scanned to avoid bleed through of text and images from  the opposite side of thin magazine paper, If this is not done, a ghost like bleed through image of text and graphics on the backside of the page will most likely appear in the scanned image, thus degrading the usefulness of that image for both OCR and visual sharing with others.
 
  1. To reduce the scanner image's final output file size, the scanner's cropping tool can be used to crop out excess page margins or non-article related sections such as advertisements on the page.  
  2. Do not digitally "sharpen" image during scanning or afterwards if scanning for OCR purposes or if you intend to edit an image in a image editing program such as Photoshop. Sharpening can permanently corrupt the image file for use afterwards. Sharpening should only be as the last step before publishing for online viewing.
 

Scanning body text sections for OCR:

 
All areas of the page that have blocks of body text (not headlines or fancy type), tables of data and black & white line illustrations, need to be scanned at the following settings in order for the OCR program to open and convert them to editable text.
 
Scanner control software settings for OCR conversions:

Scanning Photos for Web Display:

 
Because graphics for the web do not display at any higher resolution than 72 dpi on a common computer monitor, color, black and white photos and continuous tone illustrations do not need to be scanned at any resolution finer than 72 dpi.  However, some web designers and graphic prep specialists prefer to scan at twice that resolution (150 dpi) and then scale down to 72 dpi just before saving them for final display web graphics because they feel that the final image will look cleaner. 
 
Due to 56k modem bandwidth limitations of sending such scans quickly over the internet as e-mail attachments, most modem users would be wise to scan at 72 dpi and save in either .gif or .jpg format to keep file size to a minimum.  If both sender and receiver have high bandwidth internet connections with adequate mailbox sizes, then scanning at 72-150 dpi and saving the scans int., .eps or .psd graphic formats will result in a superior quality web display graphic.
 
 

Scanner control software settings for Photos & continuous tone graphics

 
CAUTION: DO NOT DIGITALLY "SHARPEN" SCANNED IMAGES IF INTENDED FOR OCR PROCESSING OR FURTHER DIGITAL EDITING. YOU COULD CORRUPT THE IMAGE PERMANENTLY, YOU COULD RENDER IT NON-EDITABLE AFTERWARDS. ONLY SHARPEN AN IMAGE JUST BEFORE YOU INTEND TO PUBLISH IT ONLINE. WHEN SENDING YOUR IMAGES TO A WEB DESIGNER OR PUBLISHER, NEVER SHARPEN YOUR IMAGES. LET THEM DO THAT WHEN THEY THINK BEST.

Scanning and article conversion FAQ.


 
What is descreening? Unlike photographs printed by your local photo finisher, photos and continuous tone images printed by a printing press such those in newspapers, magazines and books, are actually made of thousands of very small ink dots referred to as "halftone dots". Because a digital image, such as what a scanner generates, is also made of small dots of it's own, a strange visual phenomenon can result to create a juxtaposition of these 2 dot patterns creating an optical illusion effect called moiré.  Descreen technology attempts to digitally rearrange the relative proximity of these 2 dot patterns in such a way as to cause the moiré effect to disappear or decrease from the final scanned image.
Halftone dot and moiré elimination features included in the scanner software control programs bundled with most popular scanners are well worth investigating in order to avoid ugly moiré and hatch patterns being overlaid on top of attractive graphics as a result of scanning without descreen technology applied.
 
When a scanner's desreening features work well and the moiré pattern overlay is removed, the visual quality of the final scan will appear to match that of the original printed image.  If your first attempt to use your scanner's descreen feature does not correct the problem sufficiently, try rotating the paper image slightly around an imaginary center axis on the scanner's glass surface and scan again.  Sometimes the the descreen feature will be more effective after these slight rotations.
 
Descreen features that come bundled with quality scanners often have multiple configuration settings.  Reviewing your scanner software manual may reveal that you have these very powerful extra features that may provide you with more descreening control than you first suspected.  Be sure to read your manual.

Alternatives to De-screening

My friends, Tom and Greg, have also reminded me of an alternative method that I had forgotten to mention. By scanning at higher resolutions, usually around 300 dpi, and then reducing the image file's resolution down to 72 dpi using an image editing program, such as Photoshop or Paint Shop Pro, a person can often create scans of half-toned images that will display with little or no moire or hatched patterns visible across the 72 dpi image afterwards. Some experimentation may be required for best results, but it can be worth the effort.

Another alternative is to use special filters found in some image editing programs that attempt to detect and correct moire and hatch patterns in a scanned image. I have found the results to be somewhat hit and miss.
 

How much to scan at one time? 

 
 
Scanning article photos for digital printing instead of web display.  For best results scan all graphics for printer output at 300 dpi or higher or images may reproduce as jagged or fuzzy looking rather than sharp.  For best printing quality, scanned image files should be saved in .tif, eps or .psd format rather than .gif or .jpg.
 
 

©2001 Eric Neyerlin - all rights reserved .

No part of this article may be displayed or reproduced in any form of written, digital or online media nor sold in any form without written permission of the author. This restriction also applies to any website attempting to display this page within a frame of another website not authorized by the author to do so.
http://www.zparts.com
eric@zparts.com
Last Updated January 30, 2004