Document Number 2204
Optical Character Recognition Technology Information
May 1993

In the Beginning . . .

Although optical character recognition (OCR)  has only recently been 
popularized, OCR, or at least the concept of OCR, has existed since the 
beginning of the nineteenth century.  In 1809, the first patents for reading 
devices to aid the blind were awarded.  These inventions were the first real 
"seeds" of OCR's development.  

The next 100 years saw numerous advances in optical scanning. One important 
invention was the "retina scanner" that used a mosaic of photocells in an 
image transmission system.  Another important milestone in the evolution of 
OCR was the invention of the "Nipkow Disk," a sequential scanning disk which 
made possible the technique of line-by-line analysis of images, as well as 
other future innovations. For example, the principle of Nipkow's sequential 
scanning process was used in the operation of modern television cameras and as 
a technology incorporated in many current OCR systems.  

Shortly before World War I, the first true "readers," or machines that were 
able to convert printed characters into another form, were made commercially 
available.   In 1912, Emmanuel Goldberg patented a machine which directly read
characters and converted those symbols into standard telegraph code.  
Goldberg's machine read typed messages, converted  them to paper tape, and 
then used the tape to transmit telegraphic messages over wires without human 
intervention.  His invention demonstrated a practical application of OCR.  
During the same time, but independently, Fournier D'Albe invented an OCR 
device called the "Optophone."  The Optophone was a hand-held scanner that 
optically scanned printed material and produced series of audible tones while 
being moved along a page.  Each tone corresponded to a specific letter or 
character which allowed a visually impaired person to interpret written 
material.  In the late 1920's, AT&T patented systems which scanned messages 
and encoded them into "Morse Code" for telegraphic transmission.   

Emmanuel Goldberg was responsible for yet another significant development in 
1931, when he patented a device that searched photographic transparencies of 
data records and attempted to match them against a template of the desired 
search pattern. The hypothesis behind this system was that once a match was
located, the coincidence of pattern would cause a light source to be 
completely blocked from a detection device, more specifically, a photographic 
cell.  This concept was the beginning concept of "template matching."  This 
technique was actually applied in the first actual working character readers
which appeared in the 1950's.  

In the mid 1940's, the birth of the electronic data processing industry 
created the need for a productive method of data entry.  Although IBM entered 
the optical scanning field in 1938, and was awarded various OCR-related 
patents, including one for a "Light Sensitive Device," the computer pioneer 
made no attempts to market commercial OCR devices until after 1960.

A nonscientific article in the mid 1950's introduced to the public the first 
potential commercial marketplace for OCR technology and equipment -- an 
invention named "Gismo."  Developed by a Department of Defense engineer, Gismo 
was capable of reading of reading 23 letters of the alphabet, which had been
produced by a standard typewriter.  Gismo could also understand Morse Code, 
read musical notations, and even read aloud from printed pages.  The inventor 
was quoted as saying that once "Gismo" got into production, the machine would 
have about 99.9-percent reading accuracy and would sell for approximately 
$1000.00.  Of course, this was all theory in 1950.  But it generated 
substantial enthusiasm and pointed to a bright future for OCR.

Shortly after "Gismo" captured the public's attention, the same engineer 
founded Intelligent Machines Research Corporation (IMR).  IMR developed and 
applied OCR technology to the problems and needs of commercial data 
processing.  The company went on to achieve a major first in OCR with the 
installation of a commercial OCR reader at Reader's Digest in New York in 
1954. The initial reader was used to convert typewritten documents (sales 
reports) into punched cards for input into the subscription department 
computer.  This equipment enabled the magazine to reduce order processing from 
the former rate of one month to a little more than a day.  Reader's Digest 
scanner, often cited as "paying for itself" twice each year, had already read 
its billionth character by September of 1959.  

Numerous other companies were early adopters of OCR:  First National City 
Bank, New York (processing travelers' checks); National Biscuit Company 
(converting sales records to cards); AT&T (dividend checks and stockholder 
records); Ohio Bell Telephone; Arizona Public Service Company; Atlantic City
Electric (cash accounting) and numerous government agencies including the U.S. 
Post Office.  At this time, most of the OCR systems were hardware + software 
combined devices costing hundreds of thousands of dollars that were restricted 
to reading two specialized fonts: OCR A and OCR B.

"Matrix matching" dominated OCR technology during the 1970's. In matrix 
matching systems, the software compares small parts of each bit-image scanned 
to bit-patterns stored in a library, finding which stored character is the 
most similar to the bit-pattern scanned.  However, the large variety of fonts, 
type sizes, and styles created a major problem for matrix matching.  For 
example, an Italic "A" has a different pattern from a Roman "A," even within 
the same size and type family.  Because of this, a matrix-matching OCR system 
must have either an enormous library of bit-patterns, (which requires a time-
consuming search for each match), or the system must be limited to matching a 
few type styles.

Matrix matching systems are commonly referred to as "trainable," since they 
allow the user to "train" the program to recognize different fonts.  
Generally, after a document has been scanned, the program separates out what 
it believes to be character images and asks the user to identify each image.  
It then stores each bitmap as the assigned character in its library and 
matches later images against that collection of bitmaps in order to identify 
characters.  This process is very time-consuming, given the number of fonts 
available today.

In 1974, a company named Kurzweil was formed to extend the capabilities of OCR
to fonts other than the set fonts.  The company's initial goal was to enable 
blind people to hear written documents through OCR software and voice 
synthesis.  A new technology was sorely needed, since matrix matching was 
becoming increasingly difficult, as word processors and laser printers gave 
rise to a rapid proliferation of fonts and heavily kerned, touching text. The 
technology pioneered by Kurzweil for the blind was called "OmniFont."

OmniFont, also known as "feature extraction," looks at the features of a 
character to recognize it, instead of looking at the entire letter and 
matching it to a letter in its library. The features each character are 
matched to the features of a known character.  For example, a figure charac-
terized by two slanted lines with a horizontal line across the center is an 
"A."  A vertical line with a circle attached on the lower right hand side is 
a "b."  If the circle is on the other side, it is a "d."  OmniFont works on 
most normal fonts because most fonts, as different as they are, share the same 
features.  

The major benefits of OmniFont over matrix matching are speed and the ability 
to read most normal fonts.  The increase in speed is the result of minimizing 
the samples table in relation to the volume of fonts supported.  A matrix 
matching table can include multiple samples of each character and can be 
updated by the user training it.  OmniFont only uses a table of generic 
features which does not increase in size and makes the search process much 
quicker.

In 1976, DEST pioneered an OCR solution to the business and office market, and 
in 1980, introduced a product call the Workless Station.  The company claimed 
that the Workless station garnered 65% of the flatbed scanner market.  
However, the product was specialized and not for the mass commercial market.

In 1988, Caere brought OCR to the mass commercial market with the OmniPage 
product -- an OmniFont OCR package aimed at the rapidly expanding flatbed 
scanner market.  What had cost many thousands of dollars and ran only on 
expensive hardware, was now offered to owners of personal computers with 
flatbed scanners.

OCR on Every Desktop

Scanners -- the electronic "partner" of OCR -- give "eyes" to the computer by 
providing a bridge between the analog world of everyday reality and the 
digital world of the computer.  But before Caere revolutionized the scanner 
market with the introduction of OmniFont technology, flatbed scanners were 
seen as devices for capturing images, not text.  Today, scanners are seen as 
both graphics and text solutions.

Until recently, quality images could only be captured and digitized with 
extremely expensive flatbed and sheetfed scanners.  However, the same 
functionality and sophistication are now available in the smaller, more 
affordable hand-held scanner.  As a result, hand-held scanners have evolved 
from tech toy of computer hobbyist into integral, productive desktop tools for 
business people as well as the home user. 

As this evolution takes place, users are demanding capabilities beyond image 
capture, as they purchase hand-held scanners to create complex documents that 
incorporate both text and graphics.  Flatbed scanners are already able to 
perform optical character recognition (OCR) at high level of speed and 
accuracy; the challenge lies in bringing this capability to the hand-held 
scanner.  

In 1988, Logitech introduced the first ScanMan hand-held scanner and brought 
scanning to the individual desktop.  The unit was intended for graphics 
scanning and limited to 200 dpi hardware resolution.  In addition, it was 
difficult to scan straight with this early model, which contained only one 
set of rollers. Thus, OCR was not a recommended use for the scanner.  What's
more, initial OCR packages for hand-held scanners were expensive and, in many 
cases, too slow and inaccurate to truly enhance individual productivity. 

ScanMan Plus for DOS, introduced by Logitech in late 1989, paved the way for 
OCR in the hand-held environment.  With its 400 dpi hardware resolution, 
extra set of rollers, scanning speed indicator, straightedge head design, and 
scanning speed indicator, ScanMan Plus enabled users to control their scans 
and achieve a level of resolution necessary for OCR.  

The first version of CatchWord, a DOS-based OCR software by Logitech, followed
the introduction of ScanMan Plus.  CatchWord marked the second stage in the 
evolution of Logitech hand-held scanners into highly functional, multipurpose 
input devices. CatchWord used OmniFont technology, giving hand-held scanners
the flexibility to capture a wide range of fonts and styles. CatchWord was 
also able to scan full pages of text by stitching together two scans of a 
full page.

In 1992, Logitech introduced CatchWord Pro for Windows. CatchWord Pro for 
Windows represented a new generation of OCR software that kept the special 
requirements of hand-held scanners in mind.

Logitech is now directly partnering with the acknowledged market leader in OCR 
software for the personal computer and the founder of OmniFont technology -- 
Los Gatos, Calif.-based Caere Corporation.  Caere is tailoring its popular 
OmniFont Direct product for use with Logitech's Windows-based ScanMan hand-
held scanners.  The application -- OmniPage Direct for Logitech -- is 
positioned as an affordable basic utility designed to meet the needs of 
ScanMan users who wish to capture a few pages of text to incorporate into 
other documents.

Acknowledgment

Much of the history of OCR was obtained with permission from the book The 
History of OCR by Herbert Schantz.  Herbert Schantz is the Director and Vice 
President of the Recognition Technology Users Group and a member of the 
OCR/Scanner/Fax Association.  He has written many papers and given numerous 
presentations on the theory, economics, and application of OCR dating back to 
1969.  Logitech would like to thank him for writing such an exciting and 
informative book on a subject that does not have that much written about it.
