Napster, MP3, YouTube, iPhone and MySpace. You may be asking yourself “what does this have to do with document scanning?” In reality, not much, other than large file sizes, however when we draw an analogy between large audio files like those found on Napster such as wav files or mp3’s, video files like those found on YouTube and MySpace which you can utilize on a mobile device like the new iPhone then we can get an appreciate of the challenges of sharing large files. And because viewing document images has become just as important, if not more important from a business perspective, we need to have a clear understanding of the general technological trends of information sharing and the Trends Towards Higher Resolution Scanning.
In the perfect world of scanning technology someone would drop a document into the scanners automatic document feeder, scan the page and “voila”, all the vital business data has been automatically extracted for immediate use by an Enterprise Content Management (ECM) system or for general retrieval via a keyword search. This is similar to using a search engine to find the information we are looking for. Sounds like magic? It nearly is but there are many underlining technologies that create this magic. Technically speaking advanced forms processing, or the ability to perform these sophisticated tasks automatically, is a reality that is available today and this ‘magic’ starts with high quality scanned images which most closely resemble the original document. In Automated Forms Processing applications there is a lot going on behind the scenes where a poor or good quality image dictates the success, or failure, among other related processes in the grand scheme of the document imaging system. These features need to be highly functional, extremely accurate and transparent to the users themselves.
In a recent study of scanner users Susan Moyse of Moyse Technology Consulting summed up the current trend quite well, “Scanner users need applications that do more automatically. This requires vendors to deliver sophisticated functionality almost invisibly. The less these users know of the underlying technologies the better. Business users just want their scanning solutions to solve their problems.”
Traditional Obstacles Addressed By Advanced Technology
Advanced features are great from the standpoint that custom systems can be designed by systems integrators, value-added resellers or a professional services organization to fit individual business needs. An effective document capture system is a system that operator’s don’t have to think about. Capturing more dots per inch at scan time gives your scanning solution the greatest chance of automation success. Most likely no solution will be absolutely perfect, nevertheless, giving your capture solution the greatest chance at success through good image quality, more dots per inch and great paper handling can dramatically increase your level of automatic document capture.
There are advanced techniques such as automatic document classification, document separation and free-form processing, all of which greatly depend on the computer being able to read the dots on scanned pages to make intelligent, and critical decisions about these images. After all, garbage-in is garbage-out and your document capture solution is the on-ramp to transform paper to useable electronic data. Most often you get one chance to capture these images before they are filed in a permanent archive or the physical paper is destroyed forever.
To understand these trends and to develop our hypothesis for the future of document scanning we must evaluate what inhibited the sharing of large files in the early days of file sharing. While the ability to share audio, video and document images has been around a long time, this sharing was limited due to some rather common factors. The cohesion between all file formats is they have historically been large file sizes and difficult, if not impossible, to use over computer networks. Let’s take a look back into the not-so-distant past and get a glimpse at what ultimately made the likes of YouTube, MySpace and Napster, successful and what will drive the trend of scanning higher resolutions for automation. One of the most obvious drawbacks to sharing large files was the lack of bandwidth. Whether it was a remote user on a dial-up connection or corporate networks that hadn’t had the foresight to plan ahead for the sharing of large files, customer dissatisfaction was high and people were reluctant to use these services due the impending frustration of waiting for large downloads to complete. Likewise, video sharing had been, until recently, slow to adopt for many of us, however times are changing on the increased bandwidth forefront and we need to refer to history to understand what limited the adoption rate of these technologies.
Contributing Factors to the Trends Towards Higher Resolution Scanning
Most leading Automated Forms Processing software companies recommend scanning at a minimum resolution of 300 dots per inch for effective data extraction. In other words, for every square inch of paper the scanner is capturing 300 dots horizontally and 300 dots vertically or 90,000 total dots (300 x 300 = 90,000 dots per square inch). This automation reduces manual intervention tasks such as ‘key index values from images’ which in turn decreases costs and improves efficiency. Some techniques, which you might be familiar with, include Optical Character Recognition (OCR), Intelligent Character Recognition (ICR) or Optical Mark Recognition (OMR).
Presume we settled for scanning at 200 dpi resolution. We would have captured only 40,000 total dots per inch versus 90,000. Why is this important? Below is an illustration which demonstrates how incrementally larger file sizes due to scanning higher resolutions or utilizing color. Higher Resolution Scanning equals Improved Automated Accuracy.
“The accuracy of the OCR systems declined dramatically when the resolution of the images was reduced from 300 to 200 dpi…”
Source: The Fourth Annual Test of OCR Accuracy (http://stephenvrice.com/images/AT-1995.pdf)
“Scan resolution: The number of dots per inch can affect the clarity of the image and accuracy of OCR. Recent tests found that reducing from 300 dpi to 200 dpi increased the OCR error rate for a complex document by 75%…”
So the question is “why wouldn’t everyone simply scan documents at 300 dots per inch?” Traditionally there have been several legitimate concerns that made higher resolution scanning unattractive to users and systems operators. This includes limited bandwidth, (as in the audio and video file size scenarios), lossy image compression technology or the physical scanners themselves might slow to two-thirds or less of their rated speed at 200 dpi scan resolutions. Lastly, the larger file sizes created by scanning at higher resolutions. Now, through advanced technologies and innovation, the document capture industry is addressing all of these obstacles, which should truly enhance the adoption rate of higher resolution scanning. Let me be specific about each:
• Increased Bandwidth for Remote Users and Corporate Networks –
For those of you that have tried sending a large file via your e-mail client, you can certainly relate to the ‘pain’ involved with sending even one file using a low bandwidth connection. Now, just imagine a customer service operator who has to retrieve hundreds of images per day during the normal course of their work day. Decreased costs and better availability to higher bandwidth networking components affords network administrators, or even remote users, to upgrade to high speed networks such as T1 internet lines, DSL, Cable Modem, Gigabit routers/cabling or even fiber optic networks. All of which bodes well for the future of sharing large size files types including audio, video and scanned images.
• Improved Image Compression Techniques of Scanned Images –
Many new image compression techniques have been introduced recently which drastically decrease file sizes of both color and black & white images while still retaining great image quality. Previously some compression techniques caused poor image quality that would drastically decrease automatic forms processing accuracy. In addition to better images and highly compressed images, technology such as Automatic Color Detection can determine whether to save the scanned images in a black & white or a color format at scan-time, thus eliminating the need to separate documents into stacks of bi-tonal and color pages. It’s much more desirable to compress a bi-tonal image than color which is an ideal example of combining emerging technologies for the benefits of users and systems administrators.
• Scanning Higher Resolutions at Rated Speed –
Just as your car’s engine is designed to perform at a maximum speed based on the combination of aggregate parts, your document scanner is only as good as its weakest link. Certain document scanners these days have been highly engineered specifically to perform at rated speeds while scanning in higher resolution modes, thus excelling at Automated Forms Processing tasks eliminating the need to sacrifice accuracy for throughput.
• Decreased Storage Costs –
When the expense per megabyte of storage cost dollars, or several dollars, per megabyte, businesses had to make a serious decision about their choice of a data storage medium. At the time, it could have been in the form of low-capacity/high-availability hard disk drives, which were expensive, optical disks for moderate-capacity/moderate-availability at a mid-range price, or tape drives which were typically high-capacity/slow-availability although the most affordable. Times have changed quickly with the evolution of CD-ROMs, DVDs and extremely high-capacity hard disk drives. The storage industry has reached the ‘critical mass’ stage where vendors are creating great technology but competing for market share which drives costs to users down. Businesses and individuals are consuming data storage devices at a greater rate and the end of this trend seems to be nowhere in sight. Increased storage capacities, smaller forms factors and decreased costs are a clear trend and portend well for storage of large file sizes.
Benefits of Higher Resolution Scanning to Automation
Consider that Automated Forms Processing involves computer-based intelligence to make crucial decisions concerning your scanned images. For example: Classification- What type of document? Separation- How many pages is the form? Anchor Points or Free-Form- Where is the information on the page? Quality Control- Are these characters meeting my defined accuracy criteria? Essentially, scanning hardware and software technologies have progressed to a level of automation that allows for sophisticated document capture, advanced forms processing and mission critical data extraction, all of which could be completely transparent or invisible to the user. However, this high level of automation beings with high resolution scanning. The ability to drop a document into the scanners automatic document feeder and perform these advanced tasks has become a reality without the traditional sacrifices inherent to Higher Resolution Scanning.
The trend towards more and more distributed scanning is obvious. As more document scanners find their way into the workplace, the demand for more invisible sophistication to the user must continue. Appreciate the technology; yet allow the user to be experts in their respective professions instead of having to become scanning experts as well. Capture more dots per inch with higher scanning resolutions and give your document capture system the greatest chance for success.