Archival science / علم الأرشيف

تعتني هذه المدونة التخصصية بطرح و مناقشة القضايا العلمية و المهنية ذات الصلة بمجال إدارة الأرشيف و بنشر المستجدات في المجال. Nous présentons dans cet espace des idées inhérentes à l'archivistique et au records management et regroupons les publications techniques et scientifiques sur le domaine

الخميس، نوفمبر 26، 2009

Best Practices for Scanning

Information that organizations need comes from many sources. Increasingly, that information arrives in electronic format, making it easy to include it in our repositories. Some information, however, still arrives in hard copy and should be scanned for ready accessibility by the appropriate people in your organization.

While the process of converting these paper documents to electronic images can be time consuming, it can be made easier by following best practices. Following are standards that will help you do just that:

ISO 10196, Document Imaging Applications – Recommendations for the Creation of Original Documents
AIIM MS52 , Recommended Practice for the Requirements and Characteristics of Original Documents Intended for Optical Scanning
AIIM TR15, Planning Considerations, Addressing Preparation of Documents for Image Capture
These three standards provide guidance on converting hard-copy, paper documents to electronic images by identifying issues you may encounter when preparing them for capture. They will also help you understand the physical characteristics of paper documents that can make scanning difficult.

If your organization is initiating a scanning operation, these documents will also help in the process design.

It’s not enough to simply have the right scanning processes in place, however; the legibility of the images captured is extremely important, especially when you intend to discard the original paper. The following standards will help you ensure that the quality of the images is the best it can possibly be by providing test targets to be used with your scanner as well as procedures for sampling the documents that have been scanned.

ISO 12653-1 , Electronic imaging – Test Target for the Black-and-White Scanning of Office Documents – Part 1: Characteristics
ISO 12653-2 , Electronic imaging – Test Target for the Black-and-White scanning of Office Documents – Part 2: Method of Use
AIIM TR34 , Sampling Procedures for Inspection by Attributes of Images in Electronic Image Management (EIM) and Micrographics Systems

When used correctly, test targets will help you ensure that your scanner operates at peak performance. If you’re conducting a large operation, it’s not feasible to inspect the image quality of each and every scanned document. The use of sampling, whereby you inspect the quality of a representative sample of the scanned images, as well as the indexed information, is an acceptable means of ensuring that the quality of the documents captured meets your organization’s requirements.

Large-Format Scanning -- Key Considerations



When documents are larger than legal size paper, you need to think about a few extra steps to ensure proper capture. To paraphrase the warning on rearview mirrors, "Drawings in computer are bigger than they appear.”

As a service provider for large format scanning, I’m often confronted with the mistaken belief of many that scanning large-format documents is simple. It’s not! These types of projects are larger and have hidden costs and traps. Based on my experience, I will share how to be productive with a large format scanning production and the process. Here are some key considerations for any large-format project.

Document Prep
Large-format drawings can require a lot of repair and preparation. Pulling staples, removing notes, taping up torn edges, and the like can involve more than you think. You may also need to remove the drawings from sticks (hanging racks) or books. Large documents are often rolled up and stored (sometimes for decades) in pigeon holes. Drawings that have been rolled for long periods of time need to be either reverse rolled or flattened out for at least 48 hours.

Staples can be located in the middle of a drawing with a document attached. Understanding the goal of this data is critical before you begin. Many times, clients will tell us that there aren’t any documents or notes on the drawings. However, there are ALWAYS exceptions (I’m tempted to say that, at times, exceptions are the rule). Be prepared.

In addition, there can be dust (from storage) which needs to be vacuumed from the drawings. Dust can damage a scanner’s glass, so you must address this issue up front. Remove as much dust as possible to avoid replacing the glass. A scratch in the glass can look like a piece of dirt and create a line on the drawing that is not there.

No laundry service for drawings. Don’t be surprised if you have to get an ironing board and iron drawings to get them flat. We have ironed drawings on previous projects to get them flat enough for scanning. Just glad we were scanning a hotel’s records -- they provided the iron and ironing board. ;-)

Sorting
Some projects are more effective when the drawings are sorted by size and/or by quality. This minimizes the amount of time spent changing settings during the scanning process. There are limitations to length with PDF and JPG formats. You can scan beyond 100 inches but you cannot have it in a PDF format. PDF will not allow you to exceed 100 inches.

Control Numbers
During all projects, we apply a unique ID number (sequential). This number helps identify that the drawing has been scanned. Because of the varying types of media, we have selected stickers (opposed to a bates stamper) because ink will bleed on mylar. It also helps with quality assurance issues.

For example, one client took the scanned drawings and moved them to another location. A few years later and with new staff, they were identifying drawings that needed scanning. The Control number, which was a small sticker with a sequential number, was the only way they were able to identify if these drawings had been scanned, allowing them to avoid duplicating work.

File Naming
Special file naming when drawings will be indexed into a database is, in my opinion, a waste of resources. If you don’t have a document management system, then this makes sense. If you are scanning projects (sets of drawings) you can batch file name the project for all drawings and have sequential numbers follow the project name.

You can also do simple batch file naming with two or three characters in the beginning of the file name and then the sequential numbers to follow. This works for separating facilities or other types of categorization without special file naming.

Scanning
Single page or multipage? Sets of drawings can be scanned as a single image or you can create a multipage document. Just remember that if you are scanning as a multipage document and have a color scan in the middle of it you may have backed yourself into a corner. Scanning to a multipage TIF will work with monochrome documents but you will need to change your format to a PDF to get monochrome and color images into one file. I am confident you will not want to have a color TIFF due to the size.

Color -- Or Not
Many times there is content within a drawing that distinguishes additional information. Highlighted data can become a black block if you scan it as a monochrome with incorrect setting. Scanning is grayscale or color allows you to capture the information so that data is as easily readable as is the original. You can also batch convert the grayscale images back to a monochrome images with software. This way you will always have the ability to adjust the image to enhance the image where you could not do this with a monochrome image.

Resolution
According to most agencies (as well as AIIM Standards), large format documents (which can be 11 x 17 or larger) should be scanned at 300 dpi resolution. Any less is not recommended even though you have a good original.

Quality Assurance
Most large-format scanning projects never get a Quality Assurance Check. All drawings should be verified with virgin eyes to confirm that the edges have not been cut off and that the setting were correctly set to capture the image at the highest quality. We sometimes find lines on a drawing that are not part of the drawing. This is often a piece of dirt that goes from the top to the bottom of the image. These need to be rescanned. A good scan tech will catch these issues and rotate the drawings to their proper orientation before Q/C.

Large format documents are in a separate world, but are the key to complete enterprise-wide document management.

Lisa A. Desautels (702-222-3590 orlad@graphicimaging.net) has been President of Graphic Imaging Services, Inc. ( http://www.graphicimaging.net/) since 1997 and has providing specialized Document Imaging Solutions for large format documents for more than 12 years. Her knowledge and experience with CAD (Autodesk) and GIS (ESRI) since 1991 allows her to bring a more holistic understanding of the issues around document management to her clients.


السبت، نوفمبر 21، 2009

The Cost of Ignoring Information Governance


The dramatic growth in the use of electronic information within businesses combined with the need for higher governance standards in business practice requires new technology solutions that can fully address governance in an automated fashion. Information governance refers to the way an enterprise manages and controls its business information. At the heart of many of the new challenges is need for control. Control of information means:

• Knowing what information you have
• Understanding its value and taking the appropriate actions based on the value of each content,
whether it means long-term archival or quick disposal
• Ensuring it is discoverable, quickly accessible and can be secured for legal hold
• Ensuring it is only accessible to those with right of access
• Ensuring it is retained and disposed according to applicable corporate and legislative rules

To download this very interesting article : http://www.aiim.org/pdfdocuments/37234.pdf

الخميس، نوفمبر 19، 2009

Documents. With Class/Chris Riley



Document classification is a key part of any data capture strategy. However, it can also be used in advance of rolling out your entire capture strategy. A few thoughts on the importance of document classification.

Within any type of advanced technology there are several components of the technology that could stand alone for other purposes. Data mining has basic search. Content management has basic tagging. Data capture is no different. While most people consider “data capture” a single thing, a trend is evolving, as the market demands more education and explanation, to start looking at the sub-components of data capture. This trend allows organizations to deploy only those pieces that make the most sense and have a clear path to success. Once success is achieved they then can move to the entirety of data capture.

One component of data capture that has been overlooked and extremely underestimated is document classification.

What Class Is Your Document In?
Before data capture technologies can do the magic of field location and extraction using optical character recognition (OCR) or intelligent character recognition (ICR), they must first decide the page type (sometimes, the type of an entire document). Types might be obvious to the world, or only specific to an organization. Types can be determined by layout (lines, barcodes, graphics), or by context (words, codes, dictionaries). All data capture solutions have this built in as a part of the template matching or document identification process. When companies deployed data capture packages, classification was geared towards feeding the data capture process, not necessarily to stand alone as a function. Interestingly enough, however, many organizations have bought data capture applications just for the purpose of classification. They have done so with a success rate that seems to dwarf the overall data capture process. Let’s look at why.

One major challenge with data capture is the human labor associated with putting documents into groups. With documents automatically classed, this expense and time suck goes away. Because of this, I think using document classification only going to become more popular as companies see that they can first tackle that one major problem. Once successful, a company can then embark on the next, laborious steps towards data capture – but with a better chance of success. This approach also allows a company to better frame the process step-by-step for the technology vendors – tightly nailing down a well-defined problem and then moving outward from there with the technology. Vendors are often inclined to be helpful because they want the license value (for their bottom line) of the company’s entire data capture process.

Politics? What Politics!?
Classification can be a dream or a true nightmare to setup. It all depends on the documents (I'm using the term “document” to mean a record which could be single or multiple pages, but each page somehow relating to all the others.) If you are a little confused, you should be. Understanding your documents is the greatest stumbling block to classification. Sometimes, documents are very clear. Take accounts payable processing as an example. A document could be a purchase order that connects to a received invoice: this is the entire document. Within this document are the types purchase order, and vendor invoice. That was not so bad. Now what happens if you scan in duplex and the invoice on the back has payment instructions or disclaimers? What do you do with this page? That’s still probably not too complicated as you may just decide to omit the page if it does not have pertinent payable data from the document. The point: just a small illustration of the rate at which the definition of a document for an organization gets complicated.

The desired approach would be a study of what your objective types (page level understanding) are. This could be as deep as disclaimers, waivers, and descriptor pages. Once this is done, determine the rules that combine the pages together. In most environments the rules are flexible. For example, an invoice from a vendor can be 1 to 10 pages – the first page will have a header and the last page will have a total, everything in between is a detail page. When you do this you allow the ability to use all the cool tools automated document classification has to offer. Your only problem with this approach is the possibility of never-ending objective page level types.

Why Is Class Important?
What is so cool about classification is there is an even tighter control of the quality of the automatic classification because it's much easier to toggle what is right or wrong. This allows an organization, once they have a clear understanding of their documents and then an understanding of their complexity relative to automated classification, the ability to determine an actual ROI (or at least get close). Also because it's just a component of the whole data capture process, classification allows the organization to deploy exceptions faster, and perform initial setup faster with less expertise. Document classification – whether acknowledged or not – is a mandatory step in any data capture process and cannot be avoided. Why not excel at it?

As I mentioned before, the trend of tackling data capture's pieces rather than as a whole is becoming increasingly popular as the market education on this type of technology increases. Companies are seeking a path to success in document automation. The step-by-step path is much less overwhelming than taking on an entire data capture process. When an organization makes the determination to do this and truly understand their documents, they are taking the accuracy of an automated system into their own hands and really giving technology the best chance to work for them.

Chris Riley is founder of http://www.livinganalytics.com where he uses his in-depth knowledge of data capture technologies to advise clients and proselytize the value of these tools. Chris recently was the feature speaker for our webinar on March 5; Tips and Tricks to Help You Automate your Office Documents (for Effective Data Capture).

Scanning and Capture Technologies: Process Integration and ROI Enhancement

This survey of over 1,000 end users explores the primary business drivers, implementation obstacles and user issues related to capture, scanning, and recognition technologies.

Data Breakouts by Organizational Role:
Business Types, Document & Records Types, and IT Types
Key Findings:

Risk reduction and compliance arguments relative to scanning and capture are important.
Justifying the investment is a key obstacle that must be overcome in selling capture technologies.
Two "change" obstacles are also clearly important to end users - "change management" and "integration of new technologies".
Records managers are important in influencing capture decisions, as is the case with other document and content technologies, yet tend to not be the final authority for purchasing.
On-line information resources (search engines, AIIM web site, company web sites, and webinars) are extremely important to end users and prospective end users as they consider scanning and capture technologies.
The ROI experience of end users mirrors the impressive results reported in 2006.
Organizations are clearly seeking to leverage the investment they have made in multi-function devices (MFDs) by extending their use into scanning.

To download the document : http://www.aiim.org/PDFDocuments/32877.pdf