Intelligent document recognition and data extraction

Once the scanning is complete, the images are submitted for image processing, intelligent classification, document recognition, and data extraction.

Learn more Contact us

How automated data capture works

Step 1
Survey and form design

Build new structured form templates optimised for accurate recognition in an easy-to-use interface.

Print-merge variable information to forms and distribute to recipients.

Learn more
Step 2
Scan paper surveys

Load the document scanner with returned paper surveys and commence scanning.

Automated data capture (ADC) software will automatically receive the images directly from the high-speed scanner and begin classification.

Learn more
Step 3
Document recognition and data extraction

ADC software automatically identifies scanned images and extracts handwritten (ICR), machine print (OCR), optical marks (OMR) and barcode data.

Learn more
Step 4
Data verification

If ADC software is unable to evaluate a data entry field with sufficient confidence, it is highlighted to a user for manual confirmation or correction.

Learn more
Step 5
Data export and archive

Once verified, ADC software automatically exports data and images directly to third-party databases or common file formats (e.g., Excel, CSV, XML).

eStore, our document storage and retrieval system, is available for organisations seeking to archive data/scanned images for regulatory compliance.

Learn more

Image processing and quality control

Image quality is a critical factor in determining how well characters and documents are identified and processed.

Automated data capture software includes advanced image enhancement and correction capabilities, including de-skewing, auto-rotating and character enhancement.

Built-in quality control features identify and correct low-quality images early in the process to reduce human intervention.

Automated document recognition and classification

Each image received will be compared to one or more pre-defined templates to identify the form type. The software can recognise and distinguish between many different form types and process them accordingly. Unrecognised pages (such as cover pages or damaged pages) will be routed to an unclassified queue for manual handling.

Extract data from scanned documents using OCR, ICR, OMR and barcode recognition

After document processing and classification, data capture software uses handprint (ICR), machine print (OCR) and checkbox (OMR) document recognition technology to automatically extract data from scanned documents. This will include constrained print fields, tick boxes, variable scales, comments, tick boxes and signatures.

Intelligent character recognition (ICR)

To read short hand-print written responses from constrained print fields (one character per box) such as names, dates and numbers.

Optical mark recognition (OMR)

OMR technology identifies if a checkbox has been filled, with automatically handling of crossed out and amended responses.

Optical character recognition (OCR)

To capture machine printed text.

Image zone capture

Predictive key from image, coding and image snippet capture can be used to capture drawings or cursive handwriting.

Barcode recognition

To recognise all standard barcode types, including 2D matrix barcodes.

Signature detection

To confirm if a paper form has been authorised with a signature by calculating the fill percentage of the field.

Simple rules such as alpha, numeric, dictionaries, date ranges, look-ups and mandatory fields will be checked at this stage with any unrecognised fields/characters queued for human review.

These common-sense logic rules are applied to the extracted data to ensure that invalid responses are not exported (e.g. impossible to tick more than one response for a single-choice question). In such situations, the exception will be intelligently routed to the right human operators to review and correct. The entire process takes seconds meaning thousands of forms can be processed each day.

Intelligent data capture software

We select the best data capture systems to meet your requirements.

OpenText TeleForm

TeleForm automatically captures and indexes data and images from any form type, using handprint (ICR), machine print (OCR) and checkbox (OMR) recognition technology, ready for export to a database.

It aims to reduce manual data entry time by 90% or more and can eliminate hundreds of operator keystrokes.

Learn more

ABBYY FlexiCapture

ABBYY FlexiCapture is a highly scalable forms processing solution for intelligent and accurate extraction of data from structured, semi-structured and unstructured forms and documents for input into backend applications for further processing and archiving.