Intelligent document recognition and data extraction

Once the scanning is complete, the images are submitted for image processing, intelligent classification, document recognition, and data extraction.

Learn more Contact us

How automated data capture works

  • Step 1

    Survey and form design

    Build new structured form templates optimised for accurate recognition in an easy-to-use interface.

    Print-merge variable information to forms and distribute to recipients.

    Learn more

  • Step 2

    Scan paper surveys

    Load the document scanner with returned paper surveys and commence scanning.

    Automated data capture (ADC) software will automatically receive the images directly from the high-speed scanner and begin classification.

    Learn more

  • Step 3

    Document recognition and data extraction

    ADC software automatically identifies scanned images and extracts handwritten (ICR), machine print (OCR), optical marks (OMR) and barcode data.

    Learn more

  • Step 4

    Data verification

    If ADC software is unable to evaluate a data entry field with sufficient confidence, it is highlighted to a user for manual confirmation or correction.

    Learn more

  • Step 5

    Data export and archive

    Once verified, ADC software automatically exports data and images directly to third-party databases or common file formats (e.g., Excel, CSV, XML).

    eStore, our document storage and retrieval system, is available for organisations seeking to archive data/scanned images for regulatory compliance.

    Learn more

    Image processing and quality control

    Image quality is a critical factor in determining how well characters and documents are identified and processed.

    Automated data capture software includes advanced image enhancement and correction capabilities, including de-skewing, auto-rotating and character enhancement.

    Built-in quality control features identify and correct low-quality images early in the process to reduce human intervention.

    Automated document recognition and classification

    Each image received will be compared to one or more pre-defined templates to identify the form type. The software can recognise and distinguish between many different form types and process them accordingly. Unrecognised pages (such as cover pages or damaged pages) will be routed to an unclassified queue for manual handling.

    Extract data from scanned documents using OCR, ICR, OMR and barcode recognition

    After document processing and classification, data capture software uses handprint (ICR), machine print (OCR) and checkbox (OMR) document recognition technology to automatically extract data from scanned documents. This will include constrained print fields, tick boxes, variable scales, comments, tick boxes and signatures.

    Intelligent character recognition (ICR)

    Intelligent character recognition (ICR)

    To read short hand-print written responses from constrained print fields (one character per box) such as names, dates and numbers.

    Optical mark recognition (OMR)

    Optical mark recognition (OMR)

    OMR technology identifies if a checkbox has been filled, with automatically handling of crossed out and amended responses.

    Optical character recognition (OCR)

    Optical character recognition (OCR)

    To capture machine printed text.

    Image zone capture

    Image zone capture

    Predictive key from image, coding and image snippet capture can be used to capture drawings or cursive handwriting.

    Barcode recognition

    Barcode recognition

    To recognise all standard barcode types, including 2D matrix barcodes.

    Signature detection

    Signature detection

    To confirm if a paper form has been authorised with a signature by calculating the fill percentage of the field.

    Simple rules such as alpha, numeric, dictionaries, date ranges, look-ups and mandatory fields will be checked at this stage with any unrecognised fields/characters queued for human review.

    These common-sense logic rules are applied to the extracted data to ensure that invalid responses are not exported (e.g. impossible to tick more than one response for a single-choice question). In such situations, the exception will be intelligently routed to the right human operators to review and correct. The entire process takes seconds meaning thousands of forms can be processed each day.

    Intelligent data capture software

    We select the best data capture systems to meet your requirements.

    OpenText TeleForm

    OpenText TeleForm

    TeleForm automatically captures and indexes data and images from any form type, using handprint (ICR), machine print (OCR) and checkbox (OMR) recognition technology, ready for export to a database.

    It aims to reduce manual data entry time by 90% or more and can eliminate hundreds of operator keystrokes.

    Learn more

    ABBYY FlexiCapture

    ABBYY FlexiCapture

    ABBYY FlexiCapture is a highly scalable forms processing solution for intelligent and accurate extraction of data from structured, semi-structured and unstructured forms and documents for input into backend applications for further processing and archiving.

    Learn more

    Case studies

    • University of Bristol

      University of Bristol

      Based at the University of Bristol, the ‘Children of the 90s’ study uses TeleForm to capture data from the paper versions of their questionnaires.

    • Civica


      Following a OJEU tender process, the British Council commissioned Civica as the prime contractor to deliver an on-screen marking (OSM) solution to mark up to 2.25 million ‘pen and paper’ IELTS tests each year.

      As part of the agreement, Civica appointed ePC as its sub-contractor for the script capture, processing and verification work.

    • Keele University

      Keele University

      Keele CTU saves time by capturing data from paper-based clinical trials with TeleForm.

      Read more