When a picture isn’t worth a single word. Have you ever tried to find information in a long PDF file using good old Ctrl+F, only to get hit with a “no matches found” warning on every keyword you try? Your troubleshooting instincts probably lead you to furiously click on the paragraphs and try to highlight words, but that’s when you realize that the document pages were scanned as images. Just because you can read the words doesn’t mean your computer can. Needless to say, copying, pasting, or editing this file is also out of the question.
Businesses whose processes rely heavily on hard copies of documents, but also on good record keeping, know this manual entry nightmare very well. They usually depend on staff to perform this tedious work, who might spend countless hours transcribing data into company systems or databases. Not only are resources like time and talent being spent inefficiently, but typos and inaccuracies can be costly in many ways. Important business decisions often come from these documents, so there isn’t room to make mistakes or increase processing times.
There's a Better Way!
A great alternative to manual entry, and solution to all the inefficiencies that it causes, is Intelligent Document Processing. Intelligent document processing (IDP) uses technology to extract data from any kind of document and processes it into a more structured form. One of the technologies behind it is called Optical Character Recognition (OCR) and it electronically identifies text and makes it machine-readable regardless of whether it is printed, handwritten, or scanned. Ever wondered how some traffic cameras can read your license plates? That’s OCR.
Ephesoft Transact is a market leader product in top OCR and data extraction. It provides quick and accurate IDP solutions to help organizations with their business and content processes. It can be configured to extract from specific document types and seamlessly integrate to your existing systems.
How Ephesoft Does It
- Ingestion – This is the process of bringing the documents into the system. Paper format documents could come in from sources such as a scanner. Electronic ones can be retrieved from email, shared folders, or existing content management systems, to name a few. This can also be done in batches.
- Image Processing – Documents are set up and cleaned up for further processing (i.e. rotating pages, page clean-up, etc.). The OCR data is also generated in this step through a layer of machine-readable data created for each page of these documents. Now, it is readable by both you and the computer.
- Classification – This is how Ephesoft Transact recognizes different types of documents. The system is previously configured and trained to recognize document types through samples. For example, if a batch of scanned documents contains invoices as well as expense reports, that’s ok! It will separate the documents automatically, so there is no need to use separator sheets or cover pages. Ephesoft Transact uses confidence levels and utilizes the training in the batch class setup to set the document type and then identify the page (first, middle, or last) of that document. Transact uses exception-based processing, so whenever there is a low confidence score, an operator is taken straight to that document to review, meaning that human interaction is only needed for the ones that need review.
- Extraction – The system identifies the contents in the documents and returns the desired ones as structured data. Ephesoft Transact supports different types of extraction rules that can be set so it knows where to look. These include key/value pairs, where you can configure extracting text following “Vendor Name” into the “Vendor Name” field, paragraph extraction, and even barcode data extraction.
- Validation – This step involves human interaction, and for good reason! Maybe a coffee stain or a printer smudge covers part of a word, and the system is not sure whether it is reading a “B” or an “8”, for example. Through the use of exception-based processing, if extracted data falls below minimum confidence levels, or extracted fields fail validation rules, the documents will be flagged for review. Operators save time by only reviewing flagged documents.
- Exporting – Ephesoft Transact can be integrated with ERP, ECM, CRMs, and other applications. Once processed, extracted data is entered seamlessly as business document metadata. It can be taken a step further and convert your image-only scans to searchable documents with that OCR layer created during the Image Processing step, essentially making your files Ctrl+F friendly. This is also important for the ingestion of documents into downstream ECM systems. Many of these systems provide full text indexing of content. This means that not only can you extract data within Ephesoft Transact, but also then index this data when it’s in a system like Alfresco.
Automation Enables Transformation
Digital Transformation has us rethinking how businesses use and leverage not just technology, but also people and processes. Removing manual data entry is an important step, especially in document-centric organizations looking to increase organizational responsiveness to customer needs and market challenges in order to be more competitive.
An important advantage of Ephesoft Transact is also the ability to adapt to growth and demand. Whether business is booming, or fiscal deadlines are approaching, an increase of documents to process does not have the impact on your workload and productivity to the extent that manual entry would have.
If you’d like to learn more about how we use Ephesoft Transact to help minimize manual entry tasks, watch our demo: Automated Metadata Extraction with Oil & Gas Engineering Documentation.