What’s New in Ephesoft 3.1 (part 2)

Share This Post

One of the key reasons we like Ephesoft so much is its straightforward implementation of
Intelligent Document Capture concepts. Ephesoft’s Search Classification mode automatically
learns from the semantic content on the sample pages provided, and builds a Lucene database from this information.
From there, this database is leveraged to adaptively interpret and classify any new documents that are fed
into the system.

As new documents are fed in, Ephesoft’s high-level classification process is basically:

“How do I figure out what kind of document this is? I’ll compare this doc’s written content with the learning samples
I have, and make an educated guess from there.” In this way, IDR mimics the common-sense semantic approach that is also
generally employed by humans — and therein lies its flexibility and utility.

Until now, most of this classification and interpretive work was hidden under the hood in Ephesoft. But in version 3.1,
this classification logic is now exposed in the batch class administration area, via the new Test Classification
feature. We are very excited about this addition to the software, and we’ll explain more below.

The new Test Classification feature functions similarly to the existing Test KV feature. First, you’ll need to set up
a new batch class with a few document types, and then go through the normal document training process. If any explanation
is needed for these steps,
Ephesoft’s documentation is a fine start.
Once the learning is complete, we can make use of the new Test Classification feature.

Inside the batch class directory is a new addition — test-classification — where new test candidates can be
loaded in the same paradigm as for the test-extraction that existed before. To use this you’ll need to copy in
some document or page samples that (ideally) are representative of the larger document population which will be fed into
Ephesoft later during the production phase.

From there, just find and press the Test Classification button in the batch class administration
area. The first time this runs, Ephesoft will use whatever classification methods have been assigned to each document type,
and apply those analyses to the files that have been loaded into the test-classification folder. This can actually
include any of Ephesoft’s recognition modes — image classification, search classification, barcode recognition, etc.
— but we’ll focus on search classification since that’s what we end up using most frequently with our clients.

The first time this runs, there may be a few moments’ delay as Ephesoft runs OCR analysis on the files for the first time. If your environment is not especially powerful, or if you’ve loaded a large set of samples, you’ll want to allow several minutes for the first run to complete. Whenever this process is finished, the classification results will pop up on the screen in a modal window, and the full power of this new feature should immediately be evident:

Essentially Ephesoft is now providing us with a fully transparent readout of all its classification logic, on a
per-document and per-page basis. The table as a whole reveals how the entire set of files is interpreted, how each
are grouped into individually classified docs, what each doc is composed of internally, and with what level of internal
confidence. Furthermore, if any questions arise regarding how these confidences and classifications were reached,
one can simply open up the _HOCR files that now accompany each page in the test-classification folder, exactly
in the same way that.

We are really pleased with how Ephesoft has set up this new feature, and we think it will be an invaluable addition for
use cases that involve “fuzzy” data sets with a lot of document variability. This feature allows the same level of
instrumental fine-tuning that we’ve enjoyed for development of extraction logic, now to be applied to the classification
of different document types. In certain use cases this will prove even more valuable than extraction testing, because
some document populations have such a high degree of variability that classification really is the biggest value that
Ephesoft brings. And without correct classification on each document, whatever custom extraction logic has been developed
will effectively be moot. We give a big thumbs-up to Ephesoft for giving us such a valuable insight into this most
important step in the document capture workflow.

More To Explore

Content Automation: Where Business Process Meets Content Management

The ECM landscape is undergoing a profound transformation. While organizations will always have content that requires management, they do not intend to merely store it in a virtual file cabinet.

Joy Beatty May 6, 2024

processes to consider during erp deployment

4 Categories of Processes to Consider During an ERP Deployment

A Strategic Guide To Automation And Efficiency During ERP Deployment Before adopting a new ERP platform like SAP, or incorporating newly acquired entities into an existing ERP framework, businesses must

Joy Beatty & Ajay Badri March 27, 2024

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

What’s New in Ephesoft 3.1 (part 2)

Share This Post

More To Explore

Content Automation: Where Business Process Meets Content Management

4 Categories of Processes to Consider During an ERP Deployment

Want to learn more about our history, what we do, or how we do it?

Request a Conversation!

ArgonDigital

Optimize Your Work Processes.

Company

Solutions

Services

ArgonDigital

Unify Your Customer Journey.

Company

Solutions

Services