F5.5G Leap-forward Development of Broadband in Africa The Africa Broadband Forum 2024 (BBAF 2024) was successfully held in Cape Town, South Africa recently, under…
The power of document filters inside enterprise search
Imagine if the South African Police Service performed a simple enterprise search, looking for material related to a burglary case. The search returns a connection between this active case and an archived case. Case documents and a quick scan of recent internet sales postings confirmed a link between the two burglaries, which helped identify a key suspect. With a bit more investigative work, family heirlooms are returned to their rightful owners.
Just the facts, ma’am
Let’s take a moment to stop and think about how old-fashioned police work can be supplemented with the latest technology to help investigators collect information to solve the case.
Each file returned using the enterprise search result list would be processed through a document filters engine that parsed and identified the varying bits and bytes of data inside each document. Breaking down the various document layers is the first step in making the data mineable by the investigator. Document filters would then dive deep inspection of each document layer to identify the file format and any special handling (such as markups) that needed to be addressed.
Finally, a series of sophisticated steps unfold to render near pixel-perfect image renditions of area maps, sketch photos and eye witness descriptions eventually aided the police department in locating and identifying the suspect. The investigator would use a high definition rendering of the original documents, without ever opening more than one application.
Time is money
Consider the time lost if the investigator needed to open each individual document, report, photograph and sketch in Word, Excel, Media Center, InDesign or in manual documents like at many of our police stations. When minutes matter (like when solving a case), so does quick user access to documents. And while your business may not be catching burglars, you are also losing time, dollars and experience without the correct document filters solution.
This file transformation and document conversion technology is complex, and the solutions available to fill this need are varied. Often, document filters (also referred to as file readers or file parsers) are a front-end ingestion technology that enable file identification, content and metadata extraction, as well as file conversion capabilities to enable doc-to-HTML conversion, doc-to-PDF conversion, among a variety of other output formats. Document filters can also enable content redaction in documents, allowing users to remove unnecessary or classified information from results.
It’s all about the experience
The breadth, depth and quality of these solutions will determine the quality of the end-user experience, like we see in the investigative example above. How broadly a solution recognises content format types, how well it replicates the original document for viewing, how much flexibility it provides in the speed and quality of the document viewing process and how easily it converts documents into new formats all contribute to a successful end-user experience.
The quality of this viewing experience is often projected onto the perceived quality of the underlying solution, as well. For the product manager of the enterprise content management/data repository provider, the e-Discovery provider, the archiving provider, the search solution provider or the data loss prevention provider, the quality of the document filters experience heavily affects user satisfaction for the entire solution, even though it is only as small subset of the value proposition.
As you use an internal search engine, or view an email attachment or a document from a data repository (or a burglary case file) today, pause for a moment to consider how challenging it used to be to download these files, launch a native application (if you even had it and/or its correct version) and then view it … just to see if you had actually found the document you were looking for. There’s your evidence of the importance of a quality document filter solution: case closed.