Subscribe
Share
Entering data and moving it from one place to another is a time-consuming, repetitive task. One employee can easily spend up to three hours a day just moving data around. In addition to eating up workers’ time, manual data handling is prone to errors, which lead to revenue losses. A report by Dun & Bradstreet, investigating the past and future of data, revealed that one in five businesses lose money due to incomplete data. Optical character recognition (OCR) technology can help businesses solve these issues. OCR algorithms can transform paper-based documents to editable searchable text. They can also extract information from files and enter it into the corresponding fields in a company’s IT systems. So, how does OCR work? How can this technology help you achieve business goals? And should you contact an artificial intelligence solutions provider to help you build and set up OCR software?

What is optical character recognition, and how it works

OCR definition

Optical character recognition is a technology that converts typed or handwritten text and printed images containing text into machine-readable digital data format. OCR algorithms help turn large amounts of paper documents into digital files, facilitating text storage, processing, and searching.
OCR systems consist of hardware and software. The hardware part can be an optical scanner or a similar device that can convert paper documents to the digital format. The software part is the OCR algorithm itself.

How does OCR work?

It is hard for computers to recognize characters because of the different fonts and variations on how one letter can be written. Handwritten letters complicate matters even further. Nevertheless, optical character recognition algorithms take on this challenge. Every OCR solution operates in four main steps:

Image acquisition

The process involves using an optical scanner to capture a digital copy of the paper document. The document has to be properly aligned and sized.

Pre-processing

The goal of this phase is to make the input file usable by the OCR algorithm. The noise and background are eliminated. Pre-processing includes the following steps:
  • Layout analysis: identifying captions, columns, and graphs as blocks
  • De-skew: tilting the digital document to make lines horizontal in case if it wasn’t properly aligned during scanning
  • Image refinement: smoothing the edges, removing dust particles, increasing contrast between text and background
  • Text detection: some algorithms detect separate words and divide them into letters while others work with text directly without splitting it into characters.
  • Binarization: converting the scanned document into black and white format, where dark areas represent characters (alphabetic or numeric) and white areas are identified as background. This step helps to recognize different fonts.

Character detection

During this phase, optical character recognition algorithms perform different manipulations to recognize letters and numbers. There are two main approaches:
  • Pattern recognition: OCR algorithms are trained on a wide variety of fonts, text formats, and handwriting styles to compare distinct characters from the input file to what they have learned.
pattern recognition
  • Feature recognition: some algorithms benefit from known character properties, such as crossed and curved lines, to identify characters in input files. For example, a letter “H” is identified as two vertical lines and one crossing horizontal line. OCR algorithms powered by neural networks (NN) use a different logic where the first NN layers aggregate pixels from the input file to create a low-level feature map of the image.
feature recognition
After detecting characters, the program converts them to American Standard Code for Information Exchange (ASCII) to facilitate further manipulations.

Post-processing

The output can be basic like a character string or a file. More advanced OCR solutions can retain the original page structure and create a PDF file with searchable text. Even though there are no tools so far that will guarantee 100% accuracy on different input files, some optical character recognition algorithms can achieve an impressive accuracy of 99.8% on familiar texts. Using handwriting will significantly compromise the results. Also, it's important to understand that with poor training or unfamiliar texts the error rate can be as high as 20%. Hence, it's necessary for users to constantly monitor, proofread, and correct OCR algorithms’ output, especially when a new type of documents enters the pipeline. Post-processing phase can also involve natural language processing (NLP) and other AI techniques for data verification. AI can not only correct the text but also catch mistakes in calculations. Let’s assume that while processing an invoice, an OCR algorithm identified the total sum to be $500. AI can verify this by adding all the expenses and figuring out that they don’t amount to $500. AI can notify a human employee to review this particular case. If you want to improve the algorithm’s quality, you can experiment with open-source OCR libraries, such as Tesseract, that use their own dictionary for character segmentation. Another approach is to create a specialized glossary of terms reoccurring in your domain. Also, reviewers can use their feedback as an input to another optical character recognition algorithm training session.

How can OCR algorithms benefit your business?

Here is what optical character recognition solutions can do for you:
  • Cut down costs: converting files to the digital format and automating data entry reduces costs in terms of employee hours
  • Increase customer satisfaction: this technology will enable people to update their personal information remotely by scanning identification documents instead of physically visiting a bank or any other establishment
  • Offer cheaper backup options: there is no need to store paper-based documents together with their duplicates and triplicates, which consumes expensive physical storage units
  • Facilitate translation among different languages: some OCR tools have the ability to translate documents from one language into another
  • Automate workflows: searching through digital files with a good management system in place is faster than dealing with paper documents. Less processes will be put on hold while looking for a lost physical file. If you are interested in a more comprehensive automation solution, you can utilize intelligent process automation services that include OCR and other advanced capabilities.

OCR solutions available on the market

If you are thinking about incorporating OCR features into your IT systems, you’ve got several options to choose from.

Open-source optical character recognition algorithms

There are several open-source OCR algorithms that businesses can adapt to their needs. These solutions are easier to customize as their source code is universally accessible. However, there is no central authority. Developers of open-source solutions don’t assume responsibility and don’t offer further support. Hence, the code’s quality can be questionable. This option is more suitable for companies with strong IT departments capable of fixing any malfunctioning. Alternatively, you can reach out to machine learning consultants who can customize and retrain this software for you. Here are some commonly used open-source OCR solutions:

Tesseract

Tesseract open-source engine is one of the most popular OCR tools, and it is believed to be among the most accurate free tools. It was developed by Hewlett-Packard between 1985 and 1994. Starting from 2006, this platform was managed and further developed by Google. Tesseract is written in C++ but it offers wrappers in Java, Python, Swift, Ruby, and R, and a few more common programming languages. The tool operates using a command line and doesn’t have a graphical user interface. However, there are several GUI options that you can deploy to make this solution user friendly. One example is glmageReader. This interface is developed using Python and supports different image formats, including PNG, GIF, and PNM.
Tesseract interface OCR
Mintguide.org
Tesseract doesn’t offer page layout analysis, doesn’t format the output, and its command line interface requires all images to be submitted in TIFF format. Additionally, this OCR solution is not optimized for GPU and doesn’t allow batch processing.

OCRopus

OCRopus was originally written in Python and now has a separate C++ version. It is supported by Google and was used as an OCR engine for Google ReCaptcha algorithm.
OCRopus has three main features:
  • Physical layout analysis: identifies text blocks, columns, and lines and determines the reading order. For example, to detect columns, it uses a maximal whitespace rectangle algorithm to detect white spaces between columns.
  • Line recognition: recognizes lines within each block or column, whether they are vertical or left-to-right lines.
  • Statistical language modeling: uses dictionaries and stochastic grammar to resolve the problem of missing and unidentified letters.

EasyOCR

Jaided AI, an optical character recognition company, built EasyOCR package using Python and PyTorch library with its deep learning models. It supports over 80 languages including Cyrillic scripts, Chinese, and Arabic, and this base keeps expanding. As a part of the implementation roadmap, there are plans to add configurable options for recognizing handwritten text.

Commercial OCR solutions

Software as a service (SaaS) solutions allow you to benefit from high-quality algorithms and receive full vendor support. Depending on the selected platform, you might be able to retrain the OCR algorithm on your dataset and even further adapt it to your unique needs.

Amazon Textract

Amazon Textract is a machine learning-based service that extracts printed and handwritten text from scanned documents. It can work with unstructured data and with formatted text, such as forms and tables. The solution uses AI and doesn’t need any extra configuration steps or templates. This service is secure and compliant with data protection regulations, such HIPAA and GDPR. Amazon Textract offers four APIs that customers can use and pay for accordingly:
  • Detect document text API: extracts unstructured printed text and handwriting from scans. Costs $0.0015 per page for the first one million pages; afterwards, the price decreases.
  • Analyze document API: works with structured data. Extracts text from forms and tables. Clients will pay $0.015 per page when processing tables, and $0.05 per page in the case of forms. The price decreases after the first million pages.
  • Analyze expense API: works with invoices. This service has a common taxonomy of receipt-related fields. For example, it can recognize invoice number. Users will pay $0.01 per page for the first million pages.
  • Analyze ID API: understands the context of identity documents, such as driver’s licenses and passports, and can extract text from specific fields. You can benefit from this service for $0.025 for the first 100,000 pages.

Google Cloud Vision

Google offers Vision API, which can extract printed and handwritten text from documents and images. It contains two features for optical character recognition:
  • Text_detection: extracts text from images, like photographs of traffic signs
  • Document_text_detection: captures texts in documents and images. It differs from the previous feature as its response is optimized for dense texts.
Both features allow users to process the first 1,000 units per month for free. After that, you will pay $1.5 per each 1,000 units. This price will decrease as you submit more units per month.

Microsoft Azure Computer Vision

Microsoft offers OCR services as a part of its generic computer vision API, not as a stand-alone feature. So, you pay for the whole package, which, in addition to optical character recognition, includes identification of celebrities, landmarks, brands, and general object detection. This API will cost you $1 per 1,000 transactions for the first million units. Afterwards, the price decreases to $0.65 per 1,000 transactions, and will keep declining as you submit more content.

Top OCR use cases in different industries

Optical character recognition algorithms are gaining traction in different industries. Below are some of the most prominent OCR applications.

OCR in banking

Banking institutions use loads of paper-based documents in their workflows. These include cheques, customer records, loan applications, bank statements, etc. Adopting OCR recognition algorithms allows employees to store and access all these documents digitally and prevents paperwork loss and damage. Check handling One example of OCR in this sector is using banking apps to deposit paper-based checks digitally. These solutions deploy optical character recognition algorithms to identify relevant fields in checks and perform operations accordingly without the need for an employee to transfer all this data manually. Additionally, such apps can perform signature validation against the existing database and clear the check immediately. Customer onboarding Instead of having an employee verify clients’ identity manually, OCR-powered solutions can extract and validate all relevant information from the person’s passport and other ID documents. This allows for instant verification and improves customer experience. Client information updating Instead of having to visit or call a bank, with the help of OCR, clients can scan their documents to update information automatically. For example, Alfa-Bank collaborated with Smart Engines to enhance their banking app with optical character recognition capabilities. With this new feature, customers can place ID documents in front of their smartphone’s cameras, confirm the extracted data, and update their information in the banking system.

OCR in healthcare

Similar to the banking sector, healthcare organizations accumulate many paper documents, such as X-ray scans, test results, treatment plans, and so on. OCR algorithms help digitize these files to prevent loss of physical documents and reduce efforts wasted on handling paper files manually. Additionally, some OCR solutions that recognize handwritten text can process patient enrollment papers and prescriptions. Medical claims system There are software vendors who specialize in OCR-enabled medical claim processing. One such company is OCR Solutions. It developed a product that can scan, verify, and correctly route medical claims for further handling. This program is trained and configured to work with common formats, such as Dental Claim Forms and CMS-1500, among others. Fax Many medical facilities still rely on fax. Optical character recognition solutions can convert incoming material into accessible digitally stored format. Invoicing OCR-powered solutions help healthcare organizations digitize invoices and file them correctly. One OCR example comes from San Francisco-based Nanonets, which offers an OCR-powered solution that specializes in invoice processing. The company claims its software will reduce invoice data entry time from three minutes per invoice to just 30 seconds.

OCR in retail

Optical character recognition algorithms enable retail employees to save time on processing purchase orders, invoices, packing lists, and other documents. These solutions can also extract serial numbers from products’ barcodes and enable customers to scan their vouchers and extract serial codes. ID scanning Store employees may need to scan personal information for many reasons, such as age verification, filling information for customer loyalty, and more. OCR vendors capitalize on this opportunity. For instance, OCR Solutions, based in Florida, developed idMax, an OCR-powered software that can scan ID documents, extract relevant fields, and populate the retailer’s database with corresponding information. idMax can be installed locally or accessed through the cloud.

Challenges of adopting an OCR solution in your business

If you decided to deploy OCR recognition algorithms to improve your operations, there are several aspects that you need to consider:
  • Input material: make sure all input files are suitable for the OCR algorithm. For example, the files need to be free of damage that can interfere with the algorithm’s ability to recognize its content. The contrast is high enough, the pages are properly aligned, etc. Some algorithms have powerful pre-processing capabilities and can resolve some of these issues for you. But if this is not the case, maybe it’s a good idea to invest in a high-quality scanner and ensure proper page alignment.
  • Training dataset: if you decide to train or retrain optical character recognition algorithms, you need to make sure the data you plan to use faithfully represents your input material and contains enough correct annotations. If your training dataset is too small, or does not contain adequate annotations, the algorithm will not produce desired results. Also, during training, you need to pay special attention to similar characters/symbols. For example, numbers 2 and 7 may look rather similar, especially if the algorithm is expected to work with handwritten text. Data scientists need to cover such distinctions in the training data. Another example can be using OCR algorithms to detect and capture license plates on cars. You need to make sure your algorithm doesn’t go for a custom sticker with text on the back of a car mistaking it for a license plate.
  • Handwritten text: with handwriting come numerous additional OCR challenges. There is a large variety of writing styles between different people, even individual user’s writing can be inconsistent. Gathering a reliable representative training dataset is a challenge as you need to account for all the different styles. Cursive handwriting is particularly challenging to process. Also, while printed text comes in a straight line, handwriting tends to have variable rotations, which complicates matters even more.
  • Scaling: if you increase the number of users or the number of requests per time slot, the system can collapse, especially if you are using an open-source solution and relying on your own computing power. In case of commercial OCR products that run in the cloud, you can arrange and pay for more capacity.
  • OCR algorithm’s performance monitoring: after deployment, the algorithm’s performance might start degrading due to different factors. One example is the change in distribution between the training data and the actual production data. This occurs when the model starts working on datasets it wasn’t prepared for, such as different fonts or characters with unusual inclines. These changes will affect the model’s output over time, and you need to detect these issues and retrain the model accordingly to maintain its initial accuracy level.

To sum up

Optical character recognition algorithms have the potential to speed up your business processes. However, there are associated challenges to consider. The selected algorithm is likely to need retraining, and it’s a tedious task to properly annotate a large dataset. You also need to think about potential scaling as your business expands. Adopting an open-source solution seems tempting price wise but it comes with its disadvantages, such as lack of support and updates, which can open security loopholes. Commercial solutions are more reliable in this regard but can be costly and hard to customize. If you are unsure of how to proceed and which OCR solution is the best fit for your business, don’t hesitate to reach out. At ITRex, we will be happy to conduct a thorough evaluation of your business needs to determine the best OCR option. We can also help you retrain the selected solution and integrate it into your system. We can also build a custom OCR algorithm, if needed.
Do you want to speed up your operations with optical character recognition? Drop us a line! Our AI experts will assist you with OCR solution integration and training. We can also develop custom algorithms for you, if needed.