Image recognition is one of the technologies used in the automation of business processes. OCR allows you to improve processes whenever you work with a large number of documents, invoices and PDF files. How to use this technology in practice and improve the flow of documents in the company? I will try to show this process and the effects from the perspective of a programmer who helps clients from various industries to implement and improve Robotic Process Automation solutions.
What is OCR technology?
OCR (Optical Character Recognition) technology is a set of techniques used to recognize characters and entire texts in a graphic file in raster format (i.e. in the form of a bitmap). Put simply, OCR allows us to recognize handwriting, scanned text (that is, a graphic file in general) and convert it into digital form. Recognition of handwriting is possible thanks to the use of many methods from the field of pattern recognition, which is one of the methods of Artificial Intelligence. So, every time we use image recognition, we use more advanced technology than we may think.
Use of OCR technology in RPA
Robotic Process Automation is a field that uses various information technologies to automate business processes. At some stage, in many processes from diverse areas of a company’s operations, scans of documents are used (e.g. as input data). Most of these documents are delivered or generated as PDF files, and employees need to extract specific data from them. In such a situation, you can use automation that works based on an image recognition. The format of the data provided is important. If they are generated as text and have an arranged structure, developers can use regular expressions for their analysis, which are used to validate text data or search for data in the text based on patterns. However, if the document is scanned as an image, the only way to read the data is to use OCR technology.
Efficiency of OCR
Can every document be read using OCR? The most important factor is whether the data is complete and correct – otherwise it will be useless. The reading efficiency depends on the quality of the documents. If, for example, the documents provided are of poor quality, the handwriting is not high contrast, the documents contain handwriting or are reversed, this will significantly affect the quality of reading. Advanced text recognition algorithms come in handy, and there are many companies on the market that offer ready-made software or advanced OCR algorithms. When deciding on such a solution, you need to make sure that the software has an interface that will allow you to provide a document for processing and receive processed data from OCR. However, in order for the data recognition efficiency to be suitable, RPA developers automate testing of the quality of data returned by OCR.
Practical use of OCR in RPA – a case study
As a developer, I support companies that want to speed up document processing. In one of the projects, I implemented a solution for a client who processed large volumes of orders in the form of PDFs and wanted to expedite and streamline this process. In my case, a project in the area of procurement was carried out, and the OCR solution was delivered by one of the leading OCR tools providers. Here’s how the process looked on the client side:
- The Vendor provides the Order Acknowledgment in PDF format. This document is sent to a dedicated e-mail box.
- Employees on the business side receive the document, read the data and validate it by comparing it with the data in the ERP system.
- If the document is validated, the read data is processed in several ERP transactions.
The challenge in the project was to effectively read specific fields. From the submitted order confirmation, 15 fields had to be extracted, 5 of which contained key data necessary for the validation and processing of a given document.
1. The provider of the solution committed to a figure of 90% accuracy of the read data, especially the key data.
2. The client’s representatives provided data patterns which were compared with the OCR extract.
3. The project was divided into rounds. During each round, the software provider adjusted the OCR algorithm to the specifications of the documents sent to improve the efficiency of data recognition. In the initial phases of the project, the recognition efficiency of some data, especially text data, was very low, often below 60%. The algorithm worked better for numerical data. The first observations were as follows:
- The best recognition efficiency was achieved by numerical data and standardized data, such as postal codes.
- I noticed the biggest difficulties in recognizing text data. For example, OCR returned the number 8 instead of the letter B, and the letter O instead of the number 0.
- Date recognition was also an issue, as the dates in the documents were in a variety of non-standard formats.
4. Developers created a robot that “read” e-mails from a dedicated mailbox, and if the e-mail contained an attachment in the PDF format, it was sent to the OCR tool for processing.
In order to be able to send and receive data from OCR, a web interface (WebAPI) was made available. In one of the sub-processes, the robot sent the given document for processing. In the next sub-process, the processed data was received in a readable JSON format, used to save data structures.
5. A sub-process was created to transform data into a format containing not only the values of individual fields on the document, but also information about the quality of the data expressed as a percentage, the so-called Confidence Score. The Confidence Score was presented both for individual fields and for the entire document.
6. On the basis of the Confidence Score, the robot classified a given document for further processing. If the Confidence Score of the entire document was below 90%, it required manual validation and data completion by employees. If the Confidence Score was above 90%, the document could be processed in the ERP system.
Additionally, for the purposes of OCR algorithm effectiveness testing, the robot extracted the data returned by OCR, which was then compared with the pattern data provided by the client. In order to automate the testing process, I created a tool that uses an algorithm which searched and compared test data with OCR data.
After several months and 6-7 rounds of customization of the OCR algorithm by the tool provider, the data for many fields, especially key data fields, was close to 90%. The tool generated accurate percentages for each of the fields read. As a result, more documents were processed automatically and it was possible to automate the invoice validation process, which was previously performed manually.
OCR in RPA – the greatest benefits
- Unburdening data validation departments and expediting the process
- Faster circulation of documents
- Automation of more processes thanks to the use of advanced technology
As you can see from the aforementioned example, OCR technology has a highly practical use in robotics. The key factors are the efficiency and precision of text recognition, which affects the final success of a given project. In more advanced and extensive OCR solutions, Machine Learning is used as well, thanks to which the effectiveness of text recognition improves over time, as the number of documents delivered increases. Technology gives us amazing possibilities and allows us to achieve effects that would be impossible for humans. RPA specialists and development teams are constantly working to improve these solutions so that clients could achieve increasingly better results thanks to intelligent solutions.