The client is a major European carrier. Established in 1928, it’s one of the world's oldest operating airlines. With a fleet of almost 100 aircrafts, the Client flies to over 120 destinations across Europe, Asia and North America.
The client is a major airline that provides services to millions of passengers and cooperates with numerous business partners. Their ground handling office processes over 10 000 invoices a month. The invoices are delivered by nearly 600 independent companies and contractors. The client’s partners use different formats and languages in their respective invoices. Processing of such an extensive number of varied documents is time-consuming and requires involving multiple people and teams. It also delays the analysis of the data and any actions that should be taken as the result, while rendering the entire process error-prone.
The client wanted to explore the possibilities for optimising this process. They decided to engage Objectivity in finding a solution that would allow for automated verification of documents and easier error detection. The Objectivity team was tasked with building a Proof of Concept (PoC) with Microsoft’s technology stack and using it to validate documents from different sources and a variety of formats.
Objectivity decided to use the Azure Form Recognizer for document extraction. The team built a tool that would allow for a specific visualisation of the structure of PDF files. This was needed to test the results from the Form Recognizer against the actual structure of the invoices. In the next step, over 200 sample invoices were checked and collected as the reference data set for further testing. The team worked out methods for pre- and post-processing of the data extracted from the collected documents and the correction of tabular data structure where needed. The solution was then tested and fine-tuned to achieve over 95% accuracy in data detection in the test set of invoices.
The solution included a digitalisation mechanism with 4main functions:
- Initial pre-processing of documents and correcting basic errors in the PDF files.
- Utilising Azure Cognitive Services and the Form Recognizer to extract data such as: Invoice Date, Invoice ID, Invoice Total, SubTotal and Total Tax, as well as tabular data.
- Post-processing of the extracted data which included e.g., data parsing, format and localisation unification and currency detection.
- The mechanism also made it possible to restore the previously missing data.
Moreover, the structure of tabular data was corrected where the formatting was overly complicated or there were discrepancies in headers or footers. The process leveraged Machine Learning algorithms, e.g. DBSCAN.
The developed PoC gave the Client a clear picture of what possible improvements document digitalisation can bring to their processes, and how well the right Microsoft technologies address their needs. The PoC allowed the Client’s teams to look into the possibilities for automation of their invoice processing and proved that a solution like the one designed by Objectivity can handle their extensive data sets with efficiency and accuracy. Once they decide to deploy such a solution to production, manual correction of data would only be needed in case of data the application marks as unreliable. The automated transfer of data to the Client’s SAP system will significantly accelerate the entire process and allow for an extended data analysis. The PoC empowered the Client to make informed decisions regarding the direction and scope of their process digitalisation, in accordance with their business priorities.