Extract Accounting Data with Artificial Intelligence

Extract Accounting Data with Artificial Intelligence AI

The question most bookkeepers and accountants are asking is how can you extract data with artificial intelligence or machine learning, and not just using data extraction / OCR techniques.

The scenario is straightforward. You are a bookkeeper or accountant, and you can be working in a company that offers outsourced bookkeeping or accounting services, or in an internal finance team in a business. It doesn’t matter.

What does matter is if you’re doing cash based or accrual accounting. If you’re doing cash based accounting, then you’re primarily creating the bookkeeping entries from either the bank statements or the credit card statements. It means you don’t have a huge amount of documentation which you need to process bookkeeping entries from.

However, if you are doing accrual based accounting, then you need to look at each document. (And sometimes you first need to reconcile groups of documents).

Then work out the bookkeeping entry from each of these documents.

So you can either do this process of data extraction manually or you can use a variety of technology that is available to help you with this.

Artificial Intelligence around data extraction is a confusing term

There is so much information and misinformation around data extraction using AI. If you do a couple of google searches, you’ll find articles such as:

  • data extraction from documents
  • ai extraction
  • intelligent data extraction
  • document ai
  • deep learning for data extraction
  • ai data entry

Really confusing, isn’t it!

Artificial Intelligence is being used in automated accounting
Even thought artificial intelligence, machine learning and other technology is being marketed in more accounting and bookkeeping technology solutions, it’s still highly confusing, as people are not sure of the differences between the different technology or how it can help them automate their accounting.

To make matters more complicated, some companies are using really basic technology together with people in the background, and some companies are using technology.

The idea of this article is to try give you clarity on what the different choices are. And also how DOKKA is approaching the problem.

What is OCR in accounting

Before we even get to more advanced technology like artificial intelligence and machine learning, it’s important to understand the basics. That way you can learn how to Extract Accounting Data with Artificial Intelligence.

OCR has been around for many years, and is defined by Wikipedia as follows:

Optical character recognition or optical character reader is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo or from subtitle text superimposed on an image.

OCR data extraction works relatively well for English speaking documents, as long as the documentation is clear, and the information required is on the document.

There are a variety of different services offering OCR data extraction, some good and some on the exact opposite.

What about OCR in non-English speaking languages?

OCR in languages other than English still doesn’t work well, especially when the letters aren’t using the Latin or Roman alphabet.

Think about a language like Hebrew, where you have many different dots underneath letters. It’s very different from English letters, or numbers.

data extraction using computers for financial documentation
Technology today is changing the way bookkeeping and accounting data processing is done. In the past, the bookkeeping entries use to be created manually from the source documentation. Today, using a combination of data extraction / OCR technology, combined with machine learning and artificial intelligence, automated accounting is becoming relied on by bookkeepers and accountants globally. And part of automated accounting is to be able to Extract Accounting Data with Artificial Intelligence.

There is technology that can cater for different languages. However, generally speaking OCR for non English data extraction (and especially non Latin or Roman Alphabet’s) still doesn’t function as well as OCR data extraction for English.

What about data extraction with Handwriting?

My colleagues always jokes around and says that the entrepreneur that successfully creates a technology solution that can convert handwriting into a digital format with a high degree of accuracy will have a billion dollar business.

If you try any software solution for extracting hand writing, you’ll find that sometimes it works better than other times. It depends on so many factors ranging from the quality of the image you try extract from, but more specifically the style of handwriting and the neatness.

Data extraction of handwriting
The company that can create a perfect data extraction OCR solution from handwriting will be the next San Francisco unicorn. Handwriting has so many styles and nuances that it’s difficult for perfect data extraction from handwriting currently. If the handwriting is neat and blocked, it’s possible to extract at a high degree of success, but as can be seen in the image, handwriting can be extremely messy and difficult to read.

So do I just choose a data extraction software for my Accounting needs?

It’s not as simple as that. When you’re wanting to extract data from financial documentation, specifically to create a bookkeeping entry, there are a number of considerations:

  1. Is all the data extraction being done by technology, or are there people involved in the background?
  2. What about the parts of the bookkeeping entry that data extraction won’t help with. Where you’ll need more than OCR?

The second part deals with how to Extract Accounting Data with Artificial Intelligence. However, as we’ll see it’s not always AI that is used. Sometimes it is other technology.

Lets look at these 2 considerations for data extraction that bookkeepers and accountants need to take into consideration.

Is your data extraction being done by technology?

It’s so interesting. There are companies that provide data extraction technology using pure technology. There are also companies that provide data extraction technology by using people to act as the “technology”. 

Often these people come from poorer countries where the cost of labor is cheaper than your home country. So you’re inputting a financial document for data extraction, and when you view it, it appears that the data extraction has been done correctly. But it hasn’t actually been done by a technology solution. Actually it’s been processed by people by hand.

Often these people come from poorer countries where the cost of labor is cheaper than your home country.

One of the primary issues with this, is that people on the other side of the world to you, are seeing your, or your clients confidential information.

There was a situation a year back where a large expense management company got into hot water. It came out that they were using “Mechanical Turk” to process a lot of the data from the expense slips. Rather than using technology.

Another example we came across was when a potential client mentioned that DOKKA wasn’t able to correctly automate the data extraction from the tips on expense slips when going to a restaurant.

I’m not referring to the cases where there is a total, then the person enters a tip amount, and then the person neatly enters the total amount at the end. So there is an amount before tip, the tip amount, and the amount after tip. DOKKA can often handle situations like that.

I’m referring to the total amount including tip being “scrawled” somewhere on the expense slip, and a “technology data extraction solution” was able to understand that that scrawled amount was the final total and included the tip. That isn’t technology! That’s people.

So how do you know if people are involved in the process of data extraction?

The best way of quickly discovering if the technology data extraction solution is purely technology or whether people are involved in the process is by seeing how long it takes to get you a solution. If it takes a couple of seconds, or under a minute or 2, then it’s pretty much guaranteed to be a technology data extraction solution.

Upload a financial document and see if you need to wait anywhere from a few minutes to a day or more. If so, then there’s a strong likelihood that there are people behind your data extraction.

The rule of thumb is that less than a minute is technology, and anything  longer than a few minutes is probably people (or people combined with technology). Or the pure data extraction technology solution is currently having an issue. So upload a couple documents over the course of a few days and you’ll know for yourself what type of data extraction solution is being used.

Considerations for bookkeeping entries in addition to data extraction

Bookkeeping entries are not simply about extracting a total, a date, and the VAT / GST / Sales tax from an expense slip.

What about the general ledger account the financial document should be allocated to? Actually, what about if an amounts or amounts on the financial document should be allocated to multiple accounts.

So you have an electricity bill for your company, and you want to split 80% to electricity and 20% to a different general ledger account for some reason.

What about the description? If you simply want to write electricity each month, that’s one thing, but what if you want the description to have more detail?

Has the financial document been paid yet? If so, which bank account or credit card should it be allocated to? And if it isn’t paid?

There are so many considerations when creating the bookkeeping entry’s other than the actual data extraction.

My best example is the vendor name. Whenever I’m speaking to clients, they always have an issue because a lot of competitor technology extracts the name of the company on the invoice or receipt. That often includes a spelling mistake. Or pulls part of the name when it’s already in the accounting software with a slightly different version of the vendor name. The result? In the accounting software there are multiple versions of the same vendor.

How does DOKKA handle both data extraction and the creation of the bookkeeping entry?

DOKKA takes a totally different approach to everyone else in the data extraction and automated accounting space.

After DOKKA receives a financial document from you, the following actions are done to automatically create the right bookkeeping entry for you:

  1. DOKKA has developed proprietary visual recognition technology which looks at the structure of the financial document. It can then analyse whether a duplicate document has already been uploaded, and what type of document you’ve submitted. (bill, expense, credit note, receipt etc)
  2. Within 7 seconds (and this is how you can see DOKKA is purely a data extraction and automated accounting technology solution), we create a page showing the document you submitted on the left, and the proposed bookkeeping entry on the right
  3. The bookkeeping entry is slightly different depending on the accounting software integration you have selected. Haven’t connected any accounting software to DOKKA? We have a default layout? Connected Xero, Zoho, QBO, QBD, SAGE50 or one of the other accounting software solutions we support? You’ll see a slightly different layout for each of them.
  4. If it’s the first time you’ve submitted this particular vendor, DOKKA will take an educated “machine learning / artificial intelligence” attempt to understand your specific requirements for this bookkeeping entry for this vendor for this client. Which VAT / GST / Sales tax rate? If it’s paid, which bank account or credit card should it go to? Which reference number is correct if there are multiple possible reference numbers on the document? Which general ledger account or accounts should it be allocated to? Usually DOKKA on the 1st attempt will be highly accurate, but you might need to tweak the bookkeeping entry a little.
  5. Tweaking a bookkeeping entry in DOKKA is simple. You can either manually adjust the bookkeeping entry, but an even easier way is to use our “drag and drop” bookkeeping adjustment method. Simply find the right reference number on the document and drag it to the bookkeeping entry. Find the right amount on the document and drag it to the bookkeeping entry. Drag the right date and drag it to the bookkeeping entry.
  6. If it’s the 2nd time you’ve received this particular vendor or supplier, then DOKKA will already know what your specific requirements are for this document, and will present to you the correct bookkeeping entry.
  7. Glance at the document, glance at the bookkeeping entry, and click “approve”. That’s it. If you have integrated DOKKA with Xero, QBO, QBD, SAGE50, Zoho, or one of our other integrated accounting software solutions, the bookkeeping entry is pushed real time into the accounting software. (and in most cases a copy of the document as well). Haven’t connected any accounting software? You can download the approved bookkeeping entry in a CSV or excel file anytime you want.
Data Extraction meets AI
The DOKKA platforms includes technology to recognize the structure of any financial document, data extraction / OCR technology to extract relevant data from the right part of the financial document, and machine learning / AI, so that the bookkeeper makes changes using the Drag-and-drop or by manually editing cells, and DOKKA becomes smarter next time a similar document is uploaded. You don’t always Extract Accounting Data with Artificial Intelligence – sometimes Machine Learning can do a better job.

So the amazing thing about DOKKA is that it’s FAR more than data extraction. It combines data extraction with machine learning / artificial intelligence logic, and brings it all together with drag-and-drop functionality and other clever accounting automation techniques. So you, the bookkeeper, don’t need to create rules, or edit rules to teach the system.

Instead, the more you use DOKKA, the more it will automatically understand what your specific bookkeeping requirements are for each particular vendor in each company you work with.

Conclusion: Extract Accounting Data with Artificial Intelligence or Machine Learning to achieve the best results

You now know that you don’t always Extract Accounting Data with Artificial Intelligence. Sometimes you simply can extract data without additional intelligence. And sometimes you use AI or other technology like Machine Learning to achieve the desired results.

Hopefully this article will have given you some insight into data extraction, and how OCR works. Most importantly, even if a company claims to have a data extraction / OCR solution that will work for you, you now know that data extraction and OCR is only a very small part of the process to creating the right bookkeeping entry’s that you require to automate your accounting.

Share this post