What data do we need?
To check this, we need two datasets:
- Purchase invoices – This is the list of all received purchase invoices, including invoice numbers, suppliers, invoice amounts, and invoice dates. Download the dummy data here.
- Bookings in the financial administration – This is the list of all booked purchase invoices, with amounts and invoice numbers as recorded in the administration. Download the dummy data here.
Now, the question is: how do we match these lists and find discrepancies?
Step-by-step analysis with Python
Let's use Python to perform this check. We will look at:
- Missing invoices
- Duplicate bookings
- Differences in amounts
Step 1: Load dummy data
First, we load the dummy datasets.
import pandas as pd
# Read the audit file (bookings) and invoice file with date parsing
bookings = pd.read_csv("dummy_auditfile_accounts_payable.csv", parse_dates=["Date"])
invoices = pd.read_csv("dummy_invoices_analysis.csv", parse_dates=["InvoiceDate"])
# We filter only the purchase bookings, as we know they are recorded under account 2000.
purchase_bookings = bookings[bookings['AccountNumber'] == 2000]
# Merge the files based on InvoiceNumber
# Using 'outer' join to include all records from both files, including mismatches.
df = purchase_bookings.merge(invoices, on="InvoiceNumber", how="outer", suffixes=("_invoice", "_auditfile"))
# Display the first few rows of the merged dataset to verify the merge was successful
print(df.head())
Analysis 1: Identify missing invoices
Often, we reason from the audit file; do we see the bookings also reflected in the invoices?
# Identify bookings without a corresponding invoice
bookings_without_invoice = df[df['TotalAmount'].isna()].reset_index()
print(bookings_without_invoice)
In some cases, it is also useful to check purchase invoices that were not booked.
# Identify invoices missing from bookings
missing = invoices[~invoices['InvoiceNumber'].isin(bookings['InvoiceNumber'])]
print(missing)
If you are sharp, you will notice that in this last check, we do not use the merged dataframe at all but work directly with the source tables. I wanted to show you this as well. This approach is only useful for checking whether certain values exist. For comparisons (as in Analysis 3), it is much better to use a merged table.
Analysis 2: Detect duplicate bookings
Sometimes invoices are accidentally booked twice. We check if an invoice number appears more than once in the bookings.
duplicate_bookings = bookings[bookings.duplicated('InvoiceNumber', keep=False)]
print(duplicate_bookings)
Analysis 3: Detect amount discrepancies
Now, we check whether the booked amounts match the original invoices.
# Identify discrepancies between invoice amounts and booked amounts
# This also accounts for negative amounts (creditor invoices)
discrepancies = df[(df['Amount']*-1 != df['TotalAmount']) & (~df['TotalAmount'].isna())]
print(discrepancies)
In this dummy set, we have included quite a few mismatches. To calculate the total discrepancy, you can use the following:
total_discrepancy = discrepancies.groupby("InvoiceNumber").apply(lambda x: (x["Amount"] - x["TotalAmount"])).sum()
print(total_discrepancy)
Conclusion
These kinds of checks are incredibly helpful for detecting errors, and implementing them wasn’t too difficult, right? Download the dummy set and try it out yourself!