
First, let's talk about the process in general. When you receive a purchase invoice, you create a liability to the supplier. In accounting, this invoice is recorded by debiting expenses (or inventory, depending on the nature of the purchase) and crediting accounts payable. This makes the invoice visible as a liability on the balance sheet. The liability is only cleared once payment is made.
In the reconciliation with the purchase invoices, we mainly focus on that first posting: verifying whether each received invoice has been recorded correctly and whether the total booked liabilities match the submitted invoices.
What data do we need?
To check this, we need two datasets:
- Purchase invoices – This is the list of all received purchase invoices, including invoice numbers, suppliers, invoice amounts, and invoice dates. Download the dummy data here.
- Bookings in the financial administration – This is the list of all booked purchase invoices, with amounts and invoice numbers as recorded in the administration. Download the dummy data here.
Now, the question is: how do we match these lists and find discrepancies?
Step-by-step analysis with Python
Let's use Python to perform this check. We will look at:
- Missing invoices
- Duplicate bookings
- Differences in amounts
Step 1: Load dummy data
First, we load the dummy datasets.
import pandas as pd
# Read the audit file (bookings) and invoice file with date parsing
bookings = pd.read_csv("dummy_auditfile_accounts_payable.csv", parse_dates=["Date"])
invoices = pd.read_csv("dummy_invoices_analysis.csv", parse_dates=["InvoiceDate"])
# We filter only the purchase bookings, as we know they are recorded under account 2000.
purchase_bookings = bookings[bookings['AccountNumber'] == 2000]
# Merge the files based on InvoiceNumber
# Using 'outer' join to include all records from both files, including mismatches.
df = purchase_bookings.merge(invoices, on="InvoiceNumber", how="outer", suffixes=("_invoice", "_auditfile"))
# Display the first few rows of the merged dataset to verify the merge was successful
print(df.head())
Analysis 1: Identify missing invoices
Often, we reason from the audit file; do we see the bookings also reflected in the invoices?
# Identify bookings without a corresponding invoice
bookings_without_invoice = df[df['TotalAmount'].isna()].reset_index()
print(bookings_without_invoice)
In some cases, it is also useful to check purchase invoices that were not booked.
# Identify invoices missing from bookings
missing = invoices[~invoices['InvoiceNumber'].isin(bookings['InvoiceNumber'])]
print(missing)
If you are sharp, you will notice that in this last check, we do not use the merged dataframe at all but work directly with the source tables. I wanted to show you this as well. This approach is only useful for checking whether certain values exist. For comparisons (as in Analysis 3), it is much better to use a merged table.
Analysis 2: Detect duplicate bookings
Sometimes invoices are accidentally booked twice. We check if an invoice number appears more than once in the bookings.
duplicate_bookings = bookings[bookings.duplicated('InvoiceNumber', keep=False)]
print(duplicate_bookings)
Analysis 3: Detect amount discrepancies
Now, we check whether the booked amounts match the original invoices.
# Identify discrepancies between invoice amounts and booked amounts
# This also accounts for negative amounts (creditor invoices)
discrepancies = df[(df['Amount']*-1 != df['TotalAmount']) & (~df['TotalAmount'].isna())]
print(discrepancies)
In this dummy set, we have included quite a few mismatches. To calculate the total discrepancy, you can use the following:
total_discrepancy = discrepancies.groupby("InvoiceNumber").apply(lambda x: (x["Amount"] - x["TotalAmount"])).sum()
print(total_discrepancy)
Conclusion
These kinds of checks are incredibly helpful for detecting errors, and implementing them wasn’t too difficult, right? Download the dummy set and try it out yourself!



