Aging Analysis & Doubtful Debtors: Analysis with Python

Every organization naturally wants to be paid quickly for the products or services it delivers, but in practice, you often have to wait a while. It's then useful to know how long you've been waiting and whether that customer will pay at all.

In this article, we delve into the aging analysis of accounts receivable and show step-by-step how you can quickly gain insight into who pays on time and who doesn't, using data analysis (in Python).

Aging analysis and doubtful debtors cover

Why is this aging analysis relevant in audits?

Before we begin, first the question: Why would you even perform this analysis as an auditor?

Firstly, annual financial statements must provide a true and fair view of the accounts receivable position at the balance sheet date. Therefore, you check whether outstanding items are not presented too optimistically. A large amount of old, unpaid invoices can mean that accounts receivable (and revenues) may be overstated.

Often, in such cases, a provision for doubtful debtors must be made: this is an adjustment to the value of accounts receivable to account for expected uncollectibility. A common practice, for example, is to consider invoices older than 90 days (three months) as (partially) doubtful and to write off an amount for this (e.g., 25% for >2 months, 50% for >3 months, increasing to 100% for >9 months overdue).

During an audit, the list of outstanding items at the balance sheet date is usually reviewed, and it is assessed which of these items have since been paid (after the balance sheet date). Invoices that remain open long after the balance sheet date are often considered a risk: such an item may have to be written off as uncollectible if it remains open for more than a year. You will need to discuss this with the company and check whether an adequate provision has been made for these doubtful debtors.

In short, the aging analysis helps identify potential 'problem' debtors and is an important tool for assessing both the valuation of accounts receivable and the collection risk.

Doubtful debtors

An important concept here is 'doubtful debtors'. We speak of a doubtful debtor when it is uncertain (doubtful) whether the customer will still pay the outstanding invoice. In other words: a doubtful debtor is a customer whose claim is presumably uncollectible.

Meaning of doubtful debtors

As long as an invoice falls within the normal payment term or only shortly thereafter, it is considered collectible. Only when an invoice is (much) too long overdue or there are concrete doubts about payment, is it reclassified from regular accounts receivable to doubtful debtors. If it ultimately turns out that an invoice is not paid at all, it is an uncollectible claim.

The difference between a collectible debtor and a doubtful debtor therefore lies in the assessment of the probability of payment: collectible items are expected to be collected, doubtful items are considered likely uncollectible.

Analysis

Now it's time for the analysis. We will show you step-by-step how to perform an aging analysis on outstanding items using Python (pandas). We use a (simplified) dummy dataset for this, but feel free to try it yourself with your own data!

Step 1: Load Data

For this example, we create a small dataset in Python. In practice, you would export this data from a financial system (e.g., as CSV/Excel) and import it with pandas. You would expect a minimum number of fields to be included in your dataset:

Customer – the name or code of the customer.
Invoice Date – the date the invoice was issued.
Due Date – the date by which payment must be received (e.g., 30 days after the invoice date). *
Amount – the outstanding amount of the invoice.

*If your company always uses a fixed payment term, you can also calculate this yourself based on the invoice date, for example.

We first import the pandas library and then manually create a DataFrame with example data:

import pandas as pd

# Example dataset with outstanding invoices
data = [
    {"Customer": "Customer A", "Invoicedate": "2024-08-01", "Duedate": "2024-09-01", "Amount": 4000},
    {"Customer": "Customer A", "Invoicedate": "2024-10-01", "Duedate": "2024-11-01", "Amount": 2500},
    {"Customer": "Customer A", "Invoicedate": "2024-11-01", "Duedate": "2024-12-01", "Amount": 3000},
    {"Customer": "Customer B", "Invoicedate": "2024-07-01", "Duedate": "2024-08-01", "Amount": 5000},
    {"Customer": "Customer B", "Invoicedate": "2024-10-15", "Duedate": "2024-11-15", "Amount": 1500},
    {"Customer": "Customer B", "Invoicedate": "2024-12-15", "Duedate": "2025-01-15", "Amount": 2000},
    # ... (more customers and invoices) ...
]

df = pd.DataFrame(data)
df.head()

In a real situation, you might also have columns such as Payment Date or an indicator of whether payment has already been made (partially). For this aging analysis, we suffice with outstanding items. Any payment behavior of the customer will be derived from how late their invoices are paid (or not paid) on average.

Step 2: Calculate the age of outstanding items

The next step is to determine how long each invoice has been outstanding. We first define a cut-off date for the analysis, for example, the end of 2024. Relative to this cut-off date, we calculate for each invoice the number of days that have passed since the due date. This indicates the age or delinquency of the item in days.

In Python, we can do this as follows:

# Set the cut-off date (analysis date)
cut_off_date = pd.to_datetime("2024-12-31")

# Convert date columns to actual date types
df["Invoicedate"] = pd.to_datetime(df["Invoicedate"])
df["Duedate"] = pd.to_datetime(df["Duedate"])

# Calculate the number of days overdue
df["days_overdue"] = (cut_off_date - df["Duedate"]).dt.days

# If the result is negative (due date after cut-off date), set to 0
df["days_overdue"] = df["days_overdue"].apply(lambda x: x if x > 0 else 0)

df[["Customer", "Invoicedate", "Duedate", "Amount", "days_overdue"]].head(6)

Here we subtract the due date from the cut-off date. If we get a positive number, it means that the invoice is overdue by that many days. If the result is negative, the due date is still in the future (so the invoice is not yet due on the cut-off date) – in that case, we set the number of days overdue to 0.

In the output, we now see a new column 'days_overdue'. For illustration, a few rows from a dataset:

Customer	Invoice Date	Due Date	Amount	Days Overdue
Customer A	2024-08-01	2024-09-01	€4000	121
Customer A	2024-10-01	2024-11-01	€2500	60
Customer A	2024-11-01	2024-12-01	€3000	30
Customer B	2024-07-01	2024-08-01	€5000	152
Customer B	2024-10-15	2024-11-15	€1500	46
Customer B	2024-12-15	2025-01-15	€2000	0

Let's clarify this. For Customer A, the invoice of August 1, 2024, for €4,000 was due on September 1, 2024. On the cut-off date of December 31, 2024, it is therefore 121 days overdue (more than 4 months late). The invoice of €2,500 (due date November 1, 2024) is 60 days overdue on December 31, and the invoice of €3,000 (due date December 1, 2024) is 30 days overdue. For Customer B, for example, we see the invoice of €5,000 with a due date of August 1, 2024, is 152 days overdue on December 31, 2024. In contrast, Customer B also has an invoice of €2,000 with a due date of January 15, 2025; on December 31, 2024, this is not yet due, so we record 0 days overdue (the payment term has not yet expired).

This column 'days_overdue' forms the basis for our aging categories.

Step 3: Categorize into aging buckets

In practice, outstanding accounts receivable amounts are often grouped into buckets based on their age. Commonly used buckets are, for example: 0-30 days, 31-60 days, 61-90 days, and >90 days overdue. This makes it easier to see at a glance how much is relatively recently outstanding and how much has been outstanding for a very long time.

We now add a 'Bucket' column to the dataset, based on the value in 'days_overdue'. In Python, we can use the pd.cut function to divide into classes:

# Define the boundaries (in days) and labels of the buckets
bins = [0, 30, 60, 90, float("inf")]
labels = ["0-30", "31-60", "61-90", ">90"]

# Divide the 'days_overdue' column according to these buckets
df["Bucket"] = pd.cut(df["days_overdue"], bins=bins, labels=labels, right=True, include_lowest=True)

df[["Customer", "Amount", "days_overdue", "Bucket"]].head(6)

The resulting Bucket column shows the age category for each invoice. Some examples from our data:

| Customer | Amount | Days Overdue | Bucket | |---|---|---|---|---| | Customer A | €4000 | 121 | >90 | | Customer A | €2500 | 60 | 31-60 | | Customer A | €3000 | 30 | 0-30 | | Customer B | €5000 | 152 | >90 | | Customer B | €1500 | 46 | 31-60 | | Customer B | €2000 | 0 | 0-30 |

We see that this is correct: an item that is 121 days overdue falls into the >90 bucket, 60 days falls into 31-60, 30 days into 0-30, etc. With this, we have the outstanding items ready for analysis by age category.

Step 4: Insight into the distribution (per customer and total)

Now that the data per invoice has been categorized, we can analyze the distribution of outstanding amounts across the buckets. This can be done at a total level (all customers combined) and at an individual customer level.

First, we look at the total. We group the dataset by the Bucket column and sum the amounts:

# Total outstanding amount per bucket
totaal_per_bucket = df.groupby("Bucket")["Amount"].sum()
print(totaal_per_bucket)

Suppose this (fictitious) result for our dataset is as follows:

Bucket	Sum Amount
0-30	€19,500
31-60	€4,000
61-90	€6,000
>90	€36,000

This distribution shows that, in total, the majority of outstanding items fall into the oldest category (>90 days). This is a potential risk: €36k in invoices has been outstanding for more than three months. A healthy accounts receivable portfolio ideally has the largest part of the outstanding items in the younger buckets (i.e., recently due or not yet due). A lot in the >90-day category can mean that accounts receivable management can be improved or that certain claims are problematic.

Next, we look at the distribution per customer. With a so-called pivot_table or multi-index groupby, we can sum the outstanding amount per customer and per bucket:

verdeling_Customer = df.pivot_table(index="Customer", columns="Bucket", values="Amount", aggfunc="sum", fill_value=0)
print(verdeling_Customer)

From our dummy data, for example, the following picture emerges (amounts per customer per bucket):

Customer A: €3,000 in 0-30 days, €2,500 in 31-60, €0 in 61-90, €4,000 in >90.

Customer B: €2,000 in 0-30, €1,500 in 31-60, €0 in 61-90, €5,000 in >90.

Customer C: €0 in 0-30, €0 in 31-60, €0 in 61-90, €20,000 in >90.

Here, Customer C clearly stands out: this customer has €20k outstanding, and everything is in the oldest category (>90 days). This indicates that Customer C pays very late or possibly not at all.

As an auditor, such a breakdown immediately provides direction: you can focus attention primarily on customers and items in the highest aging categories. For example, for Customer C, you want to know what is going on (are there disputes, is the customer in financial trouble, etc.), and whether that €20,000 is still collectible or whether a provision needs to be made for it.

Step 5: Detection of structurally poor payers

Finally, we can go a step further and look at payment behavior per customer over time. We analyze which customers consistently pay late, as opposed to a single late payment. With the aging data, we can do this in several ways:

A simple approach is to calculate the average age of outstanding invoices per customer. This gives one number per customer: high means that this customer takes a very long time on average for their (unpaid) invoices.

gemiddelde_dagen = df.groupby("Customer")["days_overdue"].mean()
slechte_betalers = gemiddelde_dagen[gemiddelde_dagen > 90]
print(slechte_betalers)

Suppose this results in:

Customer	Avg. Days Overdue
Customer C	594.5
Customer E	183.0

Here we see that Customer C is on average almost 595 days overdue and Customer E 183 days. These two stand out above the 90-day limit, meaning they consistently pay very late (Customer E only has one outstanding invoice in this example, but it has been outstanding for an extremely long time). Customers A and B also had an old invoice, but because they also have more recent outstanding items, their averages are around ~60-70 days.

An alternative approach is to look at the percentage of invoices or amount per customer that falls into the longest bucket. In our example, Customer C had 100% of their invoices in the >90 days category (2 out of 2 invoices), Customer E 100% (but 1 out of 1), while Customers A/B/D had around 33% of their invoices >90 days and Customer F 0%. These metrics also confirm that especially Customer C – and to a lesser extent E – consistently pays late.

In practice, you can map payment history even more accurately by also analyzing actually paid invoices. This way, you can calculate the average actual payment term or the median per customer and compare customers. In our example, we would, for example, conclude that Customer C is most likely a doubtful debtor (perhaps a provision needs to be made for this), while for other customers, the chance of collection is better.

Conclusion

So, we have seen how you can gain insight step-by-step with Python into which invoices have been outstanding for a long time and which customers may pose a credit risk. We started with a simple list of outstanding items and calculated the age of each item relative to a cut-off date. Then we divided the items into buckets (0-30, 31-60, 61-90, >90 days) to analyze the distribution of the outstanding amounts. This immediately revealed where the biggest risks lie. Now try it yourself with your own data!