Process Mining in Practice: Identifying Payments Before Receipt with Python

Process mining is often sold as a complex technology with expensive licenses, but the core is surprisingly simple: looking at sequence. In this article, we leave the theory behind and dive into practice. How do you use a simple dataset to expose a concrete risk?

Process mining example

Imagine: you are auditing a trading company. The procurement procedure is clear:

An order is placed (Purchase Order).
The goods arrive (Goods Receipt).
The invoice is posted (Invoice).
Only when those three match, the payment follows (Payment).

Management assures you that the system enforces this, but the IT auditors have already established that manual changes can still be made in the system afterwards. And payments can also be created separately. However, reliable logging is available.

You have received an export of 50,000 transactions from the ERP, which we can get started with.

Defining the Norm

Before we dive into the technique, let's sketch out what we want to see. We aren't looking for a pretty picture of the entire process. We are looking for deviations from our norm.

Our norm is simple: PO -> GR -> INVOICE -> PAYMENT.

Anything that deviates from this is a potential risk. We want a list that looks like this:

Case ID	Sequence	Judgment
PO-1001	`PO -> GR -> INV -> PAY`	According to norm
PO-1002	`PO -> INV -> PAY`	No receipt!
PO-1003	`PO -> INV -> PAY -> GR`	Paid before receipt!

How Do You Do This?

You don't need specialized process mining software for this. With a bit of logic and a standard script (in Python, for example), you can already get there.

The trick is to organize your data well first. A computer doesn't understand the "process", but it does understand "time" and "sorting".

Step 1: Preparing the Data

We need a simple table with three columns:

Case ID: What belongs together? (e.g., Purchase Order number)
Activity: What happened? (e.g., 'Goods Receipt')
Timestamp: When did it happen?

Step 2: The Logic

Below you can see how to approach this in Python (find more about Python here). Don't be put off by the code; read the comments to follow the logic.

import pandas as pd

# 1. Read the data
df = pd.read_csv("purchase_data.csv")

# Ensure the script understands that 'Timestamp' is a date
df["Timestamp"] = pd.to_datetime(df["Timestamp"])

# 2. Sort everything: First by Purchase Order, then by Time
df_sorted = df.sort_values(["CaseID", "Timestamp"])

# 3. Create a 'chain' of activities per Purchase Order
# This turns individual rows into one readable sentence per order
process_paths = (
    df_sorted.groupby("CaseID")["Activity"]
    .apply(list)  # Create a list of all steps
    .reset_index(name="ExecutedPath")
)

# Make it readable text, for example "PO -> GR -> INV"
process_paths["Path_Text"] = process_paths["ExecutedPath"].apply(lambda x: " -> ".join(x))

print(process_paths.head())

With this small piece of code, you have mapped out the exact progression for each of the 50,000 orders.

Step 3: Determining Deviations

Now we can ask the question that really matters: Which orders deviate from our norm?

# Define what we expect (The Norm)
NORM_PATH = "PO_CREATED -> GOODS_RECEIPT -> INVOICE_POSTED -> PAYMENT"

# Filter the cases that do NOT meet the norm
deviations = process_paths[process_paths["Path_Text"] != NORM_PATH].copy()

print(f"Number of deviations found: {len(deviations)}")
print("Most common deviating patterns:")
print(deviations["Path_Text"].value_counts().head(5))

Follow-up

Suppose the script runs and spits out that there are 450 cases that deviate. What now?

You see, for example:

150x PO -> INV -> PAY: Invoices paid without goods receipt.
- Possible action: Is this service procurement (rent, consultancy)? Then it makes sense. Is it procurement of laptops? Then you have a major internal control issue.
20x PO -> INV -> PAY -> GR: Goods arrived only after payment.
- Possible action: Were these prepayments? Allowed according to procedure? Or is the administration just slow at booking?
5x INV -> PAY: No PO seen.
- Possible action: These are 'maverick buying' cases or ghost invoices. You pick these 5 out for full detailed investigation.

Conclusion

Process mining in auditing doesn't have to be abstract. By making process sequences explicit, your focus shifts from "does the procedure exist?" to "is the procedure being followed?". Handy, right?

Process mining If you are being strict, we aren't applying full process mining in this article, but an audit-focused process sequence analysis: a light form of process mining where we concretely test whether a certain sequence occurs in the process. Strictly speaking, process mining is broader and maps out all possible processes, which also immediately gives you insight into whether you have other important side streams; however, this requires much more capacity and is unnecessary for many smaller audit clients. In this article, the focus is therefore more on compliance rather than process discovery.