
Imagine: you are auditing a trading company. The procurement procedure is clear:
- An order is placed (Purchase Order).
- The goods arrive (Goods Receipt).
- The invoice is posted (Invoice).
- Only when those three match, the payment follows (Payment).
Management assures you that the system enforces this, but the IT auditors have already established that manual changes can still be made in the system afterwards. And payments can also be created separately. However, reliable logging is available.
You have received an export of 50,000 transactions from the ERP, which we can get started with.
Defining the Norm
Before we dive into the technique, let's sketch out what we want to see. We aren't looking for a pretty picture of the entire process. We are looking for deviations from our norm.
Our norm is simple: PO -> GR -> INVOICE -> PAYMENT.
Anything that deviates from this is a potential risk. We want a list that looks like this:
| Case ID | Sequence | Judgment |
|---|---|---|
| PO-1001 | PO -> GR -> INV -> PAY | According to norm |
| PO-1002 | PO -> INV -> PAY | No receipt! |
| PO-1003 | PO -> INV -> PAY -> GR | Paid before receipt! |
How Do You Do This?
You don't need specialized process mining software for this. With a bit of logic and a standard script (in Python, for example), you can already get there.
The trick is to organize your data well first. A computer doesn't understand the "process", but it does understand "time" and "sorting".
Step 1: Preparing the Data
We need a simple table with three columns:
- Case ID: What belongs together? (e.g., Purchase Order number)
- Activity: What happened? (e.g., 'Goods Receipt')
- Timestamp: When did it happen?
Step 2: The Logic
Below you can see how to approach this in Python (find more about Python here). Don't be put off by the code; read the comments to follow the logic.
import pandas as pd
# 1. Read the data
df = pd.read_csv("purchase_data.csv")
# Ensure the script understands that 'Timestamp' is a date
df["Timestamp"] = pd.to_datetime(df["Timestamp"])
# 2. Sort everything: First by Purchase Order, then by Time
df_sorted = df.sort_values(["CaseID", "Timestamp"])
# 3. Create a 'chain' of activities per Purchase Order
# This turns individual rows into one readable sentence per order
process_paths = (
df_sorted.groupby("CaseID")["Activity"]
.apply(list) # Create a list of all steps
.reset_index(name="ExecutedPath")
)
# Make it readable text, for example "PO -> GR -> INV"
process_paths["Path_Text"] = process_paths["ExecutedPath"].apply(lambda x: " -> ".join(x))
print(process_paths.head())
With this small piece of code, you have mapped out the exact progression for each of the 50,000 orders.
Step 3: Determining Deviations
Now we can ask the question that really matters: Which orders deviate from our norm?
# Define what we expect (The Norm)
NORM_PATH = "PO_CREATED -> GOODS_RECEIPT -> INVOICE_POSTED -> PAYMENT"
# Filter the cases that do NOT meet the norm
deviations = process_paths[process_paths["Path_Text"] != NORM_PATH].copy()
print(f"Number of deviations found: {len(deviations)}")
print("Most common deviating patterns:")
print(deviations["Path_Text"].value_counts().head(5))
Follow-up
Suppose the script runs and spits out that there are 450 cases that deviate. What now?
You see, for example:
- 150x
PO -> INV -> PAY: Invoices paid without goods receipt.- Possible action: Is this service procurement (rent, consultancy)? Then it makes sense. Is it procurement of laptops? Then you have a major internal control issue.
- 20x
PO -> INV -> PAY -> GR: Goods arrived only after payment.- Possible action: Were these prepayments? Allowed according to procedure? Or is the administration just slow at booking?
- 5x
INV -> PAY: No PO seen.- Possible action: These are 'maverick buying' cases or ghost invoices. You pick these 5 out for full detailed investigation.
Conclusion
Process mining in auditing doesn't have to be abstract. By making process sequences explicit, your focus shifts from "does the procedure exist?" to "is the procedure being followed?". Handy, right?
Process mining If you are being strict, we aren't applying full process mining in this article, but an audit-focused process sequence analysis: a light form of process mining where we concretely test whether a certain sequence occurs in the process. Strictly speaking, process mining is broader and maps out all possible processes, which also immediately gives you insight into whether you have other important side streams; however, this requires much more capacity and is unnecessary for many smaller audit clients. In this article, the focus is therefore more on compliance rather than process discovery.

