An important point of attention within the audit of cash and bank balances is the use of multiple bank accounts by a single beneficiary. Sometimes this is perfectly normal, for example, when an organization has different accounts for currencies or subsidiaries, but it can also indicate administrative inefficiencies, payment processing errors, or deliberate attempts to conceal or split payments.
In this article, we’ll perform this data analysis. It’s actually a very simple analysis, something you could also do in Excel, but it’s an important and commonly used check. By performing this analysis, you can:
- Quickly detect duplicate payments or unauthorized bank accounts
- Identify inconsistencies in master data (vendors, employees, customers)
When this analysis is performed automatically based on bank data (for example, through a CAMT export), it can significantly reduce manual review work.
CAMT File
I mentioned that this is a fairly simple analysis, but sometimes the source file can make it trickier. Especially in Europe, CAMT files are now often used. CAMT files are XML files used by banks to report transactions. That’s great because it introduces standardization, but converting CAMT files can sometimes feel like solving a puzzle.
You can find more details about the format here, and in another article, you’ll find a converter to Excel.
Batch Details
Unfortunately, many CAMT exports don’t yet include batch details. When an organization uses them (for example, for payroll payments), you’ll be missing that information. Sometimes these details are still available in the client’s ERP system. If you can extract them separately, you could combine them with the results of the CAMT converter.
Data Analysis in Python
Now let’s dive into the technical part. For this analysis, we’ll use Python. The scripts are easy to follow and can handle large datasets. We’ll use one main package:
pandas
for data manipulation
You can find more information about pandas in audit analytics on this website.
1. Reading the CAMT File
First, we’ll read the CAMT file. In the script below, I’ve kept it fairly straightforward. More information about the CAMT converter can be found in another article.
import xml.etree.ElementTree as ET
import pandas as pd
def parse_camt053(file_path):
tree = ET.parse(file_path)
root = tree.getroot()
ns = {'ns': 'urn:iso:std:iso:20022:tech:xsd:camt.053.001.02'}
transactions = []
for entry in root.findall('.//ns:Ntry', ns):
try:
amount = float(entry.find('ns:Amt', ns).text)
except:
amount = 0.0
currency = entry.find('ns:Amt', ns).attrib.get('Ccy', 'N/A')
date = entry.find('.//ns:BookgDt/ns:Dt', ns).text
debtor = entry.find('.//ns:Dbtr/ns:Nm', ns).text
debtor_iban = entry.find('.//ns:DbtrAcct/ns:Id/ns:IBAN', ns).text
creditor = entry.find('.//ns:Cdtr/ns:Nm', ns).text
creditor_iban = entry.find('.//ns:CdtrAcct/ns:Id/ns:IBAN', ns).text
transactions.append({
"Date": date,
"Debtor": debtor,
"Debtor IBAN": debtor_iban,
"Creditor": creditor,
"Creditor IBAN": creditor_iban,
"Amount": amount,
"Currency": currency
})
return pd.DataFrame(transactions)
file_path = "camt053_dummy.xml"
df = parse_camt053(file_path)
print(df.head())
2. Analyzing Duplicates
Once we’ve converted the CAMT file, the actual analysis is basically just creating a pivot table. With Python, we can use groupby
and nunique()
to count how many unique IBANs each creditor has.
# Group and count unique IBANs per beneficiary
duplicates = df.groupby('Creditor')['Creditor IBAN'].nunique().reset_index()
duplicates = duplicates[duplicates['Creditor IBAN'] > 1] # Filter beneficiaries with multiple IBANs
print("Beneficiaries with multiple IBANs:")
print(duplicates)
The result might look like this:
Creditor | Creditor IBAN |
---|---|
Durban Inc. | 3 |
City Escapes BV | 2 |
Math Engines Inc. | 2 |
In that case, you know to check whether these duplicate accounts are legitimate.
2.1 IBANs Linked to Multiple Beneficiaries
In addition to identifying beneficiaries with multiple IBANs, we can also analyze whether an IBAN is linked to multiple beneficiaries. This may indicate shared accounts or suspicious transactions.
# Group and count unique beneficiaries per IBAN
iban_duplicates = df.groupby('Creditor IBAN')['Creditor'].nunique().reset_index()
iban_duplicates = iban_duplicates[iban_duplicates['Creditor'] > 1] # Filter IBANs linked to multiple beneficiaries
print("IBANs linked to multiple beneficiaries:")
print(iban_duplicates)
Example result:
Creditor IBAN | Creditor |
---|---|
DE12DEUT123456789 | 3 |
9876543123 | 2 |
NL98ABNA987654321 | 2 |
In this case, these accounts are used under multiple names. That could be just a typo or punctuation, but it could also be deliberate.
3. Exporting the Results
After performing these analyses, it’s best to export the results to Excel so you can include them in your audit documentation and annotate any findings.
duplicates.to_excel("liquidity_duplicate_ibans.xlsx", index=False)
iban_duplicates.to_excel("liquidity_multiple_beneficiaries.xlsx", index=False)
Fuzzy Logic
This analysis is very strict — which is good — but it can also create extra work. For instance, if one record says “Brood B.V.” and another says “Brood BV,” it’s the same company, but the text differs, so this analysis flags it as an issue.
You can also use Fuzzy Logic. Fuzzy logic (or “fuzzy matching”) is a technique that doesn’t look for exact matches but for the likelihood of a match. Instead of saying “these two texts are 100% or 0% identical,” it calculates a similarity score (for example, 95%). This is particularly useful for detecting duplicate payments or vendors, because real-world data is rarely perfect.
Think of variations like:
- Typos: “Jansen” vs. “Janssen”.
- Order differences: “Jan de Vries” vs. “Vries, Jan de”.
A strict analysis would treat these as unique. A fuzzy logic analysis can group them as “probably the same entity” or “likely a duplicate payment.”
Fuzzy Logic in Practice
Implementing a reliable fuzzy matching model requires specialized Python libraries (such as thefuzz
or recordlinkage
) and careful consideration of which fields to compare and which threshold to set for a “match.”
To help you apply this advanced technique effectively, the CAMT Analytics Toolkit includes a ready-to-use script. It performs both the traditional and fuzzy logic methods, bridging the gap between strict analysis and smarter duplicate detection.