Data Source: CAMT File
CAMT files are XML files used by banks to report transactions. We will use a CAMT.053 file, which contains bank statements.
Relevant fields:
- Beneficiary Name (
<Nm>
) - IBAN (
<IBAN>
) - Amount (
<Amt>
) - Date (
<BookgDt>
)
To download a test CAMT file for this analysis, click here.
For more information on the CAMT.053 format and its specifications, visit this external page.
Data Analysis in Python
To perform this analysis, we need the following Python packages:
pandas
for data manipulationxml.etree.ElementTree
for parsing XML files (included by default in Python)
If you haven’t installed pandas yet, you can do so using:
pip install pandas
Below, we walk through the process of reading a CAMT file and detecting duplicate IBANs.
1. Reading the CAMT File
We use xml.etree.ElementTree
to parse the XML file and extract relevant fields.
import xml.etree.ElementTree as ET
import pandas as pd
# Function to parse a CAMT.053 XML file
def parse_camt053(file_path):
tree = ET.parse(file_path) # Parse the XML file
root = tree.getroot() # Get the root element
ns = {'ns': 'urn:iso:std:iso:20022:tech:xsd:camt.053.001.02'} # Define XML namespace
transactions = []
for entry in root.findall('.//ns:Ntry', ns):
try:
amount = float(entry.find('ns:Amt', ns).text) if entry.find('ns:Amt', ns) is not None else 0.0
except ValueError:
amount = 0.0 # Fallback for conversion errors
currency = entry.find('ns:Amt', ns).attrib.get('Ccy', 'N/A') if entry.find('ns:Amt', ns) is not None else 'N/A'
date = entry.find('.//ns:BookgDt/ns:Dt', ns).text if entry.find('.//ns:BookgDt/ns:Dt', ns) is not None else '-'
debtor = entry.find('.//ns:Dbtr/ns:Nm', ns).text if entry.find('.//ns:Dbtr/ns:Nm', ns) is not None else '-'
debtor_iban = entry.find('.//ns:DbtrAcct/ns:Id/ns:IBAN', ns).text if entry.find('.//ns:DbtrAcct/ns:Id/ns:IBAN', ns) is not None else '-'
creditor = entry.find('.//ns:Cdtr/ns:Nm', ns).text if entry.find('.//ns:Cdtr/ns:Nm', ns) is not None else '-'
creditor_iban_element = entry.find('.//ns:CdtrAcct/ns:Id/ns:IBAN', ns)
creditor_iban = creditor_iban_element.text if creditor_iban_element is not None else '-'
# Store extracted transaction details in a list
transactions.append({
"Date": date,
"Debtor": debtor,
"Debtor IBAN": debtor_iban,
"Creditor": creditor,
"Creditor IBAN": creditor_iban,
"Amount": amount,
"Currency": currency
})
return pd.DataFrame(transactions) # Convert to a Pandas DataFrame
# Load and parse the CAMT.053 file
file_path = "camt053_dummy.xml"
df = parse_camt053(file_path)
print(df.head()) # Display the first few rows
2. Analyzing Duplicates
We group by beneficiary and count the number of unique IBANs.
# Group and count unique IBANs per beneficiary
duplicates = df.groupby('Creditor')['Creditor IBAN'].nunique().reset_index()
duplicates = duplicates[duplicates['Creditor IBAN'] > 1] # Filter only beneficiaries with multiple IBANs
# Display results
print("Beneficiaries with multiple IBANs:")
print(duplicates)
2.1 Analyzing IBANs with Multiple Beneficiaries
In addition to identifying beneficiaries with multiple IBANs, we can also analyze whether an IBAN is linked to multiple beneficiaries. This may indicate joint accounts or suspicious transactions.
# Group and count unique beneficiaries per IBAN
iban_duplicates = df.groupby('Creditor IBAN')['Creditor'].nunique().reset_index()
iban_duplicates = iban_duplicates[iban_duplicates['Creditor'] > 1] # Filter only IBANs with multiple beneficiaries
# Display results
print("IBANs linked to multiple beneficiaries:")
print(iban_duplicates)
3. Visualizing the Results
We can display the results in a table and export them to Excel.
# Export results to an Excel file
duplicates.to_excel("liquid_assets_duplicate_iban.xlsx", index=False)
iban_duplicates.to_excel("liquid_assets_duplicate_beneficiaries.xlsx", index=False)
Conclusion
Hopefully, this analysis has helped you gain insights into duplicate IBANs in transactions!