Using Logging in Auditing

Logging is like a "black box" in an airplane: it tells you what happened behind the scenes, when it happened, and (sometimes) by whom. This is extremely useful in an audit, as it can help map out processes or verify them. In this article, we will explore the different types of audit logs, what to consider before using them in your audit, and what checks you can perform to ensure a reliable dataset.

Logging in audit

What types of logging are commonly encountered?

In practice, various types of audit logs can be found. Below are a few examples and why they are interesting:

Access and user logging
- Think of login attempts (successful and unsuccessful), sessions started and ended, and changes in user permissions.
- It is useful to see who accessed the system and whether someone has excessive rights. For example, does someone have an outdated password or too many privileges? You can use logs to check if they have actually logged in.
Database and system logging
- Logs events at a lower level in the database or operating system (e.g., queries and table modifications, or changes to server configurations).
- Useful for verifying changes in the fundamental configuration or permissions structure.
Transaction and process logging
- These logs record activities related to specific business processes, such as creating a sales order, booking an invoice, modifying master data, or approving a payment.
- There are many examples of this. A common one is checking modifications to master data, such as vendor bank accounts or sales prices, but approvals are also relevant.

Depending on your objective, you may focus on one type of log or combine multiple sources.

System considerations

This all sounds great, but not everything is logged, and even if it is, you need to be critical. Here are some key considerations:

Log configuration

Check whether logging is enabled, properly configured, and capturing all relevant events. What gets logged by default varies by system, and sometimes end users can adjust the settings. If that’s the case, verify whether the correct settings have been applied.

What is logged? But also: what is not?

Too much or too little logging can create issues. Aim for a balance and log all critical events. You don’t want to store 100 GB of unnecessary data.

Log retention period

If logs are deleted after a short period, they are useless for your audit. Check how long they are retained and ensure timely storage.

What about GITC?

Logging is often not directly linked to other documentation, making a reliable system environment even more crucial. Changes in the system can impact logging (change management). If you want to use logs to track user actions, authentication must be robust. Also, if logging can be enabled or disabled, who has that right?

It is advisable to start early in determining whether logging is available and to check the above points. System administrators are not always focused on this and may need time to investigate.

What checks can be performed on logging?

Once you have the desired logs, you want to ensure completeness. Since logs are often not cross-referenced with other documentation, their reliability must be established differently.

1. Do you have everything?

Many systems indicate the number of records exported. Compare this with your dataset to ensure no records are missing.

2. Unique & sequential IDs

Each event should have a unique ID. Sometimes these IDs are sequential, allowing you to check for gaps. If you expect IDs from 1 to 1000 but find 435-450 missing, that indicates missing records.

3. Integrity and consistency

Ensure that all necessary fields are included and consistently populated.

4. Time and date validation

Check the timestamp field. Are timestamps chronological? Are there unexplained gaps in activity?

Example checks using Python (with Pandas)

For reference, here are some Python-based checks implementing the above points using pandas.

import pandas as pd
from datetime import datetime

def check_completeness(df):
    # Example check: counting if the number of records makes sense.
    total_records = len(df)
    print(f"Total log entries: {total_records}")
    
    # Suppose we expect 9500 records
    if total_records != 9500:
        print("Some log records might be missing!")
    else:
        print("The number of log records appears correct.")

def check_ids(df):
    # Check for unique and sequential IDs
    if 'event_id' not in df.columns:
        print("No 'event_id' column found!")
        return
    
    # Check if 'event_id' is unique
    duplicates = df[df.duplicated(subset=['event_id'], keep=False)]
    if not duplicates.empty:
        print("Duplicate event_id found!")
        print(duplicates)
    
    # Check for missing sequential IDs (optional, depends on the system)
    df_sorted = df.sort_values('event_id')
    expected_range = range(df_sorted['event_id'].min(), df_sorted['event_id'].max() + 1)
    actual_ids = df_sorted['event_id'].tolist()
    missing_ids = set(expected_range) - set(actual_ids)
    if missing_ids:
        print(f"Missing event_ids in sequence: {missing_ids}")

def check_timestamps(df):
    # Ensure timestamps are sequential and within expected range
    if 'timestamp' not in df.columns:
        print("No 'timestamp' column found!")
        return
    
    # Try parsing the timestamp column as datetime
    try:
        df['timestamp_parsed'] = pd.to_datetime(df['timestamp'])
    except ValueError as e:
        print(f"Could not parse timestamps: {e}")
        return
    
    # Check chronological order (optional, depends on scenario)
    df_sorted = df.sort_values('timestamp_parsed')
    if not df_sorted['timestamp_parsed'].is_monotonic_increasing:
        print("Warning: timestamps are not strictly increasing.")

def main():
    log_path = 'my_logging.csv'
    df = pd.read_csv(log_path)
    check_completeness(df)
    check_ids(df)
    check_timestamps(df)

if __name__ == '__main__':
    main()

Conclusion

That’s it! Do you already have an idea of how to use logging in your audit?