Tracking database changes in Flask Sqlalchemy for audit logs

If you ever come across a situation where you need to track the changes in the database for auditing purposes or to make the history of all changes in the database. This kind of requirement often comes when we need to adhere to some kind of regulation or we want to have a trial of all data changes so customer support can help the user.

There are many ways we can implement this kind of audit log. I only know two ways but I am sure that there are more than 2 ways. Let me know if you another way to solve the same problem.

In this blog, we will talk about how to solve this problem using Flask Sqlalchemy. But we need an application to start with. You can check out the code used for this blog on my Github repo here.

# app.py

from flask import Flask, abort, redirect, url_for
from flask import render_template
from flask_sqlalchemy import SQLAlchemy
from flask_wtf import FlaskForm
from wtforms import StringField
from wtforms.validators import DataRequired

app = Flask(__name__, template_folder='./templates')
app.config['DEBUG'] = True
app.config['SECRET_KEY'] = "randomkey"
app.config['SQLALCHEMY_DATABASE_URI'] = 'sqlite:///test.db'
db = SQLAlchemy(app)

class User(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    first_name = db.Column(db.String(80), nullable=False)
    last_name = db.Column(db.String(120), nullable=False)

class UserForm(FlaskForm):
    first_name = StringField('First Name', validators=[DataRequired()])
    last_name = StringField('Last Name', validators=[DataRequired()])

@app.route('/')
def index():
    users = User.query.all()
    return render_template('index.html', users=users)


@app.route('/create', methods=['GET', 'POST'])
def create():
    userCreateForm = UserForm()
    if userCreateForm.validate_on_submit():
        user = User(first_name=userCreateForm.first_name.data, last_name=userCreateForm.last_name.data)
        db.session.add(user)
        db.session.commit()
        return redirect(url_for('.index'))
    return render_template('create.html', form=userCreateForm)

@app.route('/edit/<user_id>', methods=['GET', 'POST'])
def edit(user_id):
    user = User.query.get(user_id)
    if not user:
        abort(404)
    userEditForm = UserForm(first_name=user.first_name, last_name=user.last_name)
    if userEditForm.validate_on_submit():
        user.first_name = userEditForm.first_name.data
        user.last_name = userEditForm.last_name.data
        db.session.commit()
        return redirect(url_for('.index'))
    return render_template('edit.html', form=userEditForm)

Important Note: This is just an example app that's why all the configs are in plain text including secret key. Don't do that in the real application.

This small application allows us to add and edit users with first and last names. Ignore the application's CSS styling because in this application we only care about its backend.

let's start building audit logs. First, we will add one before the request hook to Flask to generate a unique ID for each request. We will use this ID to batch all the changes into one. So in the future, if any request makes more than one change we will be able to detect that they were all done by a single request.

@app.before_request
def before_request_handler():
    g.request_id = uuid4()

So every time when the user makes a request the above code will generate one UUID and attach it to the g object which will be globally available for use throughout the request.

We also need one more table in the database to store all the history of changes to make audit logs. For simplicity, we are making AuditLog table in the same database in which we are storing all of our data. But you can have one more database just to store your audit logs. You don't even have to use a database to store audit logs you can even write them in the file or just send them to other services. Just replace the database insertion code with your code. Here is the Sqlachemy model of AuditLog table

class AuditLog(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    request_id = db.Column(db.String(80), nullable=False)
    history = db.Column(db.Text(), nullable=True)
    model_name = db.Column(db.String(80), nullable=False)
    original_id = db.Column(db.Integer)
    db_event_name = db.Column(db.String(80), nullable=False)

Now comes the most important part how to track database changes. For that, we will be using events that are provided by the Sqlalchemy library.

@event.listens_for(db.session, 'after_flush')
def db_after_flush(session, flush_context):
    for instance in session.new:
        if isinstance(instance, AuditLog):
            continue
        al = AuditLog(request_id=str(g.request_id),model_name="user", original_id=instance.id,  db_event_name="create")
        session.add(al)

@event.listens_for(db.session, 'before_flush')
def db_before_flush(session, flush_context, instances):
    for instance in session.dirty:
        if isinstance(instance, AuditLog):
            continue
        if isinstance(instance, User):
            history = {}
            if get_history(instance, 'first_name').deleted:
                history['first_name'] = get_history(instance, 'first_name').deleted[0]
            if get_history(instance, 'last_name').deleted:
                history['last_name'] = get_history(instance, 'last_name').deleted[0]
            if len(history):
                al = AuditLog(request_id=str(g.request_id),model_name="user", original_id=instance.id,  db_event_name="edit", history=json.dumps(history))
                session.add(al)
    for instance in session.deleted:
        if isinstance(instance, AuditLog):
            continue
        if isinstance(instance, User):
            history = {
                'first_name': get_history(instance, 'first_name').unchanged[0],
                'last_name':  get_history(instance, 'last_name').unchanged[0]
            }
            if len(history):
                al = AuditLog(request_id=str(g.request_id), model_name="user", original_id=instance.id,  db_event_name="delete", history=json.dumps(history))
                session.add(al)

Here you can see that we are listening to two events getting emitted by Sqlalchemy. The first one is after_flush. This is event is emitted after the data has been flushed to the database as you can guess. We are using this event only to make a log when a new row is been creating. The reason is that we need the primary id of the row which is getting inserted. The id is generated after flushing the data into the database. Once the data is flushed we can get the id for instance and store it in the audit log. The adding audit log row into session will again cause the flush event. That's why we are only using after flush event only when we are creating new rows because we don't want to fire extra flush events when we don't have to.

The second event we are listening to is before flush event. This event is fired before the data is going to be flushed into the database. Here we can check whether the entity is going to be deleted or the entity is modified. The all the modified entities will be available in session.dirty(which is an object of IdentitySet) and all the deleted entities will be available in session.deleted(also object of IdentitySet). You call loop through all the instances that are getting modified or deleted and check which value has been changed using get_history function which is provided by the Sqlalchemy library.

The one important thing is here that we are using the same database transaction to make audit logs in which we are changing data. So if the transaction is failed you don't have to worry about the wrong audit logs getting written. But this will only happen when you are using the same database for your data and audit logs. If you use two different databases then it will be your responsibility to stop audit logs if the transaction fails.

There is one optimization you can do. The first one is to store UUID as Postgres native UUID datatype instead of storing it as string.

If you don't want to use Sqlalchemy to make audit logs you can also use the database's native triggers to do the same thing depends on which database you are using. Almost all modern database provides ways to write database triggers.

I would recommend the use of application-level logging because it gives you more flexibility. You can do other operations on data before storing them. The second benefit is that you can unit test your audit logs so you can make sure all the things are getting stored the way you wanted. They are database-independent.