Integration testing of business logic using dockerized PostgreSQL with Test Containers in C#

Testing is hard. Especially writing good tests. In this article, I’ll show you a simple way of creating integration tests with the real…

Mar 10, 2024

Testing is hard. Especially writing good tests. In this article, I’ll show you a simple way of creating integration tests with the real database. In my case it’s applied on a cronjob (so a regular console application), but with small modifications it should be also easily applicable into other application types.

Here you can find the whole project on GitHub

Business logic & application

Imagine you have a pretty simple cronjob to create. It’s a small console application which grabs the data from one table in the database, transforms it, and then inserts the data back into the database. In our case, it will be logs. We have a table into which raw logs are being collected. You can think of it as of some automatically gathered logs of all the events happening in the production cluster, which needs to be filtered and cleaned up in order to store & browse them long term, to save the space.

create table raw_logs
(
    id uuid primary key,
    message text not null,
    severity int not null,
    source text not null,
    timestamp timestamp not null,
    additional_field1 text,
    additional_field2 text
);

Most of the fields from the raw_logs table are not interesting for us, we only care about message and severity. Message is just a message associated with log, and severity is enum stored as int values, being:

Info = 1,
Warning = 2,
Error = 3,

We have a task to filter and transform the data, so that another application can process it without knowing the exact details of the raw logs. The rules are as following:

if log is info, we don’t care about it
if log is warning, we only care about the amount of such logs
if log is error, we care about its message

And we want to save the result into another PostgreSQL table. So let’s get into implementation of the C# application. Foremost, we need a way to read raw logs from the DB. We need a DB DTO:

public class RawLog
{
    public readonly Guid Id;
    public readonly string Message;
    public readonly Severity Severity;
}

private  RawLog(Guid id, string message, Severity severity)
{
    Id = id;
    Message = message;
    Severity = severity;
}

And a function reading the logs in DatabaseReader class:

public async Task<List<RawLogDto>> ReadRawData()
{
    string sql = $"select * from raw_logs";
    var query = await _connection.QueryAsync<RawLogDto>(sql);
    return query.ToList();
}

So now we have the raw logs, now we want to filter out only the ones we care about — warnings and errors. We don’t want to operate on DB Dtos so in this step we also map DB Dtos into another objects. We do this in the Worker class, which will be the main point of our logic.

private static IEnumerable<ITransformResult> Transform(List<RawLogDto> data)
    {
        var logs = data
            .Select(RawLog.FromDto)
            .Select(x =>
            {
                ITransformResult result = x.Severity switch
                {
                    Severity.Unknown => throw new InvalidOperationException("Unknown log severity"),
                    Severity.Info => new DropTransformResult(),
                    Severity.Warning => new WarningTransformResult(),
                    Severity.Error => new ErrorTransformResult(x.Id, x.Message),
                    _ => throw new ArgumentOutOfRangeException()
                };
                return result;
            });
        return logs;
    }

Not only that, but we also need to add mapping function from DB Dto to business logic object:

public static RawLog FromDto(RawLogDto dto)
{
    return new RawLog(dto.Id, dto.Message, (Severity) dto.Severity);
}

Now as we have transformed logs, we want to convert them into the final DB-writable DTOs in a final form, so: if the log is error we want each individual messages, if it’s warning we only want count.

private static IEnumerable<LogWriteDbDto> ToDbDto(IEnumerable<ITransformResult> data)
{
    foreach (var d in data)
    {
        switch (d)
        {
            case ErrorTransformResult e:
                yield return new LogWriteDbDto(Severity.Error, e.Message);
                break;
            case WarningTransformResult:
                yield return new LogWriteDbDto(Severity.Warning, null);
                break;
            case DropTransformResult:
                break;
            default:
                throw new InvalidOperationException("Unknown worker result");
        }
    }
}

And we also need to have the LogWriteDbDto:

public class LogWriteDbDto(Severity Severity, string? Message)
{
    public Guid Id { get; init; } = Guid.NewGuid();
    public Severity Severity { get; init; } = Severity;
    public string? Message { get; init; } = Message;
}

Finally, as we have the logs transformed and processed, we need to have a DatabaseWriter:

public async Task WriteTransformedLogs(List<LogWriteDbDto> dto)
{
    var sql = "INSERT INTO transformed_logs (id, severity, message) VALUES (@id, @severity, @message)";
    await _connection.ExecuteAsync(sql, dto);
}

With the writer, here comes the SQL table:

create table transformed_logs
(
    id uuid primary key,
    message text,
    severity int not null
);

And now as we have all the pieces, we can patch them up all together:

public async Task DoAsync()
{
    var data = await _databaseReader.ReadRawData();
    var transformed = Transform(data);
    var dbDtos = ToDbDto(transformed);
    await _databaseWriter.WriteTransformedLogs(dbDtos.ToList());
}

And then in Program.cs we can have all setup, configuration, HostBuilder, etc., and ultimately we have one entry point: Worker.DoAsync(), which is responsible for all the business logic.

In my case, the whole Program.cs looks like this:

// Connection string of local databse, hosted on inside Docker
const string connectionString = "Host=localhost:5432;Username=postgres;Password=admin;Database=postgres";
await using var databaseReader = new DatabaseReader(connectionString);
await using var databaseWriter = new DatabaseWriter(connectionString);
var worker = new Worker(databaseReader, databaseWriter);
await worker.DoAsync();

Console.WriteLine("Goodbye");

Testing

Now as we have the application working, we can get into the testing part. Although testing the whole application would be nice, I think it would be way more complicated than just testing the Worker.DoAsync() method, using the actual PostgreSQL database.

First, we need to be able to spin up the PostgreSQL container. For this I’m using the TestContainers and TestContainers.PostgreSql libraries:

public PostgreSqlContainer Get()
{
    return new PostgreSqlBuilder()
        .WithImage("postgres:16")
        .WithCleanUp(true)
        .Build();
}

As we have the container, we can already use it in our integration test:

[Fact]
public async Task LogsTransformationTest()
{
    // Arrange
    var container = new TestContainerFactory().Get();
    await container.StartAsync();
}

As a prerequisite to these tests, we need to have a working DB with the schema identical to the production one. We can achieve this by storing the whole schema in a schema.sql file, for the ease of use I've put it into test's project, however it's possible to add it as a solution folder so that it's accessible from multiple projects. In our case, schema.sql is pretty short as it just creates two tables:

create table raw_logs
( .. );

create table transformed_logs
( .. );

And we also need to have some test data in the raw_logs table. I have a second file, called data.sql:

insert into raw_logs (id, message, severity, source, timestamp, additional_field1, additional_field2) values ('c154b819-3e7b-4fa8-bf8b-5de0b7e35d3f', 'Info log 1', 1, 'Source 1', '2024-03-08 12:00:00', 'Value 1', 'Value 2');
insert into raw_logs (id, message, severity, source, timestamp, additional_field1, additional_field2) values ('f6a4dc05-2b70-44b6-90cb-238bb6f19a93', 'Warning log 1', 2, 'Source 2', '2024-03-08 12:10:00', 'Value 3', 'Value 4');
insert into raw_logs (id, message, severity, source, timestamp, additional_field1, additional_field2) values ('ecb27850-1a6c-4b2d-aa30-63f94ff541f8', 'Error log 1', 3, 'Source 3', '2024-03-08 12:20:00', 'Value 5', 'Value 6');
insert into raw_logs (id, message, severity, source, timestamp, additional_field1, additional_field2) values ('bae4b6b2-8e46-4224-967a-40528ed06a68', 'Info log 2', 1, 'Source 4', '2024-03-08 12:30:00', 'Value 7', 'Value 8');
insert into raw_logs (id, message, severity, source, timestamp, additional_field1, additional_field2) values ('a8cfdb45-d6c1-4814-a65c-58d942f52a69', 'Warning log 2', 2, 'Source 5', '2024-03-08 12:40:00', 'Value 9', 'Value 10');
insert into raw_logs (id, message, severity, source, timestamp, additional_field1, additional_field2) values ('c3bf6d44-515d-4945-b3b7-810620fb3e82', 'Error log 2', 3, 'Source 6', '2024-03-08 12:50:00', 'Value 11', 'Value 12');
insert into raw_logs (id, message, severity, source, timestamp, additional_field1, additional_field2) values ('cfd5dab5-1342-47b8-8c96-04ec5e5795ec', 'Info log 3', 1, 'Source 7', '2024-03-08 13:00:00', 'Value 13', 'Value 14');
insert into raw_logs (id, message, severity, source, timestamp, additional_field1, additional_field2) values ('c68b5e19-1097-4ec3-967e-67190d22c3f5', 'Warning log 3', 2, 'Source 8', '2024-03-08 13:10:00', 'Value 15', 'Value 16');
insert into raw_logs (id, message, severity, source, timestamp, additional_field1, additional_field2) values ('eddb52b0-70d7-45b3-9fb4-c40ddbf5ec9b', 'Error log 3', 3, 'Source 9', '2024-03-08 13:20:00', 'Value 17', 'Value 18');
insert into raw_logs (id, message, severity, source, timestamp, additional_field1, additional_field2) values ('1b9e8608-fa0b-4b4c-98cf-96b84c4bf8a6', 'Info log 4', 1, 'Source 10', '2024-03-08 13:30:00', 'Value 19', 'Value 20');
insert into raw_logs (id, message, severity, source, timestamp, additional_field1, additional_field2) values ('b01d1e47-dcb4-472f-b6f4-7b20b14e2b70', 'Warning log 4', 2, 'Source 11', '2024-03-08 13:40:00', 'Value 21', 'Value 22');
insert into raw_logs (id, message, severity, source, timestamp, additional_field1, additional_field2) values ('e4c5245b-8631-4f1e-93ee-42b2e87eab07', 'Error log 4', 3, 'Source 12', '2024-03-08 13:50:00', 'Value 23', 'Value 24');
insert into raw_logs (id, message, severity, source, timestamp, additional_field1, additional_field2) values ('3dfe7d9d-dcaa-438f-9a2d-8e10962886b0', 'Info log 5', 1, 'Source 13', '2024-03-08 14:00:00', 'Value 25', 'Value 26');
insert into raw_logs (id, message, severity, source, timestamp, additional_field1, additional_field2) values ('293fb847-9cfb-4a4c-b22d-6f2a1b319b16', 'Warning log 5', 2, 'Source 14', '2024-03-08 14:10:00', 'Value 27', 'Value 28');
insert into raw_logs (id, message, severity, source, timestamp, additional_field1, additional_field2) values ('f2430cf6-1c6e-4343-9b73-b3b594b62814', 'Error log 5', 3, 'Source 15', '2024-03-08 14:20:00', 'Value 29', 'Value 30');

Now we need to read .sql files and apply them onto container:

public async Task ExecuteScriptFromFile(string connectionString, string path)
{
    var fileText = await FileReader.ReadFile(path);
    await using var connection = new NpgsqlConnection(connectionString);
    await connection.OpenAsync();
    await connection.ExecuteAsync(fileText);
}

[Fact]
public async Task LogsTransformationTest()
{
    ...
    var testDbWriter = new TestDbWriter();
    string schemaPath = @"./DatabaseScripts/schema.sql";
    string dataPath = @"./DatabaseScripts/data.sql";
    await testDbWriter.ExecuteScriptFromFile(container.GetConnectionString(), schemaPath);
    await testDbWriter.ExecuteScriptFromFile(container.GetConnectionString(), dataPath);
}

Now we can add worker factory and call it from test:

public static class WorkerFactory
{
    public static Worker Get(string connectionString)
    {
        var databaseReader = new DatabaseReader(connectionString);
        var databaseWriter = new DatabaseWriter(connectionString);
        return new Worker(databaseReader, databaseWriter);
    }
}

[Fact]
public async Task LogsTransformationTest()
{
    ...
    var worker = WorkerFactory.Get(container.GetConnectionString());
    // Act
    await worker.DoAsync();
}

Now we have all the work done, and the results should be waiting for us in db. So we need test db reader for the assertions:

public class TestDbReader
{
    public async Task<List<TransformedLogs>> ReadAllLogs(string connectionString)
    {
        await using var connection = new NpgsqlConnection(connectionString);
        await connection.OpenAsync();
        var logs = await connection.QueryAsync<TransformedLogs>("SELECT * FROM transformed_logs");
        return logs.ToList();
    }
}

public class TransformedLogs
{
    public Guid Id { get; init; }
    public string? Message { get; init; }
    public Severity Severity { get; init; }
}

Going back to the test, we can use newly created DbReader to assert that worker did its work as expected:

[Fact]
public async Task LogsTransformationTest()
{
    ...
    // Assert
    var logs = await new TestDbReader().ReadAllLogs(container.GetConnectionString());
    logs.Should().HaveCount(10);
    logs.Where(x => x.Severity is Severity.Warning).Should().HaveCount(5);
    logs.Where(x => x.Severity is Severity.Error).Should().HaveCount(5);
    logs.Where(x => x.Severity is Severity.Error).Select(x => x.Message).All(x => x is null).Should().BeFalse();

I’m here using FluentAssertions library to improve the asserts - instead of

Assert.Equals(5, logs.Where(x => x.Severity is Severity.Warning));

I can write

logs.Where(x => x.Severity is Severity.Warning).Should().HaveCount(5);

And now, as a final touch, we should clean up the test container:

[Fact]
public async Task LogsTransformationTest()
{
    ...
    // Cleanup
    await container.DisposeAsync(); 
}

So the whole test looks like this:

[Fact]
public async Task LogsTransformationTest()
{
    // Arrange
    var container = new TestContainerFactory().Get();
    await container.StartAsync();
    var worker = WorkerFactory.Get(container.GetConnectionString());

    var testDbWriter = new TestDbWriter();
    string schemaPath = @"./DatabaseScripts/schema.sql";
    string dataPath = @"./DatabaseScripts/data.sql";
    await testDbWriter.ExecuteScriptFromFile(container.GetConnectionString(), schemaPath);
    await testDbWriter.ExecuteScriptFromFile(container.GetConnectionString(), dataPath);

    // Act
    await worker.DoAsync();

    // Assert
    var logs = await new TestDbReader().ReadAllLogs(container.GetConnectionString());
    logs.Should().HaveCount(10);
    logs.Where(x => x.Severity is Severity.Warning).Should().HaveCount(5);
    logs.Where(x => x.Severity is Severity.Error).Should().HaveCount(5);
    logs.Where(x => x.Severity is Severity.Error).Select(x => x.Message).All(x => x is null).Should().BeFalse();

    // Cleanup
    await container.DisposeAsync();
}

Conclusion

I think this way of testing provides a nice balance between the complexity of test and level of confidence provided. Running this test takes only a couple seconds and the best thing about it is that it tests the whole depth of the business part of the application — it tests not only business rules, but also db queries, SQL writes etc. It’s possible to test if your DB implementations do not struggle with multiple instances — you can easily create 10 workers and execute them in parallel to test if there will be no errors while having 10 or even 100 instances trying to run at the same time. It’s invaluable to be able to test how multiple instances behave locally, and to do it each time you run your test suite. Having integration tests done this way, you don’t need to have additional infrastructure to run db tests in your pipelines, which provides better scaling of your tests. Also, by creating a new DB each time, you are avoiding any problems with parallel test runs on the same db, resulting in a broken state — tests are more consistent and separated. This technique has served me well, and I hope you will also find it useful.

:wq please

Discussion about this post