Structuring tests using Given-When-Then
Introduction
All software requires automated testing for quality assurance. Tests are written at various levels, such as unit, integration and end-to-end tests. Readability and maintainability of tests is as important as of production code. For example, suppose you modify code and a test breaks. Did you introduce a bug, or was the design changed so that the test needs to be updated? In order to determine the correct course of action, you need to understand the structure and intent of the test.
Here we discuss tactics for writing tests using a given-when-then structure, and suggest refactoring patterns to make tests conform to the structure. Given-when-then is often associated with behavioral tests, but it can be used at any testing level.
Given-When-Then-Finally
A test case is a linear execution of code sections labeled given, when, then and finally. Only when and then are mandatory. The tests should not contain any other code, and the sections should not be in mixed order.
The purpose of each section is:
-
Given: establish a known state to the system
-
When: execute an action using the system
-
Then: verify (assert) that the system responded according to specification
-
Finally: revert the state back to pre-given state. This section is executed even if when or then fails.
Hoare triples
In parallel to the code-centric formulation, we can think of the test structure in terms of logic. Hoare triples (1969) capture the intent of given-when-then sections. Hoare triples have the form <precondition, call, postcondition>, where pre- and postcondition are logical propositions and call is the when code block. A triple is valid (the test passes) if the precondition is true before executing the call, and the postcondition is true after executing the call.
The code and logic aspects are related so that the given section makes the precondition true, and the then section asserts that the postcondition is true:
Mapping to test frameworks
Some test frameworks, such as Cucumber, are built around the given-when-then structure, but others use a different design. However, the structure can be adapted to any test framework, as the important aspects are:
-
All test code belongs to one of the four code sections
-
The sections form a linear sequence and are not mixed
Minimal: mark sections
Minimally, we can just mark the sections using comments when no explicit syntax is available. In the following example, we use Jest to test a custom array implementation. In such a simple test, it may seem pedantic to mark the sections, but it ensures that all tests are structurally sound.
test('an array with one element should have length 1', () => {
// given
const array = new CustomArray();
// when
array.push("item");
// then
expect(array.length).toBe(1);
});
Detailed: describe sections
For complex tests, the purpose of the test is clearer when the sections are described using natural language. When describing the given and then sections, it is useful to think in terms of the pre- and postconditions (logical state) instead of the actual code.
The description for the above test could be:
-
Given an empty array
-
When an item is pushed
-
Then the length of the array is one
This could be embedded as code comments or function documentation, or one can use a more structured method, such as the describe function from Jest:
describe('GIVEN an empty array', () => {
const array = new CustomArray();
describe('WHEN an item is pushed', () => {
array.push("item");
test('THEN the length of the array is one', () => {
expect(array.length).toBe(1);
});
});
});
Next, we look at two examples.
Example: unit testing a pure function
Unit tests can use the given section to create arguments for calling the unit under test. Example of testing a timestamp parser:
test('should parse an ISO format timestamp with UTC timezone', () => {
// given
const timestamp = "2021-10-18T10:28:15Z";
// when
const parsed = myTimestampParser.parse(timestamp);
// then
expect(parsed).toBe(...);
});
If the arguments are trivial, they can be embedded into the when section and the given section omitted.
Example: functional API testing
In functional testing, state setup and teardown are often done using helper facilities from the test framework. This means that the given-when-then-finally sections are split across the test file. Comments can identify which code belongs to which section.
In the following example, we insert an object using an API with a fixed object ID, and verify that the object can be found. Finally, we remove the object to clear the state for the next test.
const testObjectId = "12345678-1234";
beforeEach(() => {
// given
return api.insertObject({id: testObjectId});
});
afterEach(() => {
// finally
return api.deleteObject(testObjectId);
});
test('should find an object that has been inserted', async () => {
// when
const result = await api.queryObject(testObjectId);
// then
expect(result.id).toBe(testObjectId);
});
Next, we suggest how to refactor tests to conform to the given–when–then structure.
Refactoring: parameterized tests
Commonly you want to test a function using multiple inputs. The following is an antipattern for this. It breaks the given–when–then structure, because when and then sections are mixed. It’s also unclear whether we have one or three tests.
test('should compute the square of numbers', () => {
expect(myMathModule.square(1)).toBe(1); // when, then
expect(myMathModule.square(2)).toBe(4); // when, then
expect(myMathModule.square(3)).toBe(9); // when, then
});
Such patterns can be refactored using parameterized tests. They are supported by many test frameworks, including Jest (test.each), JUnit (@ParameterizedTest) and pytest (@pytest.mark.parametrize). Example using Jest:
test.each([
[1, 1],
[2, 4],
[3, 9],
])('should compute square of %d', (input: number, expected: number) => {
// when
const squared = myMathModule.square(input);
// then
expect(squared).toBe(expected);
});
Refactoring: decomposing test sequences
In functional testing, it is common to test sequences of operations. It is tempting to combine the whole sequence into one test, but such large tests are hard to understand:
test('length should tell how many elements the array has', () => {
// GIVEN an empty array
const array = new CustomArray();
// WHEN pushing an element to the array
array.push("item1");
// THEN the array has one element
expect(array.length).toBe(1);
// WHEN pushing a second element to the array
array.push("item2");
// THEN the array has two elements
expect(array.length).toBe(2);
});
This sequence can be decomposed into multiple tests. This is done by splitting the sequence after the first then, and using the postcondition of the first test (array has one element) as the precondition (given) of the next.
test('an array with one element should have length 1', () => {
// GIVEN an empty array
const array = new CustomArray();
// WHEN pushing an element to the array
array.push("item1");
// THEN the array has one element
expect(array.length).toBe(1);
});
test('an array with two elements should have length 2', () => {
// GIVEN an array with one element
const array = new CustomArray();
array.push("item1");
// WHEN pushing a second element to the array
array.push("item2");
// THEN the array has two elements
expect(array.length).toBe(2);
});
Conclusions
Systematically organizing tests into sections of given–when–then–finally ensures that the tests are structurally sound. Sections can be marked using lightweight comments, or with direct support from the test framework.
References
Hoare, C. A. R. (1969). An axiomatic basis for computer programming. Communications of the ACM, 12(10). Read more: Coupling and cohesion: guiding principles for clear code
This article is written by Senior Software Architect Kristian Ovaska.