Flaky tests are the bane of any software engineer's existence. These tests pass one time and fail the next, seemingly at random, making it incredibly difficult to have confidence in your test suite. Fixing flaky tests can consume huge amounts of engineering time that could be better spent building features or improving product quality.
In this comprehensive guide, we'll cover everything you need to know to avoid writing flaky tests in the first place and to diagnose and fix existing flaky tests.
In this comprehensive guide, we'll cover everything you need to know to avoid writing flaky tests in the first place and to diagnose and fix existing flaky tests.
What Are Flaky Tests?
A flaky test is one that exhibits inconsistent or unexpected behavior. It may pass one time you run it, then fail the next, and pass again on the third run. The end result is that you don't have confidence the test is actually testing what it is supposed to be testing.
Some examples of flaky test behavior:
While a single flaky test may seem like a minor inconvenience, at scale across a large test suite, flaky tests can completely erode confidence in the quality of your software. Engineers stop trusting the test results and disabling or ignoring failing tests becomes commonplace.
Some examples of flaky test behavior:
- A test that makes an assertion about something non-deterministic, like time or randomness
- A test that depends on external services or resources that change state
- A test that asserts on things in a non-deterministic order
- A test that encounters race conditions due to incorrect threading or concurrency
While a single flaky test may seem like a minor inconvenience, at scale across a large test suite, flaky tests can completely erode confidence in the quality of your software. Engineers stop trusting the test results and disabling or ignoring failing tests becomes commonplace.
Causes of Flaky Tests
Flaky tests occur for a variety of reasons, but some root causes are more common than others:
Asynchrony
JavaScript and other languages make heavy use of callbacks, promises, and asynchronous logic. It's easy to introduce flakiness if you make assertions before an asynchronous task has completed.
Asynchrony
JavaScript and other languages make heavy use of callbacks, promises, and asynchronous logic. It's easy to introduce flakiness if you make assertions before an asynchronous task has completed.
// flaky - makes assertion before async task finishes
it('returns correct value', () => {
const value = fetchData();
expect(value).toEqual(5);
});
// better - assert after promise resolves
it('returns correct value', async () => {
const value = await fetchData();
expect(value).toEqual(5);
});
Concurrency
Multi-threaded and concurrent code opens the door for all kinds of race conditions and non-deterministic behavior that make tests flaky. Common causes include threads interacting in unexpected ways and shared memory being mutated at unexpected times.
Resource Leakage
Tests that mutate global state or accidentally share state between runs can easily become flaky. For example, a test that inserts test data into a shared database but fails to clean up after itself can cause cascading failures.
Brittle Assertions
Making assertions that are overly specific or sensitive to non-deterministic factors makes tests extremely fragile. For example, asserting on the exact contents of a large object graph or making assertions that depend on time.
External Services
Networked services like APIs and databases are out of your control and can change state between test runs. Network blips or service restarts can cause tests to fail unexpectedly.
Complex Test Setup/Teardown
Some test frameworks encourage complex fixture setup and teardown between tests. If this shared environment is not properly reset, it can enable state leakage across tests.
Resource Contention
Slow tests running in parallel may contend for shared resources like CPU, memory, network ports, or database connections. This can cause surprising failures.
Test Timeout
An overly tight timeout on a test can cause flaky failures if the task takes slightly longer than expected to complete from one run to the next.
Multi-threaded and concurrent code opens the door for all kinds of race conditions and non-deterministic behavior that make tests flaky. Common causes include threads interacting in unexpected ways and shared memory being mutated at unexpected times.
Resource Leakage
Tests that mutate global state or accidentally share state between runs can easily become flaky. For example, a test that inserts test data into a shared database but fails to clean up after itself can cause cascading failures.
Brittle Assertions
Making assertions that are overly specific or sensitive to non-deterministic factors makes tests extremely fragile. For example, asserting on the exact contents of a large object graph or making assertions that depend on time.
External Services
Networked services like APIs and databases are out of your control and can change state between test runs. Network blips or service restarts can cause tests to fail unexpectedly.
Complex Test Setup/Teardown
Some test frameworks encourage complex fixture setup and teardown between tests. If this shared environment is not properly reset, it can enable state leakage across tests.
Resource Contention
Slow tests running in parallel may contend for shared resources like CPU, memory, network ports, or database connections. This can cause surprising failures.
Test Timeout
An overly tight timeout on a test can cause flaky failures if the task takes slightly longer than expected to complete from one run to the next.
Best Practices for Avoiding Flaky Tests
Luckily, there are some best practices you can follow to avoid introducing flaky tests in your test suite:
Isolate Tests Completely
Aim for complete isolation between tests. Each test should start with a clean environment and avoid mutating any shared state. Reset databases, mocks, processes, and other dependencies between tests.
Make Tests Idempotent
Structure tests so they can be run multiple times with the same result. Avoid one-time setup in tests and instead reuse or reset state as needed.
Control Asynchrony Explicitly
Use language mechanisms like async/await to control asynchrony within tests. Sequence assertions so they evaluate promises and callbacks in a deterministic order.
Leverage Waiting and Polling
For asynchronous processes, wait explicitly for certain conditions using polling or wait timeouts. This avoids race conditions due to premature assertions.
Prune Flaky Tests Aggressively
Don't ignore flaky tests - delete them! Tests that are consistently flaky after multiple attempts to fix should be removed from the suite.
Analyze Test Failures
Dig into root causes of test failures, don't just reruns them until they pass. Look for patterns across test runs to pinpoint sources of non-determinism.
Start with Small Tests
Unit test individual components before creating complex, end-to-end tests. Large tests are exponentially more difficult to debug and isolate.
Avoid Unnecessary Mocks
While mocks are useful, overusing them can make tests brittle and unreliable. Test with real implementations when feasible.
Debugging Flaky Tests
Once you have flaky tests in your test suite, debugging them can be challenging. Here are some tips for tracking down the root cause:
Fixing Flaky Tests
Once you've diagnosed the cause of your flaky tests, here are some tips for fixing them:
Isolate Tests Completely
Aim for complete isolation between tests. Each test should start with a clean environment and avoid mutating any shared state. Reset databases, mocks, processes, and other dependencies between tests.
Make Tests Idempotent
Structure tests so they can be run multiple times with the same result. Avoid one-time setup in tests and instead reuse or reset state as needed.
Control Asynchrony Explicitly
Use language mechanisms like async/await to control asynchrony within tests. Sequence assertions so they evaluate promises and callbacks in a deterministic order.
Leverage Waiting and Polling
For asynchronous processes, wait explicitly for certain conditions using polling or wait timeouts. This avoids race conditions due to premature assertions.
Prune Flaky Tests Aggressively
Don't ignore flaky tests - delete them! Tests that are consistently flaky after multiple attempts to fix should be removed from the suite.
Analyze Test Failures
Dig into root causes of test failures, don't just reruns them until they pass. Look for patterns across test runs to pinpoint sources of non-determinism.
Start with Small Tests
Unit test individual components before creating complex, end-to-end tests. Large tests are exponentially more difficult to debug and isolate.
Avoid Unnecessary Mocks
While mocks are useful, overusing them can make tests brittle and unreliable. Test with real implementations when feasible.
Debugging Flaky Tests
Once you have flaky tests in your test suite, debugging them can be challenging. Here are some tips for tracking down the root cause:
- Reproduce Locally: Flaky tests that only fail on CI can be extra difficult to debug. Try to reproduce them locally first.
- Review Recent Changes: Think about any recent code changes that could have impacted the flaky test. Reverting those changes to confirm is a good first step.
- Add Debug Logs: Log liberally within tests to trace the order of execution. This can uncover async issues and race conditions.
- Inspect Test Artifacts: Examine the state of databases, files, caches, etc after tests run to look for unreset shared state.
- Run Tests in Isolation: Temporarily disable all other tests suites and run only the flaky test repeatedly. This can help surface ordering issues.
- Slow Down Execution: Add pauses and delays to thread sleeps, message queues, etc. Changing timing may avoid race conditions.
- Depend on Time Less: Avoid wall-clock timing whenever possible. Transform time values into easier to control constants.
- Review Differences: Compare stack traces and other artifacts between passing and failing runs to spot differences.
Fixing Flaky Tests
Once you've diagnosed the cause of your flaky tests, here are some tips for fixing them:
- Quarantine Tests: As a short term bandage, move or disable flaky tests so they don't block others from running.
- Increase Timeouts: If tests are failing due to timing issues, increasing timeouts and polling intervals may help.
- Add Explicit Waits: Use language-level waits like Cypress .wait() to wait for asynchronous events before making assertions.
- Eliminate Global State: Pass dependencies explicitly into tests rather than relying on shared global state.
- Separate Threads: Use thread isolation, like running database logic in a separate process, to avoid shared state.
- Reseed Randomness: Reseed random number generators in test setup to avoid randomness leading to flakiness.
- Recreate Databases: Fully recreate database schema and test data per test to avoid accumulated state.
- Retry Failed Tests: Add retry logic to automatically rerun failed tests a few times before reporting failure.
- Refactor Tests: When all else fails, refactoring tests to be smaller and more targeted can surface new ways to eliminate flakiness.
Key Takeaways
By applying these flaky test prevention and debugging techniques, you can eliminate test flakiness before it becomes unmanageable. Your test suite will be more robust, run faster, and provide you with greater confidence for refactoring, release management, and other critical engineering workflows.
- Flaky tests undermine confidence in your test suite and waste countless engineering hours.
- Strike a balance between small focused unit tests and larger integration tests.
- Design tests to be idempotent and isolate test runs from the start.
- Wait for asynchronous logic explicitly within test cases.
- Analyze test failures thoroughly to pinpoint sources of non-determinism.
- Quarantine, reduce timeouts, or rewrite consistently flaky tests.
By applying these flaky test prevention and debugging techniques, you can eliminate test flakiness before it becomes unmanageable. Your test suite will be more robust, run faster, and provide you with greater confidence for refactoring, release management, and other critical engineering workflows.