Go Testing: Contexts and t.Parallel()
We develop Coder on Coder which allows our engineers to do their work on big, shared infrastructure (32 cores and 256 GB of RAM in our case). With a lot of hardware, extensive use of t.Parallel() in our Go backend shortens test times considerably.
We also make extensive use of context.Context in our production code, so it appears in our testing. And even if the code under test doesn’t accept a context, it is a very convenient pattern for testing concurrent code.
But, there is a dangerous little sharp edge when using both t.Parallel() and context.WithTimeout(). Tl;dr is–
Always call t.Parallel() before context.WithTimeout()
To understand why this is, and what kinds of gnarly test bugs you’ll find by not taking my advice, we need to dig a little bit into what t.Parallel() does under the hood. But first, it’s worth some background on why we make extensive use of context.WithTimeout() in our tests at Coder.
Test Context Timeouts
Building programs that correctly handle concurrency is hard, even with Go. With concurrency comes the possibility of race conditions and deadlocks. Here we’re focused on the latter. Consider the following unit test of a hypothetical component that takes inputs and output via channels:
When the component is working fine, the test passes. But if the component is broken and deadlocks, the test also deadlocks. The default timeout on go test is 10 minutes. During the initial development and any maintenance of the component, we’d expect a developer to run the tests iteratively: code, test, code, test…, and so on. If you have to wait 10 minutes to discover a broken test it kills productivity!
You can, of course, change the go test timeout, but that’s a relatively crude instrument as it sets the timeout for the entire test suite, which often grows over time, and different tests might be expected to take different times. I expect most unit tests to execute in less than 1ms, but a complex integration test could take several seconds.
Let’s set a timeout on our test case with a context.
Now, if the component deadlocks, it fails in 100ms. You’ll have to choose this number based on what the unit test is doing, choosing something that is a reasonable multiple of the usual runtime to account for testing on different systems and under load, but still quick enough to avoid dragging out the code/test iterative cycle.
You can make this pattern a lot more readable by adding some helper functions.
Parallel Tests with Timed Contexts
Godoc for t.Parallel() is very simple. Perhaps, deceptively simple:
Parallel signals that this test is to be run in parallel with (and only with) other parallel tests. When a test is run multiple times due to use of -test.count or -test.cpu, multiple instances of a single test never run in parallel with each other.
The mental model of parallel testing you get from this description looks like this
You might be tempted to think of t.Parallel() as basically a little flag (or like a decorator in other programming languages). Its presence signals something, but you can just kinda throw it anywhere in your test setup code.
If you do this, then your tests might start to mysteriously start to fail with timeouts, even if you haven’t changed the tests or the component you are testing!
In fact, what happens is that t.Parallel() sort of chops your test code into two halves. Any statements before t.Parallel() run serially along with other serial test cases, and then any statements after t.Parallel() run in parallel. Here is a great blog post that covers this in more detail, including how t.Run() is handled.
To keep the diagram readable, I’m going to abbreviate the serial portion of Parallel Test1 (the part before t.Parallel() is called) sPT1, and so on.
Consequently, if you create a timeout context in the serial portion of a parallel test, that is, before the call to t.Parallel(), then it may expire before or during the parallel portion.
Another way to think about this is to just look at the green boxes in the diagram.
A call to t.Parallel() is like a time.Sleep() for an indeterminate time. The time depends on how big the test suite is and how much of it is parallel. So, doing any sort of time-based computation on different sides of the t.Parallel() call is fraught with danger.
Generally, the best thing to do is just put t.Parallel() as the first statement of your test case.
In rare cases, you might need to do some test setup serially, and then want to execute your test cases in parallel. This is often not worth the trouble unless the parallel parts are numerous or take a long time (relative to the setup). But, if you really, really decide you need it, create a parent context without a timeout, tied to the lifetime of the test case (defer testCancel() below). Then create child contexts for before and after t.Parallel() as needed. As the child contexts are tied to the parent’s lifetime, they are also canceled when the test completes.
We make extensive use of t.Parallel() at Coder, to the extent our linters yell if you don’t use it. We care a lot about keeping developers in flow here, including our own. I hope I’ve convinced you of the value of test timeouts to avoid disrupting flow during code/test iterations. When you use context timeouts and t.Parallel() in the same test, save yourself some grief and make sure t.Parallel() is first!