In my previous post I discussed some of the issues I have with the Agile Testing Pyramid and why if we follow it blindly it can lead to poor quality tests.
In this post, I will discuss a different approach to testing. Instead of setting out how many Unit Tests and Service Tests one should write (an impossible task because there is no formula or way to know this), I’ll be discussing what makes a good test and what trade-offs we can make when we write our tests.
My hope is that if you know what qualities you want to achieve from your tests and you know what trade-offs to make to achieve those qualities, you’ll be able to make informed decisions and test your application effectively.
The first thing you need to do in order to start writing effective tests is stop thinking about tests in terms of labels: Unit, Component, Integration, Service, etc… How many times have you heard people arguing about what is a unit, what is a component? Unit tests are not supposed to use real dependencies. Component tests must real dependencies.
The truth is that those definitions and discussions don’t matter much in practice. Your project is unique and the way you test it depends on many factors. Even if you’re creating a standard, by-the-book, framework-specific application, factors like what level of confidence you need in your application, what external dependencies your application has, and how much effort you can afford to spend writing tests, all determine how you end up testing your application.
What makes a good test
As I wrote in my previous post, not all tests are created equal. Let’s see what qualities we would like our tests to have.
We want tests that have a good chance of catching regression bugs. This is probably obvious. What might not be so obvious, is that as a consequence of this we want to concentrate our efforts on testing things that have a good chance of breaking. This is why I generally consider testing getters, setters and one-line methods as a waste of time. These are unlikely to break, which means that the tests are unlikely to catch any potential bugs.
We’ll see below that depending on how we write our tests, they’ll be more or less likely to catch bugs.
We want to have fast feedback and short build times. For this to be possible, having tests that run fast is a must.
If a test fails, we want it to be because some piece of functionality has actually broken. This means that our tests have to be deterministic and reliable. It also means that we want to avoid testing implementation details. As long as the required functionality has not been impacted, our tests should pass, even if parts of the internal implementation have changed.
Easy to mantain
Once a test has been added to the codebase, it’s likely to remain there for a long time. Like any other piece of code, and perhaps more so, tests have to be understood and maintained over time. Hard to maintain tests will definitely slow you down.
We want tests for which the effort required to write and maintain them is proportional to the level of confidence they’ll give us; if you are going to spend several hours writing a test, that test better be worth it.
Once in a while I’ll review unit tests with complex mock setups and assertions. Since tests that use real dependencies are easier to setup and they tend to give us higher confidence, my suggestion is usually to remove the mocks and use the real dependencies instead (more on this below).
In an ideal world we would only have tests that can catch all potential bugs, are extremely fast, don’t break, and are easy to maintain. In practice, we find that some of these qualities have negative correlations. For example, those tests that have higher chances of catching bugs tend to be more likely to break, and are also harder to maintain. Since we can’t have everything, there will be trade-offs we need to make. Let’s see now what some of those trade-offs are:
There is always the question of what to test. Should we test individual methods? Should we test multiple classes working together? Should we test entire flows? In general, the more code a test activates, the more confidence that test will give you. For example, a test that covers the functionality of an entire API all the way to the database will give you more confidence than a test that only tests if your Data Access Layer is working correctly.
In practice, we tend to see that the bigger the scope of a test, the more difficult it becomes to write and maintain. For example, finding out why a Service Test has failed is harder than doing the same for a Unit test.
Tests that have small scopes will also be faster. Depending on the size of your service and how many tests you have, this might or not make a big difference. To summarize:
One trade-off teams make, probably without even realizing it, relates to what they actually assert in their tests. For example, say we are writing a test to validate that a given API returns the correct data. The way I would test this is by asserting that the whole response from the API is equal to a given expected, fixed, value. What I’ve seen teams doing over and over, especially when the data models are large, is only assert part of the API’s response. For example, they only check that the id of the returned entity is the expected one and nothing else. While this clearly makes our tests easier to write and maintain, the confidence level we have from such tests decreases: we know that the API returns something, and that the id field is correct, but what about the rest of the payload? Is it also correct? In summary:
I don’t recommend doing this. Writing tests that don’t have the proper assertions is even worse than not having them and they give you a false sense of confidence. I’m mentioning this potential trade-off because I’ve seen many teams doing this without realizing what the implications are.
When we test a given piece of code, we also need to decide what to do with its dependencies. Should we use the real ones, or should we mock them?
The more real dependencies we use, the more confidence the test will give us. To put this in different terms, the more things you mock in your test, the lower its quality (I’ll explain below why). On the other hand, mocking dependencies has clear advantages. If we’re talking about external dependencies, mocking them will make your test more reliable and definitely faster. Sometimes there are dependencies that you couldn’t call from your tests, even if you wanted to.
By having to explicitly define the behavior of our mocks, we also tend to couple our tests to the internal implementation of the code under test. This has a negative effect on maintainability. In summary:
More than the sum of their parts
Just like with complex systems, services are more than just the sum of their parts. The interactions between the parts (or units) also matter. Obviously, if the individual units don’t work correctly, there is no hope that the system as a whole will work correctly. However the converse is not true. Even if the individual units work correctly, there is no guarantee that the system works correctly as a whole. Why is that? Because when we use a dependency, we need to know how to use it: what the dependency receives, what it returns, what exceptions it throws, etc…
Assume you have the following class structure:
You might want to take an inductive approach and test each layer independently. That is, assume that the layer below works correctly (i.e mock it) and test the unit in isolation. Something like this:
Are those tests enough to prove that the entire flow, from the controller to the repository, works? Unfortunately, the answer is no. Every time we mock a dependency we are not just assuming that this dependency works correctly, we are also assuming that the dependency is being used correctly. This assumption may or may not be correct, but it’s something that you must verify as well. This is why there are bug types that you cannot catch unless you have tests that activate all the (real) classes in a flow.
Earlier in this post I wrote that with each dependency we mock, we reduce the quality of our test. A more accurate statement would be: with every assumption we make, we reduce the quality of our test. The fewer assumptions, the better, at least in terms of confidence and ability to catch regression bugs.
In this post we discussed what qualities we want from our tests and how different decisions we make effect the tests we end up with.
In the next post, I’ll discuss specific examples of how I take the principles and trade-offs above and decide what tests to write.