Writing Effective Unit Tests

You’re patting yourself on the back, proud of the code you’ve just written for a cool new feature. You’re itching to add it to your organization’s repository and see customers use it as soon as possible, but you can’t. You now need to write a bunch of unit tests that meet your organization-mandated “code coverage” level. You begrudgingly sit down and start to hammer out the tests, thinking, why? “I’ve already written the code, why do I have to go through this bureaucracy?”

In this post, I’ll discuss what unit tests are useful for, why they can seem so annoying to write, and how to write them more effectively.

Why Unit Tests Are Useful

To see why unit tests are useful, let’s consider what would happen if we didn’t have them.

Suppose we’re trying to develop a project made up of many, many source files, modules, and lines of code. For a specific example, say there are 50 files. You reason that the whole point is that the end result “works” — your users won’t see the effort you put in to your code, only the result of it. So there’s no point in adding the “bells and whistles” of unit tests. You can write everything and then test directly whether your code does what you expect.

But say you have a rate of making approximately two errors per file. Since you waited and didn’t test as you went along, now when you try to run and test the full system, you get around 50 times 2, or 100, errors!

Unit tests are important for incrementally testing smaller parts as you go. For larger and larger systems — including products we use everyday, like the Google search engine or your operating system — waiting to test at the very end would result in an infeasible number of errors to deal with.

These errors can also take many different forms — not just “obvious” syntax issues, but also deeper and more insidious bugs where the code runs perfectly fine but does something different than what you want. The difference in expected behavior can also happen in hard-to-find cases, especially when you’ve had enough experience that you will instinctively cover the “easy ones.” If you have many of these hard-to-find errors, it can be a long time before you’re able to get your code in a state that works consistently.

OK, so this logic may make sense. But when you sit down to write the unit tests, it seems so frustrating. Why is this?

Why Unit Tests Are Frustrating

Fundamentally, a computer is “dumb” — it can only do what you program it to do. You, on the other hand, are a human being who has a vision for what your end result looks like. This challenge of communicating what you want to the computer is what fundamentally underlies many aspects of tech, and it is why testing in the first place is necessary.

Now, computers can maybe guess, possibly with machine learning and other techniques, where you are going, given what you have provided as input so far. But even with this, at the end of the day, the task of verifying your intentions still falls on you.

As the programmer, you are an inescapable part of not just the development of the product, but the understanding of what the product should do.

Now, the unit test code itself is code. If this test code is not written properly, it can itself have bugs! In fact, sometimes test code can seem even more complicated than what it is testing, which can make us suspicious of the correctness of the test code. How do we ensure the accuracy of our tests?

To discuss this, we will take a closer look at the discrepancy between expected and actual behavior of code.

The Human Vs. the Computer

Constructs like loops and functions form the “nuts and bolts” of source code. Fundamentally, when we use features like these, we are specifying generalized rules.

Now, many of our “rules” in real life will have some exceptional cases. They may still be important as rules, but they are not “formally correct” the way that axioms or theorems are in math. One example could be with grammar of human languages. Some of these exceptions might be ones that we might not have even completely thought of when devising the rules, but that we know should be exceptional cases once they arise.

But the computer doesn’t know this. It will run generalized rules every single time, without fail. So this can be one great source of errors where the code doesn’t do what we expect.

Let’s consider how we would write our test code. If we use generalized rules in the tests, then we run the risk of errors in the tests themselves, which make them unreliable.

But the alternative to generalized rules is to specify specific cases. We know accurately what result we expect from a specific case. So we can ensure accuracy by writing a list of specific cases.

But how many possible cases can there be? If you are, say, using string input, then, well, there are an unlimited number of values for strings. In general, there can be a large, large number of cases that our programs can cover, depending on the type of input we expect.

So we’re at an impasse. If we want to be representative of the cases that could occur with our program, we need to include a lot of specific cases, and that can get repetitive and boring quickly. But if we don’t do that, then we run the risk of our test code being potentially more buggy.

What to Keep in Mind

In general, there are many, many possible cases out there, and we can necessarily only test a sample — otherwise we’d spend an infeasibly large amount of time simply writing tests.

It’s hard to provide a single “formula” that churns out how exactly to write effective tests. Indeed, if this formula existed, then the computer just could implement it and generate the tests itself. And verifying that this formula worked would still need to be done by the programmer.

In place of a formula, I argue that an good mindset is important, so that you can more effectively translate your intentions into your tests. I cannot guess what those intentions are, but I contend that, if you keep in mind the points above, you can have a greater appreciation of the purpose of your tests, which can allow you to better construct them to target your objectives.

Metrics like method coverage and line coverage can always help. Even if you know the metrics are superfluous in your case, and the resulting tests are repetitive and annoying to write, the alternative of not having those metrics as standards is worse. There can be projects in different parts of the organization or even in the future where the mandate does identify different representative test cases, and programmers may not think to test for these cases without the mandate. In the grand scheme of tech products, many of which may have to deal with physically unreliable systems (for example, distributed systems built on top of networks), it’s good to have repetition sometimes rather than in-the-wild failures for parts of the product other than what you worked on.

Beyond these metrics, it is important to pick cases that are representative of the main uses you imagine, and also cases that are representative of potential issues. These are the cases that will be in sharper focus when your product is released and running in the wild at your users’ whims.

Safer coding practices also don’t hurt. For instance, if you have multiple operations that can query and modify the state of something, it’s a good idea to carefully analyze whether states are consistent and whether your operations aren’t stepping over each other’s toes. (Problems in this area specifically lead to interesting and challenging issues with multithreading.)

Even with this, you must keep in mind that bugs will still happen. Humans make mistakes and don’t anticipate everything — they can’t anticipate everything. Thus, cases will come up later that you should have tested for, but that you missed. All we can do is our best right now, and you must at the end of the day make some judgment calls to decide which cases are representative of potential issues, and which are redundant.

Share and like (need WordPress account to like):

Leave a comment Cancel reply