Wednesday, December 7, 2016

Correctness and Maintainability

In my career, I've focused a lot on making my code bug-resistant, and I've spent a lot of time trying to learn or invent techniques that would systematically prevent the types of bugs I would run into the most frequently. I can think of three basic types of approach I've taken, which can be placed on a spectrum from more formal to less:

  • Exploiting type systems;
  • Defining code-based tests;
  • Simplifying reviewability or debugability.

The three approaches can be characterized based on what resources we use to we check that the code is correct. You can:

  • Use the compiler to verify correctness (via type-checking);
  • Use code to verify correctness (via testing assertions);
  • Use people to verify correctness (via code review or debugging).

And then you might observe how relatively likely it is that the verification missed a program bug:

  • The compiler will never pass a program that contains a type-checking error, though it's possible that the type-system is unsound (in which case a bug the type-system is supposed to prevent may still occur). In mature languages, such issues are either rare or well understood, so we can generally say that type-checking provides rigorous proof of program correctness -- inasmuch as program correctness is assessable by the type-system, but more on this later.
  • Testing will always detect code that fails the tests, but to be robust, tests must cover the entire interesting semantic range of the Unit Under Test, and this range can be extremely difficult to characterize. Designing for testability is largely about reducing the semantic range of our program units (such that meaningful tests can be written more easily), but even given that, it is difficult to determine the full semantic range of a system unit, and tests will never be absolute. Moreover, even if full semantic coverage were possible to determine, there will still be parts of all "interesting" systems where code-based tests are too difficult to write, too expensive to run, or where correctness is too difficult to determine (or otherwise too uncertain), to make testing worthwhile.
  • And, finally, when it comes to review by people, I'm not sure anything can be said with certainty about what bugs might be missed. Usability testing may require unvarnished input from neophytes, your professor may use rigorous methods to prove your program correct, you should try to know your audience.

And, by implication from the above, one can look at those three types of approach and determine how likely it is that the approach will prevent bugs during code maintenance:

  • The type system systematically prevents the introduction of type errors into the program.
  • Tests prevent bug regressions during maintenance, and following a test-oriented implementation methodology can strongly discourage the introduction of bugs into the system.
  • Techniques for improving readability weakly discourage the introduction of bugs into the system.

So, we should prefer to use the type system, and failing that use tests, and failing that just try to make our code clearer, right? Not exactly.

First of all, these three approaches are not exclusive: one can make advanced use of type systems, and write test code, and improve human reviewability, all at once. Indeed, using the type system, or writing unit tests can improve a human reader's ability to build a mental model of the system under analysis, improving solution elegance.

Secondly, in my experience, the human maintainer is always the most important consideration in software design. If your tests or your data types aren't maintainable, then they are technical debt of one form or another.1

And finally, different people will have different capabilities and preferences, and more robust approaches may be less maintainable in your developer community. OK, so maybe (and I'm not sure that this is true, but it seems plausible) Idris can codify the invariants required by a Red-Black Tree at the type-level, so that compilation would fail if the implementation contains bugs. This sounds really cool, and it makes me want to learn Idris, but Idris's type system is simply more difficult to learn than C#'s, or even Haskell's. If you stick to the "safe" subset of Rust, your code should not have use-after-free bugs or buffer overruns, but people that learn Rust invariably describe their early efforts as, essentially, fighting the borrow-checker. Even writing code for testability is an acquired skill, which takes a significant investment to learn, and your market window may not afford you the time to invest.

This is to say: for the most part, trying to improve correctness will also result in a more maintainable system. But not always. And the more advanced techniques you use in exploiting the type system, or even in designing your tests, the fewer developers will be able to work with your code. In the extreme case, you may get code that is proven correct by the compiler, but that is unmaintainable by humans. Know your audience, and be judicious in the techniques you apply.

  1. I use the term "technical debt" in the sense of something that will require extra work, later, to maintain. By this definition, paying down technical debt does not necessarily require code changes: the payment may be the time and effort taken training developers to work with the code in its existing form.