I recently encountered an argument (Why Most Unit Testing is Waste) from Jim Coplien arguing forcefully against unit testing as it is generally practiced in the industry today. I generally like his thinking, but find I cannot agree with this thesis. I disagree with the thesis, because I think of unit tests as filling a different role than Mr. Coplien (and, I think, most others in our profession) think it fills. In order to say what I mean, I'll start by taking a step back.
At root, I think the design process for a piece of software consists of answering three questions:
- Why? What problem am I trying to solve, and why does this problem need a solution?
- What would the results be, if I had a solution to this problem?
- How could the solution work?
There is a natural order in which these questions are asked during design, and it is the same order that they are listed above: you should understand "why" you need a solution before you decide "what" your solution will do, and you should understand "what" your solution will do before you can decide "how" it will do it.
These questions will also often be asked hierarchically: the "why" answer characterizing why you create a product might result in a "how" answer characterizing the domain objects with which your product is concerned. But the answer to "how do I organize the concepts in my domain" is actually another "why" question: "why are these the correct concepts to model?". And this "why" question will lead to another "how" answer characterizing the roles and responsibilities of each domain object. And so on down, where "how" answers at one level of abstraction become "why" questions at the more concrete level, until one reaches implementation code, below which it is unnecessary to descend.
It's also notable that there is only one question in this list whose answer will be guaranteed to be visible in a design artifact, and will be guaranteed to be consistent with the execution model of the system: the question, "how does this work?", is ultimately answered in code. Neither of the other questions will necessarily be answered in a design artifact, and even if they are answered in a design artifact, it is likely that this artifact will become inconsistent with the design, over time, unless there is some force working against this. And as design artifacts grow stale, they become less useful. In the end (and again, in the absence of some force pulling in the other direction), the only documentation guaranteed to be useful in understanding a design is the code itself.
This is unfortunate. Because design (including implementation!) is a learning process, our understanding of why we make certain decisions can and will change significantly during design, almost guaranteeing significant drift between early design documentation and the system as built. If, in mitigating this problem, one relies primarily on the code for design documentation, then it takes significant mental work to work out "what" the module from "how" it does it, and still more work to go backwards from "what" the module does to "why" it does it - that is, in relying primarily on the code for design documentation, you are two degrees removed from understanding the design motivation.
Consider, instead, code with a useful test suite. While the "how does this work?" question is answered by the system code itself, the "what does this do?" question is answered by the test code. With a good test suite, you will see the set of "what" answers that the designer thought were relevant in building the set of code under test. You do not need to work backwards from "how", and you have removed a degree of uncertainty in trying to understand the "why" behind a set of code. And, if the test suite is run with sufficient frequency, then the documentation for these "what" answers (that is, the test code itself) is much less likely to drift away from the execution model of the system. Having these two views into the execution model of the system — two orthogonal views — should help maintainers more rapidly develop a deeper understanding of the system under maintenance.
Furthermore, on the designer's part, the discipline of maintaining documentation for not just "how" does a process work, but also "what does the process do" (in other words, having to maintain both the code and the tests) encourages more rigorous consideration of the system model than would otherwise be the case: if I can't figure out a reasonable way to represent "what does this do" in code (that is, if I can't write reasonable test code), then I take it as a hint that I should reconsider my modular breakdown. Remember Dijkstra's admonition:
As a slow-witted human being I have a very small head and I had better learn to live with it and to respect my limitations and give them full credit, rather than try to ignore them, for the latter vain effort will be punished by failure.
Because it takes more cognitive effort to maintain both the tests and the code than to maintain the code alone, I must consider simpler entities if I'm to keep both the code and the tests in my head at once. When this constraint is applied throughout a system, it encourages complexity reduction in every unit of the system — including the complexity of inter-unit interactions. Since complexity is being reduced while maintaining system function, it must be the accidental complexity of the system being taken out, so that a higher proportion of the remaining system complexity is essential to the problem domain. By previous argument, this implies that the unit-testing discipline encourages increased solution elegance.
This makes unit-testing discipline an example of what business-types call "synergy" - a win/win scenario. On the design side, following the discipline encourages a design composed of simpler, more orthogonal units. On the maintenance side, the existence of tests provide an orthogonal view of the design, making the design of individual units more comprehensible. This makes it easier for maintainers to develop a mental model of the system (going back to the "why" question that motivated the system in the first place), so that required maintenance will be less likely to result in model inconsistency. A more comprehensible, internally consistent, system is less risky to maintain than a less comprehensible, or internally inconsistent system would be. Unit testing encourages design elegance.
Thanks for your thoughts, Aidan. If you want to take a design perspective on this, I think you have centuries or millennia of design experience working against you. Tests cannot prove success, only failure; test-oriented thinking by a coder can avoid only those pitfalls that the coder conceived during design. The reason this is important is that goodness is much more than the absence of badness.
ReplyDeleteAs I write in Chapter 2 (soon to be published) the software engineering data bear this out. Capers Jones notes that the efficiency of unit testing is as low as 10% and, in any case, is the least efficient way we know to remove defects from code. I think coders celebrate unit testing because it is within their sphere of control. On other other hand, so are code walkthroughs and their participation in design inspections, which are many fold more effective.
In terms of documentation of code, we know many ways that are more effective than tests. Tests are a very low-context form of communication: they in fact communicate less information than the code, in a form inaccessible to most stakeholders, in a way that covers a minuscule set of the concerns about code behaviour and, furthermore, in a way that covers a minuscule set of the possible paths through the code. From the perspective of information theory they are one of the most inefficient ways that one can devise to convey information about the workings of an algorithm. And the working of an algorithm is a tiny fraction of the concerns one must master to use a given API. I think it will be a long time before we displace natural language as the dominant form of code documentation, and I think there are good reasons far beyond mere inertia that help it hold its dominant place as the way we communicate code's functionality.
In the end I think it's a combination of aversion for teamwork and social activities, and a favour for individual, introverted action, combined with a feeling of autonomous control over one's fate, that lead to the broad acceptance of unit testing. In such a bubble it's easy to become unaccountable to business goals and to ignore the forces that contribute to code maintainability and quality.
Hi Cope - thank you for taking the time to respond! I look forward to reading your book when it is published. On reading your response, I see that I did not address your argument directly - I was more concerned with showing how unit tests can be used effectively than I was with addressing your specific arguments. I'll try to remedy that here...
DeleteIn my opinion, unit tests (as used in TDD) are not primarily about defect detection or resolution. They do not eliminate the need for smoke tests or acceptance tests. Rather, they are primarily for defect prevention, and as a design aid to enforce unit testability (where by "testability", I mean that the unit's semantics can be fully exercised by test code -- in terms of the original post, that "what the unit does" can be reasonably expressed as test cases). These tests are not primarily about defect detection or resolution, but the testability that results from development using TDD means that, when defects are detected, the circumstances that lead to the defect can be reproduced with a unit test. This unit test will also be able to verify that the problem has been resolved, while the existing body of unit tests will ensure that no regressions have been introduced by the fix. The test suite, in my view, is operational documentation of the unit's behavioral model, and defect detection and resolution makes this documentation more complete.
If you do not consider TDD tests to be primarily about defect detection or resolution, it casts a much different light on many of your concerns. For example, Capers Jones's data shows that unit testing is not particularly effective at defect removal, but it also shows that TDD is an effective process for defect prevention.
In the paper that inspired this post, you talk about the difficulty in testing routines with high cyclomatic complexity, noting that the effort to achieve complete code coverage encourages breaking down these routines into routines with lower cyclomatic complexity, so that an algorithm is no longer encapsulated by a single function. That is true, but I would argue that that is a benefit. Consider how Wikipedia presents the insertion algorithm for a Red-Black tree
(https://en.wikipedia.org/wiki/Red-black_tree#Insertion): it breaks the algorithm down into five cases, with each case having an associated function. This breakdown encourages presentational clarity: each function provides a natural anchor against which the pre- and post-conditions of the function can be described. In the wiki article, these conditions are described in natural language, while in TDD they would be described with test-code, but in either case, this presentation of the implementation is easier to follow -- and, for the same reason, easier to verify -- than if the whole algorithm were captured in a single function.
(to be continued - blogger does not allow long replies)
Back to your comment above, I agree that tests communicate less information than code, though the same can be said of any form of non-code documentation: the code will always provide a more complete picture of the system than any documents surrounding it. The form of communication provided by test code is indeed inaccessible to most stakeholders, but it is among the *most* accessible forms of documentation for performing code maintenance. (There is no ambiguity in what test code does, while there is often high ambiguity in natural language descriptions. Ambiguity creates maintenance risk, especially when working with unfamiliar code. Since maintenance in general is so valuable, and so risky, it does not seem problematic to me that the documentation provided by test code is primarily useful to maintainers.) I disagree that one must necessarily understand an algorithm ("how it works") to use an API that captures the algorithm: I do not need to know how quicksort works to understand that it generates sorted output from unsorted input in (usually) O(log n) time. (In fact, as Parnas showed, it's the ability of an interface to abstract detail from callers, so that the caller does not need to understand everything about the algorithm, that defines a good interface.)
DeleteIt may be that, as you say, "a combination of aversion for teamwork and social activities, and a favour for individual, introverted action, combined with a feeling of autonomous control over one's fate, [] lead to the broad acceptance of unit testing." But if TDD helps improve code quality and maintainability (which strongly reflects my personal experience with TDD, as well as that of many well-respected developers, and is reflected in Capers Jones's data), then what is wrong with using a methodology that works with an engineer's natural preferences?
Thank you again for writing - I am a big fan of your work, and I hope we can continue this conversation.
And thanks for a gracious retort. I see we still have some disagreement, but I appreciate you for keeping the pot boiling on the stove. The rest, we'll have to take up over beers some day.
ReplyDelete