Sunday, July 27, 2014

Sources of Accidental Complexity

Recalling my preferred definition of solution elegance:

TotalComplexity = EssentialComplexity + AccidentalComplexity
Elegance = EssentialComplexity / TotalComplexity

So we want to reduce accidental complexity. Which requires identifying accidental complexity. This is subjective - the perception of complexity will vary by observer (I wrote about my experience here), and the same observer will perceive the same code as differently complex at different times. It will often be the case that two types of accidental complexity will be in opposition to each other: reducing accidental complexity of type A results in accidental complexity of type B, and reducing complexity of type B results in complexity of type A. So this will be a balancing act, and choosing the right balance means knowing your audience.

So far, so abstract. This will hopefully become clearer through example. If we accept that perceived complexity is proportional to the effort that must be spent to work with a system under study, then we can look at different types of effort we might spend, and then say when that type of effort can be considered "accidental" or "essential" to the system. This may become an ongoing series, so I'll start with two such types of effort that I'll call "lookup effort" and "interpretation effort".

Lookup effort

"Lookup effort" is the effort spent referring to external documents to understand the meaning of a piece of code. I use the term "document" broadly in this context: an external document may refer to traditional published documentation, or it may refer to source code located elsewhere in the system. The important point is that it is external: having to maintain the context of the code under study in your head while you look something up feels like an extra effort, which means it makes the system feel more complex.

This type of effort can feel "essential" to understanding the system as a whole. It will feel "essential" when the external reference neatly matches a domain boundary: in this case, the fact that the lookup is external reinforces that the domain includes a separation of concerns. The fact of having to traverse to an external reference can teach you something about the domain.

On the other hand, this type of effort will feel "accidental" in just about every other scenario:

  • when reading code, having to look up a library function, or language feature, that isn't well understood by the reader (as opposed to inlining the called construct, using mechanisms that the reader already understands);
  • when debugging code, building a highly detailed model of runtime behavior (which often requires moving rapidly through several layers of abstraction, to come to complete understanding);
  • when writing code, determining the calling conventions (function name, argument order and interpretation) of a function to be called.

In fact, I'd argue that the major force preventing this type of effort from becoming overwhelming is the fact that it disappears as our system vocabulary improves: as our language vocabulary improves, we don't need to refer to the dictionary as frequently. The time spent looking concepts up disappears when the concept is already understood. For example, if I am reasonably familiar with C, and read the code:

strncpy(CustomerName, Arg, MaxNameLength);

I do not need to refer to any reference documentation to understand that we are copying the string at Arg to the memory at CustomerName: I've already internalized what strncpy means and how it works, so that reading this line of code does not cause me to break my flow. On the other hand, at the time I wrote this question on stackoverflow, I was not prepared to understand mapM (enumFromTo 0). Since I understand strncpy very well already, it has no associated lookup effort. However, at the time I wrote that question, mapM (enumFromTo 0) had an extremely high lookup complexity, as it relied on familiarity with concepts I did not yet understand, and vocabulary I had not yet developed.

Interpretation effort

"Interpretation effort" is the effort that must be spent to develop an understanding of a linear block of code. There are several metrics that have been developed that can give a sense of the scale of interpretation effort (some of my favorites are McCabe's cyclomatic complexity, and variable live time (I couldn't easily find an on-line reference, but the metric is described in McConnell's Code Complete), but it will usually be true that a longer linear block of code will be more effort to interpret than a shorter block.

Linear blocks of code will ideally be dense with domain information, so that interpretation effort should feel essential to understanding the sub-domain. Since code blocks are the basic unit in which we read and write code, and since the problem domain should be dominant in our thinking as we read and write code, linear code blocks will naturally have a high density of information about the domain. Except when they don't. Which will happen by accident. I'm trying (failing?) to be cute, but to be more straightforward about it, what I mean is that the default state of a linear code block is to consist of problem essence. It is accident that pulls us out of this state.

But software is accident-prone. Among the sources of accidental complexity in linear code blocks are:

  • Repeated blocks of structurally identical code, AKA cut-and-paste code. This code must be re-interpreted each time it is encountered. If it were (properly) abstracted into a named external block, it would only need to be read and interpreted once, and given a name by which the common function can become part of the system vocabulary.
  • Inconsistent style. When software is maintained by multiple authors, it is unlikely that the authors will naturally have the same preferences regarding indentation style, symbol naming, or other aspects of style. To the extent that the written code does not look like the product of a single mind, there will be greater interpretation effort, as the reader must try to see the code through each maintainer's mind as she tries to understand the code in question.
  • Multiple levels of abstraction visible at once. See this.
  • Boilerplate code.
  • And much more!


"Interpretation effort" forms a natural pair with "lookup effort": decreasing lookup effort (by inlining code) will naturally come at the expense of increasing interpretation effort, while decreasing interpretation effort (by relying on external code for part of the function's behavior) will tend to increase lookup effort. There are guidelines that will generally be useful in picking the right balance in your design. (Aiming for high cohesion in code is intended to reduce interpretation effort, aiming for low coupling is intended to reduce lookup effort, and aiming for high fan-in and low fan-out is intended to help minimize the required system vocabulary.) In general, I would bias towards having greater "lookup effort" than "interpretation effort" in a design, as lookup effort can be eliminated by improving system vocabulary while interpretation effort will always be present. This advice will apply in most situations, but will not necessarily apply in all. Internalizing not just the rules, but also the rationale for the rules, will make it possible for you to make the right decisions for your audiences.