Wednesday, December 10, 2014

Bootstrapping Rust

There are two stories I particularly like about the problem of bootstrapping self-referential tools. This one:

Another way to show that LISP was neater than Turing machines was to write a universal LISP function and show that it is briefer and more comprehensible than the description of a universal Turing machine. This was the LISP function eval[e,a], which computes the value of a LISP expression e - the second argument a being a list of assignments of values to variables. (a is needed to make the recursion work). Writing eval required inventing a notation representing LISP functions as LISP data, and such a notation was devised for the purposes of the paper with no thought that it would be used to express LISP programs in practice. Logical completeness required that the notation used to express functions used as functional arguments be extended to provide for recursive functions, and the LABEL notation was invented by Nathaniel Rochester for that purpose. D.M.R. Park pointed out that LABEL was logically unnecessary since the result could be achieved using only LAMBDA - by a construction analogous to Church's Y-operator, albeit in a more complicated way.

S.R. Russell noticed that eval could serve as an interpreter for LISP, promptly hand coded it, and we now had a programming language with an interpreter.

And this one:

Then, one day, a student who had been left to sweep up after a particularly unsuccessful party found himself reasoning in this way: If, he thought to himself, such a machine is a virtual impossibility, it must have finite improbability. So all I have to do in order to make one is to work out how exactly improbable it is, feed that figure into the finite improbability generator, give it a fresh cup of really hot tea... and turn it on!

Rust is, I think, the programming language I've been waiting for: it takes a large proportion of the ideas I like from functional languages, applies them in a systems programming language context with 0-overhead abstractions, can run without a supporting runtime, and makes "ownership" and "lifetime" into first-class concepts. In my work trying to build highly complex and robust embedded systems, these are exactly the features I've been waiting for, and I'm thrilled to see a modern programming language take this application space seriously.

Building the language on a second-class platform, though, was intensely frustrating. The frustration came from several places, including that my Unix systems-administration skills have gotten (*ahem*) rusty, but I'd say the bulk of my difficulty came from a single factor: circular dependencies in the tooling.

  • Building the Rust compiler requires a working Rust compiler.
  • Building the Cargo package-manager requires a working Cargo package-manager.

These problems turned out to be ultimately surmountable: The rustc compiler had been previously built for a different version of FreeBSD, but once I re-compiled the required libraries on my version, I managed to get the pre-built compiler running. For the Cargo package manager, I ended up writing a hacky version of Cargo in Ruby, and using that to bootstrap the "true" build. I'm glossing over it here, but this turned out to be a lot of effort to create: Rust is evolving rapidly, and it was difficult to track new versions of Cargo (and its dependencies) that depended on new versions of the Rust compiler, while the Rust compiler itself sometimes did not work on my platform.

This was, necessarily, my first exposure to the platform in general, and it was, unfortunately, not a positive experience for me. Circular dependencies in developer tools are completely understandable, but there should be a way to break the cycle. Making bootstrap a first-order concern does not seem that difficult to me, in these cases, and would greatly enhance the portability of the tools (maybe even getting Rust to work on NetBSD, on GNU/Hurd, or on other neglected platforms):

  • The dependency of the compiler on itself can be broken by distributing the bootstrap compiler as LLVM IR code. Then use LLVM's IR assembler, and other tools, to re-link the stage0 Rust compiler on the target platform. The stage0 compiler could then be used to build the stage1 compiler, and the rest of the build could proceed as it does today. (Rust issue #19706)
  • The dependency of the package manager on itself can be broken by adding a package manager target to, say, tar up the source code of the package manager and its dependencies, along with a shell script of build commands necessary to link all that source code together to create the package manager executable. (Cargo issue #1026)

I am not suggesting that we should avoid circular dependencies in developer tools: Eating our own dogfood is an extremely important way in which our tools improve over time, and tools like this should be capable of self-hosting. But the more difficult it is to disentangle the circular dependency that results, the more difficult it becomes to bootstrap our tools, which will ultimately mean less people adopting our tools in new environments. Most of us aren't building Infinite Improbability Drives. Bootstrapping should be a first-order concern.