Recursive Make Considered Harmful

I'm setting up a new project, and I find that beginnings are a time for reflection and thought. One area of investigation is how I want to design the buildsystem. As part of my research, I stumbled across Peter Miller's 1998 paper "Recursive make Considered Harmful". Many projects I've worked on have involved a recursive make setup, and I have definitely struggled with the problems that Miller mentions in the paper.

For those who are not familiar, recursive make is a build system strategy. To quote the paper, the strategy involves a "hierarchy of directories containing source files for the modules which make up the project, where each of the sub-directories contains a Makefile which describes the rules and instructions for the make program. The complete project build is done by arranging for the top level Makefile to change directory into each of the sub-directories and recursively invoke make."

Why should you care about build system issues or proper build system design? As Miller puts it:

Building your project is a fundamental activity. If it is performing poorly, so are development, debugging and testing. Building your project needs to be so simple the newest recruit can do it immediately with only a single page of instructions. Building your project needs to be so simple that it rarely needs any development effort at all. Is your build process this simple?

Even if you don't care about build systems, review this paper to gain exposure to potential make issues. You will then be better prepared to debug a build problem in the future.

Read "Recursive make Considered Harmful" here.

My Highlights

The beginning of the paper summarizes problems with the recursive make strategy:

There are numerous problems with recursive make, and they are usually observed daily in practice. Some of these problems include:

  • It is very hard to get the order of the recursion into the sub-directories correct. This order is very unstable and frequently needs to be manually “tweaked.” Increasing the number of directories, or increasing the depth in the direc- tory tree, cause this order to be increasingly unstable.
  • It is often necessary to do more than one pass over the sub- directories to build the whole system. This, naturally, leads to extended build times.
  • Because the builds take so long, some dependency information is omitted, otherwise development builds take unreasonable lengths of time, and the developers are unproductive. This usually leads to things not being updated when they need to be, requiring frequent “clean” builds from scratch, to ensure everything has actually been built.
  • Because interdirectory dependencies are either omitted or too hard to express, the Makefiles are often written to build too much to ensure that nothing is left out.
  • The inaccuracy of the dependencies, or the simple lack of dependencies, can result in a product which is incapable of building cleanly, requiring the build process to be carefully watched by a human.
  • Related to the above, some projects are incapable of taking advantage of various “parallel make” impementations, because the build does patently silly things.

A summary of make's strategy:

Make determines how to build the target by constructing a directed acyclic graph, the DAG familiar to many Computer Science students. The vertices of this graph are the files in the system, the edges of this graph are the inter-file dependencies. The edges of the graph are directed because the pair-wise dependencies are ordered; resulting in an acyclic graph - things which look like loops are resolved by the direction of the edges.

We force a potential problem into DAG generation:

The use of recursive make affects both phases of the operation of make: it causes make to construct an inaccurate DAG, and it forces make to traverse the DAG in an inappropriate order.

A fair comment about build system modularity:

This example is intentionally artificial, and thoroughly so. However, all “modularity” of all projects is artificial, to some extent. Consider: for many projects, the linker flattens it all out again, right at the end.

On reshuffling library build order to fix builds:

To answer this question, it is necessary to look, not at the graphs, but the order of traversal of the graphs. In order to operate correctly, make needs to perform a postorder traversal, but in separating the DAG into two pieces, make has not been allowed to traverse the graph in the necessary order − instead the project has dictated an order of traversal. An order which, when you consider the original graph, is plain wrong. Tweaking the top- level Makefile corrects the order to one similar to that which make could have used. Until the next dependency is added...

When libraries have an order dependency, your opportunity to use parallel builds is lost:

Note that “make -j” (parallel build) invalidates many of the ordering assumptions implicit in the reshuffle solution, making it useless. And then there are all of the sub-makes all doing their builds in parallel, too.

On repeating builds to get a proper binary output: it's just not scalable!

This doubles the length of time it takes to perform the build. But that is not all: there is no guarantee that two passes are enough! The upper bound of the number of passes is not even proportional to the number of modules, it is instead proportional to the number of graph edges which cross module boundaries.

But avoiding these problems is "simple":

To avoid these problems, don’t break the DAG into pieces; instead, use one Makefile for the entire project. It is not the recursion itself which is harmful, it is the crippled Makefiles which are used in the recursion which are wrong. It is not a deficiency of make itself that recursive make is broken, it does the best it can with the flawed input it is given.

Miller then addresses common complaints about single-Makefile strategies. I have selected some comments here:

If the entire project build description were placed into a single Makefile this would certainly be true, however modern make implementations have include statements. By including a relevant fragment from each module, the total size of the Makefile and its include files need be no larger than the total size of the Makefiles in the recursive case.

The complexity of using a single top-level Makefile which includes a fragment from each module is no more complex than in the recursive case. Because the DAG is not segmented, this form of Makefile becomes less complex, and thus more maintainable, simply because fewer “tweaks” are required to keep it working.

Recursive Makefiles have a great deal of repetition. Many projects solve this by using include files. By using a single Makefile for the project, the need for the “common” include files disappears − the single Makefile is the common part.

Is doing a full project build every time so absurd? If a change made in a module has repercussions in other modules, because there is a dependency the developer is unaware of (but the Makefile is aware of), isn’t it better that the developer find out as early as possible? Dependencies like this will be found, because the DAG is more complete than in the recursive case.

Breaking the set of files up into 100 modules, and running it as a recursive make takes about 25 seconds. The repeated process creation for the subordinate make invocations take quite a long time.

I then learned some interesting facts about make that I was not aware of!

The include dependencies will be recomputed unnecessarily, or will be interpreted incorrectly. This is because make is string based, and thus “.” and “../ant” are two different places, even when you are in the ant directory. This is of concern when include dependencies are automatically generated − as they are for all large project

The input language for Makefiles is deceptively simple. A crucial distinction that often escapes both novices and experts alike is that make’s input language is text based, as opposed to token based, as is the case for C or AWK. Make does the very least possible to process input lines and stash them away in memory.

In this case humans expect make to assign two filenames to OBJ, but make actually assigns the string “$(SRC:.c=.o)”. This is because it is a macro language with deferred evaluation, as opposed to one with variables and immediate evaluation.

As a rule of thumb: always use immediate evaluation assignment unless you knowingly want deferred evaluation.

Many Makefiles perform the same text pro- cessing (the filters above, for example) for every single make run, but the results of the processing rarely change. Wherever practical, it is more effi- cient to record the results of the text processing into a file, and have the Makefile include this file.

Positive side effects of a single Makefile:

There are a couple of desirable side-effects of using a single Makefile.

  • The GNU make -j option, for parallel builds, works even better than before. It can find even more unrelated things to do at once, and no longer has some subtle problems.
  • The general make -k option, to continue as far as possible even in the face of errors, works even better than before. It can find even more things to continue with.

Miller ends with a killer argument:

Building your project is a fundamental activity. If it is performing poorly, so are development, debugging and testing. Building your project needs to be so simple the newest recruit can do it immediately with only a single page of instructions. Building your project needs to be so simple that it rarely needs any development effort at all. Is your build process this simple?