Je me souviens
Everything here is my opinion. I do not speak for your employer.
November 2018
December 2018

2018-11-13 »

mtime comparison considered harmful

tl;dr: Rebuilding a target because its mtime is older than the mtimes of its dependencies, like make does, is very error prone. redo does it better, and so can you.

A recent twitter discussion (pro tip: never do those) led me to realize that I have studied the problem of mtime comparison in considerably more depth than most people. I don't know whether to be proud of this or very concerned, but nevertheless, here we are. Soon, you'll know everything I do about the topic. I expect you will regret that as much as I have.

What is an mtime, anyway?

mtime is the "modified time" for the content associated with a given file. Generally, if anyone writes bytes anywhere in a file, the mtime will be updated. If a file has more than one name (ie. it's hardlinked to more than one place), all the names share the same inode and content, and thus all share the same mtime.

Annoyingly, when you update the content of a file, the mtime of its containing directory is not changed. All sorts of very convenient tree traversals would be possible if the directory mtime were updated (recursively to the root) when contained files changed, but no. This is probably because of hardlinks: since the kernel doesn't, in general, know all the filenames of an open file, it literally cannot update all the containing directories because it doesn't know what they are either. And anyway, purists might argue that the "content" of a directory doesn't change when the files it points to change; the content is merely a list of filenames and inode numbers, after all, and those stay the same, no matter what happens inside those inodes. Purists make me sad.

(Random side note: on MacOS, the kernel does know all the filenames of a hardlink, because hardlinks are secretly implemented as fancy symlink-like data structures. You normally don't see any symptoms of this except that hardlinks are suspiciously slow on MacOS. But in exchange for the slowness, the kernel actually can look up all filenames of a hardlink if it wants. I think this has something to do with Aliases and finding .app files even if they move around, or something.)

Related to mtime is the ctime, which most people would guess means "create time," but it absolutely does not. It means "attribute change time," which is different from the "modified time" because it updates whenever various inode fields change, not just the file contents. mtime is one of the inode fields, so whenever mtime changes, ctime also changes, but not vice versa. Among other things, ctime changes when file ownership, size, or link count change.

Link count is especially interesting: if you create or delete a hardlink to a given file, its ctime changes. Renaming is defined as creating a new hardlink and then removing another one, which means it updates the ctime (not the mtime), even though when it finishes, the link count is back to normal so the inode looks unchanged (other than the ctime). (Whether rename's create and unlink are supposed to be a single atomic transaction is a subject of much debate.)

So anyway, ctime changes much more sensitively than mtime. It turns out that mostly you don't care about the changes ctime measures, so it causes false positives, especially because of that pesky link count, but if you're paranoid, this can be helpful. Let's mostly talk about mtimes for now.

For completeness, there is also the atime, which means "access time." Originally, this would update whenever anyone "accessed" a file, usually defined as reading bytes from it. But this is unhelpful for two reasons: first, it means reading a filesystem causes writes to that filesystem, which greatly increases disk load (some people estimate by ~30%). Secondly, the definition of "access time" does not match what end users mean, which means various programs (especially backup software and search engines) try to avoid updating it. This workaround is so common that Linux added an O_NOATIME flag to open(2) to prevent updating atime. The default atime performance hit is so bad that many filesystems now have a relatime mount flag which decreases the precision of atime, thus reducing disk load. (Trivia: the Debian popularity-contest, which I started long ago, uses atime to figure out which installed packages you actually use.) (More trivia: if you mount your filesystem readonly, it is technically not POSIX compliant anymore because the atimes won't be updated.)

Popular misconceptions about mtime

  • How precise is it? It depends on the OS and filesystem. Originally, mtime had a precision of one second, which is all you can safely rely on. Nowadays most OSes have a stat(2) syscall that returns a struct timespec, which contains nanoseconds, but almost no filesystems provide that level of precision, and it depends on your kernel and disk format. For example, on my system (Debian Linux 4.9.0-7 with ext4), I get about 0.01s granularity. Stackoverflow has an explanation.

  • Is mtime monotonically increasing? No, it can go backwards. For example, the utimes(2) syscall, used by the touch command, can set the mtime to any value at all. (And tar might do this when extracting a tarball, for example.) If your system clock jumps from one time to another, it will set subsequent mtimes to match the new clock, even if the jump was backwards. And so on.

  • Does mtime get set to >= the current time? No, this depends on clock granularity. For example, gettimeofday() can return times in microseconds on my system, but ext4 rounds timestamps down to the previous ~10ms (but not exactly 10ms) increment, with the surprising result that a newly-created file is almost always created in the past:

      $ python -c "
      import os, time
      t0 = time.time()
      open('testfile', 'w').close()
      print os.stat('testfile').st_mtime - t0
      "
    
      -0.00234484672546
    
  • Does mtime get set to <= the current time? No, it might be set to a future time. For example, imagine you have an NFS server whose clock is set 5 seconds in the future relative to your client. The mtime is assigned by the server, so when you create the file, its mtime will be 5 seconds in the future. (Changing the standard so that mtime is set by the client doesn't really help: then programs running on the server will see a file 5 seconds in the past. And relying on ntpd isn't perfect either: it can only reduce clock skew between machines, not eliminate it.) For extra inconsistency, if a client uses utimes(2) to force the time to a particular value, this gets passed through to the server unchanged.

  • Is mtime always nonzero? No. Various cheaply-written virtual filesystems, like many fuse-based ones, don't bother setting mtime.

  • Does a changed mtime guarantee that a file has different content? No. Perhaps you wrote a block that happened to be identical to the block that already existed at that point in the file; the mtime changes anyway. Perhaps you wrote a block and then changed it back; the mtime changes twice.

  • Does changed content guarantee a changed mtime? No. Clock skew, low precision, or utimes(2) can cause an mtime to be the same as last time you checked. (This is also true for ctime, etc.)

  • Do version control systems like git save the mtime? No, not really. The tree and blob objects stored by git contain no timestamp information at all. (This is very good for deduplication.) commit objects contain various timestamps (commit time, author time, etc), and you could use that to reverse-engineer a guess for the mtime of a given file: the commit time of the most recent commit that changed that file's content, for example. But that's not what people do, mostly because it creates problems with make, which we'll get to shortly. (Git doesn't have the dangerous mtime-setting feature built in, but it does seem to exist in svn. You probably still shouldn't do it.)

    (This all creates interesting philosophical questions. Is the "last modified" time of a file the time when the new content itself was created, or when this particular instance of it was written to disk? If you had a sci-fi device that could make a perfect scan of my physical being and run me in a simulation, what would be the mtime of the input file? And so on.)

    (The bup project I started, which uses a git-formatted repo to back up your filesystem, does need to save mtime and other metadata. It stores metadata in separate hidden files in the git tree and reapplies it at restore time.)

  • Does switching branches in git screw up the mtime? No, not more than anything else. git just rewrites the changed files and lets the kernel update the mtime, so they look as if someone edited them with a text editor.

  • Does writing to a file via mmap() update the mtime? Hah. Well, maybe. See, POSIX guarantees that the mtime "will be marked for update at some point in the interval between a write reference to the mapped region and the next call to msync() ... If there is no such call, these fields may be marked for update at any time after a write reference." This definition actually leaves a lot of leeway for weirdness.

    I wrote a little test program (mmap_test.c) to check how this works nowadays, and, of course, it varies between OSes. On Linux (4.9.0, ext4), the mtime is updated at the first page dirty after an mmap() or msync(). On FreeBSD (11.2, ufs), it's updated at msync() or munmap() time. On MacOS (10.11.6), it updates only at msync() time, not at munmap() time. I even tried with the "WSL personality" (4.4.0-17134-Microsoft) on Windows 10, which had especially terrible results: mmaped writes never updated the mtime at all.

    I think the MacOS behaviour is allowed because the spec says "may" instead of "will" in that second sentence, but it's a stretch. The Linux behaviour may be illegal depending how you define "a write reference"; Linux seems to interpret it as "the first" or "a randomly selected" write reference, while I would expect to interpret it as "each" write reference (with the result that mtime must be updated at least once between the last reference and the msync(), which would be fine).

    Of all these, the only useful behaviour seems to be FreeBSD's; at minimum, we surely want mtime to be updated at least once after all changes to a file have been done. MacOS and Linux don't always do so, and WSL never does so. This lends credence to the claim that the .git/index file, which uses mmap, is synced incorrectly by file sync tools relying on mtime. Ironically, the faster and better the file sync tool, the more likely it is to hit the race condition. An easy fix would be to have git always write() a useless byte before closing the index file. But I'd prefer if the kernel were less dumb.

Okay! That's the introduction. Now let's move on to application.

mtimes and make

I've kinda ruined the surprise by listing the caveats above. But let's look at what all that means when we try to use mtime for something.

make dependencies work in a very simple way. Now that we, as an industry, have decades of experience learning all the above caveats, we might describe it as "naive" in the sense that, when make was first invented, nobody had heard of all these problems, so it would be unfair to expect the author to design around them. In the world where make was first written:

  • there was no NFS;
  • there was no mmap;
  • there was no version control;
  • there were no fuse filesystems;
  • computers and compilers were so slow that a one-second timestamp granularity was never a problem.

In that world, they made the seemingly obvious decision to rebuild any target if the mtime of any of its dependencies was > the mtime of the target. (If you want to be extra safe in the presence of granularity problems, rebuild if >= rather than >.) This was an exciting innovation at the time.

Unfortunately, we now know that this can lead to numerous mistakes:

  • with NFS and clock skew, if a source file is edited on one machine and you run make on another, the input file might have mtime < target mtime, so nothing will happen. Or, you might rebuild the target and its mtime will still be < source mtime, so it'll be rebuilt again later.

  • If you accidentally set your system clock ahead by a day and build some stuff, then set your clock back to the present, all the stuff you built during that time will show up as "in the future" and therefore newer than source files you edit today, preventing all rebuilds. (Eventually GNU make started detecting future-dated files and printing a warning.)

  • If you have files modified through mmap, the mtime might not be up to date. (Luckily mmap is a rarity when editing source files or building software. Usually you aren't directly using live databases as your source files.)

  • If you replace one source file with another, eg.

      mv foo.c foo.c.bak
      mv foo.c.new foo.c
    

    then the mtime is not updated, and make will see the old mtime of foo.c.new. That might be older than your foo binary, even though the binary does not yet contain the new foo.c. It won't be rebuilt.

  • If you have a dependency like

      foo.a: $(patsubst %.c,%.o,$(wildcard *.c))
    

    (ie. produce foo.a from all the .o files built from all the C source files), then if one of the source files is deleted, it will no longer be one of the dependencies at all. But all the remaining dependencies are still older than foo.a, so foo.a will not be rebuilt.

  • If you put automake/autoconf-generated files (like ./configure and Makefile) in version control, you can get surprising results. Let's say automake has a Makefile rule to regenerate Makefile whenever the automake input files (eg. Makefile.am) change. In a tarball, which preserves mtimes, this will work, because Makefile will be newer than Makefile.am. But in a version control system, which uses the default kernel-assigned mtime when writing the files, it's undefined whether Makefile or Makefile.am is written first. If your timestamps are high precision (or they're low precision and you get unlucky), then Makefile could be "older" than Makefile.am, and automake will try to run anyway. Or if not, then it won't. So different people checking out the same source code will get different results based on random luck.

  • Computers are now so fast that you can save foo.c in your editor, and then produce foo.o, and then compile foo, all in the same one-second time period. If you do this and, say, save foo.c twice in the same second (and you have one-second granularity mtimes), then make can't tell if foo.o and foo are up to date or not. (As above, make can work around this by assuming if source mtime == target mtime, the target still needs to be rebuilt. This could cause spurious rebuilds, but is less dangerous than missing rebuilds.)

    (This often happens if you're using one of those fancy new inotify-based tools that fires off a compile immediately, every time you hit save in your editor. Typescript does something like this, for example, as do auto-reloaders for various modern web languages. Symptom: needing to save your source file twice before the autocompiler catches it. And it happens more on MacOS, which has 1-second mtime granularity, than on Linux, which has 0.01-second mtimes.)

  • If your source files are in a virtual filesystem where mtime is always 0, then make will always think your source files have not changed and the target will never rebuild.

While we're here, there are some other common problems that aren't really the fault of mtime, but are common dependency problems with make:

  • If you upgrade your toolchain (eg. your C compiler), make doesn't know to rebuild your source files, unless you declare an explicit dependency on the toolchain files, which nobody does because it's hard to write that system-dependent stuff as a Makefile dependency rule. (This is one reason autoconf needs to be a ./configure script that generates a Makefile, instead of just a dependency executed by your Makefile.)

    For that matter, when you update your toolchain, it's often from a distro-provided package (basically a tarball) with timestamps helpfully in the past, which are probably older than all your output files. So make won't see it as updated anyway!

  • If you pass variables on the make command line, like CFLAGS=-O2, they will usually not be part of a dependency and so won't cause a rebuild, and you'll end up with programs built halfway with the old flags, and halfway with the new ones. You can fix this by writing CFLAGS to a file, atomically replacing it only if the content differs, and depending on that file. But nobody does.

  • If you modify the Makefile, make will not by default rebuild any targets. You can fix this by adding an explicit dependency on Makefile, but this is a giant pain during development, because Makefile contains all your build rules; you don't want to recompile every source file just because you changed the linker command line, for example. (Some nowadays-rare versions of make actually tried to track Makefile changes, per rule, and cause rebuilds for these cases.)

make is not the only program that is affected by naive use of mtime. It's fairly common. For example, Go had so much trouble that they recently changed the Go compiler to just read and hash all the input files every time it runs. (Thanks to bradfitz for this link.)

redo: mtime dependencies done right

I happened to be aware of all these problems (well, not the mmap() madness; bleah!) when I set out to write redo so many years ago. I was also influenced by djb's design for redo, in which he writes, "When redo is asked to create a file that it hasn't heard of before, it presumes that the file is a source file if it exists, or a target file otherwise. In the second case (new target), redo immediately saves this decision to disk."

In other words, redo's design fundamentally depends on keeping a database of targets, if only to remember which files were produced by redo and which were not. From there, it's easy enough to extend that database to include mtime information about sources. And from there, we can add a bit more metadata to make the timestamp even more reliable.

My implementation of redo remembers the following information about each source and target:

  • mtime
  • size
  • inode number
  • file mode
  • owner uid and gid
  • (targets only) the sequence number of the last time it was built

redo considers a dependency dirty if any of those attributes changed since the last time a target was built. Notice how this dodges the various problems of mtime skew:

  • NFS client/server time skew doesn't matter; as long as the mtime changes in any direction, it's fine.

  • mmap() weirdness is reduced, because we notice changes in file size, as well as source mtimes that changed but are still older than the target.

  • If you mv a file to replace another, it will have a different inode number, which we notice. It also probably has a different size and (even if not newer than the target) mtime, any of which are sufficient.

  • Because redo has a database of all the dependencies used to produce a given target, if one of those inputs disappears, the target needs to be rebuilt. make doesn't remember the dependencies used last time, it only remembers the dependencies declared this time, so it can miss important changes in the list of dependencies.

    (More generally, it's an interesting mathematical phenomenon that to correctly build software, we need to know not only the dependencies as they are now, but as they were before. Those two lists are used very differently. I don't think most build systems are designed with this realization, and it leads to subtle failures.)

  • If you put autoconf/automake generated files in your source repo, redo will "presume that the file is a source file," make a note of that, and not rebuild it. (It's still probably not a great idea to check those into version control. But at least now your build system won't go crazy.) If you then delete them, redo will consider them targets to be built.

  • redo has special treatment of source files whose mtime == the target mtime, so it can correct for overlaps even when your filesystem has very coarse timestamp granularity. Also, if you continue editing a source file, it will usually end up with a changed size, which also marks it as changed.

  • If your source files are in a braindead fuse filesystem, redo can use inode number and size to detect changes (although it still sucks and you should fix your fuse filesystem).

We can also fix the non-mtime-related missing dependencies:

  • It's easy to declare dependencies on your toolchain, because the rule for each target can track which parts of the toolchain were used while building, then retroactively declare a dependency on those. And we still notice a change if the new mtimes are in the past.

  • redo doesn't allow you to set variables on the command line; you have to write them to a file instead. This lets you easily declare dependencies on the file.

  • Since rules are written in separate .do files instead of one big Makefile, it's reasonable for redo to auto-declare a dependency on the .do file it used for a given target. When you edit a rule, the affected targets are automatically rebuilt.

I mentioned above that the Go compiler had problems with naive mtime-based dependency checking. I don't expect Go to switch to redo, but they could solve their problems in a similar way: generate a "database" (which might just be a text file) at build time. In the database, list the source files and their stamp information (mtime, inode, etc). Also list the toolchain version and relevant command line flags. Next time, read the database and compare against the new list of source files, the new stamps, and the new flags. If any are different, run the build. (Of course, all this is just a performance optimization that allows the compiler to avoid opening and reading files unnecessarily. The Go developers might reasonably continue to opt for the slower choice with fewer edge cases.)

Why not use checksums instead of mtimes?

Inevitably when the discussion of build dependencies comes up, someone who has heard part of the above story (usually some of the make problems caused by mtime comparisons) suggests throwing away mtimes entirely and always doing dependencies based on file checksums.

This can work, sometimes. And wow, I love checksums a lot (I wrote bup after all). But it isn't perfect for every situation.

As a clue to how complicated this can get: most people talking about this option suggest checksums as a way to avoid false negatives, ie., failing to rebuild when a source file has changed. But inode attributes change, in theory, at least as often as the content hash changes. Checksums are more useful for reducing false positives (ie. to avoid rebuilding in situations where we know the output will be identical). If someone is talking to you about rebuilding based on checksums, ask if they have thought about that difference.

Anyway, here are some specific problems with checksum-based dependencies:

  • Sometimes building a target has side effects. For example, imagine you have a redo rule for deploying a container to AWS. This does not really produce a "file" locally that you can checksum; it usually produces just log messages, or blank output, and the checksum of that will usually not change. Now, imagine you have a second container that you want to deploy only if the first container gets deployed correctly. If the checksum of the first container deployment is unchanged, the second one will think all its dependencies are unchanged, and not run, which might be incorrect. There are numerous other examples of side effects where this always-use-checksums behaviour is undesirable.

    (On the other hand, some systems out there, like blaze/bazel, specialize in build systems without side effects. In that case a pure-checksum system is more appropriate. But then you have to escape from such systems if you want to do fun stuff like deploying containers. You end up punting the dependency problem elsewhere.)

  • Checksumming every output after building it is somewhat slow. This requires the build system to read the whole content of the file and do some math on it. Mostly this is not too serious: the file is probably already in disk cache (since you just wrote it a moment ago!) and calculating a checksum is almost always much faster than generating the file in the first place. And it only happens when a build was needed, which is expensive anyway. But it does add time to every build step.

  • Checksumming every input file before building is very slow. If you're considering whether to rebuild foo.a, and foo.a depends on *.o, and each.o depends on each.c, then you have to checksum the full content of every .c file every time you consider making an incremental build of foo. In large projects, this could be thousands, or tens of thousands of files, each of which we have to open(), read(), checksum, and close(), possibly over a network filesystem. For small projects this is fine, but for large projects, this sucks a lot.

    blaze/bazel come from a world where source files are stored in a virtual filesystem, which happens to have the ability to tell you a precalculated checksum for every source file (except the ones you've changed locally). If you only have to checksum your locally-changed files on each build, that'll be very fast. But you need filesystem support to make this possible, and we can't assume that everywhere.

redo does support checksum-based dependencies, but it avoids the above problems as much as possible:

  • If you do nothing, redo uses database-mtime-based dependency checking, which is extremely fast on all operating systems. It's even reasonably fast on NFS.

  • redo-stamp lets you provide, after building a target, the data used to calculate that target's checksum (which might differ from the target itself, if you want).

  • redo-stamp records the checksum in its database after building a target. Any downstream target remembers that checksum in its list of dependencies; if it changes later, then the downstream target needs to be rebuilt. There is no need to actually recalculate any checksums when checking dependencies in the future. No special filesystem support is needed.

So you can use redo-stamp, in appropriate places, to reduce false positives in a way that causes overhead only at build time (not for checking dependencies later), and only for targets that need it.

That mmap() behaviour though. Seriously.

I'm CEO at Tailscale, where we make network problems disappear.

Why would you follow me on twitter? Use RSS.

apenwarr on gmail.com