Four-way list

8–

Purpose

Today I publish a cute little progam I wrote back in 1994, but that nowadays I still use quite often. Its purpose is to compare two sets of files, in two directory trees having the same structure.

4waylist.c is a self-contained C source, so it can be compiled by the command
cc 4waylist.c -o 4waylist
for example. Usage:
4waylist [-prefixlength] leftfile ritefile

Output

The description goes from output to input. The program produces four text files con­taining file paths, each with a two-letter filename.

Input

Input in each of the two files should look like this, for example:

CRC C8EA79E8 Size      5127 ruud/4waylist/4waylist.c
CRC 40F3E7C7 Size       759 ruud/4waylist/README-locale.htm
CRC 1D2D0493 Size      2448 ruud/4waylist/filecrc.c

So we see a 28 byte long prefix containing a CRC32 and the byte count of each file, then the path.

How to obtain the input

Input for 4waylist can be obtained like this, for example:
find . | LC_ALL=C sort | filecrc > ~/thisfileset

You should run this on both filesets of course, which can reside on the same or a different machine, as long as you have a way to bring the results together so 4waylist can use them.

filecrc

filecrc.c for computing the CRC32’s is a little program I wrote two days ago, using code from Wikipedia. It is compatible with a program bincrc that I wrote in the second half of the 1980s, as part of a package binmnt for binary maintenance. I no longer use that and I won’t publish it, because rsync is way better and more powerful.

bincrc.c used freely usable code that was © 1986 by Gary S. Brown.

I later polluted binmnt with some sort of encryption, and stupidly I can no longer find the original, simpler sources. Hence the recent rewrite of bincrc.c as filecrc.c.

filecrc, like bincrc, can take its filenames from stdin, or as command line arguments.

Both programs write their output (CRC, size, name) to stdout, ready for redirection to a file. In addition, they write file size and path to stderr, with a \r at the end of each line, so in a lengthy operation involving many and/or large files, the user will see that the program is still active, and what progress it is making.

Collating order

In the olden days when 4waylist was written, there was only an implicit C or POSIX locale, and there were only characters, not wide characters. Or maybe all of that did already exist but I wasn’t aware of them. That means 4waylist compares strings by byte value. To achieve any meaningful results, input files must also be in that order. So if a locale is active that sorts uppercase and lowercase letters together, despite their very different byte values, that will disrupt the algorithm.

Hence the recommended LC_ALL=C sort .

File types

filecrc will skip anything that is not a regular file, so something like
find . -type f
isn’t necessary. If you want to include paths findable through symbolic links, you’ll probably have to use find option -L .