Four-way list
8–
Purpose
Today I publish a cute little progam I wrote back in 1994, but that nowadays I still use quite often. Its purpose is to compare two sets of files, in two directory trees having the same structure.
4waylist.c
is a self-contained C source, so it can be compiled by the command
cc 4waylist.c -o 4waylist
for example. Usage:
4waylist [-prefixlength] leftfile ritefile
Output
The description goes from output to input. The program produces four text files containing file paths, each with a two-letter filename.
eq, equal: Files that occur in both file sets, and have the exact same contents.ne, not equal: Files that occur in both file sets, but have different contents.le, left: Files that occur only in the file set, the counts and paths of which were specified as the first command line argument.ri, right: Files that occur only in the file set, the counts and paths of which were specified as the second command line argument.
Input
Input in each of the two files should look like this, for example:
CRC C8EA79E8 Size 5127 ruud/4waylist/4waylist.c CRC 40F3E7C7 Size 759 ruud/4waylist/README-locale.htm CRC 1D2D0493 Size 2448 ruud/4waylist/filecrc.c
So we see a 28 byte long prefix containing a CRC32 and the byte count of each file, then the path.
How to obtain the input
Input for 4waylist can be obtained like this, for
example:
find . | LC_ALL=C sort | filecrc > ~/thisfileset
You should run this on both filesets of course, which can reside
on the same or a different machine, as long as you have a
way to bring the results together so 4waylist
can use them.
filecrc
filecrc.c
for computing the CRC32’s is a little program I wrote two days ago,
using code from
Wikipedia.
It is compatible with a program bincrc that I
wrote in the second half of the 1980s, as part of a package
binmnt for binary maintenance. I no longer use
that and I won’t publish it, because rsync is
way better and more powerful.
bincrc.c used freely usable code that was © 1986 by
Gary S. Brown.
I later polluted binmnt with some sort of encryption, and
stupidly I can no longer find the original, simpler sources. Hence
the recent rewrite of bincrc.c as filecrc.c.
filecrc, like bincrc, can take its filenames
from stdin, or as command line arguments.
Both programs write their output (CRC, size, name) to stdout,
ready for redirection to a file. In addition, they write file size
and path to stderr, with a \r at the end of
each line, so in a lengthy operation involving many and/or large files,
the user will see that the program is still active, and what progress
it is making.
Collating order
In the olden days when 4waylist was written, there was
only an implicit C or POSIX locale, and there were only characters,
not wide characters. Or maybe all of that did already exist but
I wasn’t aware of them. That means 4waylist compares
strings by byte value. To achieve any meaningful results, input
files must also be in that order. So if a locale is active that
sorts uppercase and lowercase letters together, despite their
very different byte values, that will disrupt the algorithm.
Hence the recommended LC_ALL=C sort .
File types
filecrc will skip anything that is not a regular file,
so something like
find . -type f
isn’t necessary. If you want to include paths findable through
symbolic links, you’ll probably have to use find
option -L .
Copyright © 2025 by R. Harmsen, all rights reserved.