Fixing a hole where the rain gets in

2 November 2014

Introduction

The title of course is a quote from the Beatles’ song. Not very well chosen, because my article has nothing to do with rain, but is about files and Unix. Sorry, nothing better came to mind.

Where the hole comes into the story isn’t easy to say in a few words. You’ll find out if you ever manage to finish reading and don’t judge my article as “TL;DR” (IHUA!).

The directory name, truncexp, is short for Truncate Experiment. That doesn’t quite describe it either. OK, enough unclarity for now, let’s begin.

From scratch, from memory

For reasons irrelevant to the article, I recently recommissioned my web and mail server VPS (Virtual Private Server) from scratch, installing a fresh FreeBSD 10.0 after having used earlier versions of that same operating system until then.

Contrary to my usual modus operandi (sorry, in Dutch only) I didn’t do the new installation on a new server keeping the old one live until being done. Instead I did the fresh install on the live server, cold turkey. So there was some downtime, no website, no e-mail. I didn’t mind this time.

Of course I had backups. But they proved to be incomplete. So I ended up having to reconstruct my crontabs (as root and as myself) from memory. Stupid me. I did manage though.

Reducing the log file

I don’t want Apache’s log file to keep growing and growing. So once every night I have superuser root reduce it to say 5000 lines. I couldn’t quite remember how I used to do that in the earlier incarnations of my server, and I thought of this:

15 3 * * * cd /usr/local/www/apache24/logs; tail -n 5000 access_log > a; mv a access_log

But that doesn’t work. It results in the log file getting frozen in its quarter past three state, and no new log files lines are ever noticeably written to it (although they are!). Why’s that?

Writing to a black hole after moving

To understand why moving causes an apparent file freeze, we have to look at what a Unix mv (move) actually does. Or what I suppose it does, I never actually opened any source code.

To me it seems reasonably likely that mv x y first tries to rm (actually unlink) file y, just in case it existed. Then x is linked under the new name y also, so now the file known so far as x, is accessible under two names, x and y.

The last step is to unlink the name x, so its contents and attributes and everything else that may somehow belong to it, lives on under its new name y, and the mv (move, or rename) has effectively taken place.

If you do that to an Apache log file, as I did, the command mv a access_log, where a is a partial copy of the original log file, created by tail, the new file access_log won’t have the inode number and birth date (on an OS and filesystem that supports the latter) of the original log file. Instead, it gets those of the partial copy.

What’s more, what is usually felt as a delete action, Unix rm (remove), is actually just an unlike. That is, the name – in fact a name, one of the names – of the file is removed from the directory it resides in.

Only if a file ends up with zero names, and no process still has the file open, is it actually deleted or removed. Or as the manual page to unlink(2) (which can be invoked by the command man 2 unlink) so nicely puts it: “all resources associated with the file are reclaimed.”

In the situation under consideration, this means that mv a access_log doesn’t really delete the existing, larger log file that I wanted to reduce to just the most recent lines. Apache 2.4 still has the file open, so the file remains in existence and can be written to without error, which is what Apache keeps doing. (I later verified in my little experiment program (see below), that in this situation no error code is returned.)

But because the shorter file a has taken over the name access_log, and the original log file is left with no name at all, there is no way to access that file, for no directory entry exists to it anywhere. Apache more or less writes its log lines to a black hole, never to be seen by anyone, human or machine in nature. (Note that this is a different kind of black hole than /dev/null .)

Pretty stupid. Not a good solution.

The feline approach

My next idea was a feline approach also known as the cat method, as follows:

15 3 * * * cd /usr/local/www/apache24/logs; tail -n 5000 access_log > a; cat a > access_log; rm a

This way I did not put the shortened file in the place of the older one, but I wrote its contents over the existing log file. So the log file remained the same file, had the same inode number as before and remained accessible under the same name, access_log.

This worked well, and has worked well for years, because after reinventing this method, I realised it was what was there already on previous versions of my VPS.

Yet, after a night’s sleep, I wondered why it worked and if it really did work as intended. I wanted to know more about it. I have been pretty familiar with UNIX file operations already since my first encounter in 1985, but it’s never too late to detail one’s knowledge.

So I wrote some silly little C programs to do some testing to find out what really happens.

Not a black hole, but still a hole

Using my test programs, I had loglines write 600 lines (I cut the more realistic situation by a factor ten) of nonsense log file information (log file opened in "w" mode), just to get some substance to the file. Then once every 10 seconds or so, loglines writes an additional line.

Meanwhile, I used wrimod to copy the most recent 400 files to a temp file, and then use those contents as the new logging information, created by various methods.

Now when using the ‘cat method’ (invocation wrimod -c, or doing the same thing directly, option -w), the result was a file with a hole in it!

That is, it starts with line number 205 or so, through 604 (making the total of 400 lines that Unix command tail selected), and then it goes on with 607 for example, written by program loglines later on. But when you look at the byte positions, and the line numbers calculated from that, you can see something strange:

Writing a line to the logfile, count =   602 (calc'd:   602), position =     49966
Writing a line to the logfile, count =   603 (calc'd:   603), position =     50049
Writing a line to the logfile, count =   604 (calc'd:   604), position =     50132
Writing a line to the logfile, count =   607 (calc'd:   607), position =     50381
Writing a line to the logfile, count =   608 (calc'd:   608), position =     50464
Writing a line to the logfile, count =   609 (calc'd:   609), position =     50547

The numbers mean that loglines continues to write where it used to do, despite of the cat (or cat-like write operation) having made the file smaller! So the log file consists of lines 205 – 604, then has a gap that is physically there but isn’t shown when you read the file (e.g. with cat, more or wc), after which follow the lines starting from number 607.

ls and stat show that the file size is unaffected although the contents are smaller. This is undesirable behaviour, creating such a file with a hole doesn’t make sense, although this is possible and intended behaviour in Unix filesystems. I became aware of that already sometime during my first Unix period, 1985 – 1990.

(When invoking wrimod with its -p option, it opens the file in read-write mode. This results in the hole being filled up with the old contents. Lines 205 – 604, then 405 to six hundred something.

OK, so cat was not the way and I came up with this instead, in root’s crontab:

15 3,15 * * * cd /usr/local/www/apache24/logs; apachectl stop; tail -n 5000 access_log > a; mv a access_log; apachectl start

So I properly stopped web server Apache, did my log file manipulations, and then restarted Apache. Much more comme il faut and reliable. So far so good.

No hole when appending

After another night’s sleep however, I realised that it cannot possibly be true that Apache opens the log file in write mode (with "w" in fopen), because that would truncate the file every time Apache restarts. (A truncate reduces file length to zero bytes, but does not assign a new inode number and it keeps the birth date the same.)

Instead, Apache logically could only open the log file with "a", i.e., in append mode. And, quoting the fopen(3) man page and as verified by my testing, “[s]ubsequent writes to the file will always end up at the then current end of file, irrespective of any intervening fseek(3) or similar.”

That means that in practice, the problem with the hole in the file, as described in the previous chapter, does not occur now. That explains why on former web server VPSes, I never noticed a physically ever growing log file that content-wise remained at a reasonable number of lines. Its physical size, in bytes or blocks, was also always reasonable.

Sample test output from loglines having opened the file in append mode, again in combination with wrimod:

Writing a line to the logfile, count =   607 (calc'd:   607), position =     50381
Writing a line to the logfile, count =   608 (calc'd:   608), position =     50464
Writing a line to the logfile, count =   609 (calc'd:   609), position =     50547
Writing a line to the logfile, count =   612 (calc'd:   612), position =     50796
Writing a line to the logfile, count =   613 (calc'd:   401), position =     33283
Writing a line to the logfile, count =   614 (calc'd:   402), position =     33366
Writing a line to the logfile, count =   615 (calc'd:   403), position =     33449
Writing a line to the logfile, count =   616 (calc'd:   404), position =     33532

Note the abrupt change in the byte position (append mode adapting to the current end-of-file position) and as a result, in the calculated line number as opposed to the counted line number.

Using the test programs

The source code to my test programs is here and here.

When testing, I usually start loglines as
./loglines &.
"./" is because the current directory is usually not in the PATH and I keep the executable in the same directory as the source, doing the complication simply as
cc loglines.c -o loglines. (Although of course employing a makefile is much better.)

The ampersand & is used to put the program in the background. (To end it prematurely I often use the command killall loglines.)

After letting loglines run for some time, I start ./wrilog -w (for example, other mode options are -m, -c, -p).

Before and after doing its thing, and once again afterwards, wrimod lists the files. I could have used ls -li for that, which lists size and inode of the files, but instead I used stat, which can also show the access, modification, change and birth dates.

I don’t know if other Unixes than FreeBSD, such as various Linux flavours, also have stat(1) and if so, if the details are the same. You may wish to create your own modified testing code if they have not.

FreeBSD, yes please

The file attribute ‘birth date’ I’m sure is specific to FreeBSD, or rather, to the type of file system that this OS comes with by default. For me, this feature is the prime reason to prefer FreeBSD over Linux: with it, I can sort sitemap.xml by modified date, so its order reflects even minor changes like typo corrections or short later additions to articles, while my RSS feeds are sorted by birth date, so only articles that were really freshly written are on top.

Update 9 January 2022: no longer.

Addition 10 November 2014

A quote from the FreeBSD 10.0 man page to find:

“-sparse
True if the current file is sparse, i.e. has fewer blocks allocated than expected based on its size in bytes. This might also match files that have been compressed by the filesystem.”

So what I called a “file with holes” here, is in fact a “sparse file” in technical Unix terms. Now that I see this word again, I vaguely remember having seen it before, in the late nineteen eighties.