
What and how?

Two days ago I finally made the improvement I had wanted to make for years. Until then, my visits statistics page showed my most visited pages with the number of visits over an unspecified period. Not very informative, as only comparisons between pages were meaningful, not the visit counts themselves. My new algorithm is:

  1. Get date-time of the oldest log entry considered.
  2. Get date-time of the most recent log entry.
  3. Calculate the difference in seconds.
  4. Count the actual visits to each page in that period.
  5. Multiply those counts by the number of seconds in one week.
  6. Divide the result by the number of seconds found in step 3.

The implementation details are rather complicated. The standard Unix utility date is quite powerful for doing date and time calculations, although sadly, it does not spontaneously understand the default date format of nginx’s logging, which is 20/Feb/2023:09:41:24 +0100 for example. So I specified that format, using the -D option.

This lead to the following code for Bourne shell compatible shells:

FRSTDATE=`zcat /var/log/nginx/access.log.2.gz |
	head -n1 | sed -E 's@.+\[(.+) .+\].+@\1@'`
LASTDATE=`cat  /var/log/nginx/access.log      |
	tail -n1 | sed -E 's@.+\[(.+) .+\].+@\1@'`
# Set today's date in case log is empty, just after a rotate
if test -z $LASTDATE
   LASTDATE=`date "+%d/%b/%Y:%H:%M:%S"`
FRSTSEC=`date -d $FRSTDATE -D "%d/%b/%Y:%H:%M:%S" "+%s"`
LASTSEC=`date -d $LASTDATE -D "%d/%b/%Y:%H:%M:%S" "+%s"`

The variable SECSBETWEEN, containing the number of seconds, I then use in:

awk -v SECS=$SECSBETWEEN '{printf("%.0f %s\n", $1*7*24*3600/SECS, $2);}'

This assumes a format, transformed from the log entries, which contains the number of visits, and the URL between [ ], separated by white space. The idea is that awk recalculates that number of visits to what it would be if the logging considered covered exactly one week.


While writing this article, I find that my solution relies on specifics of Alpine Linux’s implementation of the date utility. The GNU implementation used by Linux Mint (and probably by Debian and Ubuntu too) doesn’t have that -D option. I want everything on my website, i.e. any installation code, and anything that it installs, to be compatible with Debian and derivatives, and with Alpine Linux.

Under the POSIX standard, date can display the system’s date in various formats, and it can set it, given appropriate administrator’s rights. But it cannot convert a given date between formats.

This makes date unusable for me. I don’t want to rely upon non-standard exten­sions that are different between Linux versions.


Solution: I wrote it myself, in C, using strptime and mktime. In my shell script, I replaced the lines:
FRSTSEC=`date -d $FRSTDATE -D "%d/%b/%Y:%H:%M:%S" "+%s"`
LASTSEC=`date -d $LASTDATE -D "%d/%b/%Y:%H:%M:%S" "+%s"`

FRSTSEC=`./fmtd2sec.cgi $FRSTDATE "%d/%b/%Y:%H:%M:%S"`
LASTSEC=`./fmtd2sec.cgi $LASTDATE "%d/%b/%Y:%H:%M:%S"`

The C program can be downloaded for perusal from this link.