Simple word indexer (18)

. Continued from a previous article.

Positioning to a character (5)

Where exactly?

The exact location of the infinite loop in GNU’s fseek, which I wrote about in articles 14, 15, and 17, is now known.

It is in source file libio/wfileops.c, function adjust_wide_data, and the repeated lines are 567, 568, 576, 582. I tested and debugged with GNU glibc version 2.31, but these line numbers also apply to the sources of version 2.34.

In the do-while loop there, the enum variable status always remains __codecvt_partial, so the loop never ends.

Propose a fix?

I cannot propose a fix, because I don’t understand what this part of the code is doing, and why it is necessary in the first place. As I remarked before, as I see it, fseek should not involve any file reading and buffer filling, nor buffer flushing and file writing, and also any conversion of wide or multibyte characters should be unnecessary at this stage. All that fseek does in essence, is to set a position for future reading or writing operations. Whether those operations are wide-character oriented or not, should not make any difference. The implementation should and could be simple, efficient, fast, error-free and easy to maintain.

As I also wrote before, it is easy to say that without fully under­standing all of the code. I am aware of that.

Still, when looking at how fseek is implemented in FreeBSD, I get the impression it is much simpler and cleaner, and GNU in comparison is needlessly complicated. Also, FreeBSD seems to do what it does for fseek, without ever looking whether we are dealing with wide characters or not. And that makes sense to me.

The slowness of ftell is also telling in this respect (pun intended).


Addition 30 September 2021

This is what I call clean code! The library musl by Rich Felker, and others. See also this comparison.

Strangely, this musl library does consider an invalid byte consumed. So my trick using getc() cannot be used here, it skips too much. So in Siworin, I now always do fseek(), unless compiler symbol __GLIBC__ is defined, then I do getc().

Addition 5 November 2022

Later, but probably also before although I had forgotten about it, I found that the C library musl is also at the heart of busybox and Alpine Linux!


To the next article