UP | HOME

[22feb2024] iostream api rant

Table of Contents

"I wish to register a complaint…"

Working on bespoke streambuf implementation. After diving into details of iostream/streambuf apis, Here's a list of observations/complaints:

1. istream.read() doesn't report the number of bytes/chars read.

Instead of:

istream & istream::read (char_type * s, std::streamsize count);

I'd prefer signature

istream & istream::read (char_type * s, std::streamsize count, std::streamsize * p_gcount);

Developers are expected to use.

std::streamsize istream::gcount () const;

I think this is inferior, since relies on state held by istream, that will be discarded on next read operation.

2. istream.read(s, n) expects always to read n chars.

It sets failbit if less than n chars read.

Apparent alternatives are unsatisfactory:

  1. istream & readsome(s, n) isn't required to do any physical i/o; instead reports what's available already in memory
  2. istream & get(s, n, delim) only reads up to first occurence of delim.
  3. istream & get(s, n) is just a convenience for istream::get(s, n, '\n').
  4. could try writing a loop using combination of istream::sync(), istream::readsome(), but that won't work if istream is actually unbuffered.
  5. istream s; s.rdbuf()->sgetn(s, n) bypasses istream code for sentry object etc, and can't set istream's eofbit.

The following workaround is viable, except that it will read one-byte-at-a-time if input alternates between bytes values '\x0' and '\xff':

template<typename istream>
std::streamsize
read_upto(istream & in, istream::char_type * s, std::streamsize n)
{
    std::streamsize n_read = 0;

    constexpr char c_bits = '\x0'; /*any char value will do here*/

    char delim = c_bits;

    for (; in.good() && !in.eof() && (n_read < n); delim = delim ^ '\xff') {
        // each iteration alternates between {c_bits, ~c_bits} as delimiter,
        // so guarantees at least one byte progress every two iterations

        in.get(s, n, delim);

        std::streamsize nr = in.gcount();
        if (nr > 0) {
            n_read += nr;
            s += nr;
        }
    }

    return n_read;
}

I'd prefer to support this behavior (without the performance-accident-waiting-to-happen) directly from istream.

Another strategy is to use istream::peek() to check for input and istream::readsome() to fetch it

template<typename istream>
std::streamsize
read_upto(istream & in, istream::char_type * s, std::streamsize n)
{
    std::streamsize n_read = 0;

    while (in.good() && !in.eof() && (n_read < n))) {
        in.peek();   /* ensure at least one byte available in streambuf */

        std::streamsize nr = in.readsome(s + n_read, n - n_read);

        n_read += nr;
    }
}

This works if streambuf actually does buffering. It may be very slow if streambuf is unbuffered.

istream::sentry looks interesting, but doesn't do any reading (except to possibly skip whitespace).

gcc 12.2.0's implementation:

template<typename _CharT, typename _Traits>
basic_istream<_CharT, _Traits>::sentry::
sentry(basic_istream<_CharT, _Traits>& __in, bool __noskip) : _M_ok(false)
{
    ios_base::iostate __err = ios_base::goodbit;
    if (__in.good())
    {
        __try
        {
            if (__in.tie())
                __in.tie()->flush();
            if (!__noskip && bool(__in.flags() & ios_base::skipws))
            {
                const __int_type __eof = traits_type::eof();
                __streambuf_type* __sb = __in.rdbuf();
                __int_type __c = __sb->sgetc();

                const __ctype_type& __ct = __check_facet(__in._M_ctype);
                while (!traits_type::eq_int_type(__c, __eof)
                       && __ct.is(ctype_base::space,
                                  traits_type::to_char_type(__c)))
                    __c = __sb->snextc();

                // _GLIBCXX_RESOLVE_LIB_DEFECTS
                // 195. Should basic_istream::sentry's constructor ever
                // set eofbit?
                if (traits_type::eq_int_type(__c, __eof))
                    __err |= ios_base::eofbit;                // (A)
            }
        }
        __catch(__cxxabiv1::__forced_unwind&)
        {
            __in._M_setstate(ios_base::badbit);
            __throw_exception_again;
        }
        __catch(...)
        { __in._M_setstate(ios_base::badbit); }
    }

    if (__in.good() && __err == ios_base::goodbit)            // (B)
        _M_ok = true;
    else
    {
        __err |= ios_base::failbit;                           // (C)
        __in.setstate(__err);
    }
}

with

template<typename _Facet>
inline const _Facet&
__check_facet(const _Facet* __f)
{
    if (!__f)
        __throw_bad_cast();
    return *__f;
}

Note that if __noskipws is false and sentry encounters eof, then the line marked (A) executes –> test (B) fails –> (C) executes, flagging stream as in an 'unrecoverable error state'. The line (A) appears to be mandatory (in spite of the inline comment).

From https://cppreference.com:

explicit sentry( std::basic_istream<CharT, Traits>& is, bool noskipws = false );

Prepares the stream for formatted input.

If is.good() is false, calls is.setstate(std::ios_base::failbit) and returns. Otherwise, if is.tie() is not a null pointer, calls is.tie()->flush() to synchronize the output sequence with external streams. This call can be suppressed if the put area of is.tie() is empty. The implementation may defer the call to flush() until a call of is.rdbuf()->underflow() occurs. If no such call occurs before the sentry object is destroyed, it may be eliminated entirely.

If noskipws is zero and is.flags() & std::ios_base::skipws is nonzero, the function extracts and discards all whitespace characters until the next available character is not a whitespace character (as determined by the currently imbued locale in is). If is.rdbuf()->sbumpc() or is.rdbuf()->sgetc() returns traits::eof(), the function calls setstate(std::ios_base::failbit | std::ios_base::eofbit) (which may throw std::ios_base::failure).

Additional implementation-defined preparation may take place, which may call setstate(std::ios_base::failbit) (which may throw std::ios_base::failure).

If after preparation is completed, is.good() == true, then any subsequent calls to operator bool will return true.

However we can bypass this with __noskip_ set to true:

template<typename istream>
std::streamsize
read_upto(istream & in, istream::char_type * s, std::streamsize n)
{
    istream::sentry sentry(in, true /*noskipws*/);

    std::streamsize n_read = 0;

    if (sentry) {
        try {
            n_read = in.rdbuf()->sgetn(s, n);

            in.setstate(ios::eofbit);
        } catch(__cxxabiv1::__forced_unwind &)  {
            in.setstate(ios::failbit);
            throw;
        } catch(...) {
            in.setstate(ios::failbit);
        }
    }

    return n_read;
}

Another alternative would be to post-process read(), and clear failbit if set along with eofbit:

template<typename istream>
std::streamsize
read_upto(istream & in, istream::char_type * s, std::streamsize n)
    {
        in.read(s, n);

        std::streamsize n_read = in.gcount();

        if ((n_read < n) && in.eof() && in.fail()) {
            /* clear failbit */
            in.clear(in.rdstate() & ~std::ios::failbit);
        }

        return n_read;
    }

3. Iostream get isn't monotonic

iostream.get(s, n, delim) sets failbit if first character matches delim.

This interferes with using iostream.get as building block for a longer i/o sequence;

Tripped over this while writing zstream.read_until for my cmake-examples project:

Instead of:

std::streamsize read_until(char_type * s,
                           std::streamsize n,
                           bool check_delim_flag,
                           char_type delim)
    {
        ...

        std::streamsize nr = 0;

        this->get(s, n, delim);
        nr = this->gcount();

        ...

        return nr;
    }

We need carve-out:

std::streamsize read_until(char_type * s,
                           std::streamsize n,
                           bool check_delim_flag,
                           char_type delim)
    {
        ...

        std::streamsize nr = 0;

        int_type nextc = this->rdbuf_.sgetc();

        if (nextc == Traits::to_int_type(delim)) {
            nr = 0;
        } else {
            this->get(s, n, delim);

            nr = this->gcount();
        }

        ...

        return nr;
    }

4. Iostream position reporting isn't monotonic.

iostream.tellg() and iostream.putg() report current position w.r.t. beginning of stream for input (get) and output (put) respectively.

Unfortunately, they are not monotonic, and code like this is subtly broken:

istream & input = ...; // some binary stream
struct foo part1;
struct foo part2;

istream::pos_type p0 = input.tellg();

input >> part1 >> part2;

istream::pos_type p1 = input.tellg();

istream::pos_type n_read = p1 - p0;

If stream reaches end-of-file at the end of part2, then in fact reading was successful, but p1 will be -1, and n_read will be nonsense.

Presumably this is why iostream.gcount() exists: otherwise there'd be no way to determine how many bytes/chars a preceding read obtained.

A correct (but awkward and error-prone) implementation:

istream & input = ...;
struct foo part1;
struct foo part2;

std::streamsize n_read = 0;

input >> part1;
n_read += input.gcount();

input >> part2;
n_read += input.gcount();

5. Streambuf not responsible for eofbit.

istream.eofbit probably belongs in streambuf. streambuf has to recognize end-of-file anyway, since it's responsible for physical I/O. It might as well record and report it.

6. Stream position reporting from streambuf

It would be simpler for streambuf to support istream::tellg() and istream::tellp() directly instead of relying on streambuf::seekoff(). Argument here is that even for a non-seekable stream buffer, it still makes sense to support tellg() and tellp(). This requires streambuf author to implement at least a restricted version of streambuf::seekoff(), which muddies the waters.

Author: Roland Conybeare

Created: 2024-09-08 Sun 18:49

Validate