[22feb2024] iostream api rant
Table of Contents
"I wish to register a complaint…"
Working on bespoke streambuf implementation. After diving into details of iostream/streambuf apis, Here's a list of observations/complaints:
1. istream.read()
doesn't report the number of bytes/chars read.
Instead of:
istream & istream::read (char_type * s, std::streamsize count);
I'd prefer signature
istream & istream::read (char_type * s, std::streamsize count, std::streamsize * p_gcount);
Developers are expected to use.
std::streamsize istream::gcount () const;
I think this is inferior, since relies on state held by istream
,
that will be discarded on next read operation.
2. istream.read(s, n)
expects always to read n chars.
It sets failbit
if less than n
chars read.
Apparent alternatives are unsatisfactory:
istream & readsome(s, n)
isn't required to do any physical i/o; instead reports what's available already in memoryistream & get(s, n, delim)
only reads up to first occurence ofdelim
.istream & get(s, n)
is just a convenience foristream::get(s, n, '\n')
.- could try writing a loop using combination of
istream::sync()
,istream::readsome()
, but that won't work if istream is actually unbuffered. istream s; s.rdbuf()->sgetn(s, n)
bypassesistream
code for sentry object etc, and can't set istream'seofbit
.
The following workaround is viable, except that it will read one-byte-at-a-time if input alternates between bytes values '\x0'
and '\xff'
:
template<typename istream> std::streamsize read_upto(istream & in, istream::char_type * s, std::streamsize n) { std::streamsize n_read = 0; constexpr char c_bits = '\x0'; /*any char value will do here*/ char delim = c_bits; for (; in.good() && !in.eof() && (n_read < n); delim = delim ^ '\xff') { // each iteration alternates between {c_bits, ~c_bits} as delimiter, // so guarantees at least one byte progress every two iterations in.get(s, n, delim); std::streamsize nr = in.gcount(); if (nr > 0) { n_read += nr; s += nr; } } return n_read; }
I'd prefer to support this behavior (without the performance-accident-waiting-to-happen) directly from istream
.
Another strategy is to use istream::peek()
to check for input and istream::readsome()
to fetch it
template<typename istream> std::streamsize read_upto(istream & in, istream::char_type * s, std::streamsize n) { std::streamsize n_read = 0; while (in.good() && !in.eof() && (n_read < n))) { in.peek(); /* ensure at least one byte available in streambuf */ std::streamsize nr = in.readsome(s + n_read, n - n_read); n_read += nr; } }
This works if streambuf
actually does buffering. It may be very slow if streambuf
is unbuffered.
istream::sentry
looks interesting, but doesn't do any reading (except to possibly skip whitespace).
gcc 12.2.0's implementation:
template<typename _CharT, typename _Traits> basic_istream<_CharT, _Traits>::sentry:: sentry(basic_istream<_CharT, _Traits>& __in, bool __noskip) : _M_ok(false) { ios_base::iostate __err = ios_base::goodbit; if (__in.good()) { __try { if (__in.tie()) __in.tie()->flush(); if (!__noskip && bool(__in.flags() & ios_base::skipws)) { const __int_type __eof = traits_type::eof(); __streambuf_type* __sb = __in.rdbuf(); __int_type __c = __sb->sgetc(); const __ctype_type& __ct = __check_facet(__in._M_ctype); while (!traits_type::eq_int_type(__c, __eof) && __ct.is(ctype_base::space, traits_type::to_char_type(__c))) __c = __sb->snextc(); // _GLIBCXX_RESOLVE_LIB_DEFECTS // 195. Should basic_istream::sentry's constructor ever // set eofbit? if (traits_type::eq_int_type(__c, __eof)) __err |= ios_base::eofbit; // (A) } } __catch(__cxxabiv1::__forced_unwind&) { __in._M_setstate(ios_base::badbit); __throw_exception_again; } __catch(...) { __in._M_setstate(ios_base::badbit); } } if (__in.good() && __err == ios_base::goodbit) // (B) _M_ok = true; else { __err |= ios_base::failbit; // (C) __in.setstate(__err); } }
with
template<typename _Facet> inline const _Facet& __check_facet(const _Facet* __f) { if (!__f) __throw_bad_cast(); return *__f; }
Note that if __noskipws
is false
and sentry encounters eof,
then the line marked (A) executes –> test (B) fails –> (C) executes,
flagging stream as in an 'unrecoverable error state'.
The line (A) appears to be mandatory (in spite of the inline comment).
From https://cppreference.com:
explicit sentry( std::basic_istream<CharT, Traits>& is, bool noskipws = false );
Prepares the stream for formatted input.
If is.good() is false, calls is.setstate(std::ios_base::failbit) and returns. Otherwise, if is.tie() is not a null pointer, calls is.tie()->flush() to synchronize the output sequence with external streams. This call can be suppressed if the put area of is.tie() is empty. The implementation may defer the call to flush() until a call of is.rdbuf()->underflow() occurs. If no such call occurs before the sentry object is destroyed, it may be eliminated entirely.
If noskipws is zero and is.flags() & std::ios_base::skipws is nonzero, the function extracts and discards all whitespace characters until the next available character is not a whitespace character (as determined by the currently imbued locale in is). If is.rdbuf()->sbumpc() or is.rdbuf()->sgetc() returns traits::eof(), the function calls setstate(std::ios_base::failbit | std::ios_base::eofbit) (which may throw std::ios_base::failure).
Additional implementation-defined preparation may take place, which may call setstate(std::ios_base::failbit) (which may throw std::ios_base::failure).
If after preparation is completed, is.good() == true, then any subsequent calls to operator bool will return true.
However we can bypass this with __noskip_
set to true
:
template<typename istream> std::streamsize read_upto(istream & in, istream::char_type * s, std::streamsize n) { istream::sentry sentry(in, true /*noskipws*/); std::streamsize n_read = 0; if (sentry) { try { n_read = in.rdbuf()->sgetn(s, n); in.setstate(ios::eofbit); } catch(__cxxabiv1::__forced_unwind &) { in.setstate(ios::failbit); throw; } catch(...) { in.setstate(ios::failbit); } } return n_read; }
Another alternative would be to post-process read()
, and clear failbit
if set along with eofbit
:
template<typename istream> std::streamsize read_upto(istream & in, istream::char_type * s, std::streamsize n) { in.read(s, n); std::streamsize n_read = in.gcount(); if ((n_read < n) && in.eof() && in.fail()) { /* clear failbit */ in.clear(in.rdstate() & ~std::ios::failbit); } return n_read; }
3. Iostream get isn't monotonic
iostream.get(s, n, delim)
sets failbit
if first character matches delim.
This interferes with using iostream.get
as building block for a longer i/o sequence;
Tripped over this while writing zstream.read_until
for my cmake-examples
project:
Instead of:
std::streamsize read_until(char_type * s, std::streamsize n, bool check_delim_flag, char_type delim) { ... std::streamsize nr = 0; this->get(s, n, delim); nr = this->gcount(); ... return nr; }
We need carve-out:
std::streamsize read_until(char_type * s, std::streamsize n, bool check_delim_flag, char_type delim) { ... std::streamsize nr = 0; int_type nextc = this->rdbuf_.sgetc(); if (nextc == Traits::to_int_type(delim)) { nr = 0; } else { this->get(s, n, delim); nr = this->gcount(); } ... return nr; }
4. Iostream position reporting isn't monotonic.
iostream.tellg()
and iostream.putg()
report current position w.r.t.
beginning of stream for input (get) and output (put) respectively.
Unfortunately, they are not monotonic, and code like this is subtly broken:
istream & input = ...; // some binary stream struct foo part1; struct foo part2; istream::pos_type p0 = input.tellg(); input >> part1 >> part2; istream::pos_type p1 = input.tellg(); istream::pos_type n_read = p1 - p0;
If stream reaches end-of-file at the end of part2
, then in fact reading was successful,
but p1
will be -1
, and n_read
will be nonsense.
Presumably this is why iostream.gcount()
exists: otherwise there'd be no way to
determine how many bytes/chars a preceding read obtained.
A correct (but awkward and error-prone) implementation:
istream & input = ...; struct foo part1; struct foo part2; std::streamsize n_read = 0; input >> part1; n_read += input.gcount(); input >> part2; n_read += input.gcount();
5. Streambuf not responsible for eofbit.
istream.eofbit
probably belongs in streambuf
.
streambuf
has to recognize end-of-file anyway, since it's responsible for physical I/O.
It might as well record and report it.
6. Stream position reporting from streambuf
It would be simpler for streambuf
to support istream::tellg()
and istream::tellp()
directly instead of relying on streambuf::seekoff()
.
Argument here is that even for a non-seekable stream buffer, it still makes sense to support tellg()
and tellp()
.
This requires streambuf author to implement at least a restricted version of streambuf::seekoff()
,
which muddies the waters.