Standard IO Buffering in Rust and C
I know that in Rust, the FileDesc struct uses libc::read and libc::write to read and write data to the OS kernel’s buffer. This FileDesc struct is used in stdin, stdout, stderr, file, or networking, basically anything has a file descriptor associated with it.
I also know that libc library’s implementations had its own buffer in user space, different from the kernel’s buffer (which is in kernel space). The purpose is to reduce excessive system calls when doing frequent reads/writes with a file descriptor. My question was: does Rust’s std library delegate the user-space buffer initialization and management to libc or does it take care of that itself?
Disclaimer:
- Even though this blog post focuses on standard input/output (stdin/stdout) only, the same general principles apply to file IO as well, since they all use file descriptors under the hood.
- This article is based on my personal exploration of the Rust and C source code. I may have misunderstood some parts of the implementation. If you find any mistakes, please let me know!
Rust IO Buffering
Stdout
I decided to dig through the Rust source code for std::io::Stdout. The Stdout handle that we use is a wrapper that handles thread safety & line buffering for the underlying unbuffered, unsynchronized StdoutRaw struct:
// io/stdio.rs
pub struct Stdout {
// FIXME: this should be LineWriter or BufWriter depending on the state of
// stdout (tty or not). Note that if this is not line buffered it
// should also flush-on-panic or some form of flush-on-abort.
inner: &'static ReentrantLock<RefCell<LineWriter<StdoutRaw>>>,
}
ReentrantLock is a thread locker like Mutex, but allows an already locking thread to lock multiple times, which would normally result in a deadlock with a Mutex.
Buffering implementation for write
LineWriter wraps a BufWriter, while BufWriter holds a Vec<u8> buffer:
// io/buffered/linewriter.rs
pub struct LineWriter<W: ?Sized + Write> {
inner: BufWriter<W>,
}
// io/buffered/bufwriter.rs
pub struct BufWriter<W: ?Sized + Write> {
// The buffer. Avoid using this like a normal `Vec` in common code paths.
// That is, don't use `buf.push`, `buf.extend_from_slice`, or any other
// methods that require bounds checking or the like. This makes an enormous
// difference to performance (we may want to stop using a `Vec` entirely).
buf: Vec<u8>,
// #30888: If the inner writer panics in a call to write, we don't want to
// write the buffered data a second time in BufWriter's destructor. This
// flag tells the Drop impl if it should skip the flush.
panicked: bool,
inner: W,
}
LineWriter::write initializes a LineWriterShim on the inner BufWriter, which handles writing the given &[u8] data to its inner BufWriter’s Vec<u8> buffer:
// io/buffered/linewriter.rs
impl<W: ?Sized + Write> Write for LineWriter<W> {
fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
LineWriterShim::new(&mut self.inner).write(buf)
}
}
LineWriterShim::write will ensure that: in BufWriter’s buf, the newline character can only exist at the end of the buffered data, or there is no newline at all. So that if a newline is found at the end of the currently buffered data, the buffer will be flushed first (via BufWriter::flush_buf), before new data is written to the buffer.
If the new data contains a newline, LineWriterShim will only write the data to BufWriter up to that newline (inclusive). The rest will be buffered in BufWriter’s buf.
Because of this line-buffering behavior of LineWriter, Stdout will hold the written data in memory until:
- a newline is written,
- or
BufWriter’sbufis full and needs to flush the currently buffered data.
LineWriter’s default buffer size is 1024, while BufWriter’s default buffer size when used alone is 8192 (or 512 on bare metal 😀):
// io/buffered/linewriter.rs
impl<W: Write> LineWriter<W> {
pub fn new(inner: W) -> LineWriter<W> {
// Lines typically aren't that long, don't use a giant buffer
LineWriter::with_capacity(1024, inner)
}
}
// io/buffered/bufwriter.rs
impl<W: Write> BufWriter<W> {
pub fn new(inner: W) -> BufWriter<W> {
BufWriter::with_capacity(DEFAULT_BUF_SIZE, inner)
}
}
So, what do you do when you want to print a partial line (text with no endling newline) to the terminal, when prompting user input for example? You have to tell the Stdout to write BufWriter’s buffered data to stdout (a.k.a via file descriptor 1). People usually call this flushing. In Rust, Stdout::flush has to go through several flush() calls to eventually reach the BufWriter::flush method, which does this:
// io/buffered/bufwriter.rs
impl<W: ?Sized + Write> BufWriter<W> {
pub fn get_mut(&mut self) -> &mut W {
&mut self.inner
}
}
impl<W: ?Sized + Write> Write for BufWriter<W> {
fn flush(&mut self) -> io::Result<()> {
self.flush_buf().and_then(|()| self.get_mut().flush())
}
}
What else to do besides flushing BufWriter??
We already know that flush_buf() flushes its own data buffer, but what does self.get_mut().flush() do?
Remember that in Stdout, LineWriter wraps StdoutRaw, so W in BufWriter means StdoutRaw. Let’s go to StdoutRaw’s Write implementation and see!
// io/stdio.rs
struct StdoutRaw(stdio::Stdout);
impl Write for StdoutRaw {
fn flush(&mut self) -> io::Result<()> {
handle_ebadf(self.0.flush(), || Ok(()))
}
}
Yet another wrapper 🥲. This time it’s std::sys::stdio::Stdout (private), not std::io::Stdout (public). Let’s see how stdio::Stdout implements flush:
// sys/stdio/unix.rs
impl io::Write for Stdout {
fn write(&mut self, buf: &[u8]) -> io::Result<usize> {
unsafe { ManuallyDrop::new(FileDesc::from_raw_fd(STDOUT_FILENO)).write(buf) }
}
fn write_vectored(&mut self, bufs: &[IoSlice<'_>]) -> io::Result<usize> {
unsafe { ManuallyDrop::new(FileDesc::from_raw_fd(STDOUT_FILENO)).write_vectored(bufs) }
}
#[inline]
fn is_write_vectored(&self) -> bool {
true
}
#[inline]
fn flush(&mut self) -> io::Result<()> {
Ok(())
}
}
WTF??? Nothing happens in flush??? Why?
This is where I got really confused. If libc has an internal buffer for IO, wouldn’t stdio::Stdout need to call a libc function that flushes its internal buffer? Does stdio::Stdout::write even buffer the write at all? Seeing that stdio::Stdout uses FileDesc::write method, let’s go and see:
// sys/fd.rs
impl FileDesc {
pub fn write(&self, buf: &[u8]) -> io::Result<usize> {
let ret = cvt(unsafe {
libc::write(
self.as_raw_fd(),
buf.as_ptr() as *const libc::c_void,
cmp::min(buf.len(), READ_LIMIT),
)
})?;
Ok(ret as usize)
}
}
So FileDesc::write calls libc::write. Nothing surprising. But the question is: Does libc::write itself initialize a buffer? Why do people say that libc has an internal buffer for IO?
Stdin
A similar story goes for std::io::Stdin, which uses a BufReader that wraps the struct StdinRaw(stdio::Stdin) for buffered reads. And guess how stdio::Stdin implements Read?
// io/stdio.rs
struct StdinRaw(stdio::Stdin);
pub struct Stdin {
inner: &'static Mutex<BufReader<StdinRaw>>,
}
// sys/stdio/unix.rs
impl io::Read for Stdin {
fn read(&mut self, buf: &mut [u8]) -> io::Result<usize> {
unsafe { ManuallyDrop::new(FileDesc::from_raw_fd(STDIN_FILENO)).read(buf) }
}
}
So both Read::read and Write::write implementations for stdio::Stdout and stdio::Stdin use the FileDesc’s read and write implementations. Does FileDesc::read use a libc::read function? Yes!
// sys/fd.rs
impl FileDesc {
pub fn read(&self, buf: &mut [u8]) -> io::Result<usize> {
let ret = cvt(unsafe {
libc::read(
self.as_raw_fd(),
buf.as_mut_ptr() as *mut libc::c_void,
cmp::min(buf.len(), READ_LIMIT),
)
})?;
Ok(ret as usize)
}
}
But does libc::read or libc::write initialize buffers? If not, how do functions in libc like printf, fwrite, fread, and many others have buffered IO? With these questions flying around in my head, I decided to look at glibc’s source code to find out myself 🫠. Brace yourself. It’s gonna be quite a journey.
C IO Buffering
read and write do NOT initialize any buffers
For C programs using libc for efficient file IO, libc must have had buffering implemented somewhere. However, I was surprised when I found no such thing in write implementation:
#include <unistd.h>
#include <sysdep-cancel.h>
/* Write NBYTES of BUF to FD. Return the number written, or -1. */
ssize_t
__libc_write (int fd, const void *buf, size_t nbytes)
{
return SYSCALL_CANCEL (write, fd, buf, nbytes);
}
libc_hidden_def (__libc_write)
weak_alias (__libc_write, __write)
libc_hidden_weak (__write)
weak_alias (__libc_write, write)
libc_hidden_weak (write)
__libc_write only involves a system call, and is aliased to write. No user-space buffering found. What about read? Turns out it’s the same story:
#include <unistd.h>
#include <sysdep-cancel.h>
/* Read NBYTES into BUF from FD. Return the number read or -1. */
ssize_t
__libc_read (int fd, void *buf, size_t nbytes)
{
return SYSCALL_CANCEL (read, fd, buf, nbytes);
}
libc_hidden_def (__libc_read)
libc_hidden_def (__read)
weak_alias (__libc_read, __read)
libc_hidden_def (read)
weak_alias (__libc_read, read)
__libc_read only involves a system call, and is aliased to read. So the libc::read and libc::write functions used in the Rust code actually do not do any buffering at all! They just call the system calls directly, writing data straight to the kernel’s buffer. This means that stdio::Stdout::flush being a no-op makes sense, as buffering is done by the BufWriter already. In fact, I later found out that most implementations of std::io::Write either call their inner structs or are just no-ops. The exceptions that I know of are BufWriter and LineWriter.
So if libc’s read and write do not do any buffering, buffering has to be managed by the callers of these functions. There would be a lot of them I suppose.
File pointers initialize buffers?
Looking at fwrite and fread, I see that a common function argument FILE *fp being used extensively in the code. May this FILE struct be where buffering is implemented? What does the FILE struct look like?
// libio/bits/types/struct_FILE.h
/* The tag name of this struct is _IO_FILE to preserve historic
C++ mangled names for functions taking FILE* arguments.
That name should not be used in new code. */
struct _IO_FILE
{
int _flags; /* High-order word is _IO_MAGIC; rest is flags. */
/* The following pointers correspond to the C++ streambuf protocol. */
char *_IO_read_ptr; /* Current read pointer */
char *_IO_read_end; /* End of get area. */
char *_IO_read_base; /* Start of putback+get area. */
char *_IO_write_base; /* Start of put area. */
char *_IO_write_ptr; /* Current put pointer. */
char *_IO_write_end; /* End of put area. */
char *_IO_buf_base; /* Start of reserve area. */
char *_IO_buf_end; /* End of reserve area. */
/* The following fields are used to support backing up and undo. */
char *_IO_save_base; /* Pointer to start of non-current get area. */
char *_IO_backup_base; /* Pointer to first valid character of backup area */
char *_IO_save_end; /* Pointer to end of non-current get area. */
struct _IO_marker *_markers;
struct _IO_FILE *_chain;
int _fileno;
int _flags2:24;
/* Fallback buffer to use when malloc fails to allocate one. */
char _short_backupbuf[1];
__off_t _old_offset; /* This used to be _offset but it's too small. */
/* 1+column number of pbase(); 0 is unknown. */
unsigned short _cur_column;
signed char _vtable_offset;
char _shortbuf[1];
_IO_lock_t *_lock;
#ifdef _IO_USE_OLD_IO_FILE
};
struct _IO_FILE_complete
{
struct _IO_FILE _file;
#endif
__off64_t _offset;
/* Wide character stream stuff. */
struct _IO_codecvt *_codecvt;
struct _IO_wide_data *_wide_data;
struct _IO_FILE *_freeres_list;
void *_freeres_buf;
struct _IO_FILE **_prevchain;
int _mode;
#if __WORDSIZE == 64
int _unused3;
#endif
__uint64_t _total_written;
#if __WORDSIZE == 32
int _unused3;
#endif
/* Make sure we don't get into trouble again. */
char _unused2[12 * sizeof (int) - 5 * sizeof (void *)];
};
// libio/bits/types/FILE.h
struct _IO_FILE;
/* The opaque type of streams. This is the definition used elsewhere. */
typedef struct _IO_FILE FILE;
Okay that’s A LOT of stuff, but we do see byte pointers _IO_buf_base and _IO_buf_end being used to point to the buffer associated with the FILE. We also see _total_written in FILE (only when _IO_USE_OLD_IO_FILE is set). So it looks like there is a buffering mechanism here.
But before diving more, let’s start with a few simple C programs to confirm buffering does happen.
Stdout buffering behavior
// stdout.c
#include <stdio.h>
#include <string.h>
#include <unistd.h>
void print_text() {
const char *text = "This is a sample text. ";
while (1) {
fwrite(text, 1, strlen(text), stdout); // write the text to stdout
sleep(1);
}
}
int main() {
print_text();
return 0;
}
When you compile and run this program in the terminal, you’ll see that nothing is printed there.
❯ gcc stdout.c -o stdout && ./stdout
^C
Now let’s try adding \n at the end of text:
#include <stdio.h>
#include <string.h>
#include <unistd.h>
void print_text() {
const char *text = "This is a sample text.\n"; // added newline
while (1) {
fwrite(text, 1, strlen(text), stdout); // write the text to stdout
sleep(1);
}
}
int main() {
print_text();
return 0;
}
Now we see text being printed line by line.
❯ gcc stdout.c -o stdout && ./stdout
This is a sample text.
This is a sample text.
This is a sample text.
This is a sample text.
This is a sample text.
This is a sample text.
This is a sample text.
This is a sample text.
^C
Let’s try adding a newline to text for every 3 iterations:
#include <stdio.h>
#include <string.h>
#include <unistd.h>
void print_text() {
const char *text = "This is a sample text. ";
const char *text_newline = "This is a sample text.\n";
int i = 0;
while (1) {
i++;
if (i % 3 == 0) {
fwrite(text_newline, 1, strlen(text_newline), stdout);
} else {
fwrite(text, 1, strlen(text), stdout);
}
sleep(1);
}
}
int main() {
print_text();
return 0;
}
You will see that a new line of text will appear every 3 seconds:
❯ gcc stdout.c -o stdout && ./stdout
This is a sample text. This is a sample text. This is a sample text.
This is a sample text. This is a sample text. This is a sample text.
This is a sample text. This is a sample text. This is a sample text.
^C
You can also flush the data written to stdout:
#include <stdio.h>
#include <string.h>
#include <unistd.h>
void print_text() {
const char *text = "This is a sample text. ";
while (1) {
fwrite(text, 1, strlen(text), stdout);
fflush(stdout);
sleep(1);
}
}
int main() {
print_text();
return 0;
}
New text appears every 1 second:
❯ gcc stdout.c -o stdout && ./stdout
This is a sample text. This is a sample text. This is a sample text. This is a sample text. This is a sample text. This is a sample text. This is a sample te
xt. This is a sample text. This is a sample text. ^C
Note that using printf, which writes data to stdout, also has that line-buffering behavior. You can replace the fwrite(text, 1, strlen(text), stdout) line with printf(text) and the results will be the same.
Interestingly, making a read from stdin will flush all existing content from stdout for you. This is useful when prompting for user input, for example:
#include <stdio.h>
#include <string.h>
#define NAME_SIZE 5
void ask_name() {
char name[NAME_SIZE + 1];
printf("Enter your name ");
printf("(max %d characters): ", NAME_SIZE);
// ^ All stdout data still blocked at this point
fgets(name, NAME_SIZE + 2, stdin); // This will flush the stdout buffered data
// remove newline if present (not if input exceeds buffer or sees EOF)
size_t len = strlen(name);
if (len > 0 && name[len - 1] == '\n') {
name[len - 1] = '\0';
}
printf("Hello, %s\n", name);
}
int main() {
ask_name();
return 0;
}
❯ gcc stdout.c -o stdout && ./stdout
Enter your name (max 3 characters): 123
Hello, 123
So fwrite does buffer its data, and automatically flushes the current data when it sees a newline character. And we have fflush to send the current buffered data to its destination (the kernel’s buffer). Where in glibc code does it do this kind of buffering?
Buffer allocation
From looking around the codebase, I came across this fileops.c and genops.c which contain various functions for FILE struct. I have discovered things like _IO_default_allocate, _IO_setb, _IO_doallocbuf, and _IO_file_doallocate, which look like they are associated with buffer initialization.
First, take a look at _IO_file_doallocate:
// libio/filedoalloc.c
/* Allocate a file buffer, or switch to unbuffered I/O. Streams for
TTY devices default to line buffered. */
int
_IO_file_doallocate (FILE *fp)
{
size_t size;
char *p;
struct __stat64_t64 st;
size = BUFSIZ;
if (fp->_fileno >= 0 && __builtin_expect (_IO_SYSSTAT (fp, &st), 0) >= 0)
{
if (S_ISCHR (st.st_mode))
{
/* Possibly a tty. */
if (
#ifdef DEV_TTY_P
DEV_TTY_P (&st) ||
#endif
__isatty_nostatus (fp->_fileno))
fp->_flags |= _IO_LINE_BUF;
}
#if defined _STATBUF_ST_BLKSIZE
if (st.st_blksize > 0 && st.st_blksize < BUFSIZ)
size = st.st_blksize;
#endif
}
p = malloc (size);
if (__glibc_unlikely (p == NULL))
return EOF;
_IO_setb (fp, p, p + size, 1);
return 1;
}
libc_hidden_def (_IO_file_doallocate)
// libio/genops.c
void
_IO_setb (FILE *f, char *b, char *eb, int a)
{
if (f->_IO_buf_base && !(f->_flags & _IO_USER_BUF))
free (f->_IO_buf_base);
f->_IO_buf_base = b;
f->_IO_buf_end = eb;
if (a)
f->_flags &= ~_IO_USER_BUF;
else
f->_flags |= _IO_USER_BUF;
}
libc_hidden_def (_IO_setb)
Looks like _IO_file_doallocate is where given a FILE *fp, we allocate the buffer associated with this file (default is BUFSIZ) and attach the FILE’s _IO_buf_base and _IO_buf_end. Makes sense. How would a function that we use like fwrite trigger this function though?
fwrite implementation
fwrite definition is defined in iofwrite.c:
size_t
_IO_fwrite (const void *buf, size_t size, size_t count, FILE *fp)
{
size_t request = size * count;
size_t written = 0;
CHECK_FILE (fp, 0);
if (request == 0)
return 0;
_IO_acquire_lock (fp);
if (_IO_vtable_offset (fp) != 0 || _IO_fwide (fp, -1) == -1)
{
/* Compute actually written bytes plus pending buffer
contents. */
uint64_t original_total_written
= fp->_total_written + (fp->_IO_write_ptr - fp->_IO_write_base);
written = _IO_sputn (fp, (const char *) buf, request);
if (written == EOF)
{
/* An error happened and we need to find the appropriate return
value. There 3 possible scenarios:
1. If the number of bytes written is between 0..[buffer content],
we need to return 0 because none of the bytes from this
request have been written;
2. If the number of bytes written is between
[buffer content]+1..request-1, that means we managed to write
data requested in this fwrite call;
3. We might have written all the requested data and got an error
anyway. We can't return success, which means we still have to
return less than request. */
if (fp->_total_written > original_total_written)
{
written = fp->_total_written - original_total_written;
/* If everything was reported as written and somehow an
error occurred afterwards, avoid reporting success. */
if (written == request)
--written;
}
else
/* Only already-pending buffer contents was written. */
written = 0;
}
}
_IO_release_lock (fp);
/* We have written all of the input in case the return value indicates
this. */
if (written == request)
return count;
else
return written / size;
}
libc_hidden_def (_IO_fwrite)
# include <stdio.h>
weak_alias (_IO_fwrite, fwrite)
libc_hidden_weak (fwrite)
# ifndef _IO_MTSAFE_IO
weak_alias (_IO_fwrite, fwrite_unlocked)
libc_hidden_weak (fwrite_unlocked)
# endif
_IO_fwrite is aliased as fwrite, probably to hide the internal implementation. I don’t really know what _IO_vtable_offset and _IO_fwide are for, but basically we lock the file pointer first, write all the bytes from the buf, handle errors, release lock, and then return the number of items in buf written. So the main stuff in this function is this line:
written = _IO_sputn (fp, (const char *) buf, request);
V-table look-up
_IO_sputn is a macro that expands eventually to a v-table lookup to get the actual function. _IO_JUMPS_FUNC returns a v-table for a file pointer FILE*, and the name we look for in that v-table is __xsputn.
// libio/libioP.h
#define JUMP_FIELD(TYPE, NAME) TYPE NAME
struct _IO_jump_t
{
JUMP_FIELD(size_t, __dummy);
JUMP_FIELD(size_t, __dummy2);
JUMP_FIELD(_IO_finish_t, __finish);
JUMP_FIELD(_IO_overflow_t, __overflow);
JUMP_FIELD(_IO_underflow_t, __underflow);
JUMP_FIELD(_IO_underflow_t, __uflow);
JUMP_FIELD(_IO_pbackfail_t, __pbackfail);
/* showmany */
JUMP_FIELD(_IO_xsputn_t, __xsputn);
JUMP_FIELD(_IO_xsgetn_t, __xsgetn);
JUMP_FIELD(_IO_seekoff_t, __seekoff);
JUMP_FIELD(_IO_seekpos_t, __seekpos);
JUMP_FIELD(_IO_setbuf_t, __setbuf);
JUMP_FIELD(_IO_sync_t, __sync);
JUMP_FIELD(_IO_doallocate_t, __doallocate);
JUMP_FIELD(_IO_read_t, __read);
JUMP_FIELD(_IO_write_t, __write);
JUMP_FIELD(_IO_seek_t, __seek);
JUMP_FIELD(_IO_close_t, __close);
JUMP_FIELD(_IO_stat_t, __stat);
JUMP_FIELD(_IO_showmanyc_t, __showmanyc);
JUMP_FIELD(_IO_imbue_t, __imbue);
};
/* Perform vtable pointer validation. If validation fails, terminate
the process. */
static inline const struct _IO_jump_t *
IO_validate_vtable (const struct _IO_jump_t *vtable)
{
uintptr_t ptr = (uintptr_t) vtable;
uintptr_t offset = ptr - (uintptr_t) &__io_vtables;
if (__glibc_unlikely (offset >= IO_VTABLES_LEN))
/* The vtable pointer is not in the expected section. Use the
slow path, which will terminate the process if necessary. */
_IO_vtable_check ();
return vtable;
}
# define _IO_JUMPS_FUNC(THIS) \
(IO_validate_vtable \
(*(struct _IO_jump_t **) ((void *) &_IO_JUMPS_FILE_plus (THIS) \
+ (THIS)->_vtable_offset)))
#define JUMP2(FUNC, THIS, X1, X2) (_IO_JUMPS_FUNC(THIS)->FUNC) (THIS, X1, X2)
#define _IO_XSPUTN(FP, DATA, N) JUMP2 (__xsputn, FP, DATA, N)
#define _IO_sputn(__fp, __s, __n) _IO_XSPUTN (__fp, __s, __n)
Now we need to know how _IO_JUMPS_FUNC gets the v-table for a given file pointer:
// libio/libioP.h
struct _IO_FILE_plus
{
FILE file;
const struct _IO_jump_t *vtable;
};
/* Type of MEMBER in struct type TYPE. */
#define _IO_MEMBER_TYPE(TYPE, MEMBER) __typeof__ (((TYPE){}).MEMBER)
/* Essentially ((TYPE *) THIS)->MEMBER, but avoiding the aliasing
violation in case THIS has a different pointer type. */
#define _IO_CAST_FIELD_ACCESS(THIS, TYPE, MEMBER) \
(*(_IO_MEMBER_TYPE (TYPE, MEMBER) *)(((char *) (THIS)) \
+ offsetof(TYPE, MEMBER)))
#define _IO_JUMPS_FILE_plus(THIS) \
_IO_CAST_FIELD_ACCESS ((THIS), struct _IO_FILE_plus, vtable)
Hmm… So it just casts a FILE* a _IO_FILE_plus* and get its vtable. This means the FILE* fd passed as argument in _IO_fwrite has to stay next to a const struct _IO_jump_t* in memory. From this it must mean that: any FILE* that will be given to the user has to be initialized in the _IO_FILE_plus struct. In order to confirm this, let’s find out how we get the global stdout to use in our C example.
How is a FILE associated with a v-table?
// libio/stdio.c
#include "libioP.h"
#include "stdio.h"
#undef stdin
#undef stdout
#undef stderr
FILE *stdin = (FILE *) &_IO_2_1_stdin_;
FILE *stdout = (FILE *) &_IO_2_1_stdout_;
FILE *stderr = (FILE *) &_IO_2_1_stderr_;
// libio/stdfiles.c
#include "libioP.h"
#ifdef _IO_MTSAFE_IO
# define DEF_STDFILE(NAME, FD, CHAIN, FLAGS) \
static _IO_lock_t _IO_stdfile_##FD##_lock = _IO_lock_initializer; \
static struct _IO_wide_data _IO_wide_data_##FD \
= { ._wide_vtable = &_IO_wfile_jumps }; \
struct _IO_FILE_plus NAME \
= {FILEBUF_LITERAL(CHAIN, FLAGS, FD, &_IO_wide_data_##FD), \
&_IO_file_jumps};
#else
# define DEF_STDFILE(NAME, FD, CHAIN, FLAGS) \
static struct _IO_wide_data _IO_wide_data_##FD \
= { ._wide_vtable = &_IO_wfile_jumps }; \
struct _IO_FILE_plus NAME \
= {FILEBUF_LITERAL(CHAIN, FLAGS, FD, &_IO_wide_data_##FD), \
&_IO_file_jumps};
#endif
DEF_STDFILE(_IO_2_1_stdin_, 0, 0, _IO_NO_WRITES);
DEF_STDFILE(_IO_2_1_stdout_, 1, &_IO_2_1_stdin_, _IO_NO_READS);
DEF_STDFILE(_IO_2_1_stderr_, 2, &_IO_2_1_stdout_, _IO_NO_READS+_IO_UNBUFFERED);
From stdfiles.c we can see that for the globals stdin, stdout, and stderr, their respective _IO_2_1_stdin_, _IO_2_1_stdout_, and _IO_2_1_stderr_ are initialized as _IO_FILE_plus structs. We also see that the file descriptor is 0 for stdin , 1 for stdout, and 2 for stderr. Interestingly, stdin is chained with stdout (by assigning the struct _IO_FILE *_chain member of FILE). Maybe this attributes to how we see the behavior in here. Also, stderr has _IO_UNBUFFERED flag, which explains the unbuffered IO behavior of stderr, unlike stdout.
You might guess the _IO_file_jumps is the v-table assigned to all the 3 file structs. In fact, it is:
// libio/libioP.h
#define JUMP_INIT(NAME, VALUE) VALUE
#define JUMP_INIT_DUMMY JUMP_INIT(dummy, 0), JUMP_INIT (dummy2, 0)
enum
{
IO_STR_JUMPS = 0,
IO_WSTR_JUMPS = 1,
IO_FILE_JUMPS = 2,
IO_FILE_JUMPS_MMAP = 3,
IO_FILE_JUMPS_MAYBE_MMAP = 4,
IO_WFILE_JUMPS = 5,
IO_WFILE_JUMPS_MMAP = 6,
IO_WFILE_JUMPS_MAYBE_MMAP = 7,
IO_COOKIE_JUMPS = 8,
IO_PROC_JUMPS = 9,
IO_MEM_JUMPS = 10,
IO_WMEM_JUMPS = 11,
IO_PRINTF_BUFFER_AS_FILE_JUMPS = 12,
IO_WPRINTF_BUFFER_AS_FILE_JUMPS = 13,
#if SHLIB_COMPAT (libc, GLIBC_2_0, GLIBC_2_1)
IO_OLD_FILE_JUMPS = 14,
IO_OLD_PROC_JUMPS = 15,
IO_OLD_COOKIED_JUMPS = 16,
IO_VTABLES_NUM = IO_OLD_COOKIED_JUMPS + 1,
#elif SHLIB_COMPAT (libc, GLIBC_2_0, GLIBC_2_2)
IO_OLD_COOKIED_JUMPS = 14,
IO_VTABLES_NUM = IO_OLD_COOKIED_JUMPS + 1,
#else
IO_VTABLES_NUM = IO_WPRINTF_BUFFER_AS_FILE_JUMPS + 1
#endif
};
#define _IO_file_jumps (__io_vtables[IO_FILE_JUMPS])
// libio/vtbles.c
#include <libioP.h>
const struct _IO_jump_t __io_vtables[] attribute_relro =
{
// ...
/* _IO_file_jumps */
[IO_FILE_JUMPS] = {
JUMP_INIT_DUMMY,
JUMP_INIT (finish, _IO_file_finish),
JUMP_INIT (overflow, _IO_file_overflow),
JUMP_INIT (underflow, _IO_file_underflow),
JUMP_INIT (uflow, _IO_default_uflow),
JUMP_INIT (pbackfail, _IO_default_pbackfail),
JUMP_INIT (xsputn, _IO_file_xsputn),
JUMP_INIT (xsgetn, _IO_file_xsgetn),
JUMP_INIT (seekoff, _IO_new_file_seekoff),
JUMP_INIT (seekpos, _IO_default_seekpos),
JUMP_INIT (setbuf, _IO_new_file_setbuf),
JUMP_INIT (sync, _IO_new_file_sync),
JUMP_INIT (doallocate, _IO_file_doallocate),
JUMP_INIT (read, _IO_file_read),
JUMP_INIT (write, _IO_new_file_write),
JUMP_INIT (seek, _IO_file_seek),
JUMP_INIT (close, _IO_file_close),
JUMP_INIT (stat, _IO_file_stat),
JUMP_INIT (showmanyc, _IO_default_showmanyc),
JUMP_INIT (imbue, _IO_default_imbue)
},
// ...
};
__io_vtables holds all functions for different kinds of IO. I guess this is how people implement polymorphism for FILE in C. Every FILE pointer is created so that it sits next to a pointer to a corresponding v-table in memory.
Coming back to xsputn, we can see that the xsputn member declaration aligns in both __io_vtables[IO_FILE_JUMPS] and _IO_jump_t definition. So we are confident that the _IO_sputn call for stdout leads to _IO_file_xsputn. Let’s see how _IO_file_xsputn is implemented.
Buffering implementation for write
// libio/fileops.c
size_t
_IO_new_file_xsputn (FILE *f, const void *data, size_t n)
{
const char *s = (const char *) data;
size_t to_do = n;
int must_flush = 0;
size_t count = 0;
if (n <= 0)
return 0;
/* This is an optimized implementation.
If the amount to be written straddles a block boundary
(or the filebuf is unbuffered), use sys_write directly. */
/* First figure out how much space is available in the buffer. */
if ((f->_flags & _IO_LINE_BUF) && (f->_flags & _IO_CURRENTLY_PUTTING))
{
count = f->_IO_buf_end - f->_IO_write_ptr;
if (count >= n)
{
const char *p;
for (p = s + n; p > s; )
{
if (*--p == '\n')
{
count = p - s + 1;
must_flush = 1;
break;
}
}
}
}
else if (f->_IO_write_end > f->_IO_write_ptr)
count = f->_IO_write_end - f->_IO_write_ptr; /* Space available. */
/* Then fill the buffer. */
if (count > 0)
{
if (count > to_do)
count = to_do;
f->_IO_write_ptr = __mempcpy (f->_IO_write_ptr, s, count);
s += count;
to_do -= count;
}
if (to_do + must_flush > 0)
{
size_t block_size, do_write;
/* Next flush the (full) buffer. */
if (_IO_OVERFLOW (f, EOF) == EOF)
/* If nothing else has to be written we must not signal the
caller that everything has been written. */
return to_do == 0 ? EOF : n - to_do;
/* Try to maintain alignment: write a whole number of blocks. */
block_size = f->_IO_buf_end - f->_IO_buf_base;
do_write = to_do - (block_size >= 128 ? to_do % block_size : 0);
if (do_write)
{
count = new_do_write (f, s, do_write);
to_do -= count;
if (count < do_write)
return n - to_do;
}
/* Now write out the remainder. Normally, this will fit in the
buffer, but it's somewhat messier for line-buffered files,
so we let _IO_default_xsputn handle the general case. */
if (to_do)
to_do -= _IO_default_xsputn (f, s+do_write, to_do);
}
return n - to_do;
}
libc_hidden_ver (_IO_new_file_xsputn, _IO_file_xsputn)
// libio/libioP.h
#define _IO_OVERFLOW(FP, CH) JUMP1 (__overflow, FP, CH)
// libio/genops.c
size_t
_IO_default_xsputn (FILE *f, const void *data, size_t n)
{
const char *s = (char *) data;
size_t more = n;
if (more <= 0)
return 0;
for (;;)
{
/* Space available. */
if (f->_IO_write_ptr < f->_IO_write_end)
{
size_t count = f->_IO_write_end - f->_IO_write_ptr;
if (count > more)
count = more;
if (count > 20)
{
f->_IO_write_ptr = __mempcpy (f->_IO_write_ptr, s, count);
s += count;
}
else if (count)
{
char *p = f->_IO_write_ptr;
ssize_t i;
for (i = count; --i >= 0; )
*p++ = *s++;
f->_IO_write_ptr = p;
}
more -= count;
}
if (more == 0 || _IO_OVERFLOW (f, (unsigned char) *s++) == EOF)
break;
more--;
}
return n - more;
}
libc_hidden_def (_IO_default_xsputn)
We can see how _IO_new_file_xsputn implements buffering (or line buffering when _IO_LINE_BUF and _IO_CURRENTLY_PUTTING are set). But it’s not in either _IO_new_file_xsputn or _IO_default_xsputn that implements buffer initialization or flushing. So it can only be _IO_OVERFLOW, which, through a v-table look-up logic similar to _IO_sputn, calls _IO_file_overflow.
Finally, buffer initialization found??
// libio/fileops.c
int
_IO_new_file_overflow (FILE *f, int ch)
{
if (f->_flags & _IO_NO_WRITES) /* SET ERROR */
{
f->_flags |= _IO_ERR_SEEN;
__set_errno (EBADF);
return EOF;
}
/* If currently reading or no buffer allocated. */
if ((f->_flags & _IO_CURRENTLY_PUTTING) == 0 || f->_IO_write_base == NULL)
{
/* Allocate a buffer if needed. */
if (f->_IO_write_base == NULL)
{
_IO_doallocbuf (f);
_IO_setg (f, f->_IO_buf_base, f->_IO_buf_base, f->_IO_buf_base);
}
/* Otherwise must be currently reading.
If _IO_read_ptr (and hence also _IO_read_end) is at the buffer end,
logically slide the buffer forwards one block (by setting the
read pointers to all point at the beginning of the block). This
makes room for subsequent output.
Otherwise, set the read pointers to _IO_read_end (leaving that
alone, so it can continue to correspond to the external position). */
if (__glibc_unlikely (_IO_in_backup (f)))
{
size_t nbackup = f->_IO_read_end - f->_IO_read_ptr;
_IO_free_backup_area (f);
f->_IO_read_base -= MIN (nbackup,
f->_IO_read_base - f->_IO_buf_base);
f->_IO_read_ptr = f->_IO_read_base;
}
if (f->_IO_read_ptr == f->_IO_buf_end)
f->_IO_read_end = f->_IO_read_ptr = f->_IO_buf_base;
f->_IO_write_ptr = f->_IO_read_ptr;
f->_IO_write_base = f->_IO_write_ptr;
f->_IO_write_end = f->_IO_buf_end;
f->_IO_read_base = f->_IO_read_ptr = f->_IO_read_end;
f->_flags |= _IO_CURRENTLY_PUTTING;
if (f->_mode <= 0 && f->_flags & (_IO_LINE_BUF | _IO_UNBUFFERED))
f->_IO_write_end = f->_IO_write_ptr;
}
if (ch == EOF)
return _IO_do_write (f, f->_IO_write_base,
f->_IO_write_ptr - f->_IO_write_base);
if (f->_IO_write_ptr == f->_IO_buf_end ) /* Buffer is really full */
if (_IO_do_flush (f) == EOF)
return EOF;
*f->_IO_write_ptr++ = ch;
if ((f->_flags & _IO_UNBUFFERED)
|| ((f->_flags & _IO_LINE_BUF) && ch == '\n'))
if (_IO_do_write (f, f->_IO_write_base,
f->_IO_write_ptr - f->_IO_write_base) == EOF)
return EOF;
return (unsigned char) ch;
}
libc_hidden_ver (_IO_new_file_overflow, _IO_file_overflow)
_IO_new_file_overflow does indeed handle buffer allocation, as well as flushing. The two prominent calls are _IO_doallocbuf and _IO_do_flush:
// libio/genops.c
void
_IO_doallocbuf (FILE *fp)
{
if (fp->_IO_buf_base)
return;
if (!(fp->_flags & _IO_UNBUFFERED) || fp->_mode > 0)
if (_IO_DOALLOCATE (fp) != EOF)
return;
_IO_setb (fp, fp->_shortbuf, fp->_shortbuf+1, 0);
}
libc_hidden_def (_IO_doallocbuf)
// libio/libioP.h
#define _IO_DOALLOCATE(FP) JUMP0 (__doallocate, FP)
Now, what does _IO_DOALLOCATE lead to? That’s right, _IO_file_doallocate. Let’s see how it’s implemented:
// libio/filedoalloc.c
/* Allocate a file buffer, or switch to unbuffered I/O. Streams for
TTY devices default to line buffered. */
int
_IO_file_doallocate (FILE *fp)
{
size_t size;
char *p;
struct __stat64_t64 st;
size = BUFSIZ;
if (fp->_fileno >= 0 && __builtin_expect (_IO_SYSSTAT (fp, &st), 0) >= 0)
{
if (S_ISCHR (st.st_mode))
{
/* Possibly a tty. */
if (
#ifdef DEV_TTY_P
DEV_TTY_P (&st) ||
#endif
__isatty_nostatus (fp->_fileno))
fp->_flags |= _IO_LINE_BUF;
}
#if defined _STATBUF_ST_BLKSIZE
if (st.st_blksize > 0 && st.st_blksize < BUFSIZ)
size = st.st_blksize;
#endif
}
p = malloc (size);
if (__glibc_unlikely (p == NULL))
return EOF;
_IO_setb (fp, p, p + size, 1);
return 1;
}
libc_hidden_def (_IO_file_doallocate)
// libio/stdio.h
/* Default buffer size. */
#define BUFSIZ 8192
// libio/genops.c
void
_IO_setb (FILE *f, char *b, char *eb, int a)
{
if (f->_IO_buf_base && !(f->_flags & _IO_USER_BUF))
free (f->_IO_buf_base);
f->_IO_buf_base = b;
f->_IO_buf_end = eb;
if (a)
f->_flags &= ~_IO_USER_BUF;
else
f->_flags |= _IO_USER_BUF;
}
libc_hidden_def (_IO_setb)
Finally, a malloc 🙂. The size is BUFSIZ (8192) by default. After allocating a chunk of memory at p, the file pointer’s _IO_buf_base and _IO_buf_end are set accordingly.
We’ve come full circle 🎉. We have connected the dots, from how stdout is initialized with its v-table, how fwrite calls eventually calls _IO_file_doallocate through the v-table, and how _IO_file_doallocate initializes the buffer for a given FILE* if not yet.
Conclusion
Through this endeavor, I can now confirm that Rust manages its own buffer, and only uses libc::write and libc::read for unbuffered file IO. Buffering is handled in the BufWriter struct, while line buffering is controlled by the LineWriter. The buffer initialization is eager - as soon as you get yourself a std::io::Stdout handle, a 1024-byte buffer will be initialized for you. In C, every FILE has a 8192-byte buffer by default (probably 1024 when state of stdout is tty, like when stdout is connected to a terminal), but the buffer is initialized lazily on the first IO read/write operation.
We can see how polymorphism is implemented differently between Rust and C. Rust applies polymorphism by having Read and Write traits and trait implementations are resolved at compile time, while C initializes a v-table array and assigns a v-table for every FILE at runtime. I got a taste of how Rust and C organize their code in large projects like std and glibc. I would say the Rust code was much more easier to navigate personally, but I could be wrong though 😀.