c++linuxfstreamfile-pointerfilebuf

std::filebuf passed to std::ifstream not always called


The Goal

Have unbuffered I/O and disabled kernel page cache for a legacy C++11 program. This feature must be on demand (through an executable parameter). The idea is to reduce the memory overhead of I/O operations regardless of performances. I am not sure this is the right way to achieve this though...

My attempt

The code base being quite big, with massive usage of std::ifstream and std::ofstream spread accross different binaries/libraries, my goal is to implement a class deriving from std::filebuf that relies on C I/O features (FILE *, open() so I can pass O_DIRECT flag, etc..), and pass it to a std::ifstream object (only inputs for the moment) using the inherited method std::basic_streambuf<CharT,Traits>* std::basic_ios<CharT,Traits>::rdbuf(std::basic_streambuf<CharT,Traits>*).

The issue

The issue is that the std::ifstream object seems to have in fact two internal buffers. See the code to understand my experiment (there might still be some obvious mistakes).

My filebuf

// FileBuf.h
class FileBuf : public std::filebuf {
public:
    FileBuf();
    virtual ~FileBuf();
    virtual std::filebuf* open(const char* filename,  std::ios_base::openmode mode);
    virtual std::filebuf* open(const std::string filename, std::ios_base::openmode mode);
    virtual bool is_open() const;
    virtual std::filebuf* close();
    virtual std::streambuf* setbuf(char_type* s, std::streamsize n);
    virtual int_type overflow(int c = traits_type::eof());
    virtual FileBuf::int_type underflow();
    virtual int sync();
private:
    int _fd;
    FILE * _fp;
    char _buff[1]; // minimal size
};
// FileBuf.cpp
FileBuf::FileBuf()
: std::filebuf(), _fd(0), _fp(NULL)
{}

FileBuf::~FileBuf() {
  close(); // RAII
}

std::filebuf* FileBuf::open(const char* filename,  std::ios_base::openmode mode) {
  std::cout << "open(const char*, ..): filename=" << filename << ", mode=" << mode << std::endl;
  // not finished, need to handle all modes
  int flags = O_RDONLY;
  mode_t fmode = S_IRUSR;
  std::string smode = "r";
  _fd = ::open(filename, flags, fmode);
  _fp = ::fdopen(_fd, smode.c_str());
  return _fp != NULL ? this : nullptr;
}

std::filebuf* FileBuf::open(const std::string filename, std::ios_base::openmode mode) {
  std::cout << "open(const std::string, ..): filename=" << filename << ", mode=" << mode << std::endl;
  return open(filename.c_str(), mode);
}

std::streambuf* FileBuf::setbuf(char_type* s, std::streamsize n) {
  return this;
}

bool FileBuf::is_open() const {
  return (_fp != NULL);
}

std::filebuf* FileBuf::close() {
  std::cout << "close()" << std::endl;
  if (_fp) {
    if (std::fclose(_fp) == 0) {
      return this;
    }
  }
  return nullptr;
}

FileBuf::int_type FileBuf::overflow(int_type c) {
  std::cout << "overflow()" << std::endl;
  if (traits_type::eq_int_type(c, traits_type::eof())) {
    return (sync() == 0) ? traits_type::not_eof(c) : traits_type::eof();
  } else {
    return ((std::fputc(c, _fp) != EOF) ? traits_type::not_eof(c) : traits_type::eof());
  }
}

FileBuf::int_type FileBuf::underflow()
{
  std::cout << "underflow(): _fp=" << _fp << std::endl;
  if (gptr() == NULL || gptr() >= egptr()) {
    int gotted = fgetc(_fp);
    if (gotted == EOF) {
      return traits_type::eof();
    } else {
      *_buff = gotted;
      setg(_buff, _buff, _buff + 1);
      return traits_type::to_int_type(*_buff);
    }
  } else {
    return traits_type::to_int_type(*_buff);
  }
}

int FileBuf::sync()
{
  std::cout << "sync()" << std::endl;
  return (std::fflush(_fp) == 0) ? 0 : -1;
}

Client code

std::string buff(1024, '\0');
std::ifstream ifs;
FileBuf fileBuf;

ifs.std::istream::rdbuf(&fileBuf); // file buf passed here

std::cout << "rdbuf()=" << static_cast<void*>(ifs.rdbuf()) << ", istream.rdbuf()=" << static_cast<void*>(ifs.std::istream::rdbuf()) << ", &fileBuf=" << static_cast<void*>(&fileBuf) << std::endl;
ifs.open("data/test1/delta");
ifs.read(&buff[0], 1024);

The output

rdbuf()=0x7fffffffdb10, istream.rdbuf()=0x7fffffffd9f0, &fileBuf=0x7fffffffd9f0
underflow(): _fp=0
// !! SEGFAULT !!

As the output shows, the two flavors of rdbuf() does not refer to the same internal buffer, and FileBuf::open is never called while it is supposed to be, as specified in std::basic_ifstream<CharT,Traits>::open:

Effectively calls rdbuf()->open(filename, mode | ios_base::in)

I understand what is happening: the internal buffer object returned by std::basic_ifstream::rdbuf is being called instead of the one from std::basic_ios<CharT,Traits>::rdbuf, but still, I have no clue how to get the behavior I want.

I'd like to avoid -at all costs- to replace all std::ifstream references with a custom implementation of it as it would imply to replace the type in all current declarations.

NOTE: I am compiling with gcc and libstdc++.


Solution

  • std::ifstream is always going to be using with it's own std::filebuf. That std::filebuf is separate to the one owned by std::basic_ios. Setting is has no effect, as it's never used and std::ifstream always uses his own. See also Difference between "internal" vs "associated" stream buffer .

    What you can do instead, you can overwrite the implementation of the file input/output operations in libstdc++ with your own implementation. Take the original implementation from libstdc++/basic_file_stdio.c and patch it with custom behavior. Dynamic linker will prefer your symbol over the shared one. For example:

    #include <iostream>
    #include <sstream>
    #include <fstream>
    #include <sys/fcntl.h>
    #include <bits/basic_file.h>
    #include <fcntl.h>
    #include <errno.h>
    #include <cstring>
    #include <unistd.h>
    
    // code copied from  libstdc++-v3/config/io/basic_file_stdio.cc
    namespace {
      // Map ios_base::openmode flags to a string for use in fopen().
      // Table of valid combinations as given in [lib.filebuf.members]/2.
      static const char*
      fopen_mode(std::ios_base::openmode mode)
      {
        enum
          {
            in     = std::ios_base::in,
            out    = std::ios_base::out,
            trunc  = std::ios_base::trunc,
            app    = std::ios_base::app,
            binary = std::ios_base::binary
          };
        // _GLIBCXX_RESOLVE_LIB_DEFECTS
        // 596. 27.8.1.3 Table 112 omits "a+" and "a+b" modes.
        switch (mode & (in|out|trunc|app|binary))
          {
          case (   out                 ): return "w";
          case (   out      |app       ): return "a";
          case (             app       ): return "a";
          case (   out|trunc           ): return "w";
          case (in                     ): return "r";
          case (in|out                 ): return "r+";
          case (in|out|trunc           ): return "w+";
          case (in|out      |app       ): return "a+";
          case (in          |app       ): return "a+";
          case (   out          |binary): return "wb";
          case (   out      |app|binary): return "ab";
          case (             app|binary): return "ab";
          case (   out|trunc    |binary): return "wb";
          case (in              |binary): return "rb";
          case (in|out          |binary): return "r+b";
          case (in|out|trunc    |binary): return "w+b";
          case (in|out      |app|binary): return "a+b";
          case (in          |app|binary): return "a+b";
          default: return 0; // invalid
          }
      }
    }
    
    namespace std
    {
      __basic_file<char>*
      __basic_file<char>::open(const char* __name, ios_base::openmode __mode,
                               int /*__prot*/)
      {
        __basic_file* __ret = NULL;
        const char* __c_mode = fopen_mode(__mode);
        if (__c_mode && !this->is_open())
          {
              // HERE I ADDED THIS LINE HERE I ADDED THIS LINE HERE I ADDED THIS LINE HERE I ADDED THIS LINE 
              const char *str = "TODO: set O_DIRECT here\n";
              write(STDOUT_FILENO, str, strlen(str));
    #ifdef _GLIBCXX_USE_LFS
            if ((_M_cfile = fopen64(__name, __c_mode)))
    #else
            if ((_M_cfile = fopen(__name, __c_mode)))
    #endif
              {
                _M_cfile_created = true;
                __ret = this;
              }
          }
        return __ret;
      }
    }
    
    int main() {
        std::string buff(1024, '\0');
        std::ifstream ifs;
        ifs.open("/tmp/1.cpp");
        ifs.read(&buff[0], 1024);
    }
    

    The program compiled and outputs on my system (the file is opened successfully):

    TODO: set O_DIRECT here
    

    You will have to replace fopen with open+fdopen and also replace __basic_file::close() so it also does close(fileno(..)).