c++large-datafmt

How to use {fmt} with large data


I'm starting to play with {fmt} and wrote a little program to see how it processes large containers. It would seem that fmt::print() (which ultimately sends output to stdout) internally first composes the entire result as a string. The test program below where I format a 10,000,000 sized vector<char> using a format string that consumes 100 bytes per entry amasses the full 100 * 10,000,000 = 1 GB of RAM before starting to dump the result to stdout. Although you can't tell from the output of my test program, almost all of the 1.7 seconds it took to format and output the result is spent in the formatting -- not the outputting. (If you don't redirect to /dev/null, there's a long pause before anything starts printing to stdout.) This is not good behavior if you're trying to build pipelining tools.

Q1. I do see some references in the docs to fmt::format_to(). Can that somehow be used to start streaming and discarding the result before the formatting is complete and thereby avoid the in-core composition of the full result?

Q2. Continuing along this line of exploration, instead of passing a container, is there a way I can pass, say, two iterators (that perhaps point at the beginning and ending of a very large file) and pump that data through {fmt} for processing (and thereby avoid having to first read the entire file into memory)?

#include <iostream>
#include <vector>
#include "fmt/format.h"
#include "fmt/ranges.h"
#include "time.h"

using namespace std;

inline long long
clock_monotonic_raw() {
    struct timespec ct;
    clock_gettime(CLOCK_MONOTONIC_RAW, &ct);
    return ct.tv_sec * 1000000000LL + ct.tv_nsec;
}

inline double
dt() {
    static long long t0 = 0;
    if (t0 == 0) {
        t0 = clock_monotonic_raw();
        return 0.0;
    }
    long long t1 = clock_monotonic_raw();
    return (t1 - t0) / 1.0e9;
}

int main(int argc, char** argv) {
    fprintf(stderr, "%10.6f: ENTRY\n", dt());
    vector<char> v;
    for (int i = 0; i < 10'000'000; ++i)
        v.push_back('A' + i % 26);
    string pad(98, ' ');
    fprintf(stderr, "%10.6f: INIT\n", dt());
    fmt::print(pad + "{}\n", fmt::join(v, "\n" + pad));
    fprintf(stderr, "%10.6f: DONE\n", dt());
    return 0;
}

matt@dworkin:fmt_test$ g++ -o mem_fmt -O3 -I ../fmt/include/ mem_fmt.cpp ../fmt/libfmt.a
matt@dworkin:fmt_test$ ./mem_fmt > /dev/null
  0.000000: ENTRY
  0.034582: INIT
  1.769687: DONE

[from another window whilst it's running]

matt@dworkin:fmt_test$ ps -aux | egrep 'COMMAND|mem_fmt' | grep -v grep
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
matt       30292  2.8  6.2 1097864 999208 pts/0  S+   17:40   0:01 ./mem_fmt

Note VSZ of 1.097864 GB


Solution

  • First, let's address your example. The current version of {fmt} has an optimization that allows writing directly into a stream buffer. Right now it is only enabled for fundamental and string types. Once enabled for join_view in this commit, no additionally dynamic memory will be allocated in your example, fmt::print will just use the C stream buffer.

    Unlike the ostream_iterator approach it will also be faster.

    Before:

    % time ./a.out > /dev/null
    ...
    ./a.out > /dev/null  0.23s user 0.38s system 71% cpu 0.857 total
    

    After:

    % time ./a.out > /dev/null
    ...
    ./a.out > /dev/null  0.12s user 0.01s system 96% cpu 0.135 total
    

    This optimization is also proposed (and accepted) for std::print in P3107R5 Permit an efficient implementation of std::print.

    In older versions of {fmt} you can just replace fmt::join with writing lines individually, fmt::join provides no benefit in your case anyway.

    Now to the questions:

    Q1. I do see some references in the docs to fmt::format_to(). Can that somehow be used to start streaming and discarding the result before the formatting is complete and thereby avoid the in-core composition of the full result?

    Yes. In general formatting functions including format_to write into a fixed-size buffer (print was an exception but it is being fixed as described above). They might still need to allocate for a single argument (but not the full output) if you use padding.

    Q2. Continuing along this line of exploration, instead of passing a container, is there a way I can pass, say, two iterators (that perhaps point at the beginning and ending of a very large file) and pump that data through {fmt} for processing (and thereby avoid having to first read the entire file into memory)?

    Yes. {fmt} iterates over a range element by element and supports single-pass input iterators. So you can read lazily and discard parts of the input after they have been consumed to save memory. Iterators can be passed as part of a range or via fmt::join.