c++iterator polymorphism api-design input-iterator

How to return a variant from an input iterator with high performance?

I have some file format decoder which returns a custom input iterator. The value type of this iterator (when dereferencing it with *iter) can be one of many token types.

Here's a simplified usage example:

File file {"/path/to/file"};

for (const auto& token : file) {
    // do something with token
}

How can this token have multiple possible types? Depending on the type of the token, the type of its payload changes too.

Performance is important here during traversal. I don't want any unnecessary allocation, for example. This is why the iterator's type is an input iterator: as soon as you advance the iterator, the previous token is invalidated as per the requirements of the InputIterator tag.

I have two ideas in mind so far:

Use a single Token class with a private union of all the possible payloads (with their public getters) and a public type ID (enum) getter. The user needs to switch on this type ID to know which payload getter to call:
```
for (const auto& token : file) {
    switch (token.type()) {
    case Token::Type::APPLE:
        const auto& apple = token.apple();
        // ...
        break;

    case Token::Type::BANANA:
        const auto& banana = token.banana();
        // ...
        break;

    // ...
    }
}
```
Although this is probably what I would choose in C, I'm not a fan of this solution in C++ because the user can still call the wrong getter and nothing can enforce this (except run-time checks which I want to avoid for performance concerns).
Create an abstract Token base class which has an accept() method to accept a visitor, and multiple concrete classes (one for each payload type) inheriting this base class. In the iterator object, instantiate one of each concrete class at creation time. Also have a Token *token member. When iterating, fill the appropriate pre-allocated payload object, and set this->token = this->specificToken. Make operator*() return this->token (reference to). Ask the user to use a visitor during the iteration (or worse, use dynamic_cast):
```
class MyVisitor : public TokenVisitor {
public:
    void visit(const AppleToken& token) override {
        // ...
    }

    void visit(const BananaToken& token) override {
        // ...
    }
};

TokenVisitor visitor;

for (const auto& token : file) {
    token.accept(visitor);
}
```
This introduces additional function calls for each token, at least one of them which is virtual, but this might not be the end of the world; I remain open to this solution.

Is there any other interesting solution? I consider that returning a boost::variant or std::variant is the same as idea #2.

Solution

Although this is probably what I would choose in C, I'm not a fan of this solution in C++ because the user can still call the wrong getter and nothing can enforce this (except run-time checks which I want to avoid for performance concerns).

You can reverse the approach and accept a callable object instead of returning an iterator to the user. Then you can iterate the container internally and dispatch the right type. This way users cannot do mistakes anymore by ignoring the information carried up with your tagged union, for you are in charge of taking it in consideration.

Here is a minimal, working example to show what I mean:

#include <vector>
#include <utility>
#include <iostream>

struct A {};
struct B {};

class C {
    struct S {
        enum { A_TAG, B_TAG } tag;
        union { A a; B b; };
    };

public:
    void add(A a) {
        S s;
        s.a = a;
        s.tag = S::A_TAG;
        vec.push_back(s);
    }

    void add(B b) {
        S s;
        s.b = b;
        s.tag = S::B_TAG;
        vec.push_back(s);
    }

    template<typename F>
    void iterate(F &&f) {
        for(auto &&s: vec) {
            if(s.tag == S::A_TAG) {
                std::forward<F>(f)(s.a);
            } else {
                std::forward<F>(f)(s.b);
            }
        }
    }

private:
    std::vector<S> vec;
};

void f(const A &) {
    std::cout << "A" << std::endl;
}

void f(const B &) {
    std::cout << "B" << std::endl;
}

int main() {
    C c;
    c.add(A{});
    c.add(B{});
    c.add(A{});
    c.iterate([](auto item) { f(item); });

}

See it up and running on Coliru.