c++performance primes sieve-of-eratosthenes

Checking Uniqueness of Shifting Sieve of Eratosthenes

My code is as follows:

int main(){
    vector<int> primes;
    vector<int> primesSum;
    int tally = 1;
    bool primeFound = false;
    while (true) {
        int i = 0;
        while (!primeFound) {
            primeFound = true;
            tally++;
            for (i = 0; i < (int)primes.size(); i++) {
                if (tally == primesSum[i]) {
                    primeFound = false;
                    primesSum[i] += primes[i];
                }
            }
        }
        primeFound = false;
        primesSum.push_back(tally*2);
        primes.push_back(tally);
    }
}

It seems to generate prime numbers fine but I was wondering if I could increase its efficiency by implementing the rule that all primes are 6n +/- 1 after 2 and 3. That would seem to sacrifice my achievement of initially empty vectors though.

I've seen prime number verifiers that make use of the 6n +/- i rule but not really in sieve programs, likely as it breaks the 2 filter.

Edit: as an aside I would prefer to not use modulo as that is another achievement for this program, unless a modulo is more efficient (time-wise not space-wise) than what I currently have.

Solution

Here is some code that I wrote to efficiently generate primes using a sieve that only considers integers that are +-1 mod 6. It is written as a generator using a coroutine library, but the logic is equivalent. It uses a vector of uint64_t's as a bit mask.

There are no modulo's related to computing the primes, but they are used to access the bitmap.

Update

I extracted my original code from the generator and created a simple function that puts the primes in a vector. One method uses the +-1 mod 2 search and the other +-1 mod 6 search. I also extracted the OP's code into a function.

I ran some simple timing measurements on an M1 Max (arm64).

function	primes<1M	primes<10M	primes<100M
sieve_index	12904 ms	still running...
sieve_2n	2 ms	30 ms	222 ms
sieve_6n.	1 ms	15 ms	132 ms

The code and build instructions can be found at the GitHub repo cpp-core/so

Code

auto sieve_mod6_prime_seq(int max = int{1} << 20) {
    std::vector<int> primes;
    primes.push_back(2);
    primes.push_back(3);

    auto max_index = max / 3;
    auto bits_per = sizeof(uint64_t) * CHAR_BIT;
    auto nwords = (bits_per + max_index - 1) / bits_per;
    std::vector<uint64_t> words(nwords);

    words[0] |= 1;
    size_t wdx = 0;
    while (wdx < nwords) {
        auto b = std::countr_one(words[wdx]);
        auto p = 3 * (64 * wdx + b) + 1 + (b bitand 1);
        if (b < 64 and p < max) {
            primes.push_back(p);

            for (auto j = p; j < max; j += 6 * p) {
                auto idx = j / 3;
                auto jdx = idx / 64;
                auto jmask = uint64_t{1} << (idx % 64);
                words[jdx] |= jmask;
            }

            for (auto j = 5 * p; j < max; j += 6 * p) {
                auto idx = j / 3;
                auto jdx = idx / 64;
                auto jmask = uint64_t{1} << (idx % 64);
                words[jdx] |= jmask;
            }
        }
        else {
            ++wdx;
        }
    }
    return primes;
}

auto sieve_mod2_prime_seq(int max = int{1} << 20) {
    std::vector<int> primes;

    auto bits_per = 2 * sizeof(uint64_t) * CHAR_BIT;
    auto nwords = (bits_per + max - 1) / bits_per;
    std::vector<uint64_t> words(nwords);
    if (max > 2)
        primes.push_back(2);
    words[0] |= 1;
    size_t idx = 0;
    while (idx < nwords) {
        auto bdx = 2 * std::countr_one(words[idx]) + 1;
        auto p = 128 * idx + bdx;
        if (bdx < 128 and p < max) {
            primes.push_back(p);
            for (auto j = p; j < max; j += 2 * p) {
                auto jdx = j / 128;
                auto jmask = uint64_t{1} << ((j % 128) / 2);
                words[jdx] |= jmask;
            }
        }
        else {
            ++idx;
        }
    }
    return primes;
}

auto sieve_index(int max = int{1} << 20) {
    std::vector<int> primes;
    std::vector<int> primesSum;
    int tally = 1;
    bool primeFound = false;
    while (tally < max) {
        int i = 0;
        while (!primeFound) {
            primeFound = true;
            tally++;
            for (i = 0; i < (int)primes.size(); i++) {
                if (tally == primesSum[i]) {
                    primeFound = false;
                    primesSum[i] += primes[i];
                }
            }
        }
        primeFound = false;
        primesSum.push_back(tally*2);
        primes.push_back(tally);
    }
    return primes;
}

template<class Work>
void measure(std::ostream& os, std::string_view desc, Work&& work) {
    chron::StopWatch timer;
    timer.mark();
    work();
    auto millis = timer.elapsed_duration<std::chrono::milliseconds>().count();
    os << fmt::format("{:>30s}: {} ms", desc, millis) << endl;
}

int tool_main(int argc, const char *argv[]) {
    ArgParse opts
        (
         argValue<'n'>("number", 1000, "Number of primes"),
         argFlag<'v'>("verbose", "Verbose diagnostics")
         );
    opts.parse(argc, argv);
    auto n = opts.get<'n'>();
    // auto verbose = opts.get<'v'>();

    measure(cout, "sieve_index", [&]() { sieve_index(n); });
    measure(cout, "sieve_2n", [&]() { sieve_mod2_prime_seq(n); });
    measure(cout, "sieve_6n", [&]() { sieve_mod6_prime_seq(n); });
    return 0;
}

End Update

You can easily adapt it to construct a vector of primes by replacing each co_yield with vec.push_back.

coro::Generator<uint64_t> sieve_mod6_prime_seq(uint64_t max = uint64_t{1} << 20) {
    co_yield 2;
    co_yield 3;

    auto max_index = max / 3;
    auto bits_per = sizeof(uint64_t) * CHAR_BIT;
    auto nwords = (bits_per + max_index - 1) / bits_per;
    std::vector<uint64_t> words(nwords);

    words[0] |= 1;
    size_t wdx = 0;
    while (wdx < nwords) {
        auto b = std::countr_one(words[wdx]);
        auto p = 3 * (64 * wdx + b) + 1 + (b bitand 1);
        if (b < 64 and p < max) {
            co_yield p;

            for (auto j = p; j < max; j += 6 * p) {
                auto idx = j / 3;
                auto jdx = idx / 64;
                auto jmask = uint64_t{1} << (idx % 64);
                words[jdx] |= jmask;
            }

            for (auto j = 5 * p; j < max; j += 6 * p) {
                auto idx = j / 3;
                auto jdx = idx / 64;
                auto jmask = uint64_t{1} << (idx % 64);
                words[jdx] |= jmask;
            }
        }
        else {
            ++wdx;
        }
    }
    co_return;
}