256-bit arithmetic in Clang (extended integers)

I'm in the design phase of a project that needs to do a lot of simple 256-bit integer arithmetic (add, sub, mul, div only) and need something that is reasonably well optimised for these four operations.

I'm already familiar with GMP, NTL and most of the other heavyweight bignum implementations. However, the overhead of these implementations is pushing me towards doing my own low-level implementation - which I really don't want to do; this stuff is notoriously hard to get right.

In my research I noticed the new extended integer types in Clang - I am a gcc user - and I was wondering if anyone has any experience of the extended integers in real-life, in-anger implementations? Are they optimised for the "obvious" bit sizes (256, 512, etc)?

I'm working in C on x-64 under linux (currently Ubuntu, though open to other distributions if necessary). I mostly use gcc for production work.

Edited to add: @phuclv identified a previous answer C++ 128/256-bit fixed size integer types. (Thanks @phuclv.) This q/a focuses on c++ support; I was hoping to identify whether anyone had any specific experience with the new Clang types.

Solution

Update as of August 2025 (thanks Marc Glisse):

C23 has adopted _BitInt in place of the earlier nonstandard _ExtInt, with similar syntax. Compiler support has improved, but is highly target-dependent.

Try on godbolt.

On x86-64, Clang 16 and higher supports _BitInt, including division, for apparently unlimited sizes (up to BITINT_MAXWIDTH = 8388608 = 0x800000 for that ISA). However, code size and compilation time increase with the size, as the algorithms are apparently inlined and unrolled. (A 65536-bit division took about 24 seconds to compile at -O3 on my laptop, and produced about 600KB of code. -Os did not reduce the code size significantly. -Oz takes a very long time to compile and actually makes the code much larger, which I reported as #156251). Even simple bitwise ops don't loop even for huge sizes.

On the other hand, for Clang on ARM64, _BitInt remains limited to 128 bits.

GCC 14 and higher supports _BitInt with sizes up to 65535 on x86-64 and ARM64 at least, including division which calls a runtime library function. However, some other targets, e.g. RISC-V, do not support _BitInt at all, regardless of size.

It's worth noting that on every compiler/target I tried, when _BitInt(N) was supported at all, the support included division.

Original answer from 2020:

It looks like division with these types is not currently supported beyond 128 bits.

As of 2 August 2020, using clang trunk on godbolt, compiling the following code for x86-64

typedef unsigned _ExtInt(256) uint256;

uint256 div(uint256 a, uint256 b) {
    return a/b;
}

fails with the error message

fatal error: error in backend: Unsupported library call operation!

Try it

The same thing happens with _ExtInt(129) and everything larger that I tried. _ExtInt(128) and smaller seem to work, though they call the internal library function __udivti3 instead of inlining.

It has been reported as LLVM bug 45649. There is some discussion on that page, but the upshot seems to be that they do not really want to write a full arbitrary-precision divide instruction.

Addition, subtraction and multiplication do work with _ExtInt(256) on this version.