Is there a 256-bit integer type in C?

OS: Linux (Debian 10)

CC: GCC 8.3

CPU: i7-5775C

There is a unsigned __int128/__int128 in GCC, but is there any way to have a uint256_t/int256_t in GCC?

I have read of a __m256i which seems to be from Intel. Is there any header that I can include to get it?

Is it as usable as a hypothetic unsigned __int256? I mean if you can assign from/to it, compare them, bitwise operations, etc.

What is its signed equivalent (if any)?

EDIT 1:

I achieved this:

#include <immintrin.h>
typedef __m256i uint256_t;

and compiled. If I can do some operations with it, I'll update it here.

EDIT 2:

Issues found:

uint256_t   m;
int         l = 5;

m = ~((uint256_t)1 << l);

ouput:

error: can’t convert a value of type ‘int’ to vector type ‘__vector(4) long long int’ which has different size
  m = ~((uint256_t)1 << l);

Solution

Clang 16 and GCC 14 have full support for C23 unsigned _BitInt(256) on x86.
(GCC also AArch64.)
For other ISAs, GCC and Clang are limited to 128-bit or don't support _BitInt at all.
LLVM-based ICX works like clang.

Clang had _ExtInt extended integers that supports operations other than division, but SIMD isn't useful for that because of carry between elements¹. Other mainstream x86-64 compilers don't even have that; you need a library or something to define a custom type and use the same add-with-carry instructions clang will use. (Or a less efficient emulation in pure C²).

This has now been renamed _BitInt(n), and will be part of ISO C23. (clang -std=gnu2x).
As an extension, clang also accepts _BitInt in C++, regardless of revision, even with -std=c++11 rather than -std=gnu++11. Also in earlier C revisions, like -std=gnu11 or -std=c11.

typedef unsigned _BitInt(256) u256;
typedef /*signed*/ _BitInt(256) i256;

// works with clang 16
// but not GCC yet (as of April 2023)

int lt0(i256 a){
    return a < 0;  // just the sign bit in the top limb
}

// -march=broadwell allows mulx which clang uses, and adox/adcx which it doesn't
u256 mul256(u256 a, u256 b){
    return a*b;
}

Godbolt with clang -std=gnu2x - works even with -m32, where it's 8x 32-bit limbs instead of just 4x 64-bit. Multiply and divide expand inline to a large amount of code, not calling helper functions, so use carefully.

Clang 11 and 12 supported _ExtInt(256), except for divide wider than 128. Not _BitInt.
a<0 required explicit casts like a < (i256)0.
Clang 13 added implicit conversion from int to _ExtInt types. Still no divide support for integers wider than 128-bit.
Clang 14 and 15 support _BitInt(n), but only for sizes up to _BitInt(128), so all supported sizes support division.
Clang 16 and later accept unsigned _BitInt(256) bar;, including mul and div (but it's expanded inline, not a helper function, so code-size is large for those ops.)
- x86-64 and i386: max width 8388608
- AArch64 and ARM 32: max width 128
- RISC-V 32 and 64: max width 128
- WebAssembly: max width 128
GCC 14 on x86-64 supports _BitInt(256), with BITINT_MAXWIDTH 65535. It uses helper functions for mul and div for widths > 128 (passing the bit-width as a function arg.)
- x86-64 and i386: supported, max width 65535
- AArch64: supported, max width 65535
- ARM-32: no support, not even 128-bit, not even with GCC15-trunk
- RISC-V 32 and 64: no support, not even 128-bit, not even with GCC15-trunk
- POWER64 / powerpc64le: no support.

SIMD 256-bit vectors aren't 256-bit scalar integers

__m256i is AVX2 SIMD 4x uint64_t (or a narrower element size like 8x uint32_t). It's not a 256-bit scalar integer type, you can't use it for scalar operations, __m256i var = 1 won't even compile. There is no x86 SIMD support for integers wider than 64-bit, and the Intel intrinsic types like __m128i and __m256i are purely for SIMD. You can do bitwise boolean ops with them.

GCC's __int128 / unsigned __int128 typically uses scalar add/adc, and/or scalar mul / imul, because AVX2 is generally not helpful for extended precision, unless you use a partial-word storage format so you can defer carry. (SIMD is helpful for stuff like bitwise AND/OR/XOR where element boundaries are irrelevant.)

Footnote 1: There actually is some scope for using SIMD for BigInteger types, but only with a specialized format. And more importantly, you have to manually choose when to re-normalize (propagate carry) so your calculations have to be designed around it; it's not a drop-in replacement. See Mysticial's answer on Can long integer routines benefit from SSE?

Footnote 2: Unfortunately C does not provide carry-out from addition / subtraction, so it's not even convenient to write in C. sum = a+b / carry = sum<a works for carry out when there's no carry in, but it's much harder to write a full adder in C. And compiler typically make crap asm that doesn't just use native add-with-carry instructions on machines where they're available. Extended-precision libraries for very big integers, like GMP, are typically written in asm.

Update: Assembly ADC (Add with carry) to C++ shows an add-with-carry in C/C++ which new-enough clang can optimize to a chain of adc instructions, with the middle ones having carry-in and carry-out.

Also related: Producing good add with carry code from clang discusses GNU C __builtin_add_overflow and Clang's __builtin_addcl