How should GMP/MPFR limbs be interpreted?

The arbitrary precision libraries GMP and MPFR use heap-allocated arrays of machine word-sized integers to store the limbs that make up the high precision number/mantissa.

How should this array of limbs be interpreted to recover the arbitrary precision integer number? In other words: for N limbs holding B bits each, how should I interpret them to recover the N*B bit number?

Does the limb size really affect the in-memory representation (see below)? If so, what is the rationale behind this?

Background:

I wrote a small program to look inside the representation, but I was confused by what I saw. The limbs seem to be ordered in most significant digit order, whereas the limbs themselves are in native least significant digit format. When representing the 64-bit word 0xAAAABBBBCCCCDDDD using 32-bit words and precision fixed to 128 bits, I see:

% c++ limbs.cpp -lgmp -lmpfr -o limbs && ./limbs
ccccdddd|aaaabbbb|00000000|00000000
00000000|00000000|ccccdddd|aaaabbbb

This seems to imply that the in-memory representation can not be read back as a string of bits to recover the arbitrary precision number (e.g., if loaded this into a register on a machine that supported N*B sized words). Furthermore, this also seems to suggest that the limb size changes the representation, so that I would not be able to deserialize a number without knowing which limb size was used to serialize it.

Here's my test program (uses 32-bit limbs with the __GMP_SHORT_LIMB macro):

#define __GMP_SHORT_LIMB
#include <gmp.h>
#include <mpfr.h>

#include <iomanip>
#include <iostream>

constexpr int PRECISION = 128;

void PrintLimbs(mp_limb_t const *const limbs) {
  std::cout << std::hex;
  constexpr int NUM_LIMBS = PRECISION / (8 * sizeof(mp_limb_t));
  for (int i = 0; i < NUM_LIMBS; ++i) {
    std::cout << std::setfill('0') << std::setw(2 * sizeof(mp_limb_t)) << limbs[i];
    if (i < NUM_LIMBS - 1) {
      std::cout << "|";
    }
  }
  std::cout << "\n";
}

int main() {
  {  // GMP
    mpz_t num;
    mpz_init2(num, PRECISION);
    mpz_set_ui(num, 0xAAAABBBBCCCCDDDD);
    PrintLimbs(num->_mp_d);
    mpz_clear(num);
  }
  {  // MPFR
    mpfr_t num;
    mpfr_init2(num, PRECISION);
    mpfr_set_ui(num, 0xAAAABBBBCCCCDDDD, MPFR_RNDN);
    PrintLimbs(num->_mpfr_d);
    mpfr_clear(num);
  }
  return 0;
}

Solution

It finally clicked for me. The limb size does not affect the in-memory representation.

The data in GMP/MPFR is stored consistently in little-endian format, even when interpreted as a string of bytes across limbs. But registers on x86 are big-endian.

The inconsistent outcome when printing the limbs comes from how words are interpreted when read back from memory. When loaded into a register, memory is reinterpreted from little-endian (how it is stored in memory) to big-endian (how it is stored in registers).

I've modified the example below to show how it is in fact the word size with which we reinterpret memory that affects how the content is printed, as the output will be the same no matter if 32-bit or 64-bit limbs are used:

#define __GMP_SHORT_LIMB
#include <gmp.h>
#include <mpfr.h>

#include <iomanip>
#include <iostream>
#include <cstdint>

constexpr int PRECISION = 128;

template <typename InterpretAs>
void PrintLimbs(mp_limb_t const *const limbs) {
  constexpr int LIMB_BITS = 8 * sizeof(InterpretAs); 
  constexpr int NUM_LIMBS = PRECISION / LIMB_BITS;
  std::cout << LIMB_BITS << "-bit: ";
  for (int i = 0; i < NUM_LIMBS; ++i) {
    const auto limb = reinterpret_cast<InterpretAs const *>(limbs)[i];
    for (int b = 0; b < LIMB_BITS; ++b) {
      if (b > 0 && b % 16 == 0) {
        std::cout << " ";
      }
      uint64_t bit = (limb >> (LIMB_BITS - 1 - b)) & 0x1; 
      std::cout << bit; 
    }
    if (i < NUM_LIMBS - 1) {
      std::cout << "|";
    }
  }
  std::cout << "\n";
}

int main() {
  uint64_t literal = 0b1111000000000000000000000000000000000000000000000000000000001001;
  {  // GMP
    mpz_t num;
    mpz_init2(num, PRECISION);
    mpz_set_ui(num, literal);
    std::cout << "GMP where limbs are interpreted as:\n";
    PrintLimbs<uint64_t>(num->_mp_d);
    PrintLimbs<uint32_t>(num->_mp_d);
    PrintLimbs<uint16_t>(num->_mp_d);
    mpz_clear(num);
  }
  {  // MPFR
    mpfr_t num;
    mpfr_init2(num, PRECISION);
    mpfr_set_ui(num, literal, MPFR_RNDN);
    std::cout << "MPFR where limbs are interpreted as:\n";
    PrintLimbs<uint64_t>(num->_mpfr_d);
    PrintLimbs<uint32_t>(num->_mpfr_d);
    PrintLimbs<uint16_t>(num->_mpfr_d);
    mpfr_clear(num);
  }
  return 0;
}

This prints (regardless of limb size):

GMP where limbs are interpreted as:
64-bit: 1111000000000000 0000000000000000 0000000000000000 0000000000001001|0000000000000000 0000000000000000 0000000000000000 0000000000000000
32-bit: 0000000000000000 0000000000001001|1111000000000000 0000000000000000|0000000000000000 0000000000000000|0000000000000000 0000000000000000
16-bit: 0000000000001001|0000000000000000|0000000000000000|1111000000000000|0000000000000000|0000000000000000|0000000000000000|0000000000000000
MPFR where limbs are interpreted as:
64-bit: 0000000000000000 0000000000000000 0000000000000000 0000000000000000|1111000000000000 0000000000000000 0000000000000000 0000000000001001
32-bit: 0000000000000000 0000000000000000|0000000000000000 0000000000000000|0000000000000000 0000000000001001|1111000000000000 0000000000000000
16-bit: 0000000000000000|0000000000000000|0000000000000000|0000000000000000|0000000000001001|0000000000000000|0000000000000000|1111000000000000