I am currently creating a BIG_INTEGER
struct of my own in pure C as a pastime project. One of the functions that I decided to implement into my project is a PRNG
that returns a random BIG_INTEGER
with a specified number of digits.
This is the BIG_INTEGER
struct:
typedef struct BIG_INTEGER
{
BOOL bHasSign, bInit;
SIZE_T szLength;
INT* arrnDigits;
} BIG_INTEGER, *PBIG;
This is the code I came up for the PRNG
.
VOID birndl(PBIG pbigOut, ULONG szLength)
{
if (pbigOut->bInit != 1)
{
// Initialize the input with binew() first.
fprintf(stderr, "birndl!1: no content");
abort();
}
else
// Simply resets the values and frees the int array
bidel(pbigOut);
// Prepare
pbigOut->bInit = TRUE;
pbigOut->bHasSign = FALSE;
pbigOut->szLength = szLength;
pbigOut->arrnDigits = malloc(szLength * sizeof(INT));
if (pbigOut->arrnDigits != NULL)
{
NTSTATUS ntErr;
UCHAR buf[512];
SIZE_T index1, index2, index3, index4;
// Fill buffer with random bytes
ntErr = BCryptGenRandom(NULL, buf, 512, BCRYPT_USE_SYSTEM_PREFERRED_RNG);
if (!NT_SUCCESS(ntErr))
{
fprintf(stderr, "birndl!2: bcrypt failed with code %ld", ntErr);
abort();
}
index1 = index2 = index3 = index4 = 0;
*(pbigOut->arrnDigits) = (buf[index1] * buf[index2] * buf[index3] * buf[index4]) % 10;
// Zero cannot be the first digit, since the resulting length would be szLength - 1
// Slightly biased, but the show must go on; transfer the chances of getting a #0 to #5
if (*(pbigOut->arrnDigits) == 0)
*(pbigOut->arrnDigits) = 5;
// By multiplying 4 selected UCHAR in the buffer mod 10, a digit from 0-9 is obtained
for (SIZE_T _iter = 1; _iter < szLength; ++_iter)
{
// Select a different UCHAR from the buffer for every iteration
if (index1 == 512 && index2 < 512)
{
index2++;
index1 = 0;
}
else if (index1 == 512 && index2 == 512 && index3 < 512)
{
index3++;
index2 = 0;
}
else if (index1 == 512 && index2 == 512 && index3 == 512 && index4 < 512)
{
index4++;
index3 = 0;
}
else if (index1 == 512 && index2 == 512 && index3 == 512 && index4 == 512)
{
fprintf(stderr, "birndl!3: exceeding capability");
abort();
}
else
index1++;
*(pbigOut->arrnDigits + _iter) = (buf[index1] * buf[index2] * buf[index3] * buf[index4]) % 10;
}
}
else
{
fprintf(stderr, "birndl!4: memory error");
abort();
}
}
For a random thought, I wanted to test the performance about how fast can it generate digits, so what I did is pretty straightforward.
LARGE_INTEGER frequency, start, end;
QueryPerformanceFrequency(&frequency);
QueryPerformanceCounter(&start);
ULONGLONG length = 10000000;
BIG_INTEGER num;
binew(NULL, &num);
birndl(&num, length);
PCHAR str = bistr(&num);
bidel(&num);
free(str);
QueryPerformanceCounter(&end);
printf("QPC: Program took %fs to finish. (digits=%I64u)\n",
(double)((end.QuadPart - start.QuadPart)) / frequency.QuadPart, length);
_CrtDumpMemoryLeaks();
Note that I tried this with other powers of 10
lower than 10M
, too. This is the summary of its performance with the same troubleshooting method, except I changed the value of length
accordingly.
1-10 K digits: instantly
100 K digits: 0.5~0.6 seconds
1 M digits: 47~53 seconds
2 M digits: ~197 seconds
10 M digits: >10 minutes and still not finished
With common sense, who generates 10M
digits, right? However, I wanted to know if there is a better, faster way to generate such big amounts. I thought, maybe if the for
-loop decays in speed as the iteration progresses, then maybe I can split the values first, pass it through birndl
with lesser iterations, convert those BIG_INTEGER
results into a string, and finally concatenate them into one big string. Here's the entry point code along with bistr
, which converts my BIG_INTEGER
into a string.
INT main(VOID)
{
LARGE_INTEGER frequency, start, end;
QueryPerformanceFrequency(&frequency);
QueryPerformanceCounter(&start);
// Compute necessary values
ULONGLONG aim_digits = 10000000;
UINT divisions = 2500;
ULONGLONG per_division = aim_digits / divisions;
// Create an array of BIG_INTEGER values
BIG_INTEGER* list = malloc(sizeof(BIG_INTEGER) * divisions);
// Store the string forms of the BIG_INTEGER values here
// Last element (hence +1) would contain the concatenation of all strings in the array
PCHAR* strlist = malloc(sizeof(PCHAR*) * (divisions + 1));
assert(list != NULL && strlist != NULL);
// Iterate for each division
for (SIZE_T i = 0; i < divisions; ++i)
{
binew(NULL, &list[i]);
birndl(&list[i], per_division);
// Convert generated portion into a string
strlist[i] = bistr(&list[i]);
}
// Allocate
strlist[divisions] = malloc(aim_digits + 1);
assert(strlist[divisions] != NULL);
// Start concatenating everything
strcpy(strlist[divisions], strlist[0]);
for (SIZE_T i = 1; i < divisions; ++i)
strcat(strlist[divisions], strlist[i]);
// Make sure to free all the used memory by the end
for (SIZE_T i = 0; i < divisions; ++i)
{
bidel(&list[i]);
free(strlist[i]);
}
free(strlist[divisions]);
free(strlist);
free(list);
QueryPerformanceCounter(&end);
printf("QPC: Program took %fs to finish. (digits=%I64u)\n",
(double)((end.QuadPart - start.QuadPart)) / frequency.QuadPart, aim_digits);
_CrtDumpMemoryLeaks();
return 0;
}
PCHAR bistr(PBIG pbigIn)
{
if (pbigIn->bInit != 1)
{
fprintf(stderr, "bistr!1: no content");
abort();
}
PCHAR pstrOut = calloc(pbigIn->szLength + pbigIn->bHasSign + 1, sizeof(CHAR));
if (!pstrOut)
{
fprintf(stderr, "bistr!2: memory error");
abort();
}
if (pbigIn->bHasSign)
strcat(pstrOut, "-");
for (SIZE_T i = 0; i < pbigIn->szLength; ++i)
{
CHAR digit[2];
snprintf(digit, 2, "%d", pbigIn->arrnDigits[i]);
strcat(pstrOut, digit);
}
return pstrOut;
}
I did a test run "logarithmically" with around 4000
digits per division, and this gave extremely amazing results!
d - digits; v - divisions
10d,5v: }
100d,25v: }
1Kd,25v: } instantly
10Kd,25v: }
100Kd,25v: }
1Md,250v: 0.5~0.6 seconds
10Md,2500v: 6.6~7.0 seconds
100Md,25000v: ~197 seconds
This is definitely a lot faster than the straightforward way of using the PRNG
. It generates 100 million digits within same time as the first method took when it tried to generate 2 million!
However, as the title suggests, I also wonder why birndl
causes the for
-loop to deteriorate in speed over time. Is there a way to optimize it so that it performs the same or at least nearly the same way as the "split method", or is it just the nature of for
-loops to slow down over time as it iterates that one should keep the number of iterations low to maintain efficiency?
Why would the second method be faster, which uses birndl
many times among other extra steps (bistr
, strcat
) but with less iterations passed, compared to the first method which simply tells birndl
to generate a very big amount of digits in one go?
Thank you!
For loops do not inherently get slower over time.
I can't be certain what is really happening on your machine but with that shape of workload your measurements do not surprise me. At low enough counts all the memory required will be hitting cache lines within the CPU which is extremely fast. As the count increases you start hitting system RAM which is considerably slower and (because the code & data structure you're using are not designed to mitigate this that I can see) for large enough counts you will run into cache collisions and prefetch/expiry misprediction which will be a further step slower. Beyond that you may start using virtual memory that's backed by something other than system RAM which will be another step slower.
Some other hints: