I'm dealing with a problem where I have a huge collection of 16-bit signed integers. I must multiply each by a constant factor of type double and promote convert it back to the integer type. I need to check for both overflow and underflow, and in case of such occurrences, the result should be set to the integer limit, and a warning should be printed informing the user about the problem.
I came up with a solution using C but didn't like it too much. I'm also questioning the efficiency of the check. Basically, I'm looking for a "best practice" approach.
int16_t *sample = malloc(2);
for (unsigned long i = 1; fread(sample, 2, 1, inputf); i++)
{
// Check overflow/underflow
if (fabs(INT16_MAX / (double) *sample) < fabs(factor))
{
if (copysign(1.0, factor) * *sample > 0)
{
printf("Overflow in sample #%li\n", i);
*sample = INT16_MAX;
}
else
{
printf("Underflow in sample #%li\n", i);
*sample = INT16_MIN;
}
}
else
{
*sample = (int16_t) (*sample * factor);
}
/* Do stuff with sample */
}
free(sample);
I have already considered saturation/wrapping of integers, but according to this question, relying on saturation or wrapping is considered undefined behavior for signed integers. I also came across this question, but gosh.
The range for int16_t
is [−32,768, +32,767], since it is a 16-bit two’s complement type (and these limits are specified in C 2018 7.20.2.1 1). When a floating-point value is converted to an integer type, it is truncated (6.3.1.4 1), so all values inside the open interval (−32,769, +32,768) have defined results (do not overflow), and values outside that overflow (the C standard does not define the behavior).
The C standard’s requirements for float
are such that all integers from −32,769 to +32,768 can be represented. Per 5.2.4.2.2 14, the spacing between representable numbers in a neighborhood of 1 is at most 10− 5, so the spacing in the neighborhood of 32,769 is at most 32,769•10−5 = .32769. Further, 5.2.4.2.2 13 tells us this is within the range of the float
format (at least 1037). So integers up to 32,769 in magnitude are representable. The float
values are a subset of the double
values per 6.2.5 10.
Therefore, we can perform the desired scaling, test, and conversion with:
float t = *sample * factor;
if (t <= INT16_MIN - 1.f)
{
fprintf(stderr, "Warning, underflow in sample #%lu.\n", i);
*sample = INT16_MIN;
}
else if (INT16_MAX + 1.f <= t)
{
fprintf(stderr, "Warning, overflow in sample #%lu.\n", i);
*sample = INT16_MAX;
}
else
*sample = t;
The above uses float
since it suffices for range, but double
may be used if more precision is desired in the representation of factor
or the product.
The IEEE-754 binary64 format commonly used for double
is sufficient, as its 53-bit significand ensures integers up to 253 can be represented.
However, the C standard alone does not guarantee this. It only guarantees spacing of 10−9 in the neighborhood of 1 and a 32-bit int
may have values up to 2,147,483,647, this is insufficient to guarantee the test may be performed in the same way as above. For general portable conversion of floating-point values to integer types, this answer provides safe code.