java floating-point precision ieee-754 strictfp

Example of Code with and without strictfp Modifier

I know this question might seem overly familiar to the community, but I swear I've never been able to reproduce the issue related to this question even once throughout my programming journey.

I understand what the strictfp modifier does and how it ensures full compliance with the IEEE754 standard. However, I've never encountered a situation in practice where the set of values with an extended exponent is used, as described in the official specification.

I've tried using options like -XX:+UseFPUForSpilling to stimulate the use of the FPU block for calculations on my relatively modern processor, but it had no effect.

I even went as far as installing Windows 98 SE on a virtual machine and emulating an Intel Pentium II processor through Bochs, which does not support the SSE instruction set, hoping that the use of the FPU block in this case would be virtually the only option. However, even such an experiment yielded no results.

The essence of the experiment was to take the maximum possible value of the double type and multiply it by 2 to take the intermediate result beyond the permissible range of the double type. Then, I divided the obtained value by 4, and the final result was saved back into a double variable. In theory, I should have gotten some more meaningful result, but in all situations, I ended up with Infinity. In general, I haven't found a single reproducible example on the entire internet (even as of 2024!) that would show different results with and without the use of strictfp. Is it really possible that in almost 30 years of the language's history, there isn't a single example on this topic that clearly demonstrates the difference?

P.S. I'm well aware of Java 17+. All experiments were conducted on earlier versions, where the difference should, in theory, be observable. I installed Java SE 1.3 on the virtual machine.

Solution

Understanding `strictfp` in Java: A Deep Dive Into JVM Behavior

If you’ve ever worked with floating-point arithmetic in Java, you may have come across the strictfp keyword. It guarantees platform-independent results by strictly adhering to the IEEE 754 floating-point standard. But how does it actually work under the hood? In this post, I’ll walk you through my detailed exploration of strictfp, including examples, assembly code, and insights into the JVM’s behavior on different architectures.

This is not just theoretical – I spent a significant amount of time analyzing the output of a 32-bit JVM on x86 processors, including disassembled JIT-compiled code. This might be one of the few hands-on explanations you’ll find, showcasing real examples of how strictfp affects floating-point calculations.

What Is `strictfp`?

Floating-point types (float and double) in Java are governed by the IEEE 754 standard. The Java Language Specification (JLS §4.2.3) (link) defines two standard value sets for floating-point numbers:

float value set (binary32)
double value set (binary64)

In addition to these, the JVM may support extended-exponent value sets:

float-extended-exponent
double-extended-exponent

Key Differences Between `strictfp` and Default Behavior:

Without strictfp: The JVM can use extended precision for intermediate calculations. For example, on x86 processors, it may use 80-bit floating-point registers. This can lead to platform-specific results due to differences in rounding and precision.
With strictfp: All intermediate calculations are confined to the binary32 (float) or binary64 (double) value sets, ensuring consistency across platforms.

The Experiment: How Does `strictfp` Affect Results?

To explore the effects of strictfp, I tested two examples illustrating overflow and underflow behavior on an x86 processor using a 32-bit JVM. These examples demonstrate how intermediate results behave differently with and without strictfp.

Why Local Variables Were Used Instead of Compile-Time Constants

It’s important to highlight that local variables were deliberately used instead of compile-time constants. This decision was crucial for ensuring that calculations were performed at runtime rather than being optimized away by the compiler.

If compile-time constants (e.g., System.out.println(Double.MIN_VALUE / 2 * 4);) were used directly, the Java compiler would likely compute the result at compile time. During this process, the compiler adheres strictly to the IEEE 754 standard, enforcing binary32 or binary64 precision for intermediate results. This means the calculations would effectively mimic the behavior of strictfp, regardless of whether the modifier is present or not.

By introducing local variables, we force the JVM to defer the computation to runtime. This runtime calculation allows us to observe the effects of extended precision (80-bit x87 registers) or strict IEEE 754 conformance in real-time, as influenced by the presence or absence of the strictfp modifier. Without this approach, the experimental results would not reflect the differences we’re trying to illustrate.

Example 1: Underflow Behavior

public class StrictTest {
    public static void main(String[] args) {
        double secondOperand = 2;
        double thirdOperand = 4;

        System.out.println(Double.MIN_VALUE / secondOperand * thirdOperand);
    }
}

Results:

Without strictfp:

Extended precision (80-bit x87 registers) avoids underflow, preserving the intermediate result:
```
1.0E-323
```
With strictfp:

Intermediate calculations adhere to binary64 precision, causing underflow:
```
0.0
```

Example 2: Overflow Behavior

public class StrictTest {
    public static void main(String[] args) {
        double secondOperand = 2;
        double thirdOperand = 4;

        System.out.println(Double.MAX_VALUE * secondOperand / thirdOperand);
    }
}

Results:

Without strictfp:

Extended precision allows the intermediate result to fit within the 80-bit range, avoiding immediate overflow:
```
8.988465674311579E307
```
With strictfp:

Calculations confined to binary64 precision result in an overflow to positive infinity:
```
Infinity
```

Key Insight:

The use of local variables ensured that these calculations occurred at runtime, allowing us to capture the runtime differences between strictfp and non-strictfp behavior. If compile-time constants had been used, the compiler would have optimized the calculations based on strict IEEE 754 conformance, negating the ability to observe the effects of extended precision on intermediate results. This distinction is critical for reproducibility and understanding the nuances of strictfp.

What Happens Under the Hood?

Using a disassembler (hsdis), I examined the assembly code generated by the JVM to understand how calculations are performed. The goal was to observe how the strictfp modifier impacts floating-point operations at the machine code level.

JVM Options

To replicate the results, the following JVM options were used:

-server -Xcomp -XX:UseSSE=0 -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:CompileCommand=compileonly,StrictTest.main

For the minimal setup required to observe differences, use:

-Xcomp -XX:UseSSE=0

Why These Options Are Necessary

-Xcomp: This option forces the JVM to compile all methods using the Just-In-Time (JIT) compiler immediately. It is mandatory in this experiment because:
- Without -Xcomp, or when using -Xint (interpreted mode), the methods might not be compiled, and the JVM will execute them in interpreted mode. This results in no JIT-compiled assembly output, which is essential for the disassembler (hsdis) to provide meaningful results.
- In interpreted mode, floating-point operations would rely entirely on the bytecode interpreter, making it impossible to observe the low-level differences caused by strictfp.
-XX:UseSSE=0: This disables the use of Streaming SIMD Extensions (SSE) instructions for floating-point operations. Instead, the JVM falls back to the x87 FPU instructions, which utilize 80-bit extended precision registers. This option was critical because:
- By default, modern JVMs on x86 use SSE instructions for floating-point operations, which comply with IEEE 754 by default and do not use extended precision. As a result, there would be no observable difference in behavior with or without strictfp.
- Disabling SSE ensures that the JVM uses x87 FPU instructions, where intermediate results can utilize 80-bit extended precision unless constrained by strictfp. This allows us to demonstrate the impact of strictfp effectively.
-XX:+PrintAssembly: This option outputs the generated assembly code for the compiled methods. Combined with hsdis, it allows for precise observation of how floating-point calculations are executed at the machine level.
-XX:+CompileCommand=compileonly,StrictTest.main: This restricts compilation to the specific method under investigation (StrictTest.main), reducing noise in the assembly output.

By combining these options, the experiment isolates the floating-point operations affected by strictfp and ensures that the results are observable at the assembly level. Without this configuration, the differences introduced by strictfp would remain hidden, or the disassembly would lack the necessary precision.

Assembly Analysis: Without `strictfp`

Here’s the disassembly output when running the underflow example without the strictfp modifier:

0x02f52326: fldl    0x2f522c0   ; Load Double.MIN_VALUE
0x02f5232c: fdivl   0x2f522c8   ; Divide by secondOperand (2.0)
0x02f52332: fmull   0x2f522d0   ; Multiply by thirdOperand (4.0)
0x02f52338: fstpl   (%esp)      ; Store the result for printing

Explanation:

The JVM uses 80-bit extended precision for intermediate calculations, preserving the value beyond the IEEE 754 binary64 precision. As a result, underflow is avoided, and the intermediate result is preserved:
```
Result: 1.0E-323
```

Assembly Analysis: With `strictfp`

When the strictfp modifier is applied, the disassembly for the underflow example includes additional type conversion steps to enforce strict adherence to binary64 precision:

0x02fe2306: fldl    0x2fe22a0   ; Load Double.MIN_VALUE
0x02fe230c: fldt    0x6f4c40a4  ; Extended load
0x02fe2312: fmulp   %st(1)      ; Multiply and store in st(1)
0x02fe2314: fdivl   0x2fe22a8   ; Divide by secondOperand (2.0)
0x02fe231a: fldt    0x6f4c40b0  ; Extended load
0x02fe2320: fmulp   %st(1)      ; Multiply and store in st(1)
0x02fe2322: fstpl   0x18(%esp)  ; Store intermediate result
0x02fe2326: fldl    0x18(%esp)  ; Reload and enforce binary64 rounding
0x02fe232a: fldt    0x6f4c40a4  ; Extended load
0x02fe2330: fmulp   %st(1)      ; Multiply again
0x02fe2332: fmull   0x2fe22b0   ; Multiply by thirdOperand (4.0)
0x02fe2338: fldt    0x6f4c40b0  ; Extended load
0x02fe233e: fmulp   %st(1)      ; Multiply and store in st(1)
0x02fe2340: fstpl   0x20(%esp)  ; Final result stored

Explanation:

The key difference lies in the intermediate rounding and type conversion steps (e.g., fstpl followed by fldl). This forces compliance with the binary64 value set, leading to underflow:
```
Result: 0.0
```

Behavior on Modern 64-Bit JVMs

On modern 64-bit JVMs, the behavior is fundamentally different from 32-bit JVMs due to architectural and implementation changes. Extended precision (80-bit x87 floating-point registers) is not utilized, even when SIMD (SSE or AVX) is explicitly disabled via JVM options. Instead:

Relying on Native Implementations: Calculations appear to rely on native libraries or other internal JVM mechanisms for processing floating-point arithmetic. This can be inferred from the runtime call observed in the disassembled assembly code:
```
0x00000230aeae7e13: callq        0x230aea25820  ; OopMap{off=24}
                                          ;*getstatic out
                                          ; - StrictTest::main@8 (line 6)
                                          ;   {runtime_call}
```
This instruction indicates that instead of performing the floating-point calculation directly via hardware registers, the JVM delegates it to a runtime component. This component likely ensures that intermediate results conform to the binary64 (double) precision standard.
Disabling SSE and AVX Has No Effect: When using the -XX:UseSSE=0 and -XX:UseAVX=0 flags, one might expect the JVM to fall back to utilizing x87 80-bit FPU registers for floating-point operations. However, the runtime behavior remains unchanged, and x87 registers are not employed. Even the additional flag -XX:+UseFPUForSpilling, which should theoretically allow spilling intermediate results to x87 FPU registers, has no noticeable effect on the 64-bit JVM.
Intermediate Results Conform to Binary64 Rules: Regardless of the absence of strictfp, intermediate floating-point calculations adhere to IEEE 754 binary64 standards. This behavior ensures consistent results, simplifying cross-platform development. However, it also means that the potential benefits of extended precision for intermediate calculations (e.g., reducing rounding errors) are not available.
Internal Handling of Floating-Point Arithmetic: The reliance on a runtime component, as indicated by the disassembled code, suggests that floating-point calculations in a 64-bit JVM are heavily abstracted. This aligns with the broader trend of modern JVMs to use platform-independent mechanisms for floating-point arithmetic, reducing reliance on specific hardware features.

Observed Assembly Code

The following disassembled output demonstrates the runtime call used for floating-point calculations on a 64-bit JVM:

0x00000230aeae7e13: callq        0x230aea25820  ; OopMap{off=24}
                                              ;*getstatic out
                                              ; - StrictTest::main@8 (line 6)
                                              ;   {runtime_call}

This instruction explicitly calls into a runtime function for handling floating-point operations, bypassing hardware-level x87 or SIMD (SSE/AVX) capabilities.

Implications

While the strictfp modifier remains important for ensuring cross-platform consistency, its significance is diminished on 64-bit JVMs due to the inherent adherence of intermediate calculations to binary64 standards. This behavior is consistent even when hardware optimizations (like SSE or AVX) are disabled, and no fallback to x87 FPU registers occurs.

This architectural design underscores the JVM's emphasis on platform independence, even at the cost of foregoing hardware-specific optimizations for extended precision.

Diving Into the Java Language Specification

The JLS §4.2.3 (link) provides detailed insights into floating-point value sets. Here are the key points:

Value Sets:
- float and double value sets (binary32, binary64).
- Extended-exponent value sets (broader range of exponents, same precision).
Compliance:
- All JVM implementations must support float and double value sets.
- Extended-exponent value sets are optional but may be used for intermediate results unless restricted by strictfp.

Quote From the JLS:

"The float, float-extended-exponent, double, and double-extended-exponent value sets are not types. It is always correct for an implementation of the Java programming language to use an element of the float value set to represent a value of type float; however, it may be permissible in certain regions of code for an implementation to use an element of the float-extended-exponent value set instead."

System Configuration

Here’s my setup for these experiments:

Processor: Intel Core i7-2960XM Extreme Edition
OS: Windows 10 Enterprise 22H2
JVM: Oracle OpenJDK 1.8.0_431 (32-bit) with hsdis installed.

Notes on Potential Variability

These experiments were conducted exclusively on an x86-64 processor architecture. Results may differ on other architectures (e.g., ARM64), operating systems, or JVM versions/vendors. This variability arises from the differences in how specific architectures and JVM implementations handle floating-point arithmetic and their internal optimizations.

Several factors that could influence results include:

Bytecode Compiler Optimizations: The Java compiler may optimize code differently depending on the runtime context or specific constructs used.
JVM Implementation Details: The behavior may vary based on the JVM vendor or version due to differences in policies around extended-exponent value set support and floating-point arithmetic handling.
OS and Hardware Optimizations: Operating systems and processor microarchitectures may influence how low-level instructions are executed, potentially affecting intermediate results.
JVM Flags: The specific flags used to launch the JVM can have a substantial impact on how calculations are handled. For instance, options like -XX:UseSSE or -XX:+UseFPUForSpilling directly alter the floating-point arithmetic behavior.

Understanding these dependencies is crucial for accurately interpreting experimental results and for reproducing the behavior across different environments.

Compatibility with Older JVM Versions

This analysis extends beyond the JVM versions explicitly mentioned in the earlier sections. I successfully reproduced the observed behavior on 32-bit JVMs starting from J2SE 1.4. Notably, these results were achieved on the Java HotSpot™ Client VM (version 1.4.2_18), which predates the widespread adoption of the SSE instruction set for floating-point calculations.

Key Findings on J2SE 1.4:

Critical Role of the -Xcomp Flag:
- The -Xcomp flag is essential for achieving the desired results on J2SE 1.4. Without this flag, the JVM operates in interpreted mode or mixed mode, which prevents the Just-In-Time (JIT) compiler from generating the assembly-level output necessary for observing the behavior of floating-point operations.
- Enabling -Xcomp ensures that all methods, including those under test, are compiled immediately, exposing the differences in intermediate precision with and without strictfp.
No Need for -XX:UseSSE=0:
- Unlike modern JVMs, the -XX:UseSSE=0 flag is not recognized in J2SE 1.4. This is likely because, during that era, the SSE instruction set was either not fully utilized or had minimal integration into JVM implementations.
- Despite the absence of this flag, the behavior is consistent with what was observed on more recent 32-bit JVMs using x87 FPU instructions, further confirming the reliance on 80-bit extended precision for intermediate floating-point calculations.
Reproducibility on HotSpot-Based JVMs:
- The experiments were conducted on a system running the following configuration:
```
Processor: Intel Core i7-2960XM Extreme Edition
JVM: Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_18-b06)
```
- Results were reproducible, confirming that HotSpot-based JVMs consistently exhibit this behavior when strictfp is absent, provided that the computation is deferred to runtime (e.g., using local variables instead of compile-time constants).

Broader Implications:

These findings reinforce the idea that the behavior described in this post is not exclusive to modern JVM versions. Instead, it aligns with a long-standing design choice in the HotSpot VM to leverage x87 FPU instructions for floating-point arithmetic on 32-bit architectures. This historical consistency ensures that users can reproduce these experiments across various JVM versions, provided that they use the correct configuration and flags (notably, -Xcomp).

This compatibility further emphasizes the importance of understanding both the historical evolution of JVM implementations and the subtle ways in which flags and internal mechanisms influence runtime behavior.

Final Thoughts

This exploration demonstrates the nuanced behavior of strictfp and its impact on floating-point calculations in Java. The examples provided offer a rare glimpse into how intermediate precision is handled by the JVM, supported by real assembly output. By understanding these details, you can make informed decisions about when to use strictfp in your code.

P.S.

Starting from Java SE 17, the strictfp modifier is redundant as strict IEEE 754 adherence became the default and only mode of operation in the JVM.

Update (November 23, 2024): Revisiting How Extended-Exponent Value Sets Are Activated

After a series of additional experiments and thorough analysis, I have reached an important new conclusion about the conditions under which extended-exponent value sets can be utilized. Previously, I claimed that using the -Xcomp flag was mandatory for achieving this behavior on 32-bit JVMs. However, further testing revealed that my earlier understanding was incomplete. Below, I present the refined insights, supported by new experimental evidence and practical examples.

JVM Execution Modes: A Crucial Context

The JVM can operate in three primary execution modes, and understanding these is key to replicating the behavior:

Interpretation Mode (-Xint): All code is executed by the bytecode interpreter. No JIT compilation occurs. In this mode, extended-exponent value sets cannot be used, as the interpreter enforces strict rounding of all intermediate results to either binary32 or binary64, depending on the expected result type.
Compilation Mode (-Xcomp): All code is eagerly compiled by the JIT compiler, bypassing the interpreter entirely. This mode reliably activates extended-exponent value sets for floating-point calculations, as JIT-compiled machine code utilizes the x87 FPU instructions (for 32-bit JVMs).
Mixed Mode (default): Combines interpretation and JIT compilation. Code is initially interpreted, but frequently executed or "hot" code is compiled by the JIT compiler as needed. In this mode, results vary depending on whether a specific block of code is interpreted or compiled.

Key Discovery: JIT Compilation Is the Real Enabler

The earlier assumption that -Xcomp was mandatory stemmed from the fact that it guarantees JIT compilation of all methods. However, my latest findings suggest that it is not the flag itself, but the use of JIT compilation that enables extended-exponent value sets. In mixed mode, it is possible to achieve the same results by ensuring that the relevant code is compiled. Here’s how:

By introducing a high number of iterations for the code block in question, the JVM's built-in heuristics classify it as "hot," triggering JIT compilation.
Once compiled, the JIT-generated machine code leverages the x87 FPU instructions, enabling the use of extended-exponent value sets.

Example: Forcing JIT Compilation Without -Xcomp

The following code demonstrates this principle:

public class StrictTest {
    public static void main(String[] args) {
        double result = 0.0;

        for (int i = 0; i < 1000000; i++) { 
            double secondOperand = 2.0;
            double thirdOperand = 4.0;

            result = Double.MIN_VALUE / secondOperand * thirdOperand;
        }

        System.out.println(result);
    }
}

Here, the repeated execution (1,000,000 iterations) ensures that the loop is compiled by the JIT compiler in mixed mode. As a result, the intermediate calculation avoids underflow, yielding the following output:

1.0E-323

This behavior is identical to what was observed with -Xcomp. It confirms that JIT compilation, not the mode flag, is the crucial factor for enabling extended-exponent calculations.

Historical Compatibility: Testing on Earlier JVM Versions

The extended-exponent value set has been supported since J2SE 1.2, aligning with the introduction of IEEE 754 compliance. Testing across various 32-bit JVM versions revealed the following:

Classic VM (J2SE 1.2–1.3):
- Classic VM (e.g., java version "1.2.2") already supports extended-exponent calculations when JIT compilation is enabled via the symcjit compiler.
- Results are consistent with later HotSpot versions when the same conditions are met.
HotSpot VM (J2SE 1.4 and beyond):
- The introduction of HotSpot VM in J2SE 1.3 as an add-on (and as the default VM in J2SE 1.4) solidified this behavior.
- On J2SE 1.4 and later versions, results were identical across all 32-bit JVMs, confirming that the reliance on x87 FPU instructions remained unchanged.
32-bit JVMs (up to Java SE 9):
- This behavior persisted until Java SE 9, the last version to offer 32-bit JVMs. Beyond this, 32-bit JVM support was deprecated.
64-bit JVMs:
- Extended-exponent value sets are not available on 64-bit JVMs. Testing on J2SE 5.0 and later confirmed that these JVMs adhere strictly to binary64 precision for all intermediate calculations, regardless of flags.

Important Observations on JVM Flags and Versions

Early JVMs (J2SE 1.2–1.5):

The -XX:UseSSE=0 flag is unnecessary and unrecognized in 32-bit JVMs during this period, as SSE instructions were either not utilized or minimally integrated.
Notably, in J2SE 5.0, the -XX:UseSSE=N flag is available exclusively in 64-bit JVMs. In the corresponding 32-bit version, this flag is not supported, as 32-bit JVMs in this era relied solely on x87 FPU instructions for floating-point calculations.
Results for 32-bit JVMs align with x87 FPU usage by default.

JVMs Starting From Java SE 6:

The -XX:UseSSE=0 flag becomes mandatory in 32-bit JVMs to explicitly disable SSE instructions and enable x87 FPU behavior. Without this flag, calculations default to SSE-based precision, resulting in strict binary64 adherence.

64-bit JVMs:

Disabling SSE via -XX:UseSSE=0 has no effect in 64-bit JVMs across all versions. Intermediate results remain confined to binary64, as x87 FPU registers are not utilized.

Broader Implications

This refined understanding clarifies several points about JVM behavior:

Extended-exponent value sets rely on the x87 FPU, which is only available in 32-bit JVMs.
JIT compilation is the critical enabler for accessing this behavior. Without it, the bytecode interpreter enforces strict rounding to binary32 or binary64.
The -Xcomp flag is helpful but not mandatory, provided the relevant code is compiled by the JIT in mixed mode.

Updated Testing Results

I successfully reproduced the behavior across all tested 32-bit JVM versions, from J2SE 1.2 to Java SE 9, provided that JIT compilation was enabled. The table below summarizes the results:

JVM Version          Architecture   Behavior   Notes
-------------------- -------------- ---------- -----------------------------------------
J2SE 1.2.2 (Classic) 32-bit         Success    Enabled by symcjit; no SSE support.
J2SE 1.4 (HotSpot)   32-bit         Success    Default behavior with JIT compilation.
Java SE 6 (HotSpot)  32-bit         Success    Requires -XX:UseSSE=0 to disable SSE.
Java SE 9 (HotSpot)  32-bit         Success    Last version supporting 32-bit architecture.
J2SE 5.0–Java SE 16  64-bit         Failure    x87 FPU not utilized; no extended precision.

Final Thoughts

This update reinforces the nuanced relationship between JVM internals and extended-exponent value sets. By ensuring JIT compilation, it is possible to activate this behavior on 32-bit JVMs across a wide range of versions. This finding highlights the importance of understanding how different execution modes and JVM implementations interact with floating-point arithmetic.

For anyone exploring this area, I recommend replicating the tests with and without -Xcomp and experimenting with "hot code" to better understand the role of JIT compilation in this process.