I know this question might seem overly familiar to the community, but I swear I've never been able to reproduce the issue related to this question even once throughout my programming journey.
I understand what the strictfp
modifier does and how it ensures full compliance with the IEEE754 standard. However, I've never encountered a situation in practice where the set of values with an extended exponent is used, as described in the official specification.
I've tried using options like -XX:+UseFPUForSpilling
to stimulate the use of the FPU block for calculations on my relatively modern processor, but it had no effect.
I even went as far as installing Windows 98 SE on a virtual machine and emulating an Intel Pentium II processor through Bochs, which does not support the SSE instruction set, hoping that the use of the FPU block in this case would be virtually the only option. However, even such an experiment yielded no results.
The essence of the experiment was to take the maximum possible value of the double
type and multiply it by 2 to take the intermediate result beyond the permissible range of the double
type. Then, I divided the obtained value by 4, and the final result was saved back into a double
variable. In theory, I should have gotten some more meaningful result, but in all situations, I ended up with Infinity
. In general, I haven't found a single reproducible example on the entire internet (even as of 2024!) that would show different results with and without the use of strictfp
. Is it really possible that in almost 30 years of the language's history, there isn't a single example on this topic that clearly demonstrates the difference?
P.S. I'm well aware of Java 17+. All experiments were conducted on earlier versions, where the difference should, in theory, be observable. I installed Java SE 1.3 on the virtual machine.
strictfp
in Java: A Deep Dive Into JVM BehaviorIf you’ve ever worked with floating-point arithmetic in Java, you may have come across the strictfp
keyword. It guarantees platform-independent results by strictly adhering to the IEEE 754 floating-point standard. But how does it actually work under the hood? In this post, I’ll walk you through my detailed exploration of strictfp
, including examples, assembly code, and insights into the JVM’s behavior on different architectures.
This is not just theoretical – I spent a significant amount of time analyzing the output of a 32-bit JVM on x86 processors, including disassembled JIT-compiled code. This might be one of the few hands-on explanations you’ll find, showcasing real examples of how strictfp
affects floating-point calculations.
strictfp
?Floating-point types (float
and double
) in Java are governed by the IEEE 754 standard. The Java Language Specification (JLS §4.2.3) (link) defines two standard value sets for floating-point numbers:
In addition to these, the JVM may support extended-exponent value sets:
strictfp
and Default Behavior:strictfp
: The JVM can use extended precision for intermediate calculations. For example, on x86 processors, it may use 80-bit floating-point registers. This can lead to platform-specific results due to differences in rounding and precision.strictfp
: All intermediate calculations are confined to the binary32 (float) or binary64 (double) value sets, ensuring consistency across platforms.strictfp
Affect Results?To explore the effects of strictfp
, I tested two examples illustrating overflow and underflow behavior on an x86 processor using a 32-bit JVM. These examples demonstrate how intermediate results behave differently with and without strictfp
.
It’s important to highlight that local variables were deliberately used instead of compile-time constants. This decision was crucial for ensuring that calculations were performed at runtime rather than being optimized away by the compiler.
If compile-time constants (e.g., System.out.println(Double.MIN_VALUE / 2 * 4);
) were used directly, the Java compiler would likely compute the result at compile time. During this process, the compiler adheres strictly to the IEEE 754 standard, enforcing binary32
or binary64
precision for intermediate results. This means the calculations would effectively mimic the behavior of strictfp
, regardless of whether the modifier is present or not.
By introducing local variables, we force the JVM to defer the computation to runtime. This runtime calculation allows us to observe the effects of extended precision (80-bit x87 registers) or strict IEEE 754 conformance in real-time, as influenced by the presence or absence of the strictfp
modifier. Without this approach, the experimental results would not reflect the differences we’re trying to illustrate.
public class StrictTest {
public static void main(String[] args) {
double secondOperand = 2;
double thirdOperand = 4;
System.out.println(Double.MIN_VALUE / secondOperand * thirdOperand);
}
}
Results:
Without strictfp
:
Extended precision (80-bit x87 registers) avoids underflow, preserving the intermediate result:
1.0E-323
With strictfp
:
Intermediate calculations adhere to binary64 precision, causing underflow:
0.0
public class StrictTest {
public static void main(String[] args) {
double secondOperand = 2;
double thirdOperand = 4;
System.out.println(Double.MAX_VALUE * secondOperand / thirdOperand);
}
}
Results:
Without strictfp
:
Extended precision allows the intermediate result to fit within the 80-bit range, avoiding immediate overflow:
8.988465674311579E307
With strictfp
:
Calculations confined to binary64 precision result in an overflow to positive infinity:
Infinity
The use of local variables ensured that these calculations occurred at runtime, allowing us to capture the runtime differences between strictfp
and non-strictfp
behavior. If compile-time constants had been used, the compiler would have optimized the calculations based on strict IEEE 754 conformance, negating the ability to observe the effects of extended precision on intermediate results. This distinction is critical for reproducibility and understanding the nuances of strictfp
.
Using a disassembler (hsdis
), I examined the assembly code generated by the JVM to understand how calculations are performed. The goal was to observe how the strictfp
modifier impacts floating-point operations at the machine code level.
To replicate the results, the following JVM options were used:
-server -Xcomp -XX:UseSSE=0 -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly -XX:CompileCommand=compileonly,StrictTest.main
For the minimal setup required to observe differences, use:
-Xcomp -XX:UseSSE=0
-Xcomp
: This option forces the JVM to compile all methods using the Just-In-Time (JIT) compiler immediately. It is mandatory in this experiment because:
-Xcomp
, or when using -Xint
(interpreted mode), the methods might not be compiled, and the JVM will execute them in interpreted mode. This results in no JIT-compiled assembly output, which is essential for the disassembler (hsdis
) to provide meaningful results.strictfp
.-XX:UseSSE=0
: This disables the use of Streaming SIMD Extensions (SSE) instructions for floating-point operations. Instead, the JVM falls back to the x87 FPU instructions, which utilize 80-bit extended precision registers. This option was critical because:
strictfp
.strictfp
. This allows us to demonstrate the impact of strictfp
effectively.-XX:+PrintAssembly
: This option outputs the generated assembly code for the compiled methods. Combined with hsdis
, it allows for precise observation of how floating-point calculations are executed at the machine level.-XX:+CompileCommand=compileonly,StrictTest.main
: This restricts compilation to the specific method under investigation (StrictTest.main
), reducing noise in the assembly output.By combining these options, the experiment isolates the floating-point operations affected by strictfp
and ensures that the results are observable at the assembly level. Without this configuration, the differences introduced by strictfp
would remain hidden, or the disassembly would lack the necessary precision.
strictfp
Here’s the disassembly output when running the underflow example without the strictfp modifier:
0x02f52326: fldl 0x2f522c0 ; Load Double.MIN_VALUE
0x02f5232c: fdivl 0x2f522c8 ; Divide by secondOperand (2.0)
0x02f52332: fmull 0x2f522d0 ; Multiply by thirdOperand (4.0)
0x02f52338: fstpl (%esp) ; Store the result for printing
Explanation:
The JVM uses 80-bit extended precision for intermediate calculations, preserving the value beyond the IEEE 754 binary64 precision. As a result, underflow is avoided, and the intermediate result is preserved:
Result: 1.0E-323
strictfp
When the strictfp
modifier is applied, the disassembly for the underflow example includes additional type conversion steps to enforce strict adherence to binary64 precision:
0x02fe2306: fldl 0x2fe22a0 ; Load Double.MIN_VALUE
0x02fe230c: fldt 0x6f4c40a4 ; Extended load
0x02fe2312: fmulp %st(1) ; Multiply and store in st(1)
0x02fe2314: fdivl 0x2fe22a8 ; Divide by secondOperand (2.0)
0x02fe231a: fldt 0x6f4c40b0 ; Extended load
0x02fe2320: fmulp %st(1) ; Multiply and store in st(1)
0x02fe2322: fstpl 0x18(%esp) ; Store intermediate result
0x02fe2326: fldl 0x18(%esp) ; Reload and enforce binary64 rounding
0x02fe232a: fldt 0x6f4c40a4 ; Extended load
0x02fe2330: fmulp %st(1) ; Multiply again
0x02fe2332: fmull 0x2fe22b0 ; Multiply by thirdOperand (4.0)
0x02fe2338: fldt 0x6f4c40b0 ; Extended load
0x02fe233e: fmulp %st(1) ; Multiply and store in st(1)
0x02fe2340: fstpl 0x20(%esp) ; Final result stored
Explanation:
The key difference lies in the intermediate rounding and type conversion steps (e.g., fstpl
followed by fldl
). This forces compliance with the binary64 value set, leading to underflow:
Result: 0.0
On modern 64-bit JVMs, the behavior is fundamentally different from 32-bit JVMs due to architectural and implementation changes. Extended precision (80-bit x87 floating-point registers) is not utilized, even when SIMD (SSE or AVX) is explicitly disabled via JVM options. Instead:
Relying on Native Implementations: Calculations appear to rely on native libraries or other internal JVM mechanisms for processing floating-point arithmetic. This can be inferred from the runtime call observed in the disassembled assembly code:
0x00000230aeae7e13: callq 0x230aea25820 ; OopMap{off=24}
;*getstatic out
; - StrictTest::main@8 (line 6)
; {runtime_call}
This instruction indicates that instead of performing the floating-point calculation directly via hardware registers, the JVM delegates it to a runtime component. This component likely ensures that intermediate results conform to the binary64 (double) precision standard.
Disabling SSE and AVX Has No Effect: When using the -XX:UseSSE=0
and -XX:UseAVX=0
flags, one might expect the JVM to fall back to utilizing x87 80-bit FPU registers for floating-point operations. However, the runtime behavior remains unchanged, and x87 registers are not employed. Even the additional flag -XX:+UseFPUForSpilling
, which should theoretically allow spilling intermediate results to x87 FPU registers, has no noticeable effect on the 64-bit JVM.
Intermediate Results Conform to Binary64 Rules: Regardless of the absence of strictfp
, intermediate floating-point calculations adhere to IEEE 754 binary64 standards. This behavior ensures consistent results, simplifying cross-platform development. However, it also means that the potential benefits of extended precision for intermediate calculations (e.g., reducing rounding errors) are not available.
Internal Handling of Floating-Point Arithmetic: The reliance on a runtime component, as indicated by the disassembled code, suggests that floating-point calculations in a 64-bit JVM are heavily abstracted. This aligns with the broader trend of modern JVMs to use platform-independent mechanisms for floating-point arithmetic, reducing reliance on specific hardware features.
The following disassembled output demonstrates the runtime call used for floating-point calculations on a 64-bit JVM:
0x00000230aeae7e13: callq 0x230aea25820 ; OopMap{off=24}
;*getstatic out
; - StrictTest::main@8 (line 6)
; {runtime_call}
This instruction explicitly calls into a runtime function for handling floating-point operations, bypassing hardware-level x87 or SIMD (SSE/AVX) capabilities.
While the strictfp
modifier remains important for ensuring cross-platform consistency, its significance is diminished on 64-bit JVMs due to the inherent adherence of intermediate calculations to binary64 standards. This behavior is consistent even when hardware optimizations (like SSE or AVX) are disabled, and no fallback to x87 FPU registers occurs.
This architectural design underscores the JVM's emphasis on platform independence, even at the cost of foregoing hardware-specific optimizations for extended precision.
The JLS §4.2.3 (link) provides detailed insights into floating-point value sets. Here are the key points:
float
and double
value sets (binary32, binary64).float
and double
value sets.strictfp
."The float, float-extended-exponent, double, and double-extended-exponent value sets are not types. It is always correct for an implementation of the Java programming language to use an element of the float value set to represent a value of type float; however, it may be permissible in certain regions of code for an implementation to use an element of the float-extended-exponent value set instead."
Here’s my setup for these experiments:
hsdis
installed.Notes on Potential Variability
These experiments were conducted exclusively on an x86-64 processor architecture. Results may differ on other architectures (e.g., ARM64), operating systems, or JVM versions/vendors. This variability arises from the differences in how specific architectures and JVM implementations handle floating-point arithmetic and their internal optimizations.
Several factors that could influence results include:
Bytecode Compiler Optimizations: The Java compiler may optimize code differently depending on the runtime context or specific constructs used.
JVM Implementation Details: The behavior may vary based on the JVM vendor or version due to differences in policies around extended-exponent value set support and floating-point arithmetic handling.
OS and Hardware Optimizations: Operating systems and processor microarchitectures may influence how low-level instructions are executed, potentially affecting intermediate results.
JVM Flags: The specific flags used to launch the JVM can have a substantial impact on how calculations are handled. For instance, options like -XX:UseSSE
or -XX:+UseFPUForSpilling
directly alter the floating-point arithmetic behavior.
Understanding these dependencies is crucial for accurately interpreting experimental results and for reproducing the behavior across different environments.
This analysis extends beyond the JVM versions explicitly mentioned in the earlier sections. I successfully reproduced the observed behavior on 32-bit JVMs starting from J2SE 1.4. Notably, these results were achieved on the Java HotSpot™ Client VM (version 1.4.2_18
), which predates the widespread adoption of the SSE instruction set for floating-point calculations.
Key Findings on J2SE 1.4:
Critical Role of the -Xcomp
Flag:
-Xcomp
flag is essential for achieving the desired results on J2SE 1.4. Without this flag, the JVM operates in interpreted mode or mixed mode, which prevents the Just-In-Time (JIT) compiler from generating the assembly-level output necessary for observing the behavior of floating-point operations.-Xcomp
ensures that all methods, including those under test, are compiled immediately, exposing the differences in intermediate precision with and without strictfp
.No Need for -XX:UseSSE=0
:
-XX:UseSSE=0
flag is not recognized in J2SE 1.4. This is likely because, during that era, the SSE instruction set was either not fully utilized or had minimal integration into JVM implementations.Reproducibility on HotSpot-Based JVMs:
Processor: Intel Core i7-2960XM Extreme Edition
JVM: Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_18-b06)
Broader Implications:
These findings reinforce the idea that the behavior described in this post is not exclusive to modern JVM versions. Instead, it aligns with a long-standing design choice in the HotSpot VM to leverage x87 FPU instructions for floating-point arithmetic on 32-bit architectures. This historical consistency ensures that users can reproduce these experiments across various JVM versions, provided that they use the correct configuration and flags (notably, -Xcomp
).
This compatibility further emphasizes the importance of understanding both the historical evolution of JVM implementations and the subtle ways in which flags and internal mechanisms influence runtime behavior.
This exploration demonstrates the nuanced behavior of strictfp
and its impact on floating-point calculations in Java. The examples provided offer a rare glimpse into how intermediate precision is handled by the JVM, supported by real assembly output. By understanding these details, you can make informed decisions about when to use strictfp
in your code.
Starting from Java SE 17, the strictfp
modifier is redundant as strict IEEE 754 adherence became the default and only mode of operation in the JVM.
After a series of additional experiments and thorough analysis, I have reached an important new conclusion about the conditions under which extended-exponent value sets can be utilized. Previously, I claimed that using the -Xcomp
flag was mandatory for achieving this behavior on 32-bit JVMs. However, further testing revealed that my earlier understanding was incomplete. Below, I present the refined insights, supported by new experimental evidence and practical examples.
JVM Execution Modes: A Crucial Context
The JVM can operate in three primary execution modes, and understanding these is key to replicating the behavior:
-Xint
):
All code is executed by the bytecode interpreter. No JIT compilation occurs. In this mode, extended-exponent value sets cannot be used, as the interpreter enforces strict rounding of all intermediate results to either binary32
or binary64
, depending on the expected result type.-Xcomp
):
All code is eagerly compiled by the JIT compiler, bypassing the interpreter entirely. This mode reliably activates extended-exponent value sets for floating-point calculations, as JIT-compiled machine code utilizes the x87 FPU instructions (for 32-bit JVMs).Key Discovery: JIT Compilation Is the Real Enabler
The earlier assumption that -Xcomp
was mandatory stemmed from the fact that it guarantees JIT compilation of all methods. However, my latest findings suggest that it is not the flag itself, but the use of JIT compilation that enables extended-exponent value sets. In mixed mode, it is possible to achieve the same results by ensuring that the relevant code is compiled. Here’s how:
Example: Forcing JIT Compilation Without -Xcomp
The following code demonstrates this principle:
public class StrictTest {
public static void main(String[] args) {
double result = 0.0;
for (int i = 0; i < 1000000; i++) {
double secondOperand = 2.0;
double thirdOperand = 4.0;
result = Double.MIN_VALUE / secondOperand * thirdOperand;
}
System.out.println(result);
}
}
Here, the repeated execution (1,000,000 iterations) ensures that the loop is compiled by the JIT compiler in mixed mode. As a result, the intermediate calculation avoids underflow, yielding the following output:
1.0E-323
This behavior is identical to what was observed with -Xcomp
. It confirms that JIT compilation, not the mode flag, is the crucial factor for enabling extended-exponent calculations.
Historical Compatibility: Testing on Earlier JVM Versions
The extended-exponent value set has been supported since J2SE 1.2, aligning with the introduction of IEEE 754 compliance. Testing across various 32-bit JVM versions revealed the following:
java version "1.2.2"
) already supports extended-exponent calculations when JIT compilation is enabled via the symcjit
compiler.binary64
precision for all intermediate calculations, regardless of flags.Important Observations on JVM Flags and Versions
Early JVMs (J2SE 1.2–1.5):
-XX:UseSSE=0
flag is unnecessary and unrecognized in 32-bit JVMs during this period, as SSE instructions were either not utilized or minimally integrated.-XX:UseSSE=N
flag is available exclusively in 64-bit JVMs. In the corresponding 32-bit version, this flag is not supported, as 32-bit JVMs in this era relied solely on x87 FPU instructions for floating-point calculations.JVMs Starting From Java SE 6:
-XX:UseSSE=0
flag becomes mandatory in 32-bit JVMs to explicitly disable SSE instructions and enable x87 FPU behavior. Without this flag, calculations default to SSE-based precision, resulting in strict binary64 adherence.64-bit JVMs:
-XX:UseSSE=0
has no effect in 64-bit JVMs across all versions. Intermediate results remain confined to binary64, as x87 FPU registers are not utilized.Broader Implications
This refined understanding clarifies several points about JVM behavior:
binary32
or binary64
.-Xcomp
flag is helpful but not mandatory, provided the relevant code is compiled by the JIT in mixed mode.Updated Testing Results
I successfully reproduced the behavior across all tested 32-bit JVM versions, from J2SE 1.2 to Java SE 9, provided that JIT compilation was enabled. The table below summarizes the results:
JVM Version Architecture Behavior Notes
-------------------- -------------- ---------- -----------------------------------------
J2SE 1.2.2 (Classic) 32-bit Success Enabled by symcjit; no SSE support.
J2SE 1.4 (HotSpot) 32-bit Success Default behavior with JIT compilation.
Java SE 6 (HotSpot) 32-bit Success Requires -XX:UseSSE=0 to disable SSE.
Java SE 9 (HotSpot) 32-bit Success Last version supporting 32-bit architecture.
J2SE 5.0–Java SE 16 64-bit Failure x87 FPU not utilized; no extended precision.
Final Thoughts
This update reinforces the nuanced relationship between JVM internals and extended-exponent value sets. By ensuring JIT compilation, it is possible to activate this behavior on 32-bit JVMs across a wide range of versions. This finding highlights the importance of understanding how different execution modes and JVM implementations interact with floating-point arithmetic.
For anyone exploring this area, I recommend replicating the tests with and without -Xcomp
and experimenting with "hot code" to better understand the role of JIT compilation in this process.