I'm trying to measure the latency between my Java publisher and an (industry standard) Cpp message broker.
The broker records the time it receives each message, and for the Java publisher I'm using the following code to get microsecond-accurate timestamps (which is stamped onto the outgoing message):
private static final long NANOTIME_OFFSET;
static {
Instant instant = Instant.now(); // Get absolute time
long nanoTime = System.nanoTime(); // Get relative time
// Offset to convert from relative to absolute time
NANOTIME_OFFSET = TimeUnit.SECONDS.toNanos(instant.getEpochSecond()) + instant.getNano() - nanoTime;
}
public static long currentTimeNanos() {
return System.nanoTime() + NANOTIME_OFFSET;
}
However, I'm noticing the measured latency between publisher and broker creep up over the course of the day and doesn't come back down until I restart my Java publisher.
The latency starts off at <<1ms, and suddenly jumps to 500ms. Further jumps take it closer to the 1sec mark, which is hard to believe since both processes reside on the same machine.
Profiling with VisualVM indicates there's no resource issue in my Java process, and packet capture with Wireshark confirms the issue is with the timestamps produced by currentTimeNanos()
.
So why does System.nanoTime()
lose accuracy over the day?
I guess the correct thing to do is to always use Instant.now()
, but it's a heavier call than System.nanoTime()
(generates a new Instance
object every time), so would add more of an overhead especially with a high volume of messages.
Edit: I'd also add I went with a nanoTime()
-based clock since Instance.now()
only yielded millisecond precision on my machine
System.nanoTime
with Instant.now
The OpenJDK source code shows the System.nanoTime
implementation as:
@IntrinsicCandidate
public static native long nanoTime();
That means the method is implemented as native code.
Without digging further, we know that nanoTime
does not use the same source as Instant.now
. The Instant
class uses the date-time clock of the host OS. On conventional computer hardware (laptops, desktops, common servers) the date-time clock of the host OS resolves to microseconds at best, not nanoseconds. So the nanoTime
feature must use another source for tracking elapsed nanos.
The CPU is the likely source. Modern CPUs have a regular “heartbeat” that keeps them running, timing their calculations. For example a 3 GHz CPU has three billion heartbeats per second. I imagine the nanoTime
native code taps into this heartbeat for its count of elapsed nanoseconds.
The point here is that these two sources of time are completely separate and distinct.
Instant.now
jump, forward or backward(!). The jump may be minuscule or may be major, depending on (a) how long since the last drift-correction, and (b) the drifting tendency of your particular computer’s clock. (Another issue is the accuracy of your time server.†)You have mixed these two separate time sources with your code:
NANOTIME_OFFSET = TimeUnit.SECONDS.toNanos(instant.getEpochSecond()) + instant.getNano() - nanoTime;
The Instant
class is a moving target, repeatedly being updated at any moment, jumping ahead or jumping backward. So your NANOTIME_OFFSET
is naïve and invalid.
nanoTime
says: 👉🏽 ”This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time.”For the sake of completeness, I’ll mention that System.currentTimeMillis
(now legacy) is supplanted by Instant.now
. Both use the the date-time clock of the host OS. Neither should be mixed with System.nanoTime
.
† Nowadays, you can buy an atomic clock for much money (though perhaps less than you expect). A recent topic on YouTube is roll-your-own super-accurate very-cheap time servers based on capturing the current moment as broadcast by satellite navigation (GPS, etc.), then served by inexpensive computers such as Raspberry Pi. To go down this time-geek rabbit hole, you might start with the Jeff Geerling channels.