javaamazon-web-servicesamazon-ec2memory-mapped-files

AWS EC2 java program memorry map issue


I have a java program where a file around 800MB is memory mapped via java.io.RandomAccessFile. I'm hosting it in an EC2 m5.8xlarge (32 CPUs, 128GB RAM) instance with JVM OPTS set to -Xms64g -Xmx64g. While starting the service, I met error:

 [thread 3606 also had an error]
 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGBUS (0x7) at pc=0x00007f214f3c5e73, pid=3556, tid=3637
 #
 # JRE version: OpenJDK Runtime Environment Temurin-17.0.6+10 (17.0.6+10) (build 17.0.6+10)
 # Java VM: OpenJDK 64-Bit Server VM Temurin-17.0.6+10 (17.0.6+10, mixed mode, sharing, tiered, compressed class ptrs, z gc, linux-amd64)
 # Problematic frame:
 # V  [libjvm.so+0x602e73]  Copy::fill_to_memory_atomic(void*, unsigned long, unsigned char)+0x103
 #
 # No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
 #
 # An error report file with more information is saved as:
 # /home/user/builds/current/hs_err_pid3556.log
 #
 # If you would like to submit a bug report, please visit:
 #   https://github.com/adoptium/adoptium-support/issues
 #
 /home/user/builds/current/start.sh: line 78:  3556 Aborted                 java ${JVM_OPTS} -cp 'lib/*' ${LAUNCH_CLASS} $@

And the hs_err_pid3556.log mentioned above gives me below, where sun.misc.Unsafe.setMemory went wrong when setting the block of memories to 0s:

Current thread (0x00007fcaf0274fd0):  JavaThread "ForkJoinPool-1-worker-7" daemon [_thread_in_vm, id=3199, stack(0x00007fcb15fda000,0x00007fcb160db000)]
Stack: [0x00007fcb15fda000,0x00007fcb160db000],  sp=0x00007fcb160d8ce8,  free space=1019k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x602e73]  Copy::fill_to_memory_atomic(void*, unsigned long, unsigned char)+0x103
j  jdk.internal.misc.Unsafe.setMemory0(Ljava/lang/Object;JJB)V+0 java.base@17.0.6
j  jdk.internal.misc.Unsafe.setMemory(Ljava/lang/Object;JJB)V+25 java.base@17.0.6
j  jdk.internal.misc.Unsafe.setMemory(JJB)V+6 java.base@17.0.6
j  sun.misc.Unsafe.setMemory(JJB)V+7 jdk.unsupported@17.0.6
j  example.com.buffer.MemoryMappedBuffer.set(JJB)V+58
j  example.com.buffer.Buffer.zeroed()Lexample/com/buffer/Buffer;+9
j  example.com.collections.BufferSupplierMapped.supplyBuffers(JJ)Lorg/apache/commons/lang3/tuple/Pair;+37
j  example.com.collections.ConcurrentOffheapLongObjMap$MapImpl.<init>(Ljava/lang/String;Lexample/com/collections/OffheapMapBufferSupplier;Lexample/com/collections/OffheapValueSerDe;JJF)V+65
j  example.com.collections.ConcurrentOffheapLongObjMap.<init>(Ljava/lang/String;JJLexample/com/collections/OffheapMapBufferSupplier;Lexample/com/collections/OffheapValueSerDe;F)V+64
j  example.com.collections.ConcurrentOffheapLongObjMap.<init>(Ljava/lang/String;JJLexample/com/collections/OffheapMapBufferSupplier;Lexample/com/collections/OffheapValueSerDe;)V+11
j  example.com.collections.OffheapMapUtil.readToMapped(Ljava/lang/String;Lexample/com/collections/OffheapValueSerDe;Ljava/lang/String;Ljava/lang/String;)Lexample/com/collections/ConcurrentOffheapLongObjMap;+99
j  example.com.index.job.WritableSiteIndex.lambda$snapshotLoad$21(Lorg/apache/commons/lang3/mutable/MutableObject;Lexample/com/model/Site;Ljava/lang/String;Ljava/lang/String;)V+20
j  example.com.index.job.WritableSiteIndex$$Lambda$362+0x0000000801044f58.run()V+16
j  example.com.thread.AsyncTaskList.lambda$add$0(Ljava/lang/String;Lexample/com/function/ThrowingRunnable;)Ljava/lang/Void;+19
j  example.com.thread.AsyncTaskList$$Lambda$223+0x0000000800e31428.call()Ljava/lang/Object;+8
j  example.com.thread.AsyncTaskList.lambda$execute$1(Ljava/util/concurrent/Callable;)Ljava/lang/Boolean;+1
j  example.com.thread.AsyncTaskList$$Lambda$230+0x0000000800e30400.get()Ljava/lang/Object;+4
j  example.com.thread.WorkerService$$Lambda$231+0x0000000800e38000.call()Ljava/lang/Object;+4
j  java.util.concurrent.ForkJoinTask$AdaptedCallable.exec()Z+5 java.base@17.0.6
j  java.util.concurrent.ForkJoinTask.doExec()I+10 java.base@17.0.6
j  java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Ljava/util/concurrent/ForkJoinTask;Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V+13 java.base@17.0.6
j  java.util.concurrent.ForkJoinPool.scan(Ljava/util/concurrent/ForkJoinPool$WorkQueue;II)I+193 java.base@17.0.6
j  java.util.concurrent.ForkJoinPool.runWorker(Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V+53 java.base@17.0.6
j  java.util.concurrent.ForkJoinWorkerThread.run()V+31 java.base@17.0.6
v  ~StubRoutines::call_stub
V  [libjvm.so+0x822715]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, JavaThread*)+0x315
V  [libjvm.so+0x823f0b]  JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, JavaThread*)+0x1cb
V  [libjvm.so+0x8eda53]  thread_entry(JavaThread*, JavaThread*)+0xa3
V  [libjvm.so+0xe5e974]  JavaThread::thread_main_inner()+0x184
V  [libjvm.so+0xe62020]  Thread::call_run()+0xc0
V  [libjvm.so+0xc187e1]  thread_native_entry(Thread*)+0xe1
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  jdk.internal.misc.Unsafe.setMemory0(Ljava/lang/Object;JJB)V+0 java.base@17.0.6
j  jdk.internal.misc.Unsafe.setMemory(Ljava/lang/Object;JJB)V+25 java.base@17.0.6
j  jdk.internal.misc.Unsafe.setMemory(JJB)V+6 java.base@17.0.6
j  sun.misc.Unsafe.setMemory(JJB)V+7 jdk.unsupported@17.0.6
j  example.com.buffer.MemoryMappedBuffer.set(JJB)V+58
j  example.com.buffer.Buffer.zeroed()Lexample/com/buffer/Buffer;+9
j  example.com.collections.BufferSupplierMapped.supplyBuffers(JJ)Lorg/apache/commons/lang3/tuple/Pair;+37
j  example.com.collections.ConcurrentOffheapLongObjMap$MapImpl.<init>(Ljava/lang/String;Lexample/com/collections/OffheapMapBufferSupplier;Lexample/com/collections/OffheapValueSerDe;JJF)V+65
j  example.com.collections.ConcurrentOffheapLongObjMap.<init>(Ljava/lang/String;JJLexample/com/collections/OffheapMapBufferSupplier;Lexample/com/collections/OffheapValueSerDe;F)V+64
j  example.com.collections.ConcurrentOffheapLongObjMap.<init>(Ljava/lang/String;JJLexample/com/collections/OffheapMapBufferSupplier;Lexample/com/collections/OffheapValueSerDe;)V+11
j  example.com.collections.OffheapMapUtil.readToMapped(Ljava/lang/String;Lexample/com/collections/OffheapValueSerDe;Ljava/lang/String;Ljava/lang/String;)Lexample/com/collections/ConcurrentOffheapLongObjMap;+99
j  example.com.index.job.WritableSiteIndex.lambda$snapshotLoad$21(Lorg/apache/commons/lang3/mutable/MutableObject;Lexample/com/model/Site;Ljava/lang/String;Ljava/lang/String;)V+20
j  example.com.index.job.WritableSiteIndex$$Lambda$362+0x0000000801044f58.run()V+16
j  example.com.thread.AsyncTaskList.lambda$add$0(Ljava/lang/String;Lexample/com/function/ThrowingRunnable;)Ljava/lang/Void;+19
j  example.com.thread.AsyncTaskList$$Lambda$223+0x0000000800e31428.call()Ljava/lang/Object;+8
j  example.com.thread.AsyncTaskList.lambda$execute$1(Ljava/util/concurrent/Callable;)Ljava/lang/Boolean;+1
j  example.com.thread.AsyncTaskList$$Lambda$230+0x0000000800e30400.get()Ljava/lang/Object;+4
j  example.com.thread.WorkerService$$Lambda$231+0x0000000800e38000.call()Ljava/lang/Object;+4
j  java.util.concurrent.ForkJoinTask$AdaptedCallable.exec()Z+5 java.base@17.0.6
j  java.util.concurrent.ForkJoinTask.doExec()I+10 java.base@17.0.6
j  java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Ljava/util/concurrent/ForkJoinTask;Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V+13 java.base@17.0.6
j  java.util.concurrent.ForkJoinPool.scan(Ljava/util/concurrent/ForkJoinPool$WorkQueue;II)I+193 java.base@17.0.6
j  java.util.concurrent.ForkJoinPool.runWorker(Ljava/util/concurrent/ForkJoinPool$WorkQueue;)V+53 java.base@17.0.6
j  java.util.concurrent.ForkJoinWorkerThread.run()V+31 java.base@17.0.6
v  ~StubRoutines::call_stub
siginfo: si_signo: 7 (SIGBUS), si_code: 2 (BUS_ADRERR), si_addr: 0x00007fc979848000

What's interesting is that there's no problem running the same program mem-mapping the same file in my laptop (Ubuntu with 64g ram). That that AWS has no problem loading a very similar but smaller (560MB vs 800MB) file. So I'm pretty sure the Java program is working as expected, and so is the integrity of the file to be mapped.


Solution

  • Turned out it's because there's not enough disk space left, therefore the OS can't mmap and segfault.