I encountered this stack trace while running a Presto query on top of Alluxio. Sometimes my query is able to succeed, but sometimes it fails with this error. What does it mean, and how can I fix it?
com.facebook.presto.spi.PrestoException: Error opening Hive split alluxio://xxxxx:19998/s3/data/m-00025 (offset=100663296, length=53990296) using org.apache.hadoop.mapred.TextInputFormat: Channel [id: 0xfa748b02, L:/xxxxx:34874 ! R:xxxxx/xxxxx:29999] is closed. at com.facebook.presto.hive.HiveUtil.createRecordReader(HiveUtil.java:219) at com.facebook.presto.hive.GenericHiveRecordCursorProvider.lambda$createRecordCursor$0(GenericHiveRecordCursorProvider.java:71) at com.facebook.presto.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:23) at com.facebook.presto.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:80) at com.facebook.presto.hive.GenericHiveRecordCursorProvider.createRecordCursor(GenericHiveRecordCursorProvider.java:70) at com.facebook.presto.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:183) at com.facebook.presto.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:93) at com.facebook.presto.spi.connector.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:44) at com.facebook.presto.split.PageSourceManager.createPageSource(PageSourceManager.java:56) at com.facebook.presto.operator.ScanFilterAndProjectOperator.getOutput(ScanFilterAndProjectOperator.java:216) at com.facebook.presto.operator.Driver.processInternal(Driver.java:379) at com.facebook.presto.operator.Driver.lambda$processFor$8(Driver.java:283) at com.facebook.presto.operator.Driver.tryWithLock(Driver.java:675) at com.facebook.presto.operator.Driver.processFor(Driver.java:276) at com.facebook.presto.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1053) at com.facebook.presto.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:162) at com.facebook.presto.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:456) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.io.IOException: Channel [id: 0xfa748b02, L:/xxxxx:34874 ! R:xxxxx/xxxxx:29999] is closed. at alluxio.client.block.stream.NettyPacketReader$PacketReadHandler.channelUnregistered(NettyPacketReader.java:314) at alluxio.core.client.runtime.io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:176)
This means the connection between the Alluxio client (Presto) and Alluxio worker was closed unexpectedly.
Usually this is caused by a long GC pause on the client. The Alluxio client periodically sends a keep-alive on the connection, but this can be delayed (to the point of the worker closing the connection) by full GCs.
You can verify if there is GC pressure by adding the Java options -XX:+PrintGCDetails
and -Xloggc:<file name here>
to the Presto daemons.