pythonapache-sparkpysparkpyarrow

Apache Arrow with Apache Spark - UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer not available


I am trying to integrate Apache Arrow with Apache Spark in a PySpark application, but I am encountering an issue related to sun.misc.Unsafe or java.nio.DirectByteBuffer during the execution.

import os
import pandas as pd
from pyspark.sql import SparkSession

extra_java_options = os.getenv("SPARK_EXECUTOR_EXTRA_JAVA_OPTIONS", "")

spark = SparkSession.builder \
    .appName("ArrowPySparkExample") \
    .getOrCreate()

spark.conf.set("Dio.netty.tryReflectionSetAccessible", "true")
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled", "true")
pdf = pd.DataFrame(["midhun"])
df = spark.createDataFrame(pdf)
result_pdf = df.select("*").toPandas()

Error Message:

in stage 0.0 (TID 11) (192.168.140.22 executor driver): java.lang.UnsupportedOperationException: sun.misc.Unsafe or java.nio.DirectByteBuffer.<init>(long, int) not available
            at org.apache.arrow.memory.util.MemoryUtil.directBuffer(MemoryUtil.java:174)
            at org.apache.arrow.memory.ArrowBuf.getDirectBuffer(ArrowBuf.java:229)
    

Environment:

Apache Spark version: 3.4 Apache Arrow version: 1.5 Java version: jdk 21


Solution

  • Same issue with:

    Downgrading java to test minimum supported version.

    Update: