apache-sparkpysparkdatabricksrddspark-connect

RDD is not implemented error on pyspark.sql.connect.dataframe.Dataframe


I have a dataframe on databricks on which I would like to use the RDD api on. The type of the dataframe is pyspark.sql.connect.dataframe.Dataframe after reading from the catalog. I found out that this is associated with spark connect. In this documentation on Spark Connect, it says,

In Spark 3.4, Spark Connect supports most PySpark APIs, including DataFrame, Functions, and Column. However, some APIs such as SparkContext and RDD are not supported.

Is there any way to get around this?


Solution

  • I was having a similar issue with rdd on Databricks, but since you did not share more details about your issue, here is how I fixed the issue I was having:

    [NOT_IMPLEMENTED] rdd is not implemented.
    

    Possible alternatives to fix this issue: