apache-sparkapache-spark-sql

Spark "storage partitioned join" (SPJ)


Storage partitioned joins (SPJ) is available since Spark v 3.3. However, are any of the data sources (other than Iceberg) updated to make use of the same?

For e.g can I make use of SPJ on 'parquet' datasource (without going to Iceberg)?

(I couldn't find this information in JIRA/SPIP/documentation/release notes/YouTube video)


Solution

  • As of this writing, it doesn't seem like any other data source supports SPJ.