I have two code workbooks. If I run a computationally expensive transform in pyspark in workbook A and try to run something in workbook B, both queue in perpetuity until the build in workbook A is stopped and then the one in Workbook B runs immediately, as if it was waiting on the build in workbook A.
Are executors shared on all code workbooks for one user? What is going on?
For Foundry running in PalantirCloud, Executors are set by the spark configuration settings and managed by Rubix. This is to guarantee execution time with lower variance than fixed resources in YARN (and for the additional Rubix security features such as containerization)
As permissions in Foundry are set at the project level, if a user is running (in interactive mode) more than one code workbook in the same project using the same profile (same set of libraries and spark configurations), the SparkSession will be shared between the two to save on computational resources.
You can check the spark session by running
print(spark)
<pyspark.sql.session.SparkSession object at 0x7ffb605ef048>.
If I have another workbook in the same project, I would get the same result:
print(spark)
<pyspark.sql.session.SparkSession object at 0x7ffb605ef048>.
If I have another workbook in a different project using the same profile, I would get a different spark session:
print(spark)
<pyspark.sql.session.SparkSession object at 0x7f45800df7f0>
If it's important for it to run in a different SparkSession (and not share the executors) then, the user can make a slight modifications to the packages in one of the workbooks, or create another pre-warmed spark session profile (instead of the default one).