I wanted to ask if there is a way to use python as a .wheel
or .egg
or just .py
dependency in kubeflow spark operator.
The resulting file i have in mind would look something like this, the dependecy would be either under jars or files, i presume files would make more sense:
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: spark-pi-python
namespace: default
spec:
type: Python
pythonVersion: "3"
mode: cluster
image: spark:3.5.3
imagePullPolicy: IfNotPresent
mainApplicationFile: local:///path/to/my/python/script.py
deps:
jars:
- local:///path/to/python/functions.py
files:
- gs://path/to/python/functions.py
sparkVersion: 3.5.3
driver:
cores: 1
memory: 512m
serviceAccount: spark-operator-spark
executor:
instances: 1
cores: 1
memory: 512m
It is possible to use python files as dependencies, see link. This has worked for me:
apiVersion: sparkoperator.k8s.io/v1beta2
kind: SparkApplication
metadata:
name: view-creator-test
namespace: default
spec:
type: Python
pythonVersion: "3"
mode: cluster
image: spark:3.5.3
imagePullPolicy: IfNotPresent
mainApplicationFile: local:///path/to/my/python/script.py
arguments: []
sparkVersion: 3.5.3
deps:
pyFiles:
- local:///mnt/spark/dependency_1.py
- local:///mnt/spark/dependency_2.py
driver:
labels:
version: 3.5.3
cores: 1
memory: 512m
volumeMounts:
- name: view-creator-volume
mountPath: /mnt/spark
executor:
labels:
version: 3.5.3
instances: 1
cores: 1
memory: 512m
volumeMounts:
- name: view-creator-volume
mountPath: /mnt/spark
volumes:
- name: view-creator-volume
persistentVolumeClaim:
claimName: view-creator-pvc