I am running a Dask
cluster and a Jupyter
notebook server on cloud resources using Kubernetes
and Helm
I am using a yaml
file for the Dask
cluster and Jupyter
, initially taken from https://docs.dask.org/en/latest/setup/kubernetes-helm.html:
apiVersion: v1
kind: Pod
worker:
replicas: 2 #number of workers
resources:
limits:
cpu: 2
memory: 2G
requests:
cpu: 2
memory: 2G
env:
- name: EXTRA_PIP_PACKAGES
value: s3fs --upgrade
# We want to keep the same packages on the workers and jupyter environments
jupyter:
enabled: true
env:
- name: EXTRA_PIP_PACKAGES
value: s3fs --upgrade
resources:
limits:
cpu: 1
memory: 2G
requests:
cpu: 1
memory: 2G
an I am using another yaml
file to create the storage locally.
#CREATE A PERSISTENT VOLUME CLAIM // attached to our pod config
apiVersion: 1
kind: PersistentVolumeClaim
metadata:
name: dask-cluster-persistent-volume-claim
spec:
accessModes:
- ReadWriteOne #can be used by a single node -ReadOnlyMany : for multiple nodes -ReadWriteMany: read/written to/by many nodes
ressources:
requests:
storage: 2Gi # storage capacity
I would like to add a persistent volume claim to the first yaml
file, I couldn't figure out where the add volumes
and volumeMounts
.
if you have an idea, please share it, thank you
I started by creating a pvc claim with the YAML
file:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pdask-cluster-persistent-volume-claim
spec:
accessModes:
- ReadWriteOnce #can be used by a single node -ReadOnlyMany : for multiple nodes -ReadWriteMany: read/written to/by many nodes
resources: # https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
requests:
storage: 2Gi
with lunching in bash:
kubectl apply -f Dask-Persistent-Volume-Claim.yaml
#persistentvolumeclaim/pdask-cluster-persistent-volume-claim created
I checked the creation of persitent volume:
kubectl get pv
I made major changes to the Dask
cluster YAML
: I added the volumes
and volumeMounts
where I read/write from a directory /data
from the persistent volume created previously, I specified ServiceType
to LoadBalancer
with port
:
apiVersion: v1
kind: Pod
scheduler:
name: scheduler
enabled: true
image:
repository: "daskdev/dask"
tag: 2021.8.1
pullPolicy: IfNotPresent
replicas: 1 #(should always be 1).
serviceType: "LoadBalancer" # Scheduler service type. Set to `LoadBalancer` to expose outside of your cluster.
# serviceType: "NodePort"
# serviceType: "ClusterIP"
#loadBalancerIP: null # Some cloud providers allow you to specify the loadBalancerIP when using the `LoadBalancer` service type. If your cloud does not support it this option will be ignored.
servicePort: 8786 # Scheduler service internal port.
# DASK WORKERS
worker:
name: worker # Dask worker name.
image:
repository: "daskdev/dask" # Container image repository.
tag: 2021.8.1 # Container image tag.
pullPolicy: IfNotPresent # Container image pull policy.
dask_worker: "dask-worker" # Dask worker command. E.g `dask-cuda-worker` for GPU worker.
replicas: 2
resources:
limits:
cpu: 2
memory: 2G
requests:
cpu: 2
memory: 2G
mounts: # Worker Pod volumes and volume mounts, mounts.volumes follows kuberentes api v1 Volumes spec. mounts.volumeMounts follows kubernetesapi v1 VolumeMount spec
volumes:
- name: dask-storage
persistentVolumeClaim:
claimName: pvc-dask-data
volumeMounts:
- name: dask-storage
mountPath: /save_data # folder for storage
env:
- name: EXTRA_PIP_PACKAGES
value: s3fs --upgrade
# We want to keep the same packages on the worker and jupyter environments
jupyter:
name: jupyter # Jupyter name.
enabled: true # Enable/disable the bundled Jupyter notebook.
#rbac: true # Create RBAC service account and role to allow Jupyter pod to scale worker pods and access logs.
image:
repository: "daskdev/dask-notebook" # Container image repository.
tag: 2021.8.1 # Container image tag.
pullPolicy: IfNotPresent # Container image pull policy.
replicas: 1 # Number of notebook servers.
serviceType: "LoadBalancer" # Scheduler service type. Set to `LoadBalancer` to expose outside of your cluster.
# serviceType: "NodePort"
# serviceType: "ClusterIP"
servicePort: 80 # Jupyter service internal port.
# This hash corresponds to the password 'dask'
#password: 'sha1:aae8550c0a44:9507d45e087d5ee481a5ce9f4f16f37a0867318c' # Password hash.
env:
- name: EXTRA_PIP_PACKAGES
value: s3fs --upgrade
resources:
limits:
cpu: 1
memory: 2G
requests:
cpu: 1
memory: 2G
mounts: # Worker Pod volumes and volume mounts, mounts.volumes follows kuberentes api v1 Volumes spec. mounts.volumeMounts follows kubernetesapi v1 VolumeMount spec
volumes:
- name: dask-storage
persistentVolumeClaim:
claimName: pvc-dask-data
volumeMounts:
- name: dask-storage
mountPath: /save_data # folder for storage
Then, I install my Dask
configuration using helm
:
helm install my-config dask/dask -f values.yaml
Finally, I accessed my jupyter
interactively:
kubectl exec -ti [pod-name] -- /bin/bash
to examine the existence of the /data
folder