MLRun, Issue with slow response times

I see higher throughput and long average response delay (waiting for worker in range 20-50 seconds), see outputs from grafana:

I know, that part of optimization can be:

use more workers (for each pod/replica)
increase sources for each pod/replica
use more pods/replicas in k8s

I tuned performance based on increase sources and pods/replicas see:

# increase of sources (for faster execution)
fn.with_requests(mem="500Mi", cpu=0.5)  # default sources
fn.with_limits(mem="2Gi", cpu=1)        # maximal sources
    
# increase parallel execution based on increase of pods/replicas
fn.spec.replicas = 2        # default replicas
fn.spec.min_replicas = 2    # min replicas
fn.spec.max_replicas = 5    # max replicas

Do you know, how can I increase amount of workers and expected impacts to CPU/Memory?

Solution

I got it. The worker uses separate worker scope. This means that each worker has a copy of all variables, and all changes are kept within the worker (change by worker x, do not affect worker y). It means, it is useful to increase the request/limit resources at least for memory in level of pod/replica.

You can setup amount of workers for http trigger based on that fn.with_http(workers=<n>), more information see. I updated code based on source tuning:

# increase of workers (two workers) for each pod/replica
fn.with_http(workers=2)

# increase of sources (for faster execution)
fn.with_requests(mem="1Gi", cpu=0.7)    # increased mem 2x and little cpu, because of two workers
fn.with_limits(mem="2Gi", cpu=1)        # maximal sources (without changes)
    
# increase parallel execution based on increase of pods/replicas
fn.spec.replicas = 2        # default replicas 
fn.spec.min_replicas = 2    # min replicas
fn.spec.max_replicas = 5    # max replicas