pythonhealth-checkfeature-storemlrun

MLRun, ErrorMessage, No space left on device


I got this error during ingest data to FeatureSet:

Error - Failed to save aggregation for /k78/online_detail/nosql/sets/on line_detail/0354467518.ed74fc2b
Response status code was 400: b'{\n\t"ErrorCode": -28,\n\t"ErrorMessage": "No space left on device"\n} 
Update expression was: pr_ph='0354467518';id=7309877;type='r77'

I used standard code for ingestion, see:

import mlrun
import mlrun.feature_store as fs
...
project = mlrun.get_or_create_project(project_name, context='./', user_project=False)
feature_set=featureGetOrCreate(True, project_name, 'sample')
...
fs.ingest(feature_set, df)

It seems as the issue with disk space, but I am 100% sure that I had enough free space for ingest (it will be something different). Did you have the similar issue?


Solution

  • The issue was in number of objects on side of data nodes and it has relation to these platform limits.

    It is possible to run the HealthCheckScript (hcs) and see number of objects in v3io containers, see the command (with setting):

    hcs -v --dark --test check_cluster_engine_number_of_objects
    

    and you see these outputs:

    ...
    [2023-05-25 20:30:20] [TASK] [check_cluster_engine_number_of_objects] Checking the number of objects in the cluster.
    [2023-05-25 20:30:20] [SUBTASK] Checking the data cluster's operational status...
    [2023-05-25 20:30:21] [INFO] The data cluster is online.
    [2023-05-25 20:30:21] [SUBTASK] Counting the number of objects in each container...
    [2023-05-25 20:30:30] [INFO] +----------------+------+----------+
    [2023-05-25 20:30:30] [INFO] | Container Name | ID   | Items    |
    [2023-05-25 20:30:30] [INFO] +----------------+------+----------+
    [2023-05-25 20:30:30] [INFO] | projects       | 1033 | 28715350 |
    [2023-05-25 20:30:30] [INFO] | users          | 1034 | 179598   |
    [2023-05-25 20:30:30] [INFO] | bigdata        | 1035 | 271      |
    [2023-05-25 20:30:30] [INFO] | users          | 1036 | 285995   |
    [2023-05-25 20:30:30] [INFO] | Total          |      | 29181214 |
    [2023-05-25 20:30:30] [INFO] +----------------+------+----------+
    ...
    

    BTW: The hcs command is available on side of data nodes.