databasemachine-learningdata-scienceartificial-intelligencehub

What does stopping the runtime while uploading a dataset to Hub cause?


I am getting the following error while trying to upload a dataset to Hub (dataset format for AI) S3SetError: Connection was closed before we received a valid response from endpoint URL: "<...>".

So, I tried to delete the dataset and it is throwing this error below.

CorruptedMetaError: 'boxes/tensor_meta.json' and 'boxes/chunks_index/unsharded' have a record of different numbers of samples. Got 0 and 6103 respectively.

Using Hub version: v2.3.1


Solution

  • Seems like when you were uploading the dataset the runtime got interrupted which led to the corruption of the data you were trying to upload. Using force=True while deleting should allow you to delete it.

    For more information feel free to check out the Hub API basics docs for details on how to delete datasets in Hub.

    If you stop uploading a Hub dataset midway through your dataset will be only partially uploaded to Hub. So, you will need to restart the upload. If you would like to re-create the dataset, you can use the overwrite = True flag in hub.empty(overwrite = True). If you are making updates to an existing dataset, you should use version control to checkpoint the states that are in good shape.