I want to add a raw dataset file to my dagshub repo (my first repo, and its being used alongside an MLflow tutorial)
This is the line that is giving me trouble:
repo = dagshub.upload.Repo(USER_NAME,REPO_NAME)
repo.upload(local_path='data/winequality.txt',
remote_path='data/raw/winequality.txt',
commit_message='Added Raw Data',
versioning='dvc')
and this is the error I get:
Uploading files (1) to "USER_NAME/REPO_NAME"...
---------------------------------------------------------------------------
DagsHubAPIError Traceback (most recent call last)
<ipython-input-49-e8d1e8493248> in <cell line: 4>()
2 repo = dagshub.upload.Repo(USER_NAME,REPO_NAME)
3
----> 4 repo.upload(local_path='data/winequality.txt',
5 remote_path='data/raw/winequality.txt',
6 commit_message='Added Raw Data',
2 frames
/usr/local/lib/python3.10/dist-packages/dagshub/upload/wrapper.py in upload(self, local_path, commit_message, remote_path, **kwargs)
286 else:
287 file_to_upload = DataSet.get_file(str(local_path), remote_path)
--> 288 self.upload_files([file_to_upload], commit_message=commit_message, **kwargs)
289
290 def upload_files(
/usr/local/lib/python3.10/dist-packages/dagshub/upload/wrapper.py in upload_files(self, files, directory_path, commit_message, versioning, new_branch, last_commit, force)
375 timeout=None,
376 )
--> 377 self._log_upload_details(data, res, files)
378
379 # The ETag header contains the hash of the uploaded commit,
/usr/local/lib/python3.10/dist-packages/dagshub/upload/wrapper.py in _log_upload_details(self, data, res, files)
413 log_message(f"Got unknown successful status code {res.status_code}")
414 else:
--> 415 raise determine_upload_api_error(res)
416
417 def _poll_mirror_up_to_date(self):
DagsHubAPIError: file missing from storage:
Required resource is missing from the storage, is '' stored in your repository DagsHub storage?
The Repo file structure looks like this:
Local disk:
root/
|...data/
|... winequality.txt
Remote:
root/
|...data/
|...raw/
Note that 'raw' is version controlled by DVC, but the dagshub documentation shows that this is the way to do it: Upload Data
Not sure what I am missing.
The issue seems to be caused due to missing DVC tracked files which prevent adding new files to the directory. To solve the issue, run the following code:
pip install dvc "dvc[s3]"
if not already installed.
git clone https://dagshub.com/<user_name>/<repo_name>.git
cd <repo_name>
dvc remote add origin --local s3://dvc
dvc remote modify origin --local endpointurl https://dagshub.com/<user_name>/<repo_name>.s3
dvc remote modify origin --local access_key_id <your_token>
dvc remote modify origin --local secret_access_key <your_token>
Then once things are configured, run the following:
mkdir -p data/raw
dvc commit data/raw.dvc
dvc push -r origin
Then run your code. It will now work!
That being said this is probably something we can improve on our end too, so I'll share it with the engineering team!
Thanks for the question :)