I'm having a problem trying to run "dvc pull" on Google Colab. I have two repositories (let's call them A and B) where repository A is for my machine learning codes and repository B is for my dataset.
I've successfully pushed my dataset to repository B with DVC (using gdrive as my remote storage) and I also managed to successfully run "dvc import" (as well as "dvc pull/update") on my local project of repository A.
The problem comes when I use colab to run my project. So what I did was the following:
On step 4, I got the error (this is the full stack trace. Note that I changed the repo URL in the stack trace for confidentiality reasons)
2022-03-08 08:53:31,863 DEBUG: Adding '/content/<my_project_A>/.dvc/config.local' to gitignore file.
2022-03-08 08:53:31,866 DEBUG: Adding '/content/<my_project_A>/.dvc/tmp' to gitignore file.
2022-03-08 08:53:31,866 DEBUG: Adding '/content/<my_project_A>/.dvc/cache' to gitignore file.
2022-03-08 08:53:31,916 DEBUG: Creating external repo https://gitlab.com/<my-dataset-repo-B>.git@3a3f2019efabff8ec71429da39b86688d1c98e75
2022-03-08 08:53:31,916 DEBUG: erepo: git clone 'https://gitlab.com/<my-dataset-repo-B>.git' to a temporary dir
Everything is up to date.
2022-03-08 08:53:32,154 ERROR: failed to pull data from the cloud - Failed to clone repo 'https://gitlab.com/<my-dataset-repo-B>.git' to '/tmp/tmp2x6y9z0edvc-clone'
------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/scmrepo/git/backend/gitpython.py", line 185, in clone
tmp_repo = clone_from()
File "/usr/local/lib/python3.7/dist-packages/git/repo/base.py", line 1148, in clone_from
return cls._clone(git, url, to_path, GitCmdObjectDB, progress, multi_options, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/git/repo/base.py", line 1079, in _clone
finalize_process, decode_streams=False)
File "/usr/local/lib/python3.7/dist-packages/git/cmd.py", line 176, in handle_process_output
return finalizer(process)
File "/usr/local/lib/python3.7/dist-packages/git/util.py", line 386, in finalize_process
proc.wait(**kwargs)
File "/usr/local/lib/python3.7/dist-packages/git/cmd.py", line 502, in wait
raise GitCommandError(remove_password_if_present(self.args), status, errstr)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
cmdline: git clone -v --no-single-branch --progress https://gitlab.com/<my-dataset-repo-B>.git /tmp/tmp2x6y9z0edvc-clone
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/dvc/scm.py", line 104, in clone
return Git.clone(url, to_path, progress=pbar.update_git, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/scmrepo/git/__init__.py", line 121, in clone
backend.clone(url, to_path, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/scmrepo/git/backend/gitpython.py", line 190, in clone
raise CloneError(url, to_path) from exc
scmrepo.exceptions.CloneError: Failed to clone repo 'https://gitlab.com/<my-dataset-repo-B>.git' to '/tmp/tmp2x6y9z0edvc-clone'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/dist-packages/dvc/command/data_sync.py", line 41, in run
glob=self.args.glob,
File "/usr/local/lib/python3.7/dist-packages/dvc/repo/__init__.py", line 49, in wrapper
return f(repo, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/dvc/repo/pull.py", line 38, in pull
run_cache=run_cache,
File "/usr/local/lib/python3.7/dist-packages/dvc/repo/__init__.py", line 49, in wrapper
return f(repo, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/dvc/repo/fetch.py", line 50, in fetch
revs=revs,
File "/usr/local/lib/python3.7/dist-packages/dvc/repo/__init__.py", line 437, in used_objs
with_deps=with_deps,
File "/usr/local/lib/python3.7/dist-packages/dvc/repo/index.py", line 190, in used_objs
filter_info=filter_info,
File "/usr/local/lib/python3.7/dist-packages/dvc/stage/__init__.py", line 660, in get_used_objs
for odb, objs in out.get_used_objs(*args, **kwargs).items():
File "/usr/local/lib/python3.7/dist-packages/dvc/output.py", line 918, in get_used_objs
return self.get_used_external(**kwargs)
File "/usr/local/lib/python3.7/dist-packages/dvc/output.py", line 973, in get_used_external
return dep.get_used_objs(**kwargs)
File "/usr/local/lib/python3.7/dist-packages/dvc/dependency/repo.py", line 94, in get_used_objs
used, _ = self._get_used_and_obj(**kwargs)
File "/usr/local/lib/python3.7/dist-packages/dvc/dependency/repo.py", line 108, in _get_used_and_obj
locked=locked, cache_dir=local_odb.cache_dir
File "/usr/lib/python3.7/contextlib.py", line 112, in __enter__
return next(self.gen)
File "/usr/local/lib/python3.7/dist-packages/dvc/external_repo.py", line 35, in external_repo
path = _cached_clone(url, rev, for_write=for_write)
File "/usr/local/lib/python3.7/dist-packages/dvc/external_repo.py", line 155, in _cached_clone
clone_path, shallow = _clone_default_branch(url, rev, for_write=for_write)
File "/usr/local/lib/python3.7/dist-packages/funcy/decorators.py", line 45, in wrapper
return deco(call, *dargs, **dkwargs)
File "/usr/local/lib/python3.7/dist-packages/funcy/flow.py", line 274, in wrap_with
return call()
File "/usr/local/lib/python3.7/dist-packages/funcy/decorators.py", line 66, in __call__
return self._func(*self._args, **self._kwargs)
File "/usr/local/lib/python3.7/dist-packages/dvc/external_repo.py", line 220, in _clone_default_branch
git = clone(url, clone_path)
File "/usr/local/lib/python3.7/dist-packages/dvc/scm.py", line 106, in clone
raise CloneError(str(exc))
dvc.scm.CloneError: Failed to clone repo 'https://gitlab.com/<my-dataset-repo-B>.git' to '/tmp/tmp2x6y9z0edvc-clone'
------------------------------------------------------------
2022-03-08 08:53:32,161 DEBUG: Analytics is enabled.
2022-03-08 08:53:32,192 DEBUG: Trying to spawn '['daemon', '-q', 'analytics', '/tmp/tmp4x5js0dk']'
2022-03-08 08:53:32,193 DEBUG: Spawned '['daemon', '-q', 'analytics', '/tmp/tmp6x11s0dk']'
And btw this is how I cloned my git repository (repo A)
!git config - global user.name "Zharfan"
!git config - global user.email "zharfan@myemail.com"
!git clone https://<MyTokenName>:<MyToken>@link-to-my-repo-A.git
Does anyone know why? Any help would be greatly appreciated. Thank you in advance!
To summarize the discussion in the comments thread.
Most likely it's happening since DVC can't get access to a private repo on GitLab. (The error message is obscure and should be fixed.)
The same way you would not be able to run:
!git clone https://gitlab.com/org/<private-repo>
It also returns a pretty obscure error:
Cloning into '<private-repo>'...
fatal: could not read Username for 'https://gitlab.com': No such device or address
(I think it's something related to how tty is setup in Colab?)
The best approach to solve this is to use SSH like described here for example.