I have a remote repository that I want to use with DVC. I want to access my files through DVC in Python using the dvc.api module. Here's the code I'm using:
import dvc.api
path = 'data/test.csv'
repo = 's3://xxx/DVC_test/'
version = 'v1'
data_url = dvc.api.get_url(path=path, repo=repo, rev=version)
However, I'm encountering the following error:
Cloning | |0.00/? [00:00, ?obj
Cloning | |0.00/? [00:00,
Traceback (most recent call last):
File "<input>", line 1, in <module>
data_url = dvc.api.get_url(path=path, repo=repo, rev=version)
File "/home/asokolov/Documents/BG/DVC_pipeline/dvc_test_venv/lib/python3.9/site-packages/dvc/
api/data.py", line 21, in get_url
with Repo.open(repo, rev=rev, subrepos=True, uninitialized=True) as _repo:
File "/usr/local/lib/python3.9/contextlib.py", line 119, in __enter__
return next(self.gen)
File "/home/asokolov/Documents/BG/DVC_pipeline/dvc_test_venv/lib/python3.9/site-packages/dvc/
external_repo.py", line 45, in external_repo
path = _cached_clone(url, rev, for_write=for_write)
File "/home/asokolov/Documents/BG/DVC_pipeline/dvc_test_venv/lib/python3.9/site-packages/dvc/
external_repo.py", line 173, in _cached_clone
clone_path, shallow = _clone_default_branch(url, rev, for_write=for_write)
File "/home/asokolov/Documents/BG/DVC_pipeline/dvc_test_venv/lib/python3.9/site-packages/func
y/decorators.py", line 45, in wrapper
return deco(call, *dargs, **dkwargs)
File "/home/asokolov/Documents/BG/DVC_pipeline/dvc_test_venv/lib/python3.9/site-packages/func
y/flow.py", line 274, in wrap_with
return call()
File "/home/asokolov/Documents/BG/DVC_pipeline/dvc_test_venv/lib/python3.9/site-packages/func
y/decorators.py", line 66, in __call__
return self._func(*self._args, **self._kwargs)
File "/home/asokolov/Documents/BG/DVC_pipeline/dvc_test_venv/lib/python3.9/site-packages/dvc/
external_repo.py", line 241, in _clone_default_branch
git = clone(url, clone_path)
File "/home/asokolov/Documents/BG/DVC_pipeline/dvc_test_venv/lib/python3.9/site-packages/dvc/
scm.py", line 165, in clone
raise CloneError("SCM error") from exc
dvc.scm.CloneError: SCM error
At the same time, running dvc pull
works without errors.
Here's my dvc.doctor
:
dvc doctor
DVC version: 2.47.0 (pip)
-------------------------
Platform: Python 3.9.16 on Linux-5.19.0-31-generic-x86_64-with-glibc2.36
Subprojects:
dvc_data = 0.42.1
dvc_objects = 0.21.1
dvc_render = 0.2.0
dvc_task = 0.2.0
scmrepo = 0.1.15
Supports:
http (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
https (aiohttp = 3.8.4, aiohttp-retry = 2.8.3),
s3 (s3fs = 2023.3.0, boto3 = 1.24.59)
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/nvme0n1p2
Caches: local
Remotes: s3
Workspace directory: ext4 on /dev/nvme0n1p2
Repo: dvc, git
And my .dvc/config.loval
:
['remote "dvc-remote"']
url = s3://xxx/DVC_test/
access_key_id = xxx
secret_access_key = xxx
region = xxx
Could you please suggest a solution to resolve the issue?
I believe you are slightly misusing the python API see here: https://dvc.org/doc/api-reference/get_url
It looks like you would want something like this:
import dvc.api
path = "data/test.csv"
remote_name = "dvc-remote"
repo = "https://github.com/username/repo.git"
version = "v1"
url = dvc.api.get_url(
path=path,
remote=remote_name,
repo=repo,
rev=version
)
print(url)