I work on a project with DVC (Data version control). Let's say I make a lot of local commits. Something like this:
# make changes for experiment 1
dvc add my_data_file
git add my_data_file.dvc
git commit -m "Experiment 1"
# make changes for experiment 2
# which change both code and data
dvc add my_data_file
git add my_data_file.dvc
git commit -m "Experiment 2"
# make changes for experiment 3
# which change both code and data
dvc add my_data_file
git add my_data_file.dvc
git commit -m "Experiment 3"
# Finally I'm done
# push changes:
dvc push
git push
However there is one problem: dvc push
will only push data from experiment 3. Is there any way to push data from all local commits (i.e. starting from the first commit diverged from remote branch)?
Currently I see two options:
dvc push -T
git checkout commit-hash && dvc push
for all local commits not yet pushed to remote.Both these options seem cumbersome and error-prone. Is there any better way to do it?
@NShiny, there is a related ticket:
support push/pull/metrics/gc, etc across different commits.
Please, give it a vote so that we know how to prioritize it.
As a workaround, I would recommend to run dvc install
. It installs a pre-push
GIt hook and runs dvc push
automatically:
Git pre-push hook executes dvc push before git push to upload files and directories under DVC control to remote.
It means, though you need to run git push
after every git commit
:(