In my project, I already have some files tracked by DVC that I added with dvc add
. And now I want to create stages using thses files as outputs and dependencies, but when I try to create a stage I get an error that says ERROR: output '[FILE NAME]' is already specified in stages
.
I assume that dvc add
add the files to the dependency graph as outputs, thus when I try to include them in a stage it creates a conflict, but I couldn't find anything on the official docuemntation confirming it. So now am confused on how to add outputs to a stage is theses outputs are already tracked by DVC.
Here is an example of what the error I get when creating a stage
>>> dvc stage add -n train -d data/data.csv -o models/model python train.py
ERROR: output 'models/model' is already specified in stages:
- models/model.dvc
- train
In this example the file data/data.csv
and directory models/model
are already added to dvc but are not added to any stage, however they are present in the dependency graph.
So how do I include theses files into a DVC Stage ? Is there a way to do it without having to remove the files from DVC then add them directly through a Stage?
DVC stage outputs are automatically tracked by DVC, you don't need to do dvc add
on them. If you already have done it before, you can safely un-track it with dvc remove
first:
Note that the actual output files or directories of the stage (
outs
field) are not removed by this command, unless the--outs
option is used.
One thing to mention / note. When you create a stage and run it, it removes outputs (unless a persistence flag is specified). This done for reproducibility, it's expected that your stage produces its outputs every time it runs.