dvc

How to add to a DVC stage outputs already tracked by DVC?


In my project, I already have some files tracked by DVC that I added with dvc add. And now I want to create stages using thses files as outputs and dependencies, but when I try to create a stage I get an error that says ERROR: output '[FILE NAME]' is already specified in stages.

I assume that dvc add add the files to the dependency graph as outputs, thus when I try to include them in a stage it creates a conflict, but I couldn't find anything on the official docuemntation confirming it. So now am confused on how to add outputs to a stage is theses outputs are already tracked by DVC.

Here is an example of what the error I get when creating a stage

>>> dvc stage add -n train -d data/data.csv -o models/model python train.py

ERROR: output 'models/model' is already specified in stages:
        - models/model.dvc
        - train

In this example the file data/data.csv and directory models/model are already added to dvc but are not added to any stage, however they are present in the dependency graph.

So how do I include theses files into a DVC Stage ? Is there a way to do it without having to remove the files from DVC then add them directly through a Stage?


Solution

  • DVC stage outputs are automatically tracked by DVC, you don't need to do dvc add on them. If you already have done it before, you can safely un-track it with dvc remove first:

    Note that the actual output files or directories of the stage (outs field) are not removed by this command, unless the --outs option is used.

    One thing to mention / note. When you create a stage and run it, it removes outputs (unless a persistence flag is specified). This done for reproducibility, it's expected that your stage produces its outputs every time it runs.