dvc

DVC *.dvc files not created even though data files are in the outs section of a stage within the dvc.yaml file


I am little confused as to how DVC keeps track of changes within datasets. If I execute "dvc add ./data/a.csv", then dvc adds ./data/a.csv into ./data/.gitignore and creates a ./data/a.csv.dvc file. On the other hand if I have something like this in a dvc.yaml file:

stages:
  gen-ref-arts:
    cmd: make gen-ref-arts
    outs:
      - ./data/b.csv

Executing "dvc repro" then DVC adds ./data/b.csv into ./data/.gitignore, however, it does not create a b.csv.dvc file.

In the documentation of DVC (https://dvc.org/doc/start/data-pipelines/data-pipelines) I can read: "DVC uses the pipeline definition to automatically track the data used and produced by any stage, so there's no need to manually run dvc add for data/prepared!"

Why does it not generate a ./data/b.csv.dvc file? Is this normal? If so why?


Solution

  • Pipeline outputs are tracked by dvc.lock file. It has a similar structure to .dvc files, but combines information across multiple stages. That was done for simplicity in case of complex pipelines.

    See more details here - dvc.lock file.