Use case: id-10T proofing data removal with a zero-trust command.
I am looking through the documentation and I don't see clear cut guidelines for what can possibly go into DVC as a file name.
Right now, I know that DVC implements some name filtration. I cannot, for example, add a file with a newline:
$: touch 'foo
bar.txt'
$: dvc add foo$'\n'bar.txt
Adding...
ERROR: output 'foobar.txt' does not exist
Can someone point me to the documentation that explains exactly what is allowed to go into the yaml file as a path?
There is no documentation on allowed filenames in DVC, but the issue is that DVC currently uses urllib.urlsplit
and urllib.urlunsplit
when normalizing path names, and the newline gets removed by urlsplit
since it's not a valid path character for RFC-compliant URLs. DVC needs to support both local paths and remote URL paths like s3://bucket/object/path
, so currently it treats everything as a URL.
The intended behavior is that DVC should support any character that is valid for your local filesystem, so it seems pretty clear that this is a bug - DVC should account for invalid URL characters that are valid for local filesystems. I've opened a report which you can follow for further updates: https://github.com/iterative/dvc-objects/issues/177