I am writing a Python package that has to use external resources. The user can choose to use its own version of the resources, or simply stick to the default one, embedded in the package. Now, I would like to handle the package resources in a similar way as the externally supplied resources, that I can access using the filesystem features. Is there a standard way to do this in Python ?
More precisely, the organization of my project is roughly as follows:
package/
├── __init__.py
├── src.py
└── resources
├── __init__.py
└── lib
├── dir1
| ├── dir1
│ ├── file1
│ └── ...
└── dir2
├── file1
└── ...
The main embedded resource is lib
, which is a directory containing an arbitrary number of nested directories and files. The user can invoke a script using either script
(which should use package/resources/lib
) or script ./path/to/resource
(which should use the directory ./path/to/resource
).
The issue comes from the fact that I strongly rely on the directory structure of the resources, in order to parse it entirely. In particular, I am now handling the files in a resource directory using pathlib.Path.glob
. Though we can work with embedded resource files using pkg_resources.resource_stream
for example, I have not found a way to handle resource directories and regular directories similarly.
Is there an API that allows to do it ? The main feature I am looking for is the ability to list all the files under a directory, be it in an embedded resource or in the filesystem.
Since packaged resources may be compressed, I think that I should use something different from pathlib
, which could provide a "Directory
" class that allows to work with regular directories as well as compressed resource directories. Another possibility would be to extract resources to a regular directory prior to using them, but it seems to be against the principle of the resource system.
The pkg_resources
package allows to do exactly this. As mentioned in the Resource Extraction section of the documentation, resource_filename(package_or_requirement, resource_name)
allows to access a resource in a true filesystem. In particular, if the resource is compressed, it extracts it to a cache directory and returns the cached path.
Thus, listing the files in the resources.lib
directory can be done with for example:
path = pkg_resources.resource_filename("package.resources", "lib")
for file in Path(path).glob("*"):
print(file)