How can I load a config file from an egg file? I'm trying to run python code that is packages as egg file on Airflow. In the code, it tries to load a config.json file which fails to run on Airflow. I guess the issue is that it tries to read the file from the egg file and since it is zipped it can't find it. I updated setup.py as follow to make sure the config file is in the pckage:
from setuptools import find_packages, setup
setup(
name='tv_quality_assurance',
packages=find_packages(),
version='0.1.0',
description='Quality checks on IPTV linear viewing data',
author='Sarah Berenji',
data_files=[('src/codes', ['src/codes/config.json'])],
include_package_data=True,
license='',
)
Now it complains that the config_file_path
is not a directory:
NotADirectoryError: [Errno 20] Not a directory: '/opt/artifacts/project-0.1.0.dev8-py3.6.egg/src/codes/config.json'
I checked the path and the json file is on there. Here is my code with some print statement added to debug which shows that it doesn't see config_file_path
as a file or a directory:
dir_path = os.path.dirname(__file__)
config_file_path = dir_path + '/config.json'
print(f"config_file_path = {config_file_path}")
print(f"relpath(config_file_path) = {os.path.relpath(config_file_path)}")
if not os.path.isfile(config_file_path):
print(f"{config_file_path} is not a file")
if not os.path.isdir(config_file_path):
print(f"{config_file_path} is not a dir")
with open(config_file_path) as json_file:
config = json.load(json_file)
It returns the following outputs:
config_file_path = /opt/artifacts/project-0.1.0.dev8-py3.6.egg/src/codes/config.json
relpath(config_file_path) = ../../artifacts/project-0.1.0.dev8-py3.6.egg/src/codes/config.json
/opt/artifacts/project-0.1.0.dev8-py3.6.egg/src/codes/config.json is not a file
/opt/artifacts/project-0.1.0.dev8-py3.6.egg/src/codes/config.json is not a dir
Traceback (most recent call last):
File "/opt/test_AF1.10.2_py2/dags/py_spark_entry_point.py", line 8, in <module>
execute(spark)
File "/opt/artifacts/project-0.1.0.dev8-py3.6.egg/src/entry_point.py", line 26, in execute
File "/opt/artifacts/project-0.1.0.dev8-py3.6.egg/src/codes/data_methods.py", line 32, in load_config_file
NotADirectoryError: [Errno 20] Not a directory: '/opt/artifacts/project-0.1.0.dev8-py3.6.egg/src/codes/config.json'
As my next try, I tried to use importlib_resources
but ended up to the weird error that the module is not installed but the log show it was successfully installed by pip: ModuleNotFoundError: No module named 'importlib_resources'
import importlib_resources
config_file = importlib_resources.files("src.codes") / "config.json"
with open(config_file) as json_file:
config = json.load(json_file)
I just managed to do it using pkg_resources
:
config_file = pkg_resources.resource_stream('src.codes', 'config.json')
config = json.load(config_file)