Data set 1 and dataset 2 having same column names but different descriptions. In dataset 1 transformation, I would say I am working on data set 1 so it has to give preference to that data set 1 specific descriptions. If I am doing transformation for another data set, I want to give preference to that data set. Is there a way to populate column descriptions which are data set specific?
For example, the arguments in my_compute_function
is there a way to pass the dataset name which has to be given priority
Column1, Column Description for dataset 1, {Dataset 1 name}.
Column1, Column Description for dataset 2, {Dataset 2 name},
...
from transforms.api import transform, Input, Output
@transform(
my_output=Output("/my/output"),
my_input=Input("/my/input"),
)
def my_compute_function(my_input, my_output):
my_output.write_dataframe(
my_input.dataframe(),
column_descriptions={
"col_1": "col 1 description"
},
???
)
One way to do this is to provide a 'override dictionary' for all your datasets, where dataset-specific descriptions could take precedence.
i.e. you have :
from transforms.api import transform, Input, Output
GENERAL_DESCRIPTIONS = {
"col_1": "my general description"
}
LOCAL_DESCRIPTIONS = {
"/path/to/my/dataset": {
"col_1": "my override description"
}
}
@transform(
my_output=Output("/path/to/my/dataset"),
my_input=Input("/path/to/input"),
)
def my_compute_function(my_output, my_input):
local_updates = LOCAL_DESCRIPTIONS.get(my_output.path, {})
local_descriptions = GENERAL_DESCRIPTIONS.copy()
local_descriptions.update(local_updates)
my_output.write_dataframe(
my_input.dataframe(),
column_descriptions=local_descriptions
)
This would then allow you to put GENERAL_DESCRIPTIONS
at the root of your module and override in each transformation .py
file at the top with your 'local' descriptions. You could even put the 'local' descriptions above a group of transformations so you don't have to inspect each and every file to specify overrides.
The most granular way to update the description dictionary will be to simply:
...
GENERAL_DESCRIPTIONS = {
"col_1": "my general description"
}
LOCAL_DESCRIPTIONS = {
"col_1": "my override description"
}
...
def my_compute_function(my_output, my_input):
local_descriptions = GENERAL_DESCRIPTIONS.copy()
local_descriptions.update(LOCAL_DESCRIPTIONS)
...