palantir-foundryfoundry-code-repositoriesfoundry-python-transform

Is there a way to populate column descriptions specific to data set?


Data set 1 and dataset 2 having same column names but different descriptions. In dataset 1 transformation, I would say I am working on data set 1 so it has to give preference to that data set 1 specific descriptions. If I am doing transformation for another data set, I want to give preference to that data set. Is there a way to populate column descriptions which are data set specific?

For example, the arguments in my_compute_function is there a way to pass the dataset name which has to be given priority Column1, Column Description for dataset 1, {Dataset 1 name}. Column1, Column Description for dataset 2, {Dataset 2 name}, ...

from transforms.api import transform, Input, Output


@transform(
    my_output=Output("/my/output"),
    my_input=Input("/my/input"),
)

def my_compute_function(my_input, my_output):
    my_output.write_dataframe(
        my_input.dataframe(),
        column_descriptions={
            "col_1": "col 1 description"
        },
         ???  
    )

Solution

  • One way to do this is to provide a 'override dictionary' for all your datasets, where dataset-specific descriptions could take precedence.

    i.e. you have :

    from transforms.api import transform, Input, Output
    
    GENERAL_DESCRIPTIONS = {
      "col_1": "my general description"
    }
    
    LOCAL_DESCRIPTIONS = {
      "/path/to/my/dataset": {
        "col_1": "my override description"
      }
    }
    
    @transform(
      my_output=Output("/path/to/my/dataset"),
      my_input=Input("/path/to/input"),
    )
    def my_compute_function(my_output, my_input):
      local_updates = LOCAL_DESCRIPTIONS.get(my_output.path, {})
      local_descriptions = GENERAL_DESCRIPTIONS.copy()
      local_descriptions.update(local_updates)
      my_output.write_dataframe(
        my_input.dataframe(),
        column_descriptions=local_descriptions
      )
    

    This would then allow you to put GENERAL_DESCRIPTIONS at the root of your module and override in each transformation .py file at the top with your 'local' descriptions. You could even put the 'local' descriptions above a group of transformations so you don't have to inspect each and every file to specify overrides.

    The most granular way to update the description dictionary will be to simply:

    ...
    GENERAL_DESCRIPTIONS = {
      "col_1": "my general description"
    }
    
    LOCAL_DESCRIPTIONS = {
      "col_1": "my override description"
    }
    
    ...
    def my_compute_function(my_output, my_input):
      local_descriptions = GENERAL_DESCRIPTIONS.copy()
      local_descriptions.update(LOCAL_DESCRIPTIONS)
      ...