There are times when an incremental pipeline in Palantir Foundry has to be built as a snapshot. If the data size is large, the resources to run the build are increased to reduce run time and then the configuration is removed after first snapshot run. Is there a way to set conditional configuration? Like if pipeline is running on Incremental Mode, use default configuration of resource allocation and if not the specified set of resources.
Example: If pipeline runs as snapshot transaction, below configuration has to be applied
@configure(profile=["NUM_EXECUTORS_8", "EXECUTOR_MEMORY_MEDIUM", "DRIVER_MEMORY_MEDIUM"])
If incremental, then the default one.
The @configure
and @incremental
are set during the CI execution, while the actual code inside the function annotated by @transform_df
or `@transform happens at build time.
This means that you can't programatically switch between them after the CI has passed. What you can do however is have a constant or configuration within your repo, and switch at code level whenever you want to switch these. Please make sure you understand how semantic versioning works before attempting this I.e.:
IS_INCREMENTAL = true
SEMANTIC_VERSION=1
def mytransform(input1, input2,...)
return input1.join(input2, "foo", left)
if IS_INCREMENTAL:
@incremental(semantic_version=SEMANTIC_VERSION)
@transform_df(
Output("foo"),
input1=Input("bar"),
input2=Input("foobar"))
def compute(input1, input2):
return mytransform(input1, input2)
else:
@configure(profile=["NUM_EXECUTORS_8", "EXECUTOR_MEMORY_MEDIUM", "DRIVER_MEMORY_MEDIUM"])
@transform_df(
Output("foo"),
input1=Input("bar"),
input2=Input("foobar"))
def compute(input1, input2):
return mytransform(input1, input2)