google-cloud-platformprefect

Prefect changing the value a variable when writing to google storage


Hi I am beginning to think that prefect is changing the value of a variable, here is the situation:

when writing to a google cloud storage, when I use the upload_from_path on my code you will see that I am passing the same variable path as the from_path and the to_path but for some reason prefect changes the structure of the to_path variable, here is the code I have that builds the path:

 @task()
def write_local(df: pd.DataFrame, color: str, dataset_file: str) -> Path:
    """Write DataFrame out locally as parquet file"""
    Path(f"data/{color}").mkdir(parents=True, exist_ok=True)
    path = Path(f"data/{color}/{dataset_file}.parquet")
    df.to_parquet(path, compression="gzip")
    return path


@task
def write_gcs(path: Path) -> None:
    """Upload local parquet file to GCS"""
    gcs_block = GcsBucket.load("zoom-gcs")
    gcs_block.upload_from_path(from_path=path, to_path=path)
    return

you can see in the second task write_gcs both of the paths are the same variable called path and that is just a path structure that has originally this value: 'data/yellow/yellow_tripdata_2021-01.parquet' . The prefect flows runs, but after it runs, in the details of the flow we can see on the first picture I am attaching it changed the text structure of the path for GCS to: 'data\\yellow\\yellow_tripdata_2021-01.parquet' , no idea why this is happening and because of this you can see in the picture 1 that it saves the file with that weird name instead of creating the folders in GCS, any help on maybe why this is happening?

flow run file in GCS


Solution

  • For Windows you may need to add .as_posix() to the Path variable.

    Also, you may need to ensure you are using prefect-gcp 0.2.6 or newer.