google-cloud-platformgoogle-cloud-dataprep

Can Google Cloud Dataprep monitor a GCS path for new files?


Google Cloud Dataprep seems great and we've used it to manually import static datasets, however I would like to execute it more than once so that it can consume new files uploaded to a GCS path. I can see that you can setup a schedule for Dataprep, but I cannot see anywhere in the import setup how it would process new files.

Is this possible? Seems like an obvious need - hopefully I've missed something obvious.


Solution

  • You can add a GCS path as a dataset by clicking on the + icon left of the folder during the dataset (see screenshot). When you set up a scheduled job for a flow that uses this dataset, all files in that directory (including new files) will be picked up on each scheduled job run.

    enter image description here