google-cloud-platformgoogle-cloud-dataflowgoogle-cloud-dataprocgoogle-cloud-dataprep

Dataprep vs Dataflow vs Dataproc


To perform source data preparation, data transformation or data cleansing, in what scenario should we use Dataprep vs Dataflow vs Dataproc?


Solution

  • Data preparation/transformation/cleaning tasks can all be seen as ETL processes, implementable with any of the products you mention. This older answer covers the basics of the Dataflow vs Dataproc question and includes this link which summarises what you should keep in mind when choosing between these three.

    In brief, you should consider familiarity (have you already worked with Hadoop-ecosystem tools? the beam programming model? would you rather work via a UI?) and desired level of control (dataproc allows more control over the cluster, dataflow and dataprep are fully managed services).

    More good reads: