When using the import functionality within Dataprep to import a BigQuery (BQ) table that has multiple columns and millions of rows, is there any options to simplify the dataset?
Can you choose the columns and parameterise the BigQuery import before wrangling the dataset?
Is my only option to create a view in BQ first - to simplify the number of rows and columns?
Ideally, I want to minimise the cost of the dataflow workflow when I run the output of any recipe that uses this table and avoid a 'select *' step.
Any tips would be appreciated.
For now it's not possible to avoid columns before wrangling the dataset, using a View is a good choice if you want to reduce query cost and processing time.
In the query to create your View, you can use some WHERE conditions to reduce the amount of data as much as possible.
Also you can upgrade your dataflow machine type, this could reduce time execution and maybe the cost.