google-cloud-platformgoogle-cloud-storagegoogle-cloud-dataprep

How can I get a list of columns from Dataprep?


I used GCP Dataprep to transform some data and store the results as partitioned CSVs in Google Cloud Storage. Those CSVs are stored without headers so in order to load them into BigQuery, I need a file that specifies the schema. But I currently don't have an exact list of the columns that were created via the Dataprep transformation. Is there a tool in Dataprep that will provide a list of those column names? Or, even better, is there a tool that will provide the schema in JSON format?


Solution

  • Look into your Recipe panel, you can review and modify the steps of the recipe that you have already created and add new steps to your recipe at the current location, such as provide a schema to your partitioned data. https://cloud.google.com/dataprep/docs/html/Recipe-Panel_57344894

    When building your recipe, you can associate a target with that recipe. The schema is displayed in the Target Matching bar in the Transformer page above your column histograms, so you are able to track your progress towards the completion of that recipe. A target is the representation of the columns to which you are building your recipe to match. https://cloud.google.com/dataprep/docs/html/Create-Target_118947842#create-schema