I'm using the Cromwell engine on Google Cloud, which submits pipeline run requests: https://cloud.google.com/genomics/reference/rest/v1alpha2/pipelines/run.
Once the pipelines have finished, I am then able to find the Google Cloud operations associated with each pipeline via the labels. However, I can't determine their cost. The Google Cloud billing logs only list the compute engine bills, but they don't show a connection between the compute engine instances and the genomics operations, so I can't work out how to calculate the cost.
How can I calculate the cost of a Google Cloud Genomics Pipeline
It turns out that if you run the pipeline with the correct labels
specified (explained here in the API docs), you can filter the billing logs using these labels. In my case, the Cromwell engine was doing this automatically for me, so I didn't have to do anything extra.
When you want to analyse the bills, you have to export the data to BigQuery, you can't export the data to a file, because that doesn't give you the required fields.
Once the bills load into BigQuery (took about 4-5 hours for me), you can run the following query:
SELECT SUM(cost)
FROM `PipelineBilling.gcp_billing_export_v1_BILLING_ACCOUNT_ID`, UNNEST(labels) as l
WHERE l.key = 'cromwell-workflow-id' AND l.value = 'cromwell-MY-WORKFLOW-ID'
This will return a single number, which is the total cost of pipeline with a label called cromwell-workflow-id
, with a value of cromwell-MY-WORKFLOW-ID
(however this label will be different if you're not using Cromwell).