google-bigquery

Generate a random value from an array in Google BigQuery standard SQL


I'm looking to anonymize some marketing data for visualization purposes. To do this, I've created several arrays that replace existing campaign names with anonymized values, such as Campaign 1, Campaign 2, Campaign 3, etc...

I got this working in a PostgreSQL database using the following query:

(ARRAY['Campaign 1'::text, 'Campaign 2'::text, 'Campaign 3'::text, 'Campaign 4'::text, 'Campaign 5'::text])[(floor(random() * 5::double precision) + 1::double precision)] AS campaign_name

This does what I need in that it randomly assigns a Campaign the numbers 1-5 for each row of data.

I tried to mimic this in BigQuery but am having trouble. I switched random to rand but that did not work. Anyone done something like this before?


Solution

  • Below is for BigQuery Standard SQL (one of quite many options). Just an example using generated dummy data.

    #standardSQL
    WITH
      `project.dataset.dummy_data` AS (
        SELECT id 
        FROM UNNEST(GENERATE_ARRAY(1, 100)) AS id
      ),
      tmp AS (
        SELECT ['Campaign 1','Campaign 2','Campaign 3','Campaign 4','Campaign 5'] AS campaigns
      )
    SELECT
      id,
      tmp.campaigns[OFFSET(CAST(5 * RAND() - 0.5 AS INT64))] AS campaign_name
    FROM `project.dataset.dummy_data`, tmp
    

    Above is assuming that for some reason you want that array to be set . The version below shows that you can avoid it completely.

    #standardSQL
    WITH `project.dataset.dummy_data` AS (
      SELECT id
      FROM UNNEST(GENERATE_ARRAY(1, 100)) AS id
    )
    SELECT
      id,
      CONCAT('Campaign ', CAST(1 + CAST(5 * RAND() - 0.5 AS INT64) AS STRING)) AS campaign_name
    FROM `project.dataset.dummy_data`