I'm looking to anonymize some marketing data for visualization purposes. To do this, I've created several arrays that replace existing campaign names with anonymized values, such as Campaign 1, Campaign 2, Campaign 3, etc...
I got this working in a PostgreSQL database using the following query:
(ARRAY['Campaign 1'::text, 'Campaign 2'::text, 'Campaign 3'::text, 'Campaign 4'::text, 'Campaign 5'::text])[(floor(random() * 5::double precision) + 1::double precision)] AS campaign_name
This does what I need in that it randomly assigns a Campaign the numbers 1-5 for each row of data.
I tried to mimic this in BigQuery but am having trouble. I switched random to rand but that did not work. Anyone done something like this before?
Below is for BigQuery Standard SQL (one of quite many options). Just an example using generated dummy data.
#standardSQL
WITH
`project.dataset.dummy_data` AS (
SELECT id
FROM UNNEST(GENERATE_ARRAY(1, 100)) AS id
),
tmp AS (
SELECT ['Campaign 1','Campaign 2','Campaign 3','Campaign 4','Campaign 5'] AS campaigns
)
SELECT
id,
tmp.campaigns[OFFSET(CAST(5 * RAND() - 0.5 AS INT64))] AS campaign_name
FROM `project.dataset.dummy_data`, tmp
Above is assuming that for some reason you want that array to be set . The version below shows that you can avoid it completely.
#standardSQL
WITH `project.dataset.dummy_data` AS (
SELECT id
FROM UNNEST(GENERATE_ARRAY(1, 100)) AS id
)
SELECT
id,
CONCAT('Campaign ', CAST(1 + CAST(5 * RAND() - 0.5 AS INT64) AS STRING)) AS campaign_name
FROM `project.dataset.dummy_data`