sqldatabasepostgresqldummy-data

Inserting dummy data into Postgres table


I have got the following Postgres table:

create table test (
    id serial, 
    contract varchar, 
    amount0 int, 
    amount1 int, 
    price double precision
);

I would like to insert 100 rows of dummy data that conforms to the following:

– In column 'contract' there should be values out of 'abc', 'klm' and 'xyz'.

– In columns 'amount0' and 'amount1' there should be integer values out of 50, 60, 80, 100, 200.

– In column 'price' there should be values out of 1.5, 1.8, 2.1, 2.5.

What I have come up with so far is this:

INSERT INTO test (amount0, amount1, price)
SELECT
  (SELECT val FROM unnest(ARRAY[50, 60, 80, 100, 200]) AS val ORDER BY random() LIMIT 1),
  (SELECT val FROM unnest(ARRAY[50, 60, 80, 100, 200]) AS val ORDER BY random() LIMIT 1),
  (SELECT val FROM unnest(ARRAY[1.5, 1.8, 2.1, 2.5]) AS val ORDER BY random() LIMIT 1)
FROM generate_series(1, 100);

But this doesn't do the trick. This statement always only uses '100' for column 'amount0', '50' for column 'amount1' and '2.1' for column 'price'.

Can anyone help me out?


Solution

  • You can randomly subscript the array of choices: (demo)

    INSERT INTO test (contract,amount0, amount1, price)
    SELECT (ARRAY['abc','klm','xyz']   )[(random()*2+1)::int],
           (ARRAY[50, 60, 80, 100, 200])[(random()*4+1)::int],
           (ARRAY[50, 60, 80, 100, 200])[(random()*4+1)::int],
           (ARRAY[1.5, 1.8, 2.1, 2.5]  )[(random()*3+1)::int]
    FROM generate_series(1, 1e2, 1);
    
    SELECT id, contract,amount0, amount1, price FROM test LIMIT 6;
    
    id contract amount0 amount1 price
    1 klm 50 100 2.1
    2 xyz 100 100 1.5
    3 abc 80 100 1.8
    4 xyz 80 80 2.1
    5 xyz 200 50 1.8
    6 xyz 60 60 1.8

    In PostgreSQL 16+, there's also array_sample(). Its intent is clearer, it looks a bit better, lets you sample multiple, non-consecutive elements. A potential downside is you can't make it repeatable with setseed(): (demo2)

    INSERT INTO test (contract,amount0, amount1, price)
    SELECT (array_sample(ARRAY['abc','klm','xyz']   ,1))[1],
           (array_sample(ARRAY[50, 60, 80, 100, 200],1))[1],
           (array_sample(ARRAY[50, 60, 80, 100, 200],1))[1],
           (array_sample(ARRAY[1.5, 1.8, 2.1, 2.5]  ,1))[1]
    FROM generate_series(1, 1e2, 1);
    

    Note that this does what you intended to do, making each selection independently random. In effect, it's quite likely that you'll get duplicate entries. If you prefer to keep getting unique combinations until you run out, and only then start over, you can do something similar to Mike's suggestion: (demo2)

    INSERT INTO test (contract,amount0, amount1, price)
    SELECT contract,amount0, amount1, price
    FROM (values (50),(60),(80),(100),(200)) a(amount0),
         (values (50),(60),(80),(100),(200)) b(amount1),
         (values (1.5), (1.8), (2.1), (2.5)) c(price),
         (values ('abc'),('klm'),('xyz')   ) d(contract),
         generate_series(1, 1e2, 1) duplicator(n)
    order by n, random()
    limit 100;
    

    Here's a function if you find yourself in need of a weighted random, meaning that you want to define exactly how likely an option is, compared to the others.


    The reason why your solution didn't work is that since your scalar subqueries don't depend in any way on the outer query, they were evaluated only once and re-used. You can check the plan by running it with explain analyze verbose. You could trick the planner into thinking they somehow rely on the outer query by adding an outside reference, even if it does nothing (demo3) but the above does the same, in less code.