sqlazureazure-databricksdatabricks-sqlazure-notebooks

Using CROSS JOIN SEQUENCE to produce large duplicated tables


I have the following in an Azure notebook (databricks sql):

CREATE TABLE my_new_big_table AS
  SELECT t.*
  FROM my_table t
  CROSS JOIN VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10) v(i);

This duplicates the my_table, 10 times, however how can I use a loop (SEQUENCE, EXPLORE) to create iterations of 100, 1000 etc?


Solution

  • This is my sample data:

    enter image description here

    With your code, I got the output like below which have every row repeated 10 times.

    enter image description here

    To do the above dynamically, you can use range(start,end) like below.

    %sql
    CREATE TABLE result2 AS
      SELECT t.* 
      FROM sample1 as t
      CROSS JOIN (select * from range(0,10)) v(i);
    select * from result2;
    

    My Result:

    enter image description here

    (Or)

    Use sequence(start,end) with explode() as discussed by community in comments.

    create table result3 as
    SELECT t.* 
    FROM sample1 as t 
    CROSS JOIN (select explode(sequence(1,10))) v(i);
    select * from result3;
    

    enter image description here