sqlsql-serveradventureworksnorthwind

How to properly generate big random SQL data respecting primary and foreign keys?


Recently I've got a task to generate big testing database built upon this E-R diagram : https://i.sstatic.net/I2kr9.png

I need to have over 300,000 rows (combined) and it was easy generating tables Customer, Supplier and Product via Excel by using its random functions but I'm out of idea how to properly create Order and OrderItem tables since I need to multiply every UnitPrice and Quantity from multiple rows (with matching IDs) in order to get TotalAmount which is located in the other table, and of course every PK and FK need to completely match.

I know it's a dummy question but any small tip would be helpful, it doesn't matter if I need to create directly through SQL scripts, Excel or any other way.

Thank you in advance!


Solution

  • :) Check out the sample Northwind database. It already has those tables in it. Field names and count do not match 1-to-1 but easy to edit.

    All you need is to proliferate the rows with simple inserts. ie: Double the customers:

    insert into customers (FirstName, LastName, City, Country)
    select FirstName+'2', LastName+'2', City, Country
    from Customers; 
    

    300 K rows is not big at all, it is small in fact.

    PS: I assumed you would change Id for customers to an int identity column. In Northwind it is character data.

    EDIT: Code I promised. It is ugly I know:

    -- Create a tally table
    DECLARE @tally TABLE (n INT);
    
    INSERT INTO @tally (n)
    SELECT TOP(300000) ROW_NUMBER() OVER (ORDER BY t1.Object_ID)
                FROM Master.sys.All_Columns t1
                CROSS JOIN Master.sys.All_Columns t2;
    
    -- 300K in total. But let's say we want 300K order items
    -- Aprx. 1000 customers, each with 50 orders * 6 items per order 
    create table Customers (Id int identity primary key, 
        FirstName varchar(15), 
        LastName varchar(15), 
        City varchar(15), 
        Country varchar(15), 
        Phone varchar(15) );
    
    create table Orders (Id int identity primary key, 
        OrderDate datetime, 
        OrderNumber int, 
        CustomerId int foreign key references Customers(Id), 
        TotalAmount money null);
    
    create table Suppliers (id int identity primary key, 
        CompanyName varchar(15), 
        ContactName varchar(15), 
        ContactTitle varchar(15), 
        City varchar(15), 
        Country varchar(15), 
        Phone varchar(15), 
        Fax varchar(15));
    
    create table Products (Id int identity primary key, 
        ProductName varchar(50), 
        SupplierId int foreign key references Suppliers(Id), 
        UnitPrice money, 
        Package varchar(20),
        IsDiscontinued bit);
    
    create table OrderItems (Id int identity primary key, 
        OrderId int foreign key references Orders(Id),
        ProductId int foreign key references Products(Id),
        UnitPrice money,
        Quantity int);
    
    
    INSERT INTO Customers
    (
        FirstName,
        LastName,
        City,
        Country,
        Phone
    )
    SELECT top 1000 'FirstName'+CAST(n AS VARCHAR(6)),
        'LastName'+CAST(n AS VARCHAR(6)),
        'City'+CAST(n%10 AS VARCHAR(6)),
        'Country'+CAST(n%100 AS VARCHAR(6)),
        'Phone'+cast(n as varchar(6))
    from @tally;
    
    insert into Orders (OrderDate, OrderNumber, CustomerId)
    select t.d, t.n, c.Id
    from customers c
    cross apply (select top(50) n, dateadd(day, -n, getdate()) from @tally) t(n, d);
    
    insert into Suppliers (CompanyName, 
        ContactName, 
        ContactTitle, 
        City, 
        Country, 
        Phone, 
        Fax)
    SELECT top 10 'Company'+CAST(n AS VARCHAR(6)),
        'Contact'+CAST(n AS VARCHAR(6)),
        'Title'+CAST(n AS VARCHAR(6)),
        'City'+CAST(n%10 AS VARCHAR(6)),
        'Country'+CAST(n%100 AS VARCHAR(6)),
        'Phone'+cast(n as varchar(6)),
        'Fax'+cast(n as varchar(6))
    from @tally;
    
    with ts(n, sId) as (
    select t.n, s.Id
    from
    (SELECT top(500) n from @tally) t, Suppliers s
    )
    insert into Products (ProductName, 
        SupplierId, 
        UnitPrice, 
        Package,
        IsDiscontinued)
    SELECT top(5000) 'Product'+CAST(n AS VARCHAR(6)),
    sId,
    n * 10,
    'Package'+CAST(n%5 AS VARCHAR(6)),
    case when n%1500 = 0 then 1 else 0 end
    from ts order by newid();
    
    
    with pdata (oid, pid) aS (
    select top(300*1000) 
     abs(cast(checksum(newid()) as bigint)) % 50000 + 1,
     abs(cast(checksum(newid()) as bigint)) % 5000 + 1
    from @tally
    order by newId())
    insert into OrderItems
        (OrderId,
        ProductId,
        UnitPrice,
        Quantity)
    select  d.oid, d.pid, p.UnitPrice, abs(cast(checksum(newid()) as bigint)) % 20 + 1 
    from pData d inner join Products p on d.pid = p.id
    order by d.oid, d.pid;