Tuesday, June 28, 2016

Sample Data Generator

OK, so sometimes I need to really quickly make a sample table to destroy or modify or whatever.

When that's true, I use a form of the below to do so.

There's a couple of things I'd like to point out about this script.

First, you can run this in any database you have create permissions in, and it really doesn't get that big if you don't want it to. You want a larger table? Fine! Increase the value of SELECT TOP 123 to SELECT TOP 1234567 in the section labelled "Make the sample table"


Does our sample table exist? yes? nuke it. Then, create and fill it in one go, followed by a primary key.


IF OBJECT_ID('dbo.numbers') IS NOT NULL DROP TABLE dbo.numbers;
GO

SELECT TOP 123 N=IDENTITY(INT, 1, 1), NULL AS somenum
INTO dbo.Numbers
FROM master.dbo.syscolumns a CROSS JOIN master.dbo.syscolumns  b;

ALTER TABLE dbo.Numbers ADD CONSTRAINT NBR_pk PRIMARY KEY(N);
GO

Now comes the fun part, where we update back with a quasi-random number. I say "quasi-random" because, with the batch size set at 500, and an initial set size of 123, from above, all the rows are going to get updated to the same value at the same time.

This is intentional, as when I was developing this, I needed to work on a grouping / windowing function, and I needed things that I could group, that I knew would group, into groups that I knew existed.

If you want something closer to a true random value for each row, set the batch size to 1, and run the script. It'll take longer, sure, but it'll look quite a bit more random.


-- declare the variables
DECLARE @batchsize BIGINT = 500;
DECLARE @currID BIGINT = 0;
DECLARE @maxID BIGINT;

-- set the loop limit
SELECT @maxID = MAX(N) FROM dbo.numbers;

-- loop over the inside query using the loop limiter to control, and
-- the batch size to control the size of "bites" we take out of the target
WHILE @currid < @maxID
BEGIN
 UPDATE dbo.numbers
 SET somenum = RAND()*100
 WHERE N > @currID AND N <= (@currID + @batchsize);

 SET @currID = @currID + @batchsize;

END
GO

-- prove it worked
SELECT TOP 5000 * FROM dbo.Numbers WHERE n IN (1,12,123,1234,12345,123456);
GO

No comments:

Post a Comment