# Learn to Generate Random Data Using SQL

Learning to generate random data using SQL is an incredible skill and a massive time-saver.

We have a total of 4 articles on this topic.

- Learn to generate data using SQL
- Learn to generate
**random**data using SQL (You are reading now) - Learn to generate
**sample**data using SQL - Learn to
**split**a dataset for testing and training using SQL

Generating random data is beneficial for a multitude of reasons, particularly when it comes to testing, development, and benchmarking. Frequently, there is a requirement to generate random data for these purposes.

For instance, you may need to populate data in your local development environment, or you might want to evaluate query optimization techniques and necessitate a substantial table for that purpose.

## The `RANDOM`

function

To generate random numbers, PostgreSQL offers the "random" function. This function produces a value ranging from 0 to 1:

`SELECT random();`

output:

```
random
0.604956973344088
```

Note: Executing `random()`

function multiple times, and on each occasion, you will receive a unique random number between 0 and 1.

## Generating random integers

As observed in the preceding example, the `random`

function generates a random number that is greater than or equal to 0 and less than 1. Unfortunately, it does not accept any arguments to control the range. So, what if we require a random integer?

To generate values within different ranges, we can incorporate the `random`

function into an expression. For instance, to generate a random floating-point number between 0 and 99, we can multiply the output of `random()`

by `100`

.

```
# Generates a random float between 0 and 99
SELECT random() * 100;
```

output:

```
?column?
45.9825816098601
```

If our aim is to generate a random integer ranging from 0 to 99, we can accomplish this by rounding down the number using the `floor`

function:

```
# Generate random number between 0 and 99
SELECT floor(random() * 100) as n;
```

output:

```
n
35
```

By implementing the `floor`

function, the outcome of the expression will be rounded down, yielding an integer rather than a floating-point value.

To generate a random integer within the range of 10 to 99, we can expand the expression as follows:

```
# Generate a random integer between 10 and 99
SELECT 10 + floor(random() * 90) as n;
```

output:

```
n
83
```

The above expression utilizes the previously mentioned random expression and adds `10`

to it. The random expression, `floor(random() * 90)`

, generates a random integer ranging from 0 to 89. By adding `10`

to the result of this expression, we obtain an integer within the range of 11-99.

## Converting float to integer

In the previous section, we used the floor function to convert a floating-point number into an integer. However, there are two additional functions that can be utilized for this purpose.

`floor`

: This function rounds a float down to the nearest integer. For example,`floor(2.6) = 2`

.`round`

: This function rounds a float to the nearest integer. For instance,`round(2.6) = 3`

and`round(2.4) = 2`

.`ceil`

: This function rounds a float up to the nearest integer. For example,`ceil(2.4) = 3`

.

To illustrate the usage of each of these functions, we can use the following query:

```
# Different ways to round a number
SELECT
n,
floor(n) as _floor_,
round(n) as _round_,
ceil(n) as _ceil_
FROM (VALUES
(2.0),
(2.2),
(2.5),
(2.7),
(3.0)
) AS t(n)
```

output:

```
n _floor_ _round_ _ceil_
2.0 2 2 2
2.2 2 2 3
2.5 2 3 3
2.7 2 3 3
3.0 3 3 3
```

In the output of the query, we can observe the conversion of each function where the float is transformed into an integer. It is particularly fascinating to note the behavior of the `round`

function when the value is exactly halfway, such as `2.5`

. Based on the obtained results, it appears that PostgreSQL rounds the values of `2.5`

up to `3`

.

## Random choice

Up until now, we have explored the generation of random floating-point numbers and integers. However, there are instances when we require selecting random values from a given list of possibilities.

By employing the knowledge we have gained thus far, we can leverage the `random`

function to pick a random value from an array:

`SELECT (array['red', 'orange', 'green'])[1 + floor(random() * 3)] AS color`

output:

```
color
red
```

To begin, we establish an array called "colors" that encompasses the values `red`

, `orange`

, and `green`

. In order to select a random value from this array, we utilize the `random`

function to generate a random index ranging from 1 to 3. Please note that in PostgreSQL, array indexing commences at 1.

## Producing reproducible random data

Suppose we are conducting a performance test on an extensive table. In order to generate test data, we compose a script that creates a table and fills it with millions of rows containing random data.

Once the script is finalised, we share it with another developer. However, when they attempt to execute the script on their local environment, they obtain entirely different results. This discrepancy arises when the locally generated random data differs from the data we used.

To illustrate this scenario, execute the provided code multiple times:

```
# Generating a random float
SELECT random() as n;
```

Note: Every time you execute the above code you get a different result.

This scenario precisely illustrates what occurs when our fellow developer executes our script on their local environment—they obtain distinct outcomes.

Therefore, the question arises: How can we generate random results that can be reproduced?

In PostgreSQL, we can achieve reproducible random data by specifying the random seed. This can be accomplished by utilizing the `setseed`

function:

```
# Using setseed to generating reproducible random numbers
SELECT setseed(0.2030);
SELECT random() as n;
```

Output:

```
setseed
n
0.568829658441246
```

Believe me, we will now get the exact same value every time we execute the above query.

The `setseed`

function accepts a random "seed" number ranging from -1 to 1. The "seed" value influences the values produced in all subsequent calls to the `random`

function.

Tip: For future demonstrations or when sharing scripts containing random data, it is advisable to include the

`setseed`

function. This ensures that the results can be easily reproduced by others.

Thanks a lot for reading. We have a total of 4 articles on this topic.

- Learn to generate data using SQL
- Learn to generate
**random**data using SQL (You already read!) - Learn to generate
**sample**data using SQL - Learn to
**split**a dataset for testing and training using SQL

*Post Tags:*