ZIPF
Declaration
ZIPF(<s> , <N> , <gen>)
Description
This function returns an integer that follows the Zipf distribution, with values in the range [0, N), and the characteristic exponent of the partition is s.
sis the characteristic exponent. The larger the value ofs, the more skewed the generated sequence is. When the sequence is plotted as a curve, the curve becomes steeper.sandNmust be scalar values that do not change with row iteration. For example, they can be integer or floating-point constants, scalar functions, or expressions like@v1or1+@v3in PL. The value ofsmust be in the range [1, +∞), and the value ofNmust be in the range [1, 16777215].- The storage and computational resources consumed by the
zipfalgorithm are related to the value ofN. The space complexity of the algorithm isO(N), and the time complexity for generating each integer isO(logN). Therefore, the value ofNis limited to the range [1, 16777215]. genis a numerical generation function, typically theRANDOM()function. If the input value is a constant, thezipf()function returns a constant value.
Examples
The following example uses the ZIPF() function to return an integer that follows the Zipf distribution.
SELECT ZIPF(1, 10, RANDOM()) FROM TABLE(GENERATOR(6));
+-----------------------+
| ZIPF(1, 10, RANDOM()) |
+-----------------------+
| 4 |
| 5 |
| 2 |
| 1 |
| 0 |
| 2 |
+-----------------------+
6 row in set (0.002 sec)
SELECT ZIPF(1, 10, 0415) FROM TABLE(GENERATOR(6));
+-------------------+
| ZIPF(1, 10, 0415) |
+-------------------+
| 8 |
| 8 |
| 8 |
| 8 |
| 8 |
| 8 |
+-------------------+
6 row in set (0.002 sec)
SELECT ZIPF(ABS(-1), 23, RANDOM()) FROM DUAL;
+-----------------------------+
| ZIPF(ABS(-1), 23, RANDOM()) |
+-----------------------------+
| 1 |
+-----------------------------+
1 row in set (0.001 sec)
The value of s in the ZIPF() function affects the distribution. Here is an example:
SELECT COUNT(*), ZIPF(1, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 350 | 9 |
| 369 | 8 |
| 450 | 7 |
| 488 | 6 |
| 559 | 5 |
| 727 | 4 |
| 877 | 3 |
| 1100 | 2 |
| 1755 | 1 |
| 3325 | 0 |
+----------+------+
10 row in set (0.003 sec)
SELECT COUNT(*), ZIPF(2, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 69 | 9 |
| 73 | 8 |
| 102 | 7 |
| 118 | 6 |
| 187 | 5 |
| 260 | 4 |
| 419 | 3 |
| 679 | 2 |
| 1632 | 1 |
| 6461 | 0 |
+----------+------+
10 row in set (0.003 sec)
SELECT COUNT(*), ZIPF(3, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 6 | 9 |
| 12 | 8 |
| 15 | 7 |
| 35 | 5 |
| 40 | 6 |
| 77 | 4 |
| 118 | 3 |
| 292 | 2 |
| 1106 | 1 |
| 8299 | 0 |
+----------+------+
10 row in set (0.003 sec)
SELECT COUNT(*), ZIPF(4, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 1 | 8 |
| 1 | 7 |
| 1 | 9 |
| 2 | 5 |
| 6 | 6 |
| 19 | 4 |
| 35 | 3 |
| 120 | 2 |
| 548 | 1 |
| 9267 | 0 |
+----------+------+
10 row in set (0.003 sec)
SELECT COUNT(*), ZIPF(5, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 2 | 5 |
| 3 | 4 |
| 18 | 3 |
| 31 | 2 |
| 309 | 1 |
| 9637 | 0 |
+----------+------+
6 row in set (0.003 sec)