Skip to main content
Version: V1.0.0

ZIPF

Declaration

ZIPF(<s> , <N> , <gen>)

Description

This function returns an integer that follows the Zipf distribution, with values in the range [0, N), and the characteristic exponent of the partition is s.

  • s is the characteristic exponent. The larger the value of s, the more skewed the generated sequence is. When the sequence is plotted as a curve, the curve becomes steeper.
  • s and N must be scalar values that do not change with row iteration. For example, they can be integer or floating-point constants, scalar functions, or expressions like @v1 or 1+@v3 in PL. The value of s must be in the range [1, +∞), and the value of N must be in the range [1, 16777215].
  • The storage and computational resources consumed by the zipf algorithm are related to the value of N. The space complexity of the algorithm is O(N), and the time complexity for generating each integer is O(logN). Therefore, the value of N is limited to the range [1, 16777215].
  • gen is a numerical generation function, typically the RANDOM() function. If the input value is a constant, the zipf() function returns a constant value.

Examples

The following example uses the ZIPF() function to return an integer that follows the Zipf distribution.

SELECT ZIPF(1, 10, RANDOM()) FROM TABLE(GENERATOR(6));
+-----------------------+
| ZIPF(1, 10, RANDOM()) |
+-----------------------+
| 4 |
| 5 |
| 2 |
| 1 |
| 0 |
| 2 |
+-----------------------+
6 row in set (0.002 sec)

SELECT ZIPF(1, 10, 0415) FROM TABLE(GENERATOR(6));
+-------------------+
| ZIPF(1, 10, 0415) |
+-------------------+
| 8 |
| 8 |
| 8 |
| 8 |
| 8 |
| 8 |
+-------------------+
6 row in set (0.002 sec)

SELECT ZIPF(ABS(-1), 23, RANDOM()) FROM DUAL;
+-----------------------------+
| ZIPF(ABS(-1), 23, RANDOM()) |
+-----------------------------+
| 1 |
+-----------------------------+
1 row in set (0.001 sec)

The value of s in the ZIPF() function affects the distribution. Here is an example:

SELECT  COUNT(*), ZIPF(1, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 350 | 9 |
| 369 | 8 |
| 450 | 7 |
| 488 | 6 |
| 559 | 5 |
| 727 | 4 |
| 877 | 3 |
| 1100 | 2 |
| 1755 | 1 |
| 3325 | 0 |
+----------+------+
10 row in set (0.003 sec)

SELECT COUNT(*), ZIPF(2, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 69 | 9 |
| 73 | 8 |
| 102 | 7 |
| 118 | 6 |
| 187 | 5 |
| 260 | 4 |
| 419 | 3 |
| 679 | 2 |
| 1632 | 1 |
| 6461 | 0 |
+----------+------+
10 row in set (0.003 sec)

SELECT COUNT(*), ZIPF(3, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 6 | 9 |
| 12 | 8 |
| 15 | 7 |
| 35 | 5 |
| 40 | 6 |
| 77 | 4 |
| 118 | 3 |
| 292 | 2 |
| 1106 | 1 |
| 8299 | 0 |
+----------+------+
10 row in set (0.003 sec)

SELECT COUNT(*), ZIPF(4, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 1 | 8 |
| 1 | 7 |
| 1 | 9 |
| 2 | 5 |
| 6 | 6 |
| 19 | 4 |
| 35 | 3 |
| 120 | 2 |
| 548 | 1 |
| 9267 | 0 |
+----------+------+
10 row in set (0.003 sec)

SELECT COUNT(*), ZIPF(5, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 2 | 5 |
| 3 | 4 |
| 18 | 3 |
| 31 | 2 |
| 309 | 1 |
| 9637 | 0 |
+----------+------+
6 row in set (0.003 sec)