ZIPF
声明
ZIPF(<s> , <N> , <gen>)
说明
该函数返回一个符合齐夫分布(Zipf-distributed)的整数,取值范围为 [0, N),分区的特征指数为 s。
s为特征指数,s越大生成的序列越倾斜。将序列绘制成曲线时,曲线越陡峭。s和N取值要求:必须是一个标量值,不随行迭代而变。例如,整形或浮点型常量、标量函数等,PL 里还可以是@v1、1+@v3等。s的取值范围为 [1, +∞),N的取值范围为 [1,16777215]。zipf算法实现时消耗的存储、计算资源和N的取值相关。算法空间复杂度为O(N),每生成一个整数的时间复杂度为O(logN)。所以,N的取值范围被限制为 [1, 16777215]。gen是一个数值生成函数,通常使用RANDOM()函数生成。如果传入值是一个常量,则zipf()函数返回值也为一个定值。
示例
如下示例为使用 ZIPF() 函数返回符合齐夫分布的整数。
SELECT ZIPF(1, 10, RANDOM()) FROM TABLE(GENERATOR(6));
+-----------------------+
| ZIPF(1, 10, RANDOM()) |
+-----------------------+
| 4 |
| 5 |
| 2 |
| 1 |
| 0 |
| 2 |
+-----------------------+
6 row in set (0.002 sec)
SELECT ZIPF(1, 10, 0415) FROM TABLE(GENERATOR(6));
+-------------------+
| ZIPF(1, 10, 0415) |
+-------------------+
| 8 |
| 8 |
| 8 |
| 8 |
| 8 |
| 8 |
+-------------------+
6 row in set (0.002 sec)
SELECT ZIPF(ABS(-1), 23, RANDOM()) FROM DUAL;
+-----------------------------+
| ZIPF(ABS(-1), 23, RANDOM()) |
+-----------------------------+
| 1 |
+-----------------------------+
1 row in set (0.001 sec)
ZIPF() 函数的 s 取值对分布存在影响,示例如下。
SELECT COUNT(*), ZIPF(1, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 350 | 9 |
| 369 | 8 |
| 450 | 7 |
| 488 | 6 |
| 559 | 5 |
| 727 | 4 |
| 877 | 3 |
| 1100 | 2 |
| 1755 | 1 |
| 3325 | 0 |
+----------+------+
10 row in set (0.003 sec)
SELECT COUNT(*), ZIPF(2, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 69 | 9 |
| 73 | 8 |
| 102 | 7 |
| 118 | 6 |
| 187 | 5 |
| 260 | 4 |
| 419 | 3 |
| 679 | 2 |
| 1632 | 1 |
| 6461 | 0 |
+----------+------+
10 row in set (0.003 sec)
SELECT COUNT(*), ZIPF(3, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 6 | 9 |
| 12 | 8 |
| 15 | 7 |
| 35 | 5 |
| 40 | 6 |
| 77 | 4 |
| 118 | 3 |
| 292 | 2 |
| 1106 | 1 |
| 8299 | 0 |
+----------+------+
10 row in set (0.003 sec)
SELECT COUNT(*), ZIPF(4, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 1 | 8 |
| 1 | 7 |
| 1 | 9 |
| 2 | 5 |
| 6 | 6 |
| 19 | 4 |
| 35 | 3 |
| 120 | 2 |
| 548 | 1 |
| 9267 | 0 |
+----------+------+
10 row in set (0.003 sec)
SELECT COUNT(*), ZIPF(5, 10, RANDOM()) v FROM TABLE(GENERATOR(10000)) GROUP BY v ORDER BY 1;
+----------+------+
| COUNT(*) | v |
+----------+------+
| 2 | 5 |
| 3 | 4 |
| 18 | 3 |
| 31 | 2 |
| 309 | 1 |
| 9637 | 0 |
+----------+------+
6 row in set (0.003 sec)